Paper Group AWR 101
Binary Classification from Positive-Confidence Data. Toward Multimodal Image-to-Image Translation. Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography. Learning the Latent “Look”: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images. Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid …
Binary Classification from Positive-Confidence Data
Title | Binary Classification from Positive-Confidence Data |
Authors | Takashi Ishida, Gang Niu, Masashi Sugiyama |
Abstract | Can we learn a binary classifier from only positive data, without any negative data or unlabeled data? We show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which we name positive-confidence (Pconf) classification. Our work is related to one-class classification which is aimed at “describing” the positive class by clustering-related methods, but one-class classification does not have the ability to tune hyper-parameters and their aim is not on “discriminating” positive and negative classes. For the Pconf classification problem, we provide a simple empirical risk minimization framework that is model-independent and optimization-independent. We theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments. |
Tasks | |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07138v3 |
http://arxiv.org/pdf/1710.07138v3.pdf | |
PWC | https://paperswithcode.com/paper/binary-classification-from-positive |
Repo | https://github.com/takashiishida/pconf |
Framework | pytorch |
Toward Multimodal Image-to-Image Translation
Title | Toward Multimodal Image-to-Image Translation |
Authors | Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman |
Abstract | Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a \emph{distribution} of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity. |
Tasks | Image-to-Image Translation |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11586v4 |
http://arxiv.org/pdf/1711.11586v4.pdf | |
PWC | https://paperswithcode.com/paper/toward-multimodal-image-to-image-translation |
Repo | https://github.com/eriklindernoren/PyTorch-GAN |
Framework | pytorch |
Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography
Title | Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography |
Authors | Li Shen, Laurie R. Margolies, Joseph H. Rothstein, Eugene Fluder, Russell B. McBride, Weiva Sieh |
Abstract | The rapid development of deep learning, a family of machine learning techniques, has spurred much interest in its application to medical imaging problems. Here, we develop a deep learning algorithm that can accurately detect breast cancer on screening mammograms using an “end-to-end” training approach that efficiently leverages training datasets with either complete clinical annotation or only the cancer status (label) of the whole image. In this approach, lesion annotations are required only in the initial training stage, and subsequent stages require only image-level labels, eliminating the reliance on rarely available lesion annotations. Our all convolutional network method for classifying screening mammograms attained excellent performance in comparison with previous methods. On an independent test set of digitized film mammograms from Digital Database for Screening Mammography (DDSM), the best single model achieved a per-image AUC of 0.88, and four-model averaging improved the AUC to 0.91 (sensitivity: 86.1%, specificity: 80.1%). On a validation set of full-field digital mammography (FFDM) images from the INbreast database, the best single model achieved a per-image AUC of 0.95, and four-model averaging improved the AUC to 0.98 (sensitivity: 86.7%, specificity: 96.1%). We also demonstrate that a whole image classifier trained using our end-to-end approach on the DDSM digitized film mammograms can be transferred to INbreast FFDM images using only a subset of the INbreast data for fine-tuning and without further reliance on the availability of lesion annotations. These findings show that automatic deep learning methods can be readily trained to attain high accuracy on heterogeneous mammography platforms, and hold tremendous promise for improving clinical tools to reduce false positive and false negative screening mammography results. |
Tasks | Breast Cancer Detection |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09427v5 |
http://arxiv.org/pdf/1708.09427v5.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-to-improve-breast-cancer-early |
Repo | https://github.com/yuyuyu123456/CBIS-DDSM |
Framework | tf |
Learning the Latent “Look”: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images
Title | Learning the Latent “Look”: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images |
Authors | Wei-Lin Hsiao, Kristen Grauman |
Abstract | What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn a style-coherent representation. Our method leverages probabilistic polylingual topic models based on visual attributes to discover a set of latent style factors. Given a collection of unlabeled fashion images, our approach mines for the latent styles, then summarizes outfits by how they mix those styles. Our approach can organize galleries of outfits by style without requiring any style labels. Experiments on over 100K images demonstrate its promise for retrieving, mixing, and summarizing fashion images by their style. |
Tasks | Topic Models |
Published | 2017-07-11 |
URL | http://arxiv.org/abs/1707.03376v2 |
http://arxiv.org/pdf/1707.03376v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-latent-look-unsupervised |
Repo | https://github.com/arodri202/dl-final-project |
Framework | none |
Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks
Title | Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks |
Authors | Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang |
Abstract | Convolutional neural networks have recently demonstrated high-quality reconstruction for single image super-resolution. However, existing methods often require a large number of network parameters and entail heavy computational loads at runtime for generating high-accuracy super-resolution results. In this paper, we propose the deep Laplacian Pyramid Super-Resolution Network for fast and accurate image super-resolution. The proposed network progressively reconstructs the sub-band residuals of high-resolution images at multiple pyramid levels. In contrast to existing methods that involve the bicubic interpolation for pre-processing (which results in large feature maps), the proposed method directly extracts features from the low-resolution input space and thereby entails low computational loads. We train the proposed network with deep supervision using the robust Charbonnier loss functions and achieve high-quality image reconstruction. Furthermore, we utilize the recursive layers to share parameters across as well as within pyramid levels, and thus drastically reduce the number of parameters. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of run-time and image quality. |
Tasks | Image Reconstruction, Image Super-Resolution, Super-Resolution |
Published | 2017-10-04 |
URL | http://arxiv.org/abs/1710.01992v3 |
http://arxiv.org/pdf/1710.01992v3.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-accurate-image-super-resolution-with |
Repo | https://github.com/pratik-kubal/Deep-Laplacian-Pyramid-Networks |
Framework | tf |
Online Learning for Neural Machine Translation Post-editing
Title | Online Learning for Neural Machine Translation Post-editing |
Authors | Álvaro Peris, Luis Cebrián, Francisco Casacuberta |
Abstract | Neural machine translation has meant a revolution of the field. Nevertheless, post-editing the outputs of the system is mandatory for tasks requiring high translation quality. Post-editing offers a unique opportunity for improving neural machine translation systems, using online learning techniques and treating the post-edited translations as new, fresh training data. We review classical learning methods and propose a new optimization algorithm. We thoroughly compare online learning algorithms in a post-editing scenario. Results show significant improvements in translation quality and effort reduction. |
Tasks | Machine Translation |
Published | 2017-06-10 |
URL | http://arxiv.org/abs/1706.03196v1 |
http://arxiv.org/pdf/1706.03196v1.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-for-neural-machine |
Repo | https://github.com/lvapeab/nmt-keras |
Framework | tf |
Video Object Segmentation with Re-identification
Title | Video Object Segmentation with Re-identification |
Authors | Xiaoxiao Li, Yuankai Qi, Zhe Wang, Kai Chen, Ziwei Liu, Jianping Shi, Ping Luo, Xiaoou Tang, Chen Change Loy |
Abstract | Conventional video segmentation methods often rely on temporal continuity to propagate masks. Such an assumption suffers from issues like drifting and inability to handle large displacement. To overcome these issues, we formulate an effective mechanism to prevent the target from being lost via adaptive object re-identification. Specifically, our Video Object Segmentation with Re-identification (VS-ReID) model includes a mask propagation module and a ReID module. The former module produces an initial probability map by flow warping while the latter module retrieves missing instances by adaptive matching. With these two modules iteratively applied, our VS-ReID records a global mean (Region Jaccard and Boundary F measure) of 0.699, the best performance in 2017 DAVIS Challenge. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00197v1 |
http://arxiv.org/pdf/1708.00197v1.pdf | |
PWC | https://paperswithcode.com/paper/video-object-segmentation-with-re |
Repo | https://github.com/birdman9390/MetaMaskTrack |
Framework | pytorch |
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Title | Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning |
Authors | Victor Zhong, Caiming Xiong, Richard Socher |
Abstract | A significant amount of the world’s knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly reduce the output space of generated queries. Moreover, we use rewards from in-the-loop query execution over the database to learn a policy to generate unordered parts of the query, which we show are less suitable for optimization via cross entropy loss. In addition, we will publish WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia. This dataset is required to train our model and is an order of magnitude larger than comparable datasets. By applying policy-based reinforcement learning with a query execution environment to WikiSQL, our model Seq2SQL outperforms attentional sequence to sequence models, improving execution accuracy from 35.9% to 59.4% and logical form accuracy from 23.4% to 48.3%. |
Tasks | Text-To-Sql |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1709.00103v7 |
http://arxiv.org/pdf/1709.00103v7.pdf | |
PWC | https://paperswithcode.com/paper/seq2sql-generating-structured-queries-from |
Repo | https://github.com/llSourcell/SQL_Database_Optimization |
Framework | pytorch |
The Diverse Cohort Selection Problem
Title | The Diverse Cohort Selection Problem |
Authors | Candice Schumann, Samsara N. Counts, Jeffrey S. Foster, John P. Dickerson |
Abstract | How should a firm allocate its limited interviewing resources to select the optimal cohort of new employees from a large set of job applicants? How should that firm allocate cheap but noisy resume screenings and expensive but in-depth in-person interviews? We view this problem through the lens of combinatorial pure exploration (CPE) in the multi-armed bandit setting, where a central learning agent performs costly exploration of a set of arms before selecting a final subset with some combinatorial structure. We generalize a recent CPE algorithm to the setting where arm pulls can have different costs and return different levels of information. We then prove theoretical upper bounds for a general class of arm-pulling strategies in this new setting. We apply our general algorithm to a real-world problem with combinatorial structure: incorporating diversity into university admissions. We take real data from admissions at one of the largest US-based computer science graduate programs and show that a simulation of our algorithm produces a cohort with hiring overall utility while spending comparable budget to the current admissions process at that university. |
Tasks | |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03441v5 |
http://arxiv.org/pdf/1709.03441v5.pdf | |
PWC | https://paperswithcode.com/paper/the-diverse-cohort-selection-problem |
Repo | https://github.com/principledhiring/SWAP |
Framework | none |
Spatial As Deep: Spatial CNN for Traffic Scene Understanding
Title | Spatial As Deep: Spatial CNN for Traffic Scene Understanding |
Authors | Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang |
Abstract | Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored. These relationships are important to learn semantic objects with strong shape priors but weak appearance coherences, such as traffic lanes, which are often occluded or not even painted on the road surface as shown in Fig. 1 (a). In this paper, we propose Spatial CNN (SCNN), which generalizes traditional deep layer-by-layer convolutions to slice-byslice convolutions within feature maps, thus enabling message passings between pixels across rows and columns in a layer. Such SCNN is particular suitable for long continuous shape structure or large objects, with strong spatial relationship but less appearance clues, such as traffic lanes, poles, and wall. We apply SCNN on a newly released very challenging traffic lane detection dataset and Cityscapse dataset. The results show that SCNN could learn the spatial relationship for structure output and significantly improves the performance. We show that SCNN outperforms the recurrent neural network (RNN) based ReNet and MRF+CNN (MRFNet) in the lane detection dataset by 8.7% and 4.6% respectively. Moreover, our SCNN won the 1st place on the TuSimple Benchmark Lane Detection Challenge, with an accuracy of 96.53%. |
Tasks | Lane Detection, Scene Understanding |
Published | 2017-12-17 |
URL | http://arxiv.org/abs/1712.06080v1 |
http://arxiv.org/pdf/1712.06080v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-as-deep-spatial-cnn-for-traffic-scene |
Repo | https://github.com/cardwing/Codes-for-Lane-Detection |
Framework | tf |
Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings
Title | Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings |
Authors | Weixin Cai, Nima S. Hejazi, Alan E. Hubbard |
Abstract | Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands – even millions – of null hypotheses. For high-dimensional multivariate distributions, these hypotheses may concern a wide range of parameters, with complex and unknown dependence structures among variables. In analyzing such hypothesis testing procedures, gains in efficiency and power can be achieved by performing variable reduction on the set of hypotheses prior to testing. We present in this paper an approach using data-adaptive multiple testing that serves exactly this purpose. This approach applies data mining techniques to screen the full set of covariates on equally sized partitions of the whole sample via cross-validation. This generalized screening procedure is used to create average ranks for covariates, which are then used to generate a reduced (sub)set of hypotheses, from which we compute test statistics that are subsequently subjected to standard multiple testing corrections. The principal advantage of this methodology lies in its providing valid statistical inference without the \textit{a priori} specifying which hypotheses will be tested. Here, we present the theoretical details of this approach, confirm its validity via a simulation study, and exemplify its use by applying it to the analysis of data on microRNA differential expression. |
Tasks | |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07008v1 |
http://arxiv.org/pdf/1704.07008v1.pdf | |
PWC | https://paperswithcode.com/paper/data-adaptive-statistics-for-multiple |
Repo | https://github.com/wilsoncai1992/adaptest |
Framework | none |
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Title | Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification |
Authors | Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy |
Abstract | Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including spatial (image) feature representation, temporal information representation, and model/computation complexity. It was recently shown by Carreira and Zisserman that 3D CNNs, inflated from 2D networks and pretrained on ImageNet, could be a promising way for spatial and temporal representation learning. However, as for model/computation complexity, 3D CNNs are much more expensive than 2D CNNs and prone to overfit. We seek a balance between speed and accuracy by building an effective and efficient video classification system through systematic exploration of critical network design choices. In particular, we show that it is possible to replace many of the 3D convolutions by low-cost 2D convolutions. Rather surprisingly, best result (in both speed and accuracy) is achieved when replacing the 3D convolutions at the bottom of the network, suggesting that temporal representation learning on high-level semantic features is more useful. Our conclusion generalizes to datasets with very different properties. When combined with several other cost-effective designs including separable spatial/temporal convolution and feature gating, our system results in an effective video classification system that that produces very competitive results on several action classification benchmarks (Kinetics, Something-something, UCF101 and HMDB), as well as two action detection (localization) benchmarks (JHMDB and UCF101-24). |
Tasks | Action Classification, Action Detection, Action Recognition In Videos, Image Classification, Representation Learning, Video Classification |
Published | 2017-12-13 |
URL | http://arxiv.org/abs/1712.04851v2 |
http://arxiv.org/pdf/1712.04851v2.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-spatiotemporal-feature-learning |
Repo | https://github.com/kylemin/S3D |
Framework | pytorch |
Linear Ensembles of Word Embedding Models
Title | Linear Ensembles of Word Embedding Models |
Authors | Avo Muromägi, Kairit Sirts, Sven Laur |
Abstract | This paper explores linear methods for combining several word embedding models into an ensemble. We construct the combined models using an iterative method based on either ordinary least squares regression or the solution to the orthogonal Procrustes problem. We evaluate the proposed approaches on Estonian—a morphologically complex language, for which the available corpora for training word embeddings are relatively small. We compare both combined models with each other and with the input word embedding models using synonym and analogy tests. The results show that while using the ordinary least squares regression performs poorly in our experiments, using orthogonal Procrustes to combine several word embedding models into an ensemble model leads to 7-10% relative improvements over the mean result of the initial models in synonym tests and 19-47% in analogy tests. |
Tasks | Word Embeddings |
Published | 2017-04-05 |
URL | http://arxiv.org/abs/1704.01419v1 |
http://arxiv.org/pdf/1704.01419v1.pdf | |
PWC | https://paperswithcode.com/paper/linear-ensembles-of-word-embedding-models |
Repo | https://github.com/Shujian2015/meta-embedding-paper-list |
Framework | none |
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Title | Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses |
Authors | Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau |
Abstract | Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model’s predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation. |
Tasks | |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.07149v2 |
http://arxiv.org/pdf/1708.07149v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-an-automatic-turing-test-learning-to |
Repo | https://github.com/mike-n-7/ADEM |
Framework | none |
3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversarial Network
Title | 3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversarial Network |
Authors | Renato Hermoza, Ivan Sipiran |
Abstract | We introduce a data-driven approach to aid the repairing and conservation of archaeological objects: ORGAN, an object reconstruction generative adversarial network (GAN). By using an encoder-decoder 3D deep neural network on a GAN architecture, and combining two loss objectives: a completion loss and an Improved Wasserstein GAN loss, we can train a network to effectively predict the missing geometry of damaged objects. As archaeological objects can greatly differ between them, the network is conditioned on a variable, which can be a culture, a region or any metadata of the object. In our results, we show that our method can recover most of the information from damaged objects, even in cases where more than half of the voxels are missing, without producing many errors. |
Tasks | 3D Reconstruction, Object Reconstruction |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06363v2 |
http://arxiv.org/pdf/1711.06363v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-reconstruction-of-incomplete |
Repo | https://github.com/renato145/3D-ORGAN |
Framework | none |