July 29, 2019

3187 words 15 mins read

Paper Group AWR 101

Binary Classification from Positive-Confidence Data. Toward Multimodal Image-to-Image Translation. Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography. Learning the Latent “Look”: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images. Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid …

Binary Classification from Positive-Confidence Data


Title	Binary Classification from Positive-Confidence Data
Authors	Takashi Ishida, Gang Niu, Masashi Sugiyama
Abstract	Can we learn a binary classifier from only positive data, without any negative data or unlabeled data? We show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which we name positive-confidence (Pconf) classification. Our work is related to one-class classification which is aimed at “describing” the positive class by clustering-related methods, but one-class classification does not have the ability to tune hyper-parameters and their aim is not on “discriminating” positive and negative classes. For the Pconf classification problem, we provide a simple empirical risk minimization framework that is model-independent and optimization-independent. We theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments.
Tasks
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07138v3
PDF	http://arxiv.org/pdf/1710.07138v3.pdf
PWC	https://paperswithcode.com/paper/binary-classification-from-positive
Repo	https://github.com/takashiishida/pconf
Framework	pytorch

Toward Multimodal Image-to-Image Translation


Title	Toward Multimodal Image-to-Image Translation
Authors	Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman
Abstract	Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim to model a \emph{distribution} of possible outputs in a conditional generative modeling setting. The ambiguity of the mapping is distilled in a low-dimensional latent vector, which can be randomly sampled at test time. A generator learns to map the given input, combined with this latent code, to the output. We explicitly encourage the connection between output and the latent code to be invertible. This helps prevent a many-to-one mapping from the latent code to the output during training, also known as the problem of mode collapse, and produces more diverse results. We explore several variants of this approach by employing different training objectives, network architectures, and methods of injecting the latent code. Our proposed method encourages bijective consistency between the latent encoding and output modes. We present a systematic comparison of our method and other variants on both perceptual realism and diversity.
Tasks	Image-to-Image Translation
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11586v4
PDF	http://arxiv.org/pdf/1711.11586v4.pdf
PWC	https://paperswithcode.com/paper/toward-multimodal-image-to-image-translation
Repo	https://github.com/eriklindernoren/PyTorch-GAN
Framework	pytorch

Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography


Title	Deep Learning to Improve Breast Cancer Early Detection on Screening Mammography
Authors	Li Shen, Laurie R. Margolies, Joseph H. Rothstein, Eugene Fluder, Russell B. McBride, Weiva Sieh
Abstract	The rapid development of deep learning, a family of machine learning techniques, has spurred much interest in its application to medical imaging problems. Here, we develop a deep learning algorithm that can accurately detect breast cancer on screening mammograms using an “end-to-end” training approach that efficiently leverages training datasets with either complete clinical annotation or only the cancer status (label) of the whole image. In this approach, lesion annotations are required only in the initial training stage, and subsequent stages require only image-level labels, eliminating the reliance on rarely available lesion annotations. Our all convolutional network method for classifying screening mammograms attained excellent performance in comparison with previous methods. On an independent test set of digitized film mammograms from Digital Database for Screening Mammography (DDSM), the best single model achieved a per-image AUC of 0.88, and four-model averaging improved the AUC to 0.91 (sensitivity: 86.1%, specificity: 80.1%). On a validation set of full-field digital mammography (FFDM) images from the INbreast database, the best single model achieved a per-image AUC of 0.95, and four-model averaging improved the AUC to 0.98 (sensitivity: 86.7%, specificity: 96.1%). We also demonstrate that a whole image classifier trained using our end-to-end approach on the DDSM digitized film mammograms can be transferred to INbreast FFDM images using only a subset of the INbreast data for fine-tuning and without further reliance on the availability of lesion annotations. These findings show that automatic deep learning methods can be readily trained to attain high accuracy on heterogeneous mammography platforms, and hold tremendous promise for improving clinical tools to reduce false positive and false negative screening mammography results.
Tasks	Breast Cancer Detection
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09427v5
PDF	http://arxiv.org/pdf/1708.09427v5.pdf
PWC	https://paperswithcode.com/paper/deep-learning-to-improve-breast-cancer-early
Repo	https://github.com/yuyuyu123456/CBIS-DDSM
Framework	tf

Learning the Latent “Look”: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images


Title	Learning the Latent “Look”: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images
Authors	Wei-Lin Hsiao, Kristen Grauman
Abstract	What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn a style-coherent representation. Our method leverages probabilistic polylingual topic models based on visual attributes to discover a set of latent style factors. Given a collection of unlabeled fashion images, our approach mines for the latent styles, then summarizes outfits by how they mix those styles. Our approach can organize galleries of outfits by style without requiring any style labels. Experiments on over 100K images demonstrate its promise for retrieving, mixing, and summarizing fashion images by their style.
Tasks	Topic Models
Published	2017-07-11
URL	http://arxiv.org/abs/1707.03376v2
PDF	http://arxiv.org/pdf/1707.03376v2.pdf
PWC	https://paperswithcode.com/paper/learning-the-latent-look-unsupervised
Repo	https://github.com/arodri202/dl-final-project
Framework	none

Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks


Title	Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks
Authors	Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang
Abstract	Convolutional neural networks have recently demonstrated high-quality reconstruction for single image super-resolution. However, existing methods often require a large number of network parameters and entail heavy computational loads at runtime for generating high-accuracy super-resolution results. In this paper, we propose the deep Laplacian Pyramid Super-Resolution Network for fast and accurate image super-resolution. The proposed network progressively reconstructs the sub-band residuals of high-resolution images at multiple pyramid levels. In contrast to existing methods that involve the bicubic interpolation for pre-processing (which results in large feature maps), the proposed method directly extracts features from the low-resolution input space and thereby entails low computational loads. We train the proposed network with deep supervision using the robust Charbonnier loss functions and achieve high-quality image reconstruction. Furthermore, we utilize the recursive layers to share parameters across as well as within pyramid levels, and thus drastically reduce the number of parameters. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of run-time and image quality.
Tasks	Image Reconstruction, Image Super-Resolution, Super-Resolution
Published	2017-10-04
URL	http://arxiv.org/abs/1710.01992v3
PDF	http://arxiv.org/pdf/1710.01992v3.pdf
PWC	https://paperswithcode.com/paper/fast-and-accurate-image-super-resolution-with
Repo	https://github.com/pratik-kubal/Deep-Laplacian-Pyramid-Networks
Framework	tf

Online Learning for Neural Machine Translation Post-editing


Title	Online Learning for Neural Machine Translation Post-editing
Authors	Álvaro Peris, Luis Cebrián, Francisco Casacuberta
Abstract	Neural machine translation has meant a revolution of the field. Nevertheless, post-editing the outputs of the system is mandatory for tasks requiring high translation quality. Post-editing offers a unique opportunity for improving neural machine translation systems, using online learning techniques and treating the post-edited translations as new, fresh training data. We review classical learning methods and propose a new optimization algorithm. We thoroughly compare online learning algorithms in a post-editing scenario. Results show significant improvements in translation quality and effort reduction.
Tasks	Machine Translation
Published	2017-06-10
URL	http://arxiv.org/abs/1706.03196v1
PDF	http://arxiv.org/pdf/1706.03196v1.pdf
PWC	https://paperswithcode.com/paper/online-learning-for-neural-machine
Repo	https://github.com/lvapeab/nmt-keras
Framework	tf

Video Object Segmentation with Re-identification


Title	Video Object Segmentation with Re-identification
Authors	Xiaoxiao Li, Yuankai Qi, Zhe Wang, Kai Chen, Ziwei Liu, Jianping Shi, Ping Luo, Xiaoou Tang, Chen Change Loy
Abstract	Conventional video segmentation methods often rely on temporal continuity to propagate masks. Such an assumption suffers from issues like drifting and inability to handle large displacement. To overcome these issues, we formulate an effective mechanism to prevent the target from being lost via adaptive object re-identification. Specifically, our Video Object Segmentation with Re-identification (VS-ReID) model includes a mask propagation module and a ReID module. The former module produces an initial probability map by flow warping while the latter module retrieves missing instances by adaptive matching. With these two modules iteratively applied, our VS-ReID records a global mean (Region Jaccard and Boundary F measure) of 0.699, the best performance in 2017 DAVIS Challenge.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00197v1
PDF	http://arxiv.org/pdf/1708.00197v1.pdf
PWC	https://paperswithcode.com/paper/video-object-segmentation-with-re
Repo	https://github.com/birdman9390/MetaMaskTrack
Framework	pytorch

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning


Title	Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
Authors	Victor Zhong, Caiming Xiong, Richard Socher
Abstract	A significant amount of the world’s knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly reduce the output space of generated queries. Moreover, we use rewards from in-the-loop query execution over the database to learn a policy to generate unordered parts of the query, which we show are less suitable for optimization via cross entropy loss. In addition, we will publish WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia. This dataset is required to train our model and is an order of magnitude larger than comparable datasets. By applying policy-based reinforcement learning with a query execution environment to WikiSQL, our model Seq2SQL outperforms attentional sequence to sequence models, improving execution accuracy from 35.9% to 59.4% and logical form accuracy from 23.4% to 48.3%.
Tasks	Text-To-Sql
Published	2017-08-31
URL	http://arxiv.org/abs/1709.00103v7
PDF	http://arxiv.org/pdf/1709.00103v7.pdf
PWC	https://paperswithcode.com/paper/seq2sql-generating-structured-queries-from
Repo	https://github.com/llSourcell/SQL_Database_Optimization
Framework	pytorch

The Diverse Cohort Selection Problem


Title	The Diverse Cohort Selection Problem
Authors	Candice Schumann, Samsara N. Counts, Jeffrey S. Foster, John P. Dickerson
Abstract	How should a firm allocate its limited interviewing resources to select the optimal cohort of new employees from a large set of job applicants? How should that firm allocate cheap but noisy resume screenings and expensive but in-depth in-person interviews? We view this problem through the lens of combinatorial pure exploration (CPE) in the multi-armed bandit setting, where a central learning agent performs costly exploration of a set of arms before selecting a final subset with some combinatorial structure. We generalize a recent CPE algorithm to the setting where arm pulls can have different costs and return different levels of information. We then prove theoretical upper bounds for a general class of arm-pulling strategies in this new setting. We apply our general algorithm to a real-world problem with combinatorial structure: incorporating diversity into university admissions. We take real data from admissions at one of the largest US-based computer science graduate programs and show that a simulation of our algorithm produces a cohort with hiring overall utility while spending comparable budget to the current admissions process at that university.
Tasks
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03441v5
PDF	http://arxiv.org/pdf/1709.03441v5.pdf
PWC	https://paperswithcode.com/paper/the-diverse-cohort-selection-problem
Repo	https://github.com/principledhiring/SWAP
Framework	none

Spatial As Deep: Spatial CNN for Traffic Scene Understanding


Title	Spatial As Deep: Spatial CNN for Traffic Scene Understanding
Authors	Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract	Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored. These relationships are important to learn semantic objects with strong shape priors but weak appearance coherences, such as traffic lanes, which are often occluded or not even painted on the road surface as shown in Fig. 1 (a). In this paper, we propose Spatial CNN (SCNN), which generalizes traditional deep layer-by-layer convolutions to slice-byslice convolutions within feature maps, thus enabling message passings between pixels across rows and columns in a layer. Such SCNN is particular suitable for long continuous shape structure or large objects, with strong spatial relationship but less appearance clues, such as traffic lanes, poles, and wall. We apply SCNN on a newly released very challenging traffic lane detection dataset and Cityscapse dataset. The results show that SCNN could learn the spatial relationship for structure output and significantly improves the performance. We show that SCNN outperforms the recurrent neural network (RNN) based ReNet and MRF+CNN (MRFNet) in the lane detection dataset by 8.7% and 4.6% respectively. Moreover, our SCNN won the 1st place on the TuSimple Benchmark Lane Detection Challenge, with an accuracy of 96.53%.
Tasks	Lane Detection, Scene Understanding
Published	2017-12-17
URL	http://arxiv.org/abs/1712.06080v1
PDF	http://arxiv.org/pdf/1712.06080v1.pdf
PWC	https://paperswithcode.com/paper/spatial-as-deep-spatial-cnn-for-traffic-scene
Repo	https://github.com/cardwing/Codes-for-Lane-Detection
Framework	tf

Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings


Title	Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings
Authors	Weixin Cai, Nima S. Hejazi, Alan E. Hubbard
Abstract	Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands – even millions – of null hypotheses. For high-dimensional multivariate distributions, these hypotheses may concern a wide range of parameters, with complex and unknown dependence structures among variables. In analyzing such hypothesis testing procedures, gains in efficiency and power can be achieved by performing variable reduction on the set of hypotheses prior to testing. We present in this paper an approach using data-adaptive multiple testing that serves exactly this purpose. This approach applies data mining techniques to screen the full set of covariates on equally sized partitions of the whole sample via cross-validation. This generalized screening procedure is used to create average ranks for covariates, which are then used to generate a reduced (sub)set of hypotheses, from which we compute test statistics that are subsequently subjected to standard multiple testing corrections. The principal advantage of this methodology lies in its providing valid statistical inference without the \textit{a priori} specifying which hypotheses will be tested. Here, we present the theoretical details of this approach, confirm its validity via a simulation study, and exemplify its use by applying it to the analysis of data on microRNA differential expression.
Tasks
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07008v1
PDF	http://arxiv.org/pdf/1704.07008v1.pdf
PWC	https://paperswithcode.com/paper/data-adaptive-statistics-for-multiple
Repo	https://github.com/wilsoncai1992/adaptest
Framework	none

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification


Title	Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Authors	Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy
Abstract	Despite the steady progress in video analysis led by the adoption of convolutional neural networks (CNNs), the relative improvement has been less drastic as that in 2D static image classification. Three main challenges exist including spatial (image) feature representation, temporal information representation, and model/computation complexity. It was recently shown by Carreira and Zisserman that 3D CNNs, inflated from 2D networks and pretrained on ImageNet, could be a promising way for spatial and temporal representation learning. However, as for model/computation complexity, 3D CNNs are much more expensive than 2D CNNs and prone to overfit. We seek a balance between speed and accuracy by building an effective and efficient video classification system through systematic exploration of critical network design choices. In particular, we show that it is possible to replace many of the 3D convolutions by low-cost 2D convolutions. Rather surprisingly, best result (in both speed and accuracy) is achieved when replacing the 3D convolutions at the bottom of the network, suggesting that temporal representation learning on high-level semantic features is more useful. Our conclusion generalizes to datasets with very different properties. When combined with several other cost-effective designs including separable spatial/temporal convolution and feature gating, our system results in an effective video classification system that that produces very competitive results on several action classification benchmarks (Kinetics, Something-something, UCF101 and HMDB), as well as two action detection (localization) benchmarks (JHMDB and UCF101-24).
Tasks	Action Classification, Action Detection, Action Recognition In Videos, Image Classification, Representation Learning, Video Classification
Published	2017-12-13
URL	http://arxiv.org/abs/1712.04851v2
PDF	http://arxiv.org/pdf/1712.04851v2.pdf
PWC	https://paperswithcode.com/paper/rethinking-spatiotemporal-feature-learning
Repo	https://github.com/kylemin/S3D
Framework	pytorch

Linear Ensembles of Word Embedding Models


Title	Linear Ensembles of Word Embedding Models
Authors	Avo Muromägi, Kairit Sirts, Sven Laur
Abstract	This paper explores linear methods for combining several word embedding models into an ensemble. We construct the combined models using an iterative method based on either ordinary least squares regression or the solution to the orthogonal Procrustes problem. We evaluate the proposed approaches on Estonian—a morphologically complex language, for which the available corpora for training word embeddings are relatively small. We compare both combined models with each other and with the input word embedding models using synonym and analogy tests. The results show that while using the ordinary least squares regression performs poorly in our experiments, using orthogonal Procrustes to combine several word embedding models into an ensemble model leads to 7-10% relative improvements over the mean result of the initial models in synonym tests and 19-47% in analogy tests.
Tasks	Word Embeddings
Published	2017-04-05
URL	http://arxiv.org/abs/1704.01419v1
PDF	http://arxiv.org/pdf/1704.01419v1.pdf
PWC	https://paperswithcode.com/paper/linear-ensembles-of-word-embedding-models
Repo	https://github.com/Shujian2015/meta-embedding-paper-list
Framework	none

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses


Title	Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Authors	Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas Angelard-Gontier, Yoshua Bengio, Joelle Pineau
Abstract	Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model’s predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation.
Tasks
Published	2017-08-23
URL	http://arxiv.org/abs/1708.07149v2
PDF	http://arxiv.org/pdf/1708.07149v2.pdf
PWC	https://paperswithcode.com/paper/towards-an-automatic-turing-test-learning-to
Repo	https://github.com/mike-n-7/ADEM
Framework	none

3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversarial Network


Title	3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversarial Network
Authors	Renato Hermoza, Ivan Sipiran
Abstract	We introduce a data-driven approach to aid the repairing and conservation of archaeological objects: ORGAN, an object reconstruction generative adversarial network (GAN). By using an encoder-decoder 3D deep neural network on a GAN architecture, and combining two loss objectives: a completion loss and an Improved Wasserstein GAN loss, we can train a network to effectively predict the missing geometry of damaged objects. As archaeological objects can greatly differ between them, the network is conditioned on a variable, which can be a culture, a region or any metadata of the object. In our results, we show that our method can recover most of the information from damaged objects, even in cases where more than half of the voxels are missing, without producing many errors.
Tasks	3D Reconstruction, Object Reconstruction
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06363v2
PDF	http://arxiv.org/pdf/1711.06363v2.pdf
PWC	https://paperswithcode.com/paper/3d-reconstruction-of-incomplete
Repo	https://github.com/renato145/3D-ORGAN
Framework	none