Paper Group AWR 127
The StreetLearn Environment and Dataset. Variational Inference with Numerical Derivatives: variance reduction through coupling. 3D Face Modeling From Diverse Raw Scan Data. Reprogrammable Electro-Optic Nonlinear Activation Functions for Optical Neural Networks. Adaptive Gradient Methods with Dynamic Bound of Learning Rate. Unsupervised learning of …
The StreetLearn Environment and Dataset
Title | The StreetLearn Environment and Dataset |
Authors | Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Denis Teplyashin, Karl Moritz Hermann, Mateusz Malinowski, Matthew Koichi Grimes, Karen Simonyan, Koray Kavukcuoglu, Andrew Zisserman, Raia Hadsell |
Abstract | Navigation is a rich and well-grounded problem domain that drives progress in many different areas of research: perception, planning, memory, exploration, and optimisation in particular. Historically these challenges have been separately considered and solutions built that rely on stationary datasets - for example, recorded trajectories through an environment. These datasets cannot be used for decision-making and reinforcement learning, however, and in general the perspective of navigation as an interactive learning task, where the actions and behaviours of a learning agent are learned simultaneously with the perception and planning, is relatively unsupported. Thus, existing navigation benchmarks generally rely on static datasets (Geiger et al., 2013; Kendall et al., 2015) or simulators (Beattie et al., 2016; Shah et al., 2018). To support and validate research in end-to-end navigation, we present StreetLearn: an interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task. The environment code, baseline agent code, and the dataset are available at http://streetlearn.cc |
Tasks | Decision Making |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.01292v1 |
http://arxiv.org/pdf/1903.01292v1.pdf | |
PWC | https://paperswithcode.com/paper/the-streetlearn-environment-and-dataset |
Repo | https://github.com/deepmind/streetlearn |
Framework | tf |
Variational Inference with Numerical Derivatives: variance reduction through coupling
Title | Variational Inference with Numerical Derivatives: variance reduction through coupling |
Authors | Alexander Immer, Guillaume P. Dehaene |
Abstract | The Black Box Variational Inference (Ranganath et al. (2014)) algorithm provides a universal method for Variational Inference, but taking advantage of special properties of the approximation family or of the target can improve the convergence speed significantly. For example, if the approximation family is a transformation family, such as a Gaussian, then switching to the reparameterization gradient (Kingma and Welling (2014)) often yields a major reduction in gradient variance. Ultimately, reducing the variance can reduce the computational cost and yield better approximations. We present a new method to extend the reparameterization trick to more general exponential families including the Wishart, Gamma, and Student distributions. Variational Inference with Numerical Derivatives (VIND) approximates the gradient with numerical derivatives and reduces its variance using a tight coupling of the approximation family. The resulting algorithm is simple to implement and can profit from widely known couplings. Our experiments confirm that VIND effectively decreases the gradient variance and therefore improves the posterior approximation in relevant cases. It thus provides an efficient yet simple Variational Inference method for computing non-Gaussian approximations. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06914v1 |
https://arxiv.org/pdf/1906.06914v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-inference-with-numerical |
Repo | https://github.com/AlexImmer/VIND |
Framework | pytorch |
3D Face Modeling From Diverse Raw Scan Data
Title | 3D Face Modeling From Diverse Raw Scan Data |
Authors | Feng Liu, Luan Tran, Xiaoming Liu |
Abstract | Traditional 3D face models learn a latent representation of faces using linear subspaces from limited scans of a single database. The main roadblock of building a large-scale face model from diverse 3D databases lies in the lack of dense correspondence among raw scans. To address these problems, this paper proposes an innovative framework to jointly learn a nonlinear face model from a diverse set of raw 3D scan databases and establish dense point-to-point correspondence among their scans. Specifically, by treating input scans as unorganized point clouds, we explore the use of PointNet architectures for converting point clouds to identity and expression feature representations, from which the decoder networks recover their 3D face shapes. Further, we propose a weakly supervised learning approach that does not require correspondence label for the scans. We demonstrate the superior dense correspondence and representation power of our proposed method, and its contribution to single-image 3D face reconstruction. |
Tasks | 3D Face Reconstruction, Face Reconstruction |
Published | 2019-02-13 |
URL | https://arxiv.org/abs/1902.04943v3 |
https://arxiv.org/pdf/1902.04943v3.pdf | |
PWC | https://paperswithcode.com/paper/3d-face-modeling-from-diverse-raw-scan-data |
Repo | https://github.com/liuf1990/3DFC |
Framework | none |
Reprogrammable Electro-Optic Nonlinear Activation Functions for Optical Neural Networks
Title | Reprogrammable Electro-Optic Nonlinear Activation Functions for Optical Neural Networks |
Authors | Ian A. D. Williamson, Tyler W. Hughes, Momchil Minkov, Ben Bartlett, Sunil Pai, Shanhui Fan |
Abstract | We introduce an electro-optic hardware platform for nonlinear activation functions in optical neural networks. The optical-to-optical nonlinearity operates by converting a small portion of the input optical signal into an analog electric signal, which is used to intensity-modulate the original optical signal with no reduction in processing speed. Our scheme allows for complete nonlinear on-off contrast in transmission at relatively low optical power thresholds and eliminates the requirement of having additional optical sources between each layer of the network. Moreover, the activation function is reconfigurable via electrical bias, allowing it to be programmed or trained to synthesize a variety of nonlinear responses. Using numerical simulations, we demonstrate that this activation function significantly improves the expressiveness of optical neural networks, allowing them to perform well on two benchmark machine learning tasks: learning a multi-input exclusive-OR (XOR) logic function and classification of images of handwritten numbers from the MNIST dataset. The addition of the nonlinear activation function improves test accuracy on the MNIST task from 85% to 94%. |
Tasks | |
Published | 2019-03-12 |
URL | https://arxiv.org/abs/1903.04579v2 |
https://arxiv.org/pdf/1903.04579v2.pdf | |
PWC | https://paperswithcode.com/paper/reprogrammable-electro-optic-nonlinear |
Repo | https://github.com/fancompute/neuroptica |
Framework | none |
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Title | Adaptive Gradient Methods with Dynamic Bound of Learning Rate |
Authors | Liangchen Luo, Yuanhao Xiong, Yan Liu, Xu Sun |
Abstract | Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with SGD or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue but they failed to achieve considerable improvement over existing methods. In our paper, we demonstrate that extreme learning rates can lead to poor performance. We provide new variants of Adam and AMSGrad, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. We further conduct experiments on various popular tasks and models, which is often insufficient in previous work. Experimental results show that new variants can eliminate the generalization gap between adaptive methods and SGD and maintain higher learning speed early in training at the same time. Moreover, they can bring significant improvement over their prototypes, especially on complex deep networks. The implementation of the algorithm can be found at https://github.com/Luolc/AdaBound . |
Tasks | |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09843v1 |
http://arxiv.org/pdf/1902.09843v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-gradient-methods-with-dynamic-bound |
Repo | https://github.com/201419/Optimizer-PyTorch |
Framework | pytorch |
Unsupervised learning of action classes with continuous temporal embedding
Title | Unsupervised learning of action classes with continuous temporal embedding |
Authors | Anna Kukleva, Hilde Kuehne, Fadime Sener, Juergen Gall |
Abstract | The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently. One problem in this context arises from the need to define and label action boundaries to create annotations for training which is very time and cost intensive. To address this issue, we propose an unsupervised approach for learning action classes from untrimmed video sequences. To this end, we use a continuous temporal embedding of framewise features to benefit from the sequential nature of activities. Based on the latent space created by the embedding, we identify clusters of temporal segments across all videos that correspond to semantic meaningful action classes. The approach is evaluated on three challenging datasets, namely the Breakfast dataset, YouTube Instructions, and the 50Salads dataset. While previous works assumed that the videos contain the same high level activity, we furthermore show that the proposed approach can also be applied to a more general setting where the content of the videos is unknown. |
Tasks | |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04189v1 |
http://arxiv.org/pdf/1904.04189v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-action-classes-with |
Repo | https://github.com/annusha/unsup_temp_embed |
Framework | none |
Robustness of Object Recognition under Extreme Occlusion in Humans and Computational Models
Title | Robustness of Object Recognition under Extreme Occlusion in Humans and Computational Models |
Authors | Hongru Zhu, Peng Tang, Jeongho Park, Soojin Park, Alan Yuille |
Abstract | Most objects in the visual world are partially occluded, but humans can recognize them without difficulty. However, it remains unknown whether object recognition models like convolutional neural networks (CNNs) can handle real-world occlusion. It is also a question whether efforts to make these models robust to constant mask occlusion are effective for real-world occlusion. We test both humans and the above-mentioned computational models in a challenging task of object recognition under extreme occlusion, where target objects are heavily occluded by irrelevant real objects in real backgrounds. Our results show that human vision is very robust to extreme occlusion while CNNs are not, even with modifications to handle constant mask occlusion. This implies that the ability to handle constant mask occlusion does not entail robustness to real-world occlusion. As a comparison, we propose another computational model that utilizes object parts/subparts in a compositional manner to build robustness to occlusion. This performs significantly better than CNN-based models on our task with error patterns similar to humans. These findings suggest that testing under extreme occlusion can better reveal the robustness of visual recognition, and that the principle of composition can encourage such robustness. |
Tasks | Object Recognition |
Published | 2019-05-11 |
URL | https://arxiv.org/abs/1905.04598v2 |
https://arxiv.org/pdf/1905.04598v2.pdf | |
PWC | https://paperswithcode.com/paper/robustness-of-object-recognition-under |
Repo | https://github.com/mattfeng/recurrent-extreme-occlusion |
Framework | none |
Cascading Convolutional Color Constancy
Title | Cascading Convolutional Color Constancy |
Authors | Huanglin Yu, Ke Chen, Kaiqi Wang, Yanlin Qian, Zhaoxiang Zhang, Kui Jia |
Abstract | Regressing the illumination of a scene from the representations of object appearances is popularly adopted in computational color constancy. However, it’s still challenging due to intrinsic appearance and label ambiguities caused by unknown illuminants, diverse reflection property of materials and extrinsic imaging factors (such as different camera sensors). In this paper, we introduce a novel algorithm by Cascading Convolutional Color Constancy (in short, C4) to improve robustness of regression learning and achieve stable generalization capability across datasets (different cameras and scenes) in a unique framework. The proposed C4 method ensembles a series of dependent illumination hypotheses from each cascade stage via introducing a weighted multiply-accumulate loss function, which can inherently capture different modes of illuminations and explicitly enforce coarse-to-fine network optimization. Experimental results on the public Color Checker and NUS 8-Camera benchmarks demonstrate superior performance of the proposed algorithm in comparison with the state-of-the-art methods, especially for more difficult scenes. |
Tasks | Color Constancy |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.11180v1 |
https://arxiv.org/pdf/1912.11180v1.pdf | |
PWC | https://paperswithcode.com/paper/cascading-convolutional-color-constancy |
Repo | https://github.com/yhlscut/C4 |
Framework | pytorch |
A Generalized Language Model in Tensor Space
Title | A Generalized Language Model in Tensor Space |
Authors | Lipeng Zhang, Peng Zhang, Xindian Ma, Shuqin Gu, Zhan Su, Dawei Song |
Abstract | In the literature, tensors have been effectively used for capturing the context information in language models. However, the existing methods usually adopt relatively-low order tensors, which have limited expressive power in modeling language. Developing a higher-order tensor representation is challenging, in terms of deriving an effective solution and showing its generality. In this paper, we propose a language model named Tensor Space Language Model (TSLM), by utilizing tensor networks and tensor decomposition. In TSLM, we build a high-dimensional semantic space constructed by the tensor product of word vectors. Theoretically, we prove that such tensor representation is a generalization of the n-gram language model. We further show that this high-order tensor representation can be decomposed to a recursive calculation of conditional probability for language modeling. The experimental results on Penn Tree Bank (PTB) dataset and WikiText benchmark demonstrate the effectiveness of TSLM. |
Tasks | Language Modelling, Tensor Networks |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11167v1 |
http://arxiv.org/pdf/1901.11167v1.pdf | |
PWC | https://paperswithcode.com/paper/a-generalized-language-model-in-tensor-space |
Repo | https://github.com/TJUIRLAB/AAAI19-TSLM |
Framework | pytorch |
Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning
Title | Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning |
Authors | Xi Shen, Alexei A. Efros, Mathieu Aubry |
Abstract | Our goal in this paper is to discover near duplicate patterns in large collections of artworks. This is harder than standard instance mining due to differences in the artistic media (oil, pastel, drawing, etc), and imperfections inherent in the copying process. The key technical insight is to adapt a standard deep feature to this task by fine-tuning it on the specific art collection using self-supervised learning. More specifically, spatial consistency between neighbouring feature matches is used as supervisory fine-tuning signal. The adapted feature leads to more accurate style-invariant matching, and can be used with a standard discovery approach, based on geometric verification, to identify duplicate patterns in the dataset. The approach is evaluated on several different datasets and shows surprisingly good qualitative discovery results. For quantitative evaluation of the method, we annotated 273 near duplicate details in a dataset of 1587 artworks attributed to Jan Brueghel and his workshop. Beyond artwork, we also demonstrate improvement on localization on the Oxford5K photo dataset as well as on historical photograph localization on the Large Time Lags Location (LTLL) dataset. |
Tasks | Image Retrieval |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.02678v2 |
http://arxiv.org/pdf/1903.02678v2.pdf | |
PWC | https://paperswithcode.com/paper/discovering-visual-patterns-in-art |
Repo | https://github.com/XiSHEN0220/ArtMiner |
Framework | pytorch |
MOSNet: Deep Learning based Objective Assessment for Voice Conversion
Title | MOSNet: Deep Learning based Objective Assessment for Voice Conversion |
Authors | Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang |
Abstract | Existing objective evaluation metrics for voice conversion (VC) are not always correlated with human perception. Therefore, training VC models with such criteria may not effectively improve naturalness and similarity of converted speech. In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech. We adopt the convolutional and recurrent neural network models to build a mean opinion score (MOS) predictor, termed as MOSNet. The proposed models are tested on large-scale listening test results of the Voice Conversion Challenge (VCC) 2018. Experimental results show that the predicted scores of the proposed MOSNet are highly correlated with human MOS ratings at the system level while being fairly correlated with human MOS ratings at the utterance level. Meanwhile, we have modified MOSNet to predict the similarity scores, and the preliminary results show that the predicted scores are also fairly correlated with human ratings. These results confirm that the proposed models could be used as a computational evaluator to measure the MOS of VC systems to reduce the need for expensive human rating. |
Tasks | Voice Conversion |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08352v2 |
http://arxiv.org/pdf/1904.08352v2.pdf | |
PWC | https://paperswithcode.com/paper/mosnet-deep-learning-based-objective |
Repo | https://github.com/aliutkus/speechmetrics |
Framework | none |
Neural Message Passing for Multi-Label Classification
Title | Neural Message Passing for Multi-Label Classification |
Authors | Jack Lanchantin, Arshdeep Sekhon, Yanjun Qi |
Abstract | Multi-label classification (MLC) is the task of assigning a set of target labels for a given sample. Modeling the combinatorial label interactions in MLC has been a long-haul challenge. We propose Label Message Passing (LaMP) Neural Networks to efficiently model the joint prediction of multiple labels. LaMP treats labels as nodes on a label-interaction graph and computes the hidden representation of each label node conditioned on the input using attention-based neural message passing. Attention enables LaMP to assign different importance to neighbor nodes per label, learning how labels interact (implicitly). The proposed models are simple, accurate, interpretable, structure-agnostic, and applicable for predicting dense labels since LaMP is incredibly parallelizable. We validate the benefits of LaMP on seven real-world MLC datasets, covering a broad spectrum of input/output types and outperforming the state-of-the-art results. Notably, LaMP enables intuitive interpretation of how classifying each label depends on the elements of a sample and at the same time rely on its interaction with other labels. We provide our code and datasets at https://github.com/QData/LaMP |
Tasks | Multi-Label Classification |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08049v1 |
http://arxiv.org/pdf/1904.08049v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-message-passing-for-multi-label-1 |
Repo | https://github.com/QData/LaMP |
Framework | pytorch |
Hyperbolic Interaction Model For Hierarchical Multi-Label Classification
Title | Hyperbolic Interaction Model For Hierarchical Multi-Label Classification |
Authors | Boli Chen, Xin Huang, Lin Xiao, Zixin Cai, Liping Jing |
Abstract | Different from the traditional classification tasks which assume mutual exclusion of labels, hierarchical multi-label classification (HMLC) aims to assign multiple labels to every instance with the labels organized under hierarchical relations. Besides the labels, since linguistic ontologies are intrinsic hierarchies, the conceptual relations between words can also form hierarchical structures. Thus it can be a challenge to learn mappings from word hierarchies to label hierarchies. We propose to model the word and label hierarchies by embedding them jointly in the hyperbolic space. The main reason is that the tree-likeness of the hyperbolic space matches the complexity of symbolic data with hierarchical structures. A new Hyperbolic Interaction Model (HyperIM) is designed to learn the label-aware document representations and make predictions for HMLC. Extensive experiments are conducted on three benchmark datasets. The results have demonstrated that the new model can realistically capture the complex data structures and further improve the performance for HMLC comparing with the state-of-the-art methods. To facilitate future research, our code is publicly available. |
Tasks | Multi-Label Classification |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10802v2 |
https://arxiv.org/pdf/1905.10802v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperbolic-interaction-model-for-hierarchical |
Repo | https://github.com/bcol23/HyperIM |
Framework | pytorch |
Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework
Title | Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework |
Authors | Junfan Chen, Richong Zhang, Yongyi Mao, Hongyu Guo, Jie Xu |
Abstract | Distant supervision for relation extraction enables one to effectively acquire structured relations out of very large text corpora with less human efforts. Nevertheless, most of the prior-art models for such tasks assume that the given text can be noisy, but their corresponding labels are clean. Such unrealistic assumption is contradictory with the fact that the given labels are often noisy as well, thus leading to significant performance degradation of those models on real-world data. To cope with this challenge, we propose a novel label-denoising framework that combines neural network with probabilistic modelling, which naturally takes into account the noisy labels during learning. We empirically demonstrate that our approach significantly improves the current art in uncovering the ground-truth relation labels. |
Tasks | Denoising, Relation Extraction |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05448v1 |
https://arxiv.org/pdf/1909.05448v1.pdf | |
PWC | https://paperswithcode.com/paper/uncover-the-ground-truth-relations-in-distant |
Repo | https://github.com/AlbertChen1991/nEM |
Framework | pytorch |
Icebreaker: Element-wise Active Information Acquisition with Bayesian Deep Latent Gaussian Model
Title | Icebreaker: Element-wise Active Information Acquisition with Bayesian Deep Latent Gaussian Model |
Authors | Wenbo Gong, Sebastian Tschiatschek, Richard Turner, Sebastian Nowozin, José Miguel Hernández-Lobato, Cheng Zhang |
Abstract | In this paper we introduce the ice-start problem, i.e., the challenge of deploying machine learning models when only little or no training data is initially available, and acquiring each feature element of data is associated with costs. This setting is representative for the real-world machine learning applications. For instance, in the health-care domain, when training an AI system for predicting patient metrics from lab tests, obtaining every single measurement comes with a high cost. Active learning, where only the label is associated with a cost does not apply to such problem, because performing all possible lab tests to acquire a new training datum would be costly, as well as unnecessary due to redundancy. We propose Icebreaker, a principled framework to approach the ice-start problem. Icebreaker uses a full Bayesian Deep Latent Gaussian Model (BELGAM) with a novel inference method. Our proposed method combines recent advances in amortized inference and stochastic gradient MCMC to enable fast and accurate posterior inference. By utilizing BELGAM’s ability to fully quantify model uncertainty, we also propose two information acquisition functions for imputation and active prediction problems. We demonstrate that BELGAM performs significantly better than the previous VAE (Variational autoencoder) based models, when the data set size is small, using both machine learning benchmarks and real-world recommender systems and health-care applications. Moreover, based on BELGAM, Icebreaker further improves the performance and demonstrate the ability to use minimum amount of the training data to obtain the highest test time performance. |
Tasks | Active Learning, Imputation, Recommendation Systems |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04537v2 |
https://arxiv.org/pdf/1908.04537v2.pdf | |
PWC | https://paperswithcode.com/paper/icebreaker-element-wise-active-information |
Repo | https://github.com/microsoft/Icebreaker |
Framework | pytorch |