May 7, 2019

2884 words 14 mins read

Paper Group AWR 18

Common-Description Learning: A Framework for Learning Algorithms and Generating Subproblems from Few Examples. Correlation Alignment for Unsupervised Domain Adaptation. Understanding deep learning requires rethinking generalization. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. Building a Large Scale Dataset for Image Emotion Recogn …

Common-Description Learning: A Framework for Learning Algorithms and Generating Subproblems from Few Examples


Title	Common-Description Learning: A Framework for Learning Algorithms and Generating Subproblems from Few Examples
Authors	Basem G. El-Barashy
Abstract	Current learning algorithms face many difficulties in learning simple patterns and using them to learn more complex ones. They also require more examples than humans do to learn the same pattern, assuming no prior knowledge. In this paper, a new learning framework is introduced that is called common-description learning (CDL). This framework has been tested on 32 small multi-task datasets, and the results show that it was able to learn complex algorithms from a few number of examples. The final model is perfectly interpretable and its depth depends on the question. What is meant by depth here is that whenever needed, the model learns to break down the problem into simpler subproblems and solves them using previously learned models. Finally, we explain the capabilities of our framework in discovering complex relations in data and how it can help in improving language understanding in machines.
Tasks
Published	2016-05-01
URL	http://arxiv.org/abs/1605.00241v1
PDF	http://arxiv.org/pdf/1605.00241v1.pdf
PWC	https://paperswithcode.com/paper/common-description-learning-a-framework-for
Repo	https://github.com/BasemElbarashy/CDL
Framework	none

Correlation Alignment for Unsupervised Domain Adaptation


Title	Correlation Alignment for Unsupervised Domain Adaptation
Authors	Baochen Sun, Jiashi Feng, Kate Saenko
Abstract	In this chapter, we present CORrelation ALignment (CORAL), a simple yet effective method for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lower-dimensional subspaces. It is also much simpler than other distribution matching methods. CORAL performs remarkably well in extensive evaluations on standard benchmark datasets. We first describe a solution that applies a linear transformation to source features to align them with target features before classifier training. For linear classifiers, we propose to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high. The resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a large margin on standard domain adaptation benchmarks. Finally, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (DNNs). The resulting Deep CORAL approach works seamlessly with DNNs and achieves state-of-the-art performance on standard benchmark datasets. Our code is available at:~\url{https://github.com/VisionLearningGroup/CORAL}
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2016-12-06
URL	http://arxiv.org/abs/1612.01939v1
PDF	http://arxiv.org/pdf/1612.01939v1.pdf
PWC	https://paperswithcode.com/paper/correlation-alignment-for-unsupervised-domain
Repo	https://github.com/VisionLearningGroup/CORAL
Framework	none

Understanding deep learning requires rethinking generalization


Title	Understanding deep learning requires rethinking generalization
Authors	Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals
Abstract	Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.
Tasks	Image Classification
Published	2016-11-10
URL	http://arxiv.org/abs/1611.03530v2
PDF	http://arxiv.org/pdf/1611.03530v2.pdf
PWC	https://paperswithcode.com/paper/understanding-deep-learning-requires
Repo	https://github.com/timbrgr/yellow-brick-road-to-MrLd-city
Framework	tf

Deep CORAL: Correlation Alignment for Deep Domain Adaptation


Title	Deep CORAL: Correlation Alignment for Deep Domain Adaptation
Authors	Baochen Sun, Kate Saenko
Abstract	Deep neural networks are able to learn powerful representations from large quantities of labeled input data, however they cannot always generalize well across changes in input distributions. Domain adaptation algorithms have been proposed to compensate for the degradation in performance due to domain shift. In this paper, we address the case when the target domain is unlabeled, requiring unsupervised adaptation. CORAL is a “frustratingly easy” unsupervised domain adaptation method that aligns the second-order statistics of the source and target distributions with a linear transformation. Here, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (Deep CORAL). Experiments on standard benchmark datasets show state-of-the-art performance.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2016-07-06
URL	http://arxiv.org/abs/1607.01719v1
PDF	http://arxiv.org/pdf/1607.01719v1.pdf
PWC	https://paperswithcode.com/paper/deep-coral-correlation-alignment-for-deep
Repo	https://github.com/lzx6/deep-coral
Framework	pytorch

Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark


Title	Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark
Authors	Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang
Abstract	Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people’s emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs.
Tasks	Emotion Recognition
Published	2016-05-09
URL	http://arxiv.org/abs/1605.02677v1
PDF	http://arxiv.org/pdf/1605.02677v1.pdf
PWC	https://paperswithcode.com/paper/building-a-large-scale-dataset-for-image
Repo	https://github.com/noahj08/DeepConnotation
Framework	tf

Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates


Title	Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates
Authors	Ilija Ilievski, Taimoor Akhtar, Jiashi Feng, Christine Annette Shoemaker
Abstract	Automatically searching for optimal hyperparameter configurations is of crucial importance for applying deep learning algorithms in practice. Recently, Bayesian optimization has been proposed for optimizing hyperparameters of various machine learning algorithms. Those methods adopt probabilistic surrogate models like Gaussian processes to approximate and minimize the validation error function of hyperparameter values. However, probabilistic surrogates require accurate estimates of sufficient statistics (e.g., covariance) of the error distribution and thus need many function evaluations with a sizeable number of hyperparameters. This makes them inefficient for optimizing hyperparameters of deep learning algorithms, which are highly expensive to evaluate. In this work, we propose a new deterministic and efficient hyperparameter optimization method that employs radial basis functions as error surrogates. The proposed mixed integer algorithm, called HORD, searches the surrogate for the most promising hyperparameter values through dynamic coordinate search and requires many fewer function evaluations. HORD does well in low dimensions but it is exceptionally better in higher dimensions. Extensive evaluations on MNIST and CIFAR-10 for four deep neural networks demonstrate HORD significantly outperforms the well-established Bayesian optimization methods such as GP, SMAC, and TPE. For instance, on average, HORD is more than 6 times faster than GP-EI in obtaining the best configuration of 19 hyperparameters.
Tasks	Gaussian Processes, Hyperparameter Optimization
Published	2016-07-28
URL	http://arxiv.org/abs/1607.08316v2
PDF	http://arxiv.org/pdf/1607.08316v2.pdf
PWC	https://paperswithcode.com/paper/efficient-hyperparameter-optimization-of-deep
Repo	https://github.com/jekyllstein/HORDOpt.jl
Framework	none

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions


Title	A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Authors	Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg
Abstract	Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer’s feedback. We demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring expression datasets. Project and demo page: https://vision.cs.unc.edu/refer
Tasks
Published	2016-12-30
URL	http://arxiv.org/abs/1612.09542v2
PDF	http://arxiv.org/pdf/1612.09542v2.pdf
PWC	https://paperswithcode.com/paper/a-joint-speaker-listener-reinforcer-model-for
Repo	https://github.com/mikittt/re-SLR
Framework	none

Item2Vec: Neural Item Embedding for Collaborative Filtering


Title	Item2Vec: Neural Item Embedding for Collaborative Filtering
Authors	Oren Barkan, Noam Koenigstein
Abstract	Many Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as word2vec, was shown to provide state-of-the-art results on various linguistics tasks. In this paper, we show that item-based CF can be cast in the same framework of neural word embedding. Inspired by SGNS, we describe a method we name item2vec for item-based CF that produces embedding for items in a latent space. The method is capable of inferring item-item relations even when user information is not available. We present experimental results that demonstrate the effectiveness of the item2vec method and show it is competitive with SVD.
Tasks
Published	2016-03-14
URL	http://arxiv.org/abs/1603.04259v3
PDF	http://arxiv.org/pdf/1603.04259v3.pdf
PWC	https://paperswithcode.com/paper/item2vec-neural-item-embedding-for
Repo	https://github.com/hyunbool/item2vec_movie_practice
Framework	none

Machine Learning in Downlink Coordinated Multipoint in Heterogeneous Networks


Title	Machine Learning in Downlink Coordinated Multipoint in Heterogeneous Networks
Authors	Faris B. Mismar, Brian L. Evans
Abstract	We propose a method for downlink coordinated multipoint (DL CoMP) in heterogeneous fifth generation New Radio (NR) networks. The primary contribution of our paper is an algorithm to enhance the trigger of DL CoMP using online machine learning. We use support vector machine (SVM) classifiers to enhance the user downlink throughput in a realistic frequency division duplex network environment. Our simulation results show improvement in both the macro and pico base station downlink throughputs due to the informed triggering of the multiple radio streams as learned by the SVM classifier.
Tasks
Published	2016-08-30
URL	http://arxiv.org/abs/1608.08306v6
PDF	http://arxiv.org/pdf/1608.08306v6.pdf
PWC	https://paperswithcode.com/paper/machine-learning-in-downlink-coordinated
Repo	https://github.com/farismismar/DL-CoMP-Machine-Learning
Framework	none

Jointly Extracting Relations with Class Ties via Effective Deep Ranking


Title	Jointly Extracting Relations with Class Ties via Effective Deep Ranking
Authors	Hai Ye, Wenhan Chao, Zhunchen Luo, Zhoujun Li
Abstract	Connections between relations in relation extraction, which we call class ties, are common. In distantly supervised scenario, one entity tuple may have multiple relation facts. Exploiting class ties between relations of one entity tuple will be promising for distantly supervised relation extraction. However, previous models are not effective or ignore to model this property. In this work, to effectively leverage class ties, we propose to make joint relation extraction with a unified model that integrates convolutional neural network (CNN) with a general pairwise ranking framework, in which three novel ranking loss functions are introduced. Additionally, an effective method is presented to relieve the severe class imbalance problem from NR (not relation) for model training. Experiments on a widely used dataset show that leveraging class ties will enhance extraction and demonstrate the effectiveness of our model to learn class ties. Our model outperforms the baselines significantly, achieving state-of-the-art performance.
Tasks	Relation Extraction
Published	2016-12-22
URL	http://arxiv.org/abs/1612.07602v4
PDF	http://arxiv.org/pdf/1612.07602v4.pdf
PWC	https://paperswithcode.com/paper/jointly-extracting-relations-with-class-ties
Repo	https://github.com/oceanypt/DR_RE
Framework	none

A Convex Surrogate Operator for General Non-Modular Loss Functions


Title	A Convex Surrogate Operator for General Non-Modular Loss Functions
Authors	Jiaqian Yu, Matthew Blaschko
Abstract	Empirical risk minimization frequently employs convex surrogates to underlying discrete loss functions in order to achieve computational tractability during optimization. However, classical convex surrogates can only tightly bound modular loss functions, sub-modular functions or supermodular functions separately while maintaining polynomial time computation. In this work, a novel generic convex surrogate for general non-modular loss functions is introduced, which provides for the first time a tractable solution for loss functions that are neither super-modular nor submodular. This convex surro-gate is based on a submodular-supermodular decomposition for which the existence and uniqueness is proven in this paper. It takes the sum of two convex surrogates that separately bound the supermodular component and the submodular component using slack-rescaling and the Lov{'a}sz hinge, respectively. It is further proven that this surrogate is convex , piecewise linear, an extension of the loss function, and for which subgradient computation is polynomial time. Empirical results are reported on a non-submodular loss based on the S{{\o}}rensen-Dice difference function, and a real-world face track dataset with tens of thousands of frames, demonstrating the improved performance, efficiency, and scalabil-ity of the novel convex surrogate.
Tasks
Published	2016-04-12
URL	http://arxiv.org/abs/1604.03373v1
PDF	http://arxiv.org/pdf/1604.03373v1.pdf
PWC	https://paperswithcode.com/paper/a-convex-surrogate-operator-for-general-non
Repo	https://github.com/yjq8812/aistats2016
Framework	none

Fast Wavenet Generation Algorithm


Title	Fast Wavenet Generation Algorithm
Authors	Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang
Abstract	This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advantages of our fast implementation over a naive one. While this method is presented for Wavenet, the same scheme can be applied anytime one wants to perform autoregressive generation or online prediction using a model with dilated convolution layers. The code for our method is publicly available.
Tasks
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09482v1
PDF	http://arxiv.org/pdf/1611.09482v1.pdf
PWC	https://paperswithcode.com/paper/fast-wavenet-generation-algorithm
Repo	https://github.com/tomlepaine/fast-wavenet
Framework	tf

DCTNet and PCANet for acoustic signal feature extraction


Title	DCTNet and PCANet for acoustic signal feature extraction
Authors	Yin Xian, Andrew Thompson, Xiaobai Sun, Douglas Nowacek, Loren Nolte
Abstract	We introduce the use of DCTNet, an efficient approximation and alternative to PCANet, for acoustic signal classification. In PCANet, the eigenfunctions of the local sample covariance matrix (PCA) are used as filterbanks for convolution and feature extraction. When the eigenfunctions are well approximated by the Discrete Cosine Transform (DCT) functions, each layer of of PCANet and DCTNet is essentially a time-frequency representation. We relate DCTNet to spectral feature representation methods, such as the the short time Fourier transform (STFT), spectrogram and linear frequency spectral coefficients (LFSC). Experimental results on whale vocalization data show that DCTNet improves classification rate, demonstrating DCTNet’s applicability to signal processing problems such as underwater acoustics.
Tasks
Published	2016-04-28
URL	http://arxiv.org/abs/1605.01755v1
PDF	http://arxiv.org/pdf/1605.01755v1.pdf
PWC	https://paperswithcode.com/paper/dctnet-and-pcanet-for-acoustic-signal-feature
Repo	https://github.com/poline3939/DCTNet
Framework	none

Spectrum Estimation from Samples


Title	Spectrum Estimation from Samples
Authors	Weihao Kong, Gregory Valiant
Abstract	We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. The eigenvalues of the covariance of a distribution contain basic information about the distribution, including the presence or lack of structure in the distribution, the effective dimensionality of the distribution, and the applicability of higher-level machine learning and multivariate statistical tools. We consider this fundamental recovery problem in the regime where the number of samples is comparable, or even sublinear in the dimensionality of the distribution in question. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to show that the vector of eigenvalues can be accurately recovered. We provide finite–sample bounds on the expected error of the recovered eigenvalues, which imply that our estimator is asymptotically consistent as the dimensionality of the distribution and sample size tend towards infinity, even in the sublinear sample regime where the ratio of the sample size to the dimensionality tends to zero. In addition to our theoretical results, we show that our approach performs well in practice for a broad range of distributions and sample sizes.
Tasks
Published	2016-01-30
URL	http://arxiv.org/abs/1602.00061v5
PDF	http://arxiv.org/pdf/1602.00061v5.pdf
PWC	https://paperswithcode.com/paper/spectrum-estimation-from-samples
Repo	https://github.com/harinath001/compbio-project
Framework	none

Text-based LSTM networks for Automatic Music Composition


Title	Text-based LSTM networks for Automatic Music Composition
Authors	Keunwoo Choi, George Fazekas, Mark Sandler
Abstract	In this paper, we introduce new methods and discuss results of text-based LSTM (Long Short-Term Memory) networks for automatic music composition. The proposed network is designed to learn relationships within text documents that represent chord progressions and drum tracks in two case studies. In the experiments, word-RNNs (Recurrent Neural Networks) show good results for both cases, while character-based RNNs (char-RNNs) only succeed to learn chord progressions. The proposed system can be used for fully automatic composition or as semi-automatic systems that help humans to compose music by controlling a diversity parameter of the model.
Tasks
Published	2016-04-18
URL	http://arxiv.org/abs/1604.05358v1
PDF	http://arxiv.org/pdf/1604.05358v1.pdf
PWC	https://paperswithcode.com/paper/text-based-lstm-networks-for-automatic-music
Repo	https://github.com/keunwoochoi/lstm_real_book
Framework	none