Paper Group AWR 18
Common-Description Learning: A Framework for Learning Algorithms and Generating Subproblems from Few Examples. Correlation Alignment for Unsupervised Domain Adaptation. Understanding deep learning requires rethinking generalization. Deep CORAL: Correlation Alignment for Deep Domain Adaptation. Building a Large Scale Dataset for Image Emotion Recogn …
Common-Description Learning: A Framework for Learning Algorithms and Generating Subproblems from Few Examples
Title | Common-Description Learning: A Framework for Learning Algorithms and Generating Subproblems from Few Examples |
Authors | Basem G. El-Barashy |
Abstract | Current learning algorithms face many difficulties in learning simple patterns and using them to learn more complex ones. They also require more examples than humans do to learn the same pattern, assuming no prior knowledge. In this paper, a new learning framework is introduced that is called common-description learning (CDL). This framework has been tested on 32 small multi-task datasets, and the results show that it was able to learn complex algorithms from a few number of examples. The final model is perfectly interpretable and its depth depends on the question. What is meant by depth here is that whenever needed, the model learns to break down the problem into simpler subproblems and solves them using previously learned models. Finally, we explain the capabilities of our framework in discovering complex relations in data and how it can help in improving language understanding in machines. |
Tasks | |
Published | 2016-05-01 |
URL | http://arxiv.org/abs/1605.00241v1 |
http://arxiv.org/pdf/1605.00241v1.pdf | |
PWC | https://paperswithcode.com/paper/common-description-learning-a-framework-for |
Repo | https://github.com/BasemElbarashy/CDL |
Framework | none |
Correlation Alignment for Unsupervised Domain Adaptation
Title | Correlation Alignment for Unsupervised Domain Adaptation |
Authors | Baochen Sun, Jiashi Feng, Kate Saenko |
Abstract | In this chapter, we present CORrelation ALignment (CORAL), a simple yet effective method for unsupervised domain adaptation. CORAL minimizes domain shift by aligning the second-order statistics of source and target distributions, without requiring any target labels. In contrast to subspace manifold methods, it aligns the original feature distributions of the source and target domains, rather than the bases of lower-dimensional subspaces. It is also much simpler than other distribution matching methods. CORAL performs remarkably well in extensive evaluations on standard benchmark datasets. We first describe a solution that applies a linear transformation to source features to align them with target features before classifier training. For linear classifiers, we propose to equivalently apply CORAL to the classifier weights, leading to added efficiency when the number of classifiers is small but the number and dimensionality of target examples are very high. The resulting CORAL Linear Discriminant Analysis (CORAL-LDA) outperforms LDA by a large margin on standard domain adaptation benchmarks. Finally, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (DNNs). The resulting Deep CORAL approach works seamlessly with DNNs and achieves state-of-the-art performance on standard benchmark datasets. Our code is available at:~\url{https://github.com/VisionLearningGroup/CORAL} |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2016-12-06 |
URL | http://arxiv.org/abs/1612.01939v1 |
http://arxiv.org/pdf/1612.01939v1.pdf | |
PWC | https://paperswithcode.com/paper/correlation-alignment-for-unsupervised-domain |
Repo | https://github.com/VisionLearningGroup/CORAL |
Framework | none |
Understanding deep learning requires rethinking generalization
Title | Understanding deep learning requires rethinking generalization |
Authors | Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals |
Abstract | Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models. |
Tasks | Image Classification |
Published | 2016-11-10 |
URL | http://arxiv.org/abs/1611.03530v2 |
http://arxiv.org/pdf/1611.03530v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-deep-learning-requires |
Repo | https://github.com/timbrgr/yellow-brick-road-to-MrLd-city |
Framework | tf |
Deep CORAL: Correlation Alignment for Deep Domain Adaptation
Title | Deep CORAL: Correlation Alignment for Deep Domain Adaptation |
Authors | Baochen Sun, Kate Saenko |
Abstract | Deep neural networks are able to learn powerful representations from large quantities of labeled input data, however they cannot always generalize well across changes in input distributions. Domain adaptation algorithms have been proposed to compensate for the degradation in performance due to domain shift. In this paper, we address the case when the target domain is unlabeled, requiring unsupervised adaptation. CORAL is a “frustratingly easy” unsupervised domain adaptation method that aligns the second-order statistics of the source and target distributions with a linear transformation. Here, we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (Deep CORAL). Experiments on standard benchmark datasets show state-of-the-art performance. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2016-07-06 |
URL | http://arxiv.org/abs/1607.01719v1 |
http://arxiv.org/pdf/1607.01719v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-coral-correlation-alignment-for-deep |
Repo | https://github.com/lzx6/deep-coral |
Framework | pytorch |
Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark
Title | Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark |
Authors | Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang |
Abstract | Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people’s emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs. |
Tasks | Emotion Recognition |
Published | 2016-05-09 |
URL | http://arxiv.org/abs/1605.02677v1 |
http://arxiv.org/pdf/1605.02677v1.pdf | |
PWC | https://paperswithcode.com/paper/building-a-large-scale-dataset-for-image |
Repo | https://github.com/noahj08/DeepConnotation |
Framework | tf |
Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates
Title | Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates |
Authors | Ilija Ilievski, Taimoor Akhtar, Jiashi Feng, Christine Annette Shoemaker |
Abstract | Automatically searching for optimal hyperparameter configurations is of crucial importance for applying deep learning algorithms in practice. Recently, Bayesian optimization has been proposed for optimizing hyperparameters of various machine learning algorithms. Those methods adopt probabilistic surrogate models like Gaussian processes to approximate and minimize the validation error function of hyperparameter values. However, probabilistic surrogates require accurate estimates of sufficient statistics (e.g., covariance) of the error distribution and thus need many function evaluations with a sizeable number of hyperparameters. This makes them inefficient for optimizing hyperparameters of deep learning algorithms, which are highly expensive to evaluate. In this work, we propose a new deterministic and efficient hyperparameter optimization method that employs radial basis functions as error surrogates. The proposed mixed integer algorithm, called HORD, searches the surrogate for the most promising hyperparameter values through dynamic coordinate search and requires many fewer function evaluations. HORD does well in low dimensions but it is exceptionally better in higher dimensions. Extensive evaluations on MNIST and CIFAR-10 for four deep neural networks demonstrate HORD significantly outperforms the well-established Bayesian optimization methods such as GP, SMAC, and TPE. For instance, on average, HORD is more than 6 times faster than GP-EI in obtaining the best configuration of 19 hyperparameters. |
Tasks | Gaussian Processes, Hyperparameter Optimization |
Published | 2016-07-28 |
URL | http://arxiv.org/abs/1607.08316v2 |
http://arxiv.org/pdf/1607.08316v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-hyperparameter-optimization-of-deep |
Repo | https://github.com/jekyllstein/HORDOpt.jl |
Framework | none |
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Title | A Joint Speaker-Listener-Reinforcer Model for Referring Expressions |
Authors | Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg |
Abstract | Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer’s feedback. We demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring expression datasets. Project and demo page: https://vision.cs.unc.edu/refer |
Tasks | |
Published | 2016-12-30 |
URL | http://arxiv.org/abs/1612.09542v2 |
http://arxiv.org/pdf/1612.09542v2.pdf | |
PWC | https://paperswithcode.com/paper/a-joint-speaker-listener-reinforcer-model-for |
Repo | https://github.com/mikittt/re-SLR |
Framework | none |
Item2Vec: Neural Item Embedding for Collaborative Filtering
Title | Item2Vec: Neural Item Embedding for Collaborative Filtering |
Authors | Oren Barkan, Noam Koenigstein |
Abstract | Many Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as word2vec, was shown to provide state-of-the-art results on various linguistics tasks. In this paper, we show that item-based CF can be cast in the same framework of neural word embedding. Inspired by SGNS, we describe a method we name item2vec for item-based CF that produces embedding for items in a latent space. The method is capable of inferring item-item relations even when user information is not available. We present experimental results that demonstrate the effectiveness of the item2vec method and show it is competitive with SVD. |
Tasks | |
Published | 2016-03-14 |
URL | http://arxiv.org/abs/1603.04259v3 |
http://arxiv.org/pdf/1603.04259v3.pdf | |
PWC | https://paperswithcode.com/paper/item2vec-neural-item-embedding-for |
Repo | https://github.com/hyunbool/item2vec_movie_practice |
Framework | none |
Machine Learning in Downlink Coordinated Multipoint in Heterogeneous Networks
Title | Machine Learning in Downlink Coordinated Multipoint in Heterogeneous Networks |
Authors | Faris B. Mismar, Brian L. Evans |
Abstract | We propose a method for downlink coordinated multipoint (DL CoMP) in heterogeneous fifth generation New Radio (NR) networks. The primary contribution of our paper is an algorithm to enhance the trigger of DL CoMP using online machine learning. We use support vector machine (SVM) classifiers to enhance the user downlink throughput in a realistic frequency division duplex network environment. Our simulation results show improvement in both the macro and pico base station downlink throughputs due to the informed triggering of the multiple radio streams as learned by the SVM classifier. |
Tasks | |
Published | 2016-08-30 |
URL | http://arxiv.org/abs/1608.08306v6 |
http://arxiv.org/pdf/1608.08306v6.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-in-downlink-coordinated |
Repo | https://github.com/farismismar/DL-CoMP-Machine-Learning |
Framework | none |
Jointly Extracting Relations with Class Ties via Effective Deep Ranking
Title | Jointly Extracting Relations with Class Ties via Effective Deep Ranking |
Authors | Hai Ye, Wenhan Chao, Zhunchen Luo, Zhoujun Li |
Abstract | Connections between relations in relation extraction, which we call class ties, are common. In distantly supervised scenario, one entity tuple may have multiple relation facts. Exploiting class ties between relations of one entity tuple will be promising for distantly supervised relation extraction. However, previous models are not effective or ignore to model this property. In this work, to effectively leverage class ties, we propose to make joint relation extraction with a unified model that integrates convolutional neural network (CNN) with a general pairwise ranking framework, in which three novel ranking loss functions are introduced. Additionally, an effective method is presented to relieve the severe class imbalance problem from NR (not relation) for model training. Experiments on a widely used dataset show that leveraging class ties will enhance extraction and demonstrate the effectiveness of our model to learn class ties. Our model outperforms the baselines significantly, achieving state-of-the-art performance. |
Tasks | Relation Extraction |
Published | 2016-12-22 |
URL | http://arxiv.org/abs/1612.07602v4 |
http://arxiv.org/pdf/1612.07602v4.pdf | |
PWC | https://paperswithcode.com/paper/jointly-extracting-relations-with-class-ties |
Repo | https://github.com/oceanypt/DR_RE |
Framework | none |
A Convex Surrogate Operator for General Non-Modular Loss Functions
Title | A Convex Surrogate Operator for General Non-Modular Loss Functions |
Authors | Jiaqian Yu, Matthew Blaschko |
Abstract | Empirical risk minimization frequently employs convex surrogates to underlying discrete loss functions in order to achieve computational tractability during optimization. However, classical convex surrogates can only tightly bound modular loss functions, sub-modular functions or supermodular functions separately while maintaining polynomial time computation. In this work, a novel generic convex surrogate for general non-modular loss functions is introduced, which provides for the first time a tractable solution for loss functions that are neither super-modular nor submodular. This convex surro-gate is based on a submodular-supermodular decomposition for which the existence and uniqueness is proven in this paper. It takes the sum of two convex surrogates that separately bound the supermodular component and the submodular component using slack-rescaling and the Lov{'a}sz hinge, respectively. It is further proven that this surrogate is convex , piecewise linear, an extension of the loss function, and for which subgradient computation is polynomial time. Empirical results are reported on a non-submodular loss based on the S{{\o}}rensen-Dice difference function, and a real-world face track dataset with tens of thousands of frames, demonstrating the improved performance, efficiency, and scalabil-ity of the novel convex surrogate. |
Tasks | |
Published | 2016-04-12 |
URL | http://arxiv.org/abs/1604.03373v1 |
http://arxiv.org/pdf/1604.03373v1.pdf | |
PWC | https://paperswithcode.com/paper/a-convex-surrogate-operator-for-general-non |
Repo | https://github.com/yjq8812/aistats2016 |
Framework | none |
Fast Wavenet Generation Algorithm
Title | Fast Wavenet Generation Algorithm |
Authors | Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang |
Abstract | This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advantages of our fast implementation over a naive one. While this method is presented for Wavenet, the same scheme can be applied anytime one wants to perform autoregressive generation or online prediction using a model with dilated convolution layers. The code for our method is publicly available. |
Tasks | |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09482v1 |
http://arxiv.org/pdf/1611.09482v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-wavenet-generation-algorithm |
Repo | https://github.com/tomlepaine/fast-wavenet |
Framework | tf |
DCTNet and PCANet for acoustic signal feature extraction
Title | DCTNet and PCANet for acoustic signal feature extraction |
Authors | Yin Xian, Andrew Thompson, Xiaobai Sun, Douglas Nowacek, Loren Nolte |
Abstract | We introduce the use of DCTNet, an efficient approximation and alternative to PCANet, for acoustic signal classification. In PCANet, the eigenfunctions of the local sample covariance matrix (PCA) are used as filterbanks for convolution and feature extraction. When the eigenfunctions are well approximated by the Discrete Cosine Transform (DCT) functions, each layer of of PCANet and DCTNet is essentially a time-frequency representation. We relate DCTNet to spectral feature representation methods, such as the the short time Fourier transform (STFT), spectrogram and linear frequency spectral coefficients (LFSC). Experimental results on whale vocalization data show that DCTNet improves classification rate, demonstrating DCTNet’s applicability to signal processing problems such as underwater acoustics. |
Tasks | |
Published | 2016-04-28 |
URL | http://arxiv.org/abs/1605.01755v1 |
http://arxiv.org/pdf/1605.01755v1.pdf | |
PWC | https://paperswithcode.com/paper/dctnet-and-pcanet-for-acoustic-signal-feature |
Repo | https://github.com/poline3939/DCTNet |
Framework | none |
Spectrum Estimation from Samples
Title | Spectrum Estimation from Samples |
Authors | Weihao Kong, Gregory Valiant |
Abstract | We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. The eigenvalues of the covariance of a distribution contain basic information about the distribution, including the presence or lack of structure in the distribution, the effective dimensionality of the distribution, and the applicability of higher-level machine learning and multivariate statistical tools. We consider this fundamental recovery problem in the regime where the number of samples is comparable, or even sublinear in the dimensionality of the distribution in question. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to show that the vector of eigenvalues can be accurately recovered. We provide finite–sample bounds on the expected error of the recovered eigenvalues, which imply that our estimator is asymptotically consistent as the dimensionality of the distribution and sample size tend towards infinity, even in the sublinear sample regime where the ratio of the sample size to the dimensionality tends to zero. In addition to our theoretical results, we show that our approach performs well in practice for a broad range of distributions and sample sizes. |
Tasks | |
Published | 2016-01-30 |
URL | http://arxiv.org/abs/1602.00061v5 |
http://arxiv.org/pdf/1602.00061v5.pdf | |
PWC | https://paperswithcode.com/paper/spectrum-estimation-from-samples |
Repo | https://github.com/harinath001/compbio-project |
Framework | none |
Text-based LSTM networks for Automatic Music Composition
Title | Text-based LSTM networks for Automatic Music Composition |
Authors | Keunwoo Choi, George Fazekas, Mark Sandler |
Abstract | In this paper, we introduce new methods and discuss results of text-based LSTM (Long Short-Term Memory) networks for automatic music composition. The proposed network is designed to learn relationships within text documents that represent chord progressions and drum tracks in two case studies. In the experiments, word-RNNs (Recurrent Neural Networks) show good results for both cases, while character-based RNNs (char-RNNs) only succeed to learn chord progressions. The proposed system can be used for fully automatic composition or as semi-automatic systems that help humans to compose music by controlling a diversity parameter of the model. |
Tasks | |
Published | 2016-04-18 |
URL | http://arxiv.org/abs/1604.05358v1 |
http://arxiv.org/pdf/1604.05358v1.pdf | |
PWC | https://paperswithcode.com/paper/text-based-lstm-networks-for-automatic-music |
Repo | https://github.com/keunwoochoi/lstm_real_book |
Framework | none |