October 20, 2019

3296 words 16 mins read

Paper Group AWR 259

Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Sparse Pursuit and Dictionary Learning for Blind Source Separation in Polyphonic Music Recordings. The Devil of Face Recognition is in the Noise. Quantifying the dynamics of topical fluctuations in language. Learning to Embed Sentences Using Attentive Recursive Trees. A Surve …

Tunability: Importance of Hyperparameters of Machine Learning Algorithms


Title	Tunability: Importance of Hyperparameters of Machine Learning Algorithms
Authors	Philipp Probst, Bernd Bischl, Anne-Laure Boulesteix
Abstract	Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view, define data-based defaults and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to chose adequate hyperparameter spaces for tuning.
Tasks
Published	2018-02-26
URL	http://arxiv.org/abs/1802.09596v3
PDF	http://arxiv.org/pdf/1802.09596v3.pdf
PWC	https://paperswithcode.com/paper/tunability-importance-of-hyperparameters-of
Repo	https://github.com/PhilippPro/tunability
Framework	none


Title	Sparse Pursuit and Dictionary Learning for Blind Source Separation in Polyphonic Music Recordings
Authors	Sören Schulze, Emily J. King
Abstract	We propose a novel method for the blind separation of single-channel audio signals produced by the mixed sounds of musical instruments. While the approach of applying non-negative matrix factorization (NMF) has been studied in many papers, it does not make use of the pitch-invariance that the sounds of many instruments exhibit. This limitation can be overcome by using tensor factorization, in which context the use of log-frequency spectrograms was initiated, but this still requires the specific tuning of the instruments to be hard-coded into the algorithm. We develop a general-purpose sparse pursuit method that matches a discrete spectrum with given shifted continuous patterns. We first use it in order to transform our audio signal into a log-frequency spectrogram that shares properties with the mel spectrogram but is applicable to a wider frequency range. Then, we use the same algorithm to identify patterns from instrument sounds in the spectrogram. The relative amplitudes of the harmonics are saved in a dictionary, which is trained via a modified version of Adam. For a realistic monaural piece with acoustic recorder and violin, we achieve qualitatively good separation with a signal-to-distortion ratio (SDR) of 13.7 dB, a signal-to-interference ratio (SIR) of 28.1 dB, and a signal-to-artifacts ratio (SAR) of 13.9 dB, averaged over the instruments.
Tasks	Dictionary Learning
Published	2018-06-01
URL	https://arxiv.org/abs/1806.00273v3
PDF	https://arxiv.org/pdf/1806.00273v3.pdf
PWC	https://paperswithcode.com/paper/musical-instrument-separation-on-shift
Repo	https://github.com/ybayle/ReproducibleResearchCode
Framework	none

The Devil of Face Recognition is in the Noise


Title	The Devil of Face Recognition is in the Noise
Authors	Fei Wang, Liren Chen, Cheng Li, Shiyao Huang, Yanjie Chen, Chen Qian, Chen Change Loy
Abstract	The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. 3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. The IMDb-Face dataset has been released on https://github.com/fwang91/IMDb-Face.
Tasks	Face Recognition
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11649v1
PDF	http://arxiv.org/pdf/1807.11649v1.pdf
PWC	https://paperswithcode.com/paper/the-devil-of-face-recognition-is-in-the-noise
Repo	https://github.com/fwang91/IMDb-Face
Framework	none

Quantifying the dynamics of topical fluctuations in language


Title	Quantifying the dynamics of topical fluctuations in language
Authors	Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith
Abstract	The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide variety of reasons, including purely random sampling effects, or because corpora are composed of contemporary media and fiction texts within which the underlying topics ebb and flow with cultural and socio-political trends. In this work, we introduce a simple model for controlling for topical fluctuations in corpora - the topical-cultural advection model - and demonstrate how it provides a robust baseline of variability in word frequency changes over time. We validate the model on a diachronic corpus spanning two centuries, and a carefully-controlled artificial language change scenario, and then use it to correct for topical fluctuations in historical time series. Finally, we use the model to show that the emergence of new words typically corresponds with the rise of a trending topic. This suggests that some lexical innovations occur due to growing communicative need in a subspace of the lexicon, and that the topical-cultural advection model can be used to quantify this.
Tasks	Time Series
Published	2018-06-02
URL	https://arxiv.org/abs/1806.00699v3
PDF	https://arxiv.org/pdf/1806.00699v3.pdf
PWC	https://paperswithcode.com/paper/quantifying-the-dynamics-of-topical
Repo	https://github.com/andreskarjus/topical_cultural_advection_model
Framework	none

Learning to Embed Sentences Using Attentive Recursive Trees


Title	Learning to Embed Sentences Using Attentive Recursive Trees
Authors	Jiaxin Shi, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang
Abstract	Sentence embedding is an effective feature representation for most deep learning-based NLP tasks. One prevailing line of methods is using recursive latent tree-structured networks to embed sentences with task-specific structures. However, existing models have no explicit mechanism to emphasize task-informative words in the tree structure. To this end, we propose an Attentive Recursive Tree model (AR-Tree), where the words are dynamically located according to their importance in the task. Specifically, we construct the latent tree for a sentence in a proposed important-first strategy, and place more attentive words nearer to the root; thus, AR-Tree can inherently emphasize important words during the bottom-up composition of the sentence embedding. We propose an end-to-end reinforced training strategy for AR-Tree, which is demonstrated to consistently outperform, or be at least comparable to, the state-of-the-art sentence embedding methods on three sentence understanding tasks.
Tasks	Sentence Embedding
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02338v2
PDF	http://arxiv.org/pdf/1811.02338v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-embed-sentences-using-attentive
Repo	https://github.com/shijx12/AR-Tree
Framework	pytorch

A Survey on Deep Learning Methods for Robot Vision


Title	A Survey on Deep Learning Methods for Robot Vision
Authors	Javier Ruiz-del-Solar, Patricio Loncomilla, Naiomi Soto
Abstract	Deep learning has allowed a paradigm shift in pattern recognition, from using hand-crafted features together with statistical classifiers to using general-purpose learning procedures for learning data-driven representations, features, and classifiers together. The application of this new paradigm has been particularly successful in computer vision, in which the development of deep learning methods for vision applications has become a hot research topic. Given that deep learning has already attracted the attention of the robot vision community, the main purpose of this survey is to address the use of deep learning in robot vision. To achieve this, a comprehensive overview of deep learning and its usage in computer vision is given, that includes a description of the most frequently used neural models and their main application areas. Then, the standard methodology and tools used for designing deep-learning based vision systems are presented. Afterwards, a review of the principal work using deep learning in robot vision is presented, as well as current and future trends related to the use of deep learning in robotics. This survey is intended to be a guide for the developers of robot vision systems.
Tasks
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10862v1
PDF	http://arxiv.org/pdf/1803.10862v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-deep-learning-methods-for-robot
Repo	https://github.com/smrjan/robotic-vision
Framework	pytorch

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking


Title	Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking
Authors	Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, Ming-Hsuan Yang
Abstract	Discriminative Correlation Filters (DCF) are efficient in visual tracking but suffer from unwanted boundary effects. Spatially Regularized DCF (SRDCF) has been suggested to resolve this issue by enforcing spatial penalty on DCF coefficients, which, inevitably, improves the tracking performance at the price of increasing complexity. To tackle online updating, SRDCF formulates its model on multiple training images, further adding difficulties in improving efficiency. In this work, by introducing temporal regularization to SRDCF with single sample, we present our spatial-temporal regularized correlation filters (STRCF). Motivated by online Passive-Agressive (PA) algorithm, we introduce the temporal regularization to SRDCF with single sample, thus resulting in our spatial-temporal regularized correlation filters (STRCF). The STRCF formulation can not only serve as a reasonable approximation to SRDCF with multiple training samples, but also provide a more robust appearance model than SRDCF in the case of large appearance variations. Besides, it can be efficiently solved via the alternating direction method of multipliers (ADMM). By incorporating both temporal and spatial regularization, our STRCF can handle boundary effects without much loss in efficiency and achieve superior performance over SRDCF in terms of accuracy and speed. Experiments are conducted on three benchmark datasets: OTB-2015, Temple-Color, and VOT-2016. Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5.4% and 3.6% AUC score on OTB-2015 and Temple-Color, respectively. Moreover, STRCF combined with CNN features also performs favorably against state-of-the-art CNN-based trackers and achieves an AUC score of 68.3% on OTB-2015.
Tasks	Visual Object Tracking, Visual Tracking
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08679v1
PDF	http://arxiv.org/pdf/1803.08679v1.pdf
PWC	https://paperswithcode.com/paper/learning-spatial-temporal-regularized
Repo	https://github.com/lifeng9472/STRCF
Framework	none

SGM: Sequence Generation Model for Multi-label Classification


Title	SGM: Sequence Generation Model for Multi-label Classification
Authors	Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, Houfeng Wang
Abstract	Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations between labels. Besides, different parts of the text can contribute differently for predicting different labels, which is not considered by existing models. In this paper, we propose to view the multi-label classification task as a sequence generation problem, and apply a sequence generation model with a novel decoder structure to solve it. Extensive experimental results show that our proposed methods outperform previous work by a substantial margin. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.
Tasks	Multi-Label Classification
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04822v3
PDF	http://arxiv.org/pdf/1806.04822v3.pdf
PWC	https://paperswithcode.com/paper/sgm-sequence-generation-model-for-multi-label
Repo	https://github.com/lancopku/SGM
Framework	pytorch

Training Neural Networks by Using Power Linear Units (PoLUs)


Title	Training Neural Networks by Using Power Linear Units (PoLUs)
Authors	Yikang Li, Pak Lun Kevin Ding, Baoxin Li
Abstract	In this paper, we introduce “Power Linear Unit” (PoLU) which increases the nonlinearity capacity of a neural network and thus helps improving its performance. PoLU adopts several advantages of previously proposed activation functions. First, the output of PoLU for positive inputs is designed to be identity to avoid the gradient vanishing problem. Second, PoLU has a non-zero output for negative inputs such that the output mean of the units is close to zero, hence reducing the bias shift effect. Thirdly, there is a saturation on the negative part of PoLU, which makes it more noise-robust for negative inputs. Furthermore, we prove that PoLU is able to map more portions of every layer’s input to the same space by using the power function and thus increases the number of response regions of the neural network. We use image classification for comparing our proposed activation function with others. In the experiments, MNIST, CIFAR-10, CIFAR-100, Street View House Numbers (SVHN) and ImageNet are used as benchmark datasets. The neural networks we implemented include widely-used ELU-Network, ResNet-50, and VGG16, plus a couple of shallow networks. Experimental results show that our proposed activation function outperforms other state-of-the-art models with most networks.
Tasks	Image Classification
Published	2018-02-01
URL	http://arxiv.org/abs/1802.00212v1
PDF	http://arxiv.org/pdf/1802.00212v1.pdf
PWC	https://paperswithcode.com/paper/training-neural-networks-by-using-power
Repo	https://github.com/awur978/Autoencoder
Framework	tf

Single-Model Uncertainties for Deep Learning


Title	Single-Model Uncertainties for Deep Learning
Authors	Natasa Tagasovska, David Lopez-Paz
Abstract	We provide single-model estimates of aleatoric and epistemic uncertainty for deep neural networks. To estimate aleatoric uncertainty, we propose Simultaneous Quantile Regression (SQR), a loss function to learn all the conditional quantiles of a given target variable. These quantiles can be used to compute well-calibrated prediction intervals. To estimate epistemic uncertainty, we propose Orthonormal Certificates (OCs), a collection of diverse non-constant functions that map all training samples to zero. These certificates map out-of-distribution examples to non-zero values, signaling epistemic uncertainty. Our uncertainty estimators are computationally attractive, as they do not require ensembling or retraining deep models, and achieve competitive performance.
Tasks
Published	2018-11-02
URL	https://arxiv.org/abs/1811.00908v3
PDF	https://arxiv.org/pdf/1811.00908v3.pdf
PWC	https://paperswithcode.com/paper/frequentist-uncertainty-estimates-for-deep
Repo	https://github.com/facebookresearch/SingleModelUncertainty
Framework	pytorch

TipsC: Tips and Corrections for programming MOOCs


Title	TipsC: Tips and Corrections for programming MOOCs
Authors	Saksham Sharma, Pallav Agarwal, Parv Mor, Amey Karkare
Abstract	With the widespread adoption of MOOCs in academic institutions, it has become imperative to come up with better techniques to solve the tutoring and grading problems posed by programming courses. Programming being the new ‘writing’, it becomes a challenge to ensure that a large section of the society is exposed to programming. Due to the gradient in learning abilities of students, the course instructor must ensure that everyone can cope up with the material, and receive adequate help in completing assignments while learning along the way. We introduce TipsC for this task. By analyzing a large number of correct submissions, TipsC can search for correct codes resembling a given incorrect solution. Without revealing the actual code, TipsC then suggests changes in the incorrect code to help the student fix logical runtime errors. In addition, this also serves as a cluster visualization tool for the instructor, revealing different patterns in user submissions. We evaluated the effectiveness of TipsC’s clustering algorithm on data collected from previous offerings of an introductory programming course conducted at IIT Kanpur where the grades were given by human TAs. The results show the weighted average variance of marks for clusters when similar submissions are grouped together is 47% less compared to the case when all programs are grouped together.
Tasks
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00373v1
PDF	http://arxiv.org/pdf/1804.00373v1.pdf
PWC	https://paperswithcode.com/paper/tipsc-tips-and-corrections-for-programming
Repo	https://github.com/HexFlow/tipsy
Framework	none

Recycle-GAN: Unsupervised Video Retargeting


Title	Recycle-GAN: Unsupervised Video Retargeting
Authors	Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh
Abstract	We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver’s speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert’s style. Our approach combines both spatial and temporal information along with adversarial losses for content translation and style preservation. In this work, we first study the advantages of using spatiotemporal constraints over spatial constraints for effective retargeting. We then demonstrate the proposed approach for the problems where information in both space and time matters such as face-to-face translation, flower-to-flower, wind and cloud synthesis, sunrise and sunset.
Tasks	Face to Face Translation, Video Generation
Published	2018-08-15
URL	http://arxiv.org/abs/1808.05174v1
PDF	http://arxiv.org/pdf/1808.05174v1.pdf
PWC	https://paperswithcode.com/paper/recycle-gan-unsupervised-video-retargeting
Repo	https://github.com/aayushbansal/Recycle-GAN
Framework	pytorch

Empowerment-driven Exploration using Mutual Information Estimation


Title	Empowerment-driven Exploration using Mutual Information Estimation
Authors	Navneet Madhu Kumar
Abstract	Exploration is a difficult challenge in reinforcement learning and is of prime importance in sparse reward environments. However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments. In such cases, empowerment can serve as an intrinsic reward signal to enable the agent to maximize the influence it has over the near future. We formulate empowerment as the channel capacity between states and actions and is calculated by estimating the mutual information between the actions and the following states. The mutual information is estimated using Mutual Information Neural Estimator and a forward dynamics model. We demonstrate that an empowerment driven agent is able to improve significantly the score of a baseline DQN agent on the game of Montezuma’s Revenge.
Tasks	Montezuma’s Revenge
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05533v1
PDF	http://arxiv.org/pdf/1810.05533v1.pdf
PWC	https://paperswithcode.com/paper/empowerment-driven-exploration-using-mutual
Repo	https://github.com/navneet-nmk/pytorch-rl
Framework	pytorch

Playing hard exploration games by watching YouTube


Title	Playing hard exploration games by watching YouTube
Authors	Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas
Abstract	Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma’s Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.
Tasks	Montezuma’s Revenge
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11592v2
PDF	http://arxiv.org/pdf/1805.11592v2.pdf
PWC	https://paperswithcode.com/paper/playing-hard-exploration-games-by-watching
Repo	https://github.com/MaxSobolMark/HardRLWithYoutube
Framework	tf

Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Healthcare Records


Title	Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Healthcare Records
Authors	Jeffrey Thompson, Jinxiang Hu, Dinesh Pal Mudaranthakam, David Streeter, Lisa Neums, Michele Park, Devin C. Koestler, Byron Gajewski, Matthew S. Mayo
Abstract	Objective: Electronic health records (EHR) represent a rich resource for conducting observational studies, supporting clinical trials, and more. However, much of the relevant information is stored in an unstructured format that makes it difficult to use. Natural language processing approaches that attempt to automatically classify the data depend on vectorization algorithms that impose structure on the text, but these algorithms were not designed for the unique characteristics of EHR. Here, we propose a new algorithm for structuring so-called free-text that may help researchers make better use of EHR. We call this method Relevant Word Order Vectorization (RWOV). Materials and Methods: As a proof-of-concept, we attempted to classify the hormone receptor status of breast cancer patients treated at the University of Kansas Medical Center during a recent year, from the unstructured text of pathology reports. Our approach attempts to account for the semi-structured way that healthcare providers often enter information. We compared this approach to the ngrams and word2vec methods. Results: Our approach resulted in the most consistently high accuracy, as measured by F1 score and area under the receiver operating characteristic curve (AUC). Discussion: Our results suggest that methods of structuring free text that take into account its context may show better performance, and that our approach is promising. Conclusion: By using a method that accounts for the fact that healthcare providers tend to use certain key words repetitively and that the order of these key words is important, we showed improved performance over methods that do not.
Tasks
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02627v1
PDF	http://arxiv.org/pdf/1812.02627v1.pdf
PWC	https://paperswithcode.com/paper/relevant-word-order-vectorization-for
Repo	https://github.com/jeffreyat/RWOV
Framework	none