October 20, 2019

3296 words 16 mins read

Paper Group AWR 259

Paper Group AWR 259

Tunability: Importance of Hyperparameters of Machine Learning Algorithms. Sparse Pursuit and Dictionary Learning for Blind Source Separation in Polyphonic Music Recordings. The Devil of Face Recognition is in the Noise. Quantifying the dynamics of topical fluctuations in language. Learning to Embed Sentences Using Attentive Recursive Trees. A Surve …

Tunability: Importance of Hyperparameters of Machine Learning Algorithms

Title Tunability: Importance of Hyperparameters of Machine Learning Algorithms
Authors Philipp Probst, Bernd Bischl, Anne-Laure Boulesteix
Abstract Modern supervised machine learning algorithms involve hyperparameters that have to be set before running them. Options for setting hyperparameters are default values from the software package, manual configuration by the user or configuring them for optimal predictive performance by a tuning procedure. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view, define data-based defaults and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform and six common machine learning algorithms. We apply our measures to assess the tunability of their parameters. Our results yield default values for hyperparameters and enable users to decide whether it is worth conducting a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to chose adequate hyperparameter spaces for tuning.
Tasks
Published 2018-02-26
URL http://arxiv.org/abs/1802.09596v3
PDF http://arxiv.org/pdf/1802.09596v3.pdf
PWC https://paperswithcode.com/paper/tunability-importance-of-hyperparameters-of
Repo https://github.com/PhilippPro/tunability
Framework none

Sparse Pursuit and Dictionary Learning for Blind Source Separation in Polyphonic Music Recordings

Title Sparse Pursuit and Dictionary Learning for Blind Source Separation in Polyphonic Music Recordings
Authors Sören Schulze, Emily J. King
Abstract We propose a novel method for the blind separation of single-channel audio signals produced by the mixed sounds of musical instruments. While the approach of applying non-negative matrix factorization (NMF) has been studied in many papers, it does not make use of the pitch-invariance that the sounds of many instruments exhibit. This limitation can be overcome by using tensor factorization, in which context the use of log-frequency spectrograms was initiated, but this still requires the specific tuning of the instruments to be hard-coded into the algorithm. We develop a general-purpose sparse pursuit method that matches a discrete spectrum with given shifted continuous patterns. We first use it in order to transform our audio signal into a log-frequency spectrogram that shares properties with the mel spectrogram but is applicable to a wider frequency range. Then, we use the same algorithm to identify patterns from instrument sounds in the spectrogram. The relative amplitudes of the harmonics are saved in a dictionary, which is trained via a modified version of Adam. For a realistic monaural piece with acoustic recorder and violin, we achieve qualitatively good separation with a signal-to-distortion ratio (SDR) of 13.7 dB, a signal-to-interference ratio (SIR) of 28.1 dB, and a signal-to-artifacts ratio (SAR) of 13.9 dB, averaged over the instruments.
Tasks Dictionary Learning
Published 2018-06-01
URL https://arxiv.org/abs/1806.00273v3
PDF https://arxiv.org/pdf/1806.00273v3.pdf
PWC https://paperswithcode.com/paper/musical-instrument-separation-on-shift
Repo https://github.com/ybayle/ReproducibleResearchCode
Framework none

The Devil of Face Recognition is in the Noise

Title The Devil of Face Recognition is in the Noise
Authors Fei Wang, Liren Chen, Cheng Li, Shiyao Huang, Yanjie Chen, Chen Qian, Chen Change Loy
Abstract The growing scale of face recognition datasets empowers us to train strong convolutional networks for face recognition. While a variety of architectures and loss functions have been devised, we still have a limited understanding of the source and consequence of label noise inherent in existing datasets. We make the following contributions: 1) We contribute cleaned subsets of popular face databases, i.e., MegaFace and MS-Celeb-1M datasets, and build a new large-scale noise-controlled IMDb-Face dataset. 2) With the original datasets and cleaned subsets, we profile and analyze label noise properties of MegaFace and MS-Celeb-1M. We show that a few orders more samples are needed to achieve the same accuracy yielded by a clean subset. 3) We study the association between different types of noise, i.e., label flips and outliers, with the accuracy of face recognition models. 4) We investigate ways to improve data cleanliness, including a comprehensive user study on the influence of data labeling strategies to annotation accuracy. The IMDb-Face dataset has been released on https://github.com/fwang91/IMDb-Face.
Tasks Face Recognition
Published 2018-07-31
URL http://arxiv.org/abs/1807.11649v1
PDF http://arxiv.org/pdf/1807.11649v1.pdf
PWC https://paperswithcode.com/paper/the-devil-of-face-recognition-is-in-the-noise
Repo https://github.com/fwang91/IMDb-Face
Framework none

Quantifying the dynamics of topical fluctuations in language

Title Quantifying the dynamics of topical fluctuations in language
Authors Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith
Abstract The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide variety of reasons, including purely random sampling effects, or because corpora are composed of contemporary media and fiction texts within which the underlying topics ebb and flow with cultural and socio-political trends. In this work, we introduce a simple model for controlling for topical fluctuations in corpora - the topical-cultural advection model - and demonstrate how it provides a robust baseline of variability in word frequency changes over time. We validate the model on a diachronic corpus spanning two centuries, and a carefully-controlled artificial language change scenario, and then use it to correct for topical fluctuations in historical time series. Finally, we use the model to show that the emergence of new words typically corresponds with the rise of a trending topic. This suggests that some lexical innovations occur due to growing communicative need in a subspace of the lexicon, and that the topical-cultural advection model can be used to quantify this.
Tasks Time Series
Published 2018-06-02
URL https://arxiv.org/abs/1806.00699v3
PDF https://arxiv.org/pdf/1806.00699v3.pdf
PWC https://paperswithcode.com/paper/quantifying-the-dynamics-of-topical
Repo https://github.com/andreskarjus/topical_cultural_advection_model
Framework none

Learning to Embed Sentences Using Attentive Recursive Trees

Title Learning to Embed Sentences Using Attentive Recursive Trees
Authors Jiaxin Shi, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang
Abstract Sentence embedding is an effective feature representation for most deep learning-based NLP tasks. One prevailing line of methods is using recursive latent tree-structured networks to embed sentences with task-specific structures. However, existing models have no explicit mechanism to emphasize task-informative words in the tree structure. To this end, we propose an Attentive Recursive Tree model (AR-Tree), where the words are dynamically located according to their importance in the task. Specifically, we construct the latent tree for a sentence in a proposed important-first strategy, and place more attentive words nearer to the root; thus, AR-Tree can inherently emphasize important words during the bottom-up composition of the sentence embedding. We propose an end-to-end reinforced training strategy for AR-Tree, which is demonstrated to consistently outperform, or be at least comparable to, the state-of-the-art sentence embedding methods on three sentence understanding tasks.
Tasks Sentence Embedding
Published 2018-11-06
URL http://arxiv.org/abs/1811.02338v2
PDF http://arxiv.org/pdf/1811.02338v2.pdf
PWC https://paperswithcode.com/paper/learning-to-embed-sentences-using-attentive
Repo https://github.com/shijx12/AR-Tree
Framework pytorch

A Survey on Deep Learning Methods for Robot Vision

Title A Survey on Deep Learning Methods for Robot Vision
Authors Javier Ruiz-del-Solar, Patricio Loncomilla, Naiomi Soto
Abstract Deep learning has allowed a paradigm shift in pattern recognition, from using hand-crafted features together with statistical classifiers to using general-purpose learning procedures for learning data-driven representations, features, and classifiers together. The application of this new paradigm has been particularly successful in computer vision, in which the development of deep learning methods for vision applications has become a hot research topic. Given that deep learning has already attracted the attention of the robot vision community, the main purpose of this survey is to address the use of deep learning in robot vision. To achieve this, a comprehensive overview of deep learning and its usage in computer vision is given, that includes a description of the most frequently used neural models and their main application areas. Then, the standard methodology and tools used for designing deep-learning based vision systems are presented. Afterwards, a review of the principal work using deep learning in robot vision is presented, as well as current and future trends related to the use of deep learning in robotics. This survey is intended to be a guide for the developers of robot vision systems.
Tasks
Published 2018-03-28
URL http://arxiv.org/abs/1803.10862v1
PDF http://arxiv.org/pdf/1803.10862v1.pdf
PWC https://paperswithcode.com/paper/a-survey-on-deep-learning-methods-for-robot
Repo https://github.com/smrjan/robotic-vision
Framework pytorch

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

Title Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking
Authors Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, Ming-Hsuan Yang
Abstract Discriminative Correlation Filters (DCF) are efficient in visual tracking but suffer from unwanted boundary effects. Spatially Regularized DCF (SRDCF) has been suggested to resolve this issue by enforcing spatial penalty on DCF coefficients, which, inevitably, improves the tracking performance at the price of increasing complexity. To tackle online updating, SRDCF formulates its model on multiple training images, further adding difficulties in improving efficiency. In this work, by introducing temporal regularization to SRDCF with single sample, we present our spatial-temporal regularized correlation filters (STRCF). Motivated by online Passive-Agressive (PA) algorithm, we introduce the temporal regularization to SRDCF with single sample, thus resulting in our spatial-temporal regularized correlation filters (STRCF). The STRCF formulation can not only serve as a reasonable approximation to SRDCF with multiple training samples, but also provide a more robust appearance model than SRDCF in the case of large appearance variations. Besides, it can be efficiently solved via the alternating direction method of multipliers (ADMM). By incorporating both temporal and spatial regularization, our STRCF can handle boundary effects without much loss in efficiency and achieve superior performance over SRDCF in terms of accuracy and speed. Experiments are conducted on three benchmark datasets: OTB-2015, Temple-Color, and VOT-2016. Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5.4% and 3.6% AUC score on OTB-2015 and Temple-Color, respectively. Moreover, STRCF combined with CNN features also performs favorably against state-of-the-art CNN-based trackers and achieves an AUC score of 68.3% on OTB-2015.
Tasks Visual Object Tracking, Visual Tracking
Published 2018-03-23
URL http://arxiv.org/abs/1803.08679v1
PDF http://arxiv.org/pdf/1803.08679v1.pdf
PWC https://paperswithcode.com/paper/learning-spatial-temporal-regularized
Repo https://github.com/lifeng9472/STRCF
Framework none

SGM: Sequence Generation Model for Multi-label Classification

Title SGM: Sequence Generation Model for Multi-label Classification
Authors Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, Houfeng Wang
Abstract Multi-label classification is an important yet challenging task in natural language processing. It is more complex than single-label classification in that the labels tend to be correlated. Existing methods tend to ignore the correlations between labels. Besides, different parts of the text can contribute differently for predicting different labels, which is not considered by existing models. In this paper, we propose to view the multi-label classification task as a sequence generation problem, and apply a sequence generation model with a novel decoder structure to solve it. Extensive experimental results show that our proposed methods outperform previous work by a substantial margin. Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.
Tasks Multi-Label Classification
Published 2018-06-13
URL http://arxiv.org/abs/1806.04822v3
PDF http://arxiv.org/pdf/1806.04822v3.pdf
PWC https://paperswithcode.com/paper/sgm-sequence-generation-model-for-multi-label
Repo https://github.com/lancopku/SGM
Framework pytorch

Training Neural Networks by Using Power Linear Units (PoLUs)

Title Training Neural Networks by Using Power Linear Units (PoLUs)
Authors Yikang Li, Pak Lun Kevin Ding, Baoxin Li
Abstract In this paper, we introduce “Power Linear Unit” (PoLU) which increases the nonlinearity capacity of a neural network and thus helps improving its performance. PoLU adopts several advantages of previously proposed activation functions. First, the output of PoLU for positive inputs is designed to be identity to avoid the gradient vanishing problem. Second, PoLU has a non-zero output for negative inputs such that the output mean of the units is close to zero, hence reducing the bias shift effect. Thirdly, there is a saturation on the negative part of PoLU, which makes it more noise-robust for negative inputs. Furthermore, we prove that PoLU is able to map more portions of every layer’s input to the same space by using the power function and thus increases the number of response regions of the neural network. We use image classification for comparing our proposed activation function with others. In the experiments, MNIST, CIFAR-10, CIFAR-100, Street View House Numbers (SVHN) and ImageNet are used as benchmark datasets. The neural networks we implemented include widely-used ELU-Network, ResNet-50, and VGG16, plus a couple of shallow networks. Experimental results show that our proposed activation function outperforms other state-of-the-art models with most networks.
Tasks Image Classification
Published 2018-02-01
URL http://arxiv.org/abs/1802.00212v1
PDF http://arxiv.org/pdf/1802.00212v1.pdf
PWC https://paperswithcode.com/paper/training-neural-networks-by-using-power
Repo https://github.com/awur978/Autoencoder
Framework tf

Single-Model Uncertainties for Deep Learning

Title Single-Model Uncertainties for Deep Learning
Authors Natasa Tagasovska, David Lopez-Paz
Abstract We provide single-model estimates of aleatoric and epistemic uncertainty for deep neural networks. To estimate aleatoric uncertainty, we propose Simultaneous Quantile Regression (SQR), a loss function to learn all the conditional quantiles of a given target variable. These quantiles can be used to compute well-calibrated prediction intervals. To estimate epistemic uncertainty, we propose Orthonormal Certificates (OCs), a collection of diverse non-constant functions that map all training samples to zero. These certificates map out-of-distribution examples to non-zero values, signaling epistemic uncertainty. Our uncertainty estimators are computationally attractive, as they do not require ensembling or retraining deep models, and achieve competitive performance.
Tasks
Published 2018-11-02
URL https://arxiv.org/abs/1811.00908v3
PDF https://arxiv.org/pdf/1811.00908v3.pdf
PWC https://paperswithcode.com/paper/frequentist-uncertainty-estimates-for-deep
Repo https://github.com/facebookresearch/SingleModelUncertainty
Framework pytorch

TipsC: Tips and Corrections for programming MOOCs

Title TipsC: Tips and Corrections for programming MOOCs
Authors Saksham Sharma, Pallav Agarwal, Parv Mor, Amey Karkare
Abstract With the widespread adoption of MOOCs in academic institutions, it has become imperative to come up with better techniques to solve the tutoring and grading problems posed by programming courses. Programming being the new ‘writing’, it becomes a challenge to ensure that a large section of the society is exposed to programming. Due to the gradient in learning abilities of students, the course instructor must ensure that everyone can cope up with the material, and receive adequate help in completing assignments while learning along the way. We introduce TipsC for this task. By analyzing a large number of correct submissions, TipsC can search for correct codes resembling a given incorrect solution. Without revealing the actual code, TipsC then suggests changes in the incorrect code to help the student fix logical runtime errors. In addition, this also serves as a cluster visualization tool for the instructor, revealing different patterns in user submissions. We evaluated the effectiveness of TipsC’s clustering algorithm on data collected from previous offerings of an introductory programming course conducted at IIT Kanpur where the grades were given by human TAs. The results show the weighted average variance of marks for clusters when similar submissions are grouped together is 47% less compared to the case when all programs are grouped together.
Tasks
Published 2018-04-02
URL http://arxiv.org/abs/1804.00373v1
PDF http://arxiv.org/pdf/1804.00373v1.pdf
PWC https://paperswithcode.com/paper/tipsc-tips-and-corrections-for-programming
Repo https://github.com/HexFlow/tipsy
Framework none

Recycle-GAN: Unsupervised Video Retargeting

Title Recycle-GAN: Unsupervised Video Retargeting
Authors Aayush Bansal, Shugao Ma, Deva Ramanan, Yaser Sheikh
Abstract We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver’s speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert’s style. Our approach combines both spatial and temporal information along with adversarial losses for content translation and style preservation. In this work, we first study the advantages of using spatiotemporal constraints over spatial constraints for effective retargeting. We then demonstrate the proposed approach for the problems where information in both space and time matters such as face-to-face translation, flower-to-flower, wind and cloud synthesis, sunrise and sunset.
Tasks Face to Face Translation, Video Generation
Published 2018-08-15
URL http://arxiv.org/abs/1808.05174v1
PDF http://arxiv.org/pdf/1808.05174v1.pdf
PWC https://paperswithcode.com/paper/recycle-gan-unsupervised-video-retargeting
Repo https://github.com/aayushbansal/Recycle-GAN
Framework pytorch

Empowerment-driven Exploration using Mutual Information Estimation

Title Empowerment-driven Exploration using Mutual Information Estimation
Authors Navneet Madhu Kumar
Abstract Exploration is a difficult challenge in reinforcement learning and is of prime importance in sparse reward environments. However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments. In such cases, empowerment can serve as an intrinsic reward signal to enable the agent to maximize the influence it has over the near future. We formulate empowerment as the channel capacity between states and actions and is calculated by estimating the mutual information between the actions and the following states. The mutual information is estimated using Mutual Information Neural Estimator and a forward dynamics model. We demonstrate that an empowerment driven agent is able to improve significantly the score of a baseline DQN agent on the game of Montezuma’s Revenge.
Tasks Montezuma’s Revenge
Published 2018-10-11
URL http://arxiv.org/abs/1810.05533v1
PDF http://arxiv.org/pdf/1810.05533v1.pdf
PWC https://paperswithcode.com/paper/empowerment-driven-exploration-using-mutual
Repo https://github.com/navneet-nmk/pytorch-rl
Framework pytorch

Playing hard exploration games by watching YouTube

Title Playing hard exploration games by watching YouTube
Authors Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas
Abstract Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma’s Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.
Tasks Montezuma’s Revenge
Published 2018-05-29
URL http://arxiv.org/abs/1805.11592v2
PDF http://arxiv.org/pdf/1805.11592v2.pdf
PWC https://paperswithcode.com/paper/playing-hard-exploration-games-by-watching
Repo https://github.com/MaxSobolMark/HardRLWithYoutube
Framework tf

Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Healthcare Records

Title Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Healthcare Records
Authors Jeffrey Thompson, Jinxiang Hu, Dinesh Pal Mudaranthakam, David Streeter, Lisa Neums, Michele Park, Devin C. Koestler, Byron Gajewski, Matthew S. Mayo
Abstract Objective: Electronic health records (EHR) represent a rich resource for conducting observational studies, supporting clinical trials, and more. However, much of the relevant information is stored in an unstructured format that makes it difficult to use. Natural language processing approaches that attempt to automatically classify the data depend on vectorization algorithms that impose structure on the text, but these algorithms were not designed for the unique characteristics of EHR. Here, we propose a new algorithm for structuring so-called free-text that may help researchers make better use of EHR. We call this method Relevant Word Order Vectorization (RWOV). Materials and Methods: As a proof-of-concept, we attempted to classify the hormone receptor status of breast cancer patients treated at the University of Kansas Medical Center during a recent year, from the unstructured text of pathology reports. Our approach attempts to account for the semi-structured way that healthcare providers often enter information. We compared this approach to the ngrams and word2vec methods. Results: Our approach resulted in the most consistently high accuracy, as measured by F1 score and area under the receiver operating characteristic curve (AUC). Discussion: Our results suggest that methods of structuring free text that take into account its context may show better performance, and that our approach is promising. Conclusion: By using a method that accounts for the fact that healthcare providers tend to use certain key words repetitively and that the order of these key words is important, we showed improved performance over methods that do not.
Tasks
Published 2018-12-06
URL http://arxiv.org/abs/1812.02627v1
PDF http://arxiv.org/pdf/1812.02627v1.pdf
PWC https://paperswithcode.com/paper/relevant-word-order-vectorization-for
Repo https://github.com/jeffreyat/RWOV
Framework none
comments powered by Disqus