October 21, 2019

3236 words 16 mins read

Paper Group AWR 73

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms. Distilling Knowledge for Search-based Structured Prediction. Fast High-Dimensional Bilateral and Nonlocal Means Filtering. Gradient descent revisited via an adaptive online learning rate. Data-Efficient Hierarchical Reinforcement Learning. Improving Opti …

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms


Title	Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms
Authors	Hussein Hazimeh, Rahul Mazumder
Abstract	The $L_0$-regularized least squares problem (a.k.a. best subsets) is central to sparse statistical learning and has attracted significant attention across the wider statistics, machine learning, and optimization communities. Recent work has shown that modern mixed integer optimization (MIO) solvers can be used to address small to moderate instances of this problem. In spite of the usefulness of $L_0$-based estimators and generic MIO solvers, there is a steep computational price to pay when compared to popular sparse learning algorithms (e.g., based on $L_1$ regularization). In this paper, we aim to push the frontiers of computation for a family of $L_0$-regularized problems with additional convex penalties. We propose a new hierarchy of necessary optimality conditions for these problems. We develop fast algorithms, based on coordinate descent and local combinatorial optimization, that are guaranteed to converge to solutions satisfying these optimality conditions. From a statistical viewpoint, an interesting story emerges. When the signal strength is high, our combinatorial optimization algorithms have an edge in challenging statistical settings. When the signal is lower, pure $L_0$ benefits from additional convex regularization. We empirically demonstrate that our family of $L_0$-based estimators can outperform the state-of-the-art sparse learning algorithms in terms of a combination of prediction, estimation, and variable selection metrics under various regimes (e.g., different signal strengths, feature correlations, number of samples and features). Our new open-source sparse learning toolkit L0Learn (available on CRAN and Github) reaches up to a three-fold speedup (with $p$ up to $10^6$) when compared to competing toolkits such as glmnet and ncvreg.
Tasks	Combinatorial Optimization, Feature Selection, Sparse Learning
Published	2018-03-05
URL	https://arxiv.org/abs/1803.01454v3
PDF	https://arxiv.org/pdf/1803.01454v3.pdf
PWC	https://paperswithcode.com/paper/fast-best-subset-selection-coordinate-descent
Repo	https://github.com/hazimehh/L0Learn
Framework	none

Distilling Knowledge for Search-based Structured Prediction


Title	Distilling Knowledge for Search-based Structured Prediction
Authors	Yijia Liu, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu
Abstract	Many natural language processing tasks can be modeled into structured prediction and solved as a search problem. In this paper, we distill an ensemble of multiple models trained with different initialization into a single model. In addition to learning to match the ensemble’s probability output on the reference states, we also use the ensemble to explore the search space and learn from the encountered states in the exploration. Experimental results on two typical search-based structured prediction tasks – transition-based dependency parsing and neural machine translation show that distillation can effectively improve the single model’s performance and the final model achieves improvements of 1.32 in LAS and 2.65 in BLEU score on these two tasks respectively over strong baselines and it outperforms the greedy structured prediction models in previous literatures.
Tasks	Dependency Parsing, Machine Translation, Structured Prediction, Transition-Based Dependency Parsing
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11224v1
PDF	http://arxiv.org/pdf/1805.11224v1.pdf
PWC	https://paperswithcode.com/paper/distilling-knowledge-for-search-based
Repo	https://github.com/Oneplus/twpipe
Framework	none

Fast High-Dimensional Bilateral and Nonlocal Means Filtering


Title	Fast High-Dimensional Bilateral and Nonlocal Means Filtering
Authors	Pravin Nair, Kunal. N. Chaudhury
Abstract	Existing fast algorithms for bilateral and nonlocal means filtering mostly work with grayscale images. They cannot easily be extended to high-dimensional data such as color and hyperspectral images, patch-based data, flow-fields, etc. In this paper, we propose a fast algorithm for high-dimensional bilateral and nonlocal means filtering. Unlike existing approaches, where the focus is on approximating the data (using quantization) or the filter kernel (via analytic expansions), we locally approximate the kernel using weighted and shifted copies of a Gaussian, where the weights and shifts are inferred from the data. The algorithm emerging from the proposed approximation essentially involves clustering and fast convolutions, and is easy to implement. Moreover, a variant of our algorithm comes with a guarantee (bound) on the approximation error, which is not enjoyed by existing algorithms. We present some results for high-dimensional bilateral and nonlocal means filtering to demonstrate the speed and accuracy of our proposal. Moreover, we also show that our algorithm can outperform state-of-the-art fast approximations in terms of accuracy and timing.
Tasks	Quantization
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02363v1
PDF	http://arxiv.org/pdf/1811.02363v1.pdf
PWC	https://paperswithcode.com/paper/fast-high-dimensional-bilateral-and-nonlocal
Repo	https://github.com/pravin1390/FastHDFilter
Framework	none

Gradient descent revisited via an adaptive online learning rate


Title	Gradient descent revisited via an adaptive online learning rate
Authors	Mathieu Ravaut, Satya Gorti
Abstract	Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the gradient descent algorithm in the which the learning rate is not fixed. Instead, we learn the learning rate itself, either by another gradient descent (first-order method), or by Newton’s method (second-order). This way, gradient descent for any machine learning algorithm can be optimized.
Tasks
Published	2018-01-27
URL	http://arxiv.org/abs/1801.09136v2
PDF	http://arxiv.org/pdf/1801.09136v2.pdf
PWC	https://paperswithcode.com/paper/gradient-descent-revisited-via-an-adaptive
Repo	https://github.com/UrosOgrizovic/SimpleGoogleQuickdraw
Framework	tf

Data-Efficient Hierarchical Reinforcement Learning


Title	Data-Efficient Hierarchical Reinforcement Learning
Authors	Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine
Abstract	Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control. For generality, we develop a scheme where lower-level controllers are supervised with goals that are learned and proposed automatically by the higher-level controllers. To address efficiency, we propose to use off-policy experience for both higher and lower-level training. This poses a considerable challenge, since changes to the lower-level behaviors change the action space for the higher-level policy, and we introduce an off-policy correction to remedy this challenge. This allows us to take advantage of recent advances in off-policy model-free RL to learn both higher- and lower-level policies using substantially fewer environment interactions than on-policy algorithms. We term the resulting HRL agent HIRO and find that it is generally applicable and highly sample-efficient. Our experiments show that HIRO can be used to learn highly complex behaviors for simulated robots, such as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods, we find that our approach substantially outperforms previous state-of-the-art techniques.
Tasks	Hierarchical Reinforcement Learning
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08296v4
PDF	http://arxiv.org/pdf/1805.08296v4.pdf
PWC	https://paperswithcode.com/paper/data-efficient-hierarchical-reinforcement
Repo	https://github.com/josherich/efficient-hrl
Framework	tf

Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning


Title	Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning
Authors	Quentin Cappart, Emmanuel Goutierre, David Bergman, Louis-Martin Rousseau
Abstract	Finding tight bounds on the optimal solution is a critical element of practical solution methods for discrete optimization problems. In the last decade, decision diagrams (DDs) have brought a new perspective on obtaining upper and lower bounds that can be significantly better than classical bounding mechanisms, such as linear relaxations. It is well known that the quality of the bounds achieved through this flexible bounding method is highly reliant on the ordering of variables chosen for building the diagram, and finding an ordering that optimizes standard metrics is an NP-hard problem. In this paper, we propose an innovative and generic approach based on deep reinforcement learning for obtaining an ordering for tightening the bounds obtained with relaxed and restricted DDs. We apply the approach to both the Maximum Independent Set Problem and the Maximum Cut Problem. Experimental results on synthetic instances show that the deep reinforcement learning approach, by achieving tighter objective function bounds, generally outperforms ordering methods commonly used in the literature when the distribution of instances is known. To the best knowledge of the authors, this is the first paper to apply machine learning to directly improve relaxation bounds obtained by general-purpose bounding mechanisms for combinatorial optimization problems.
Tasks	Combinatorial Optimization
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03359v2
PDF	http://arxiv.org/pdf/1809.03359v2.pdf
PWC	https://paperswithcode.com/paper/improving-optimization-bounds-using-machine
Repo	https://github.com/qcappart/learning-DD
Framework	none

Training compact deep learning models for video classification using circulant matrices


Title	Training compact deep learning models for video classification using circulant matrices
Authors	Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif
Abstract	In real world scenarios, model accuracy is hardly the only factor to consider. Large models consume more memory and are computationally more intensive, which makes them difficult to train and to deploy, especially on mobile devices. In this paper, we build on recent results at the crossroads of Linear Algebra and Deep Learning which demonstrate how imposing a structure on large weight matrices can be used to reduce the size of the model. We propose very compact models for video classification based on state-of-the-art network architectures such as Deep Bag-of-Frames, NetVLAD and NetFisherVectors. We then conduct thorough experiments using the large YouTube-8M video classification dataset. As we will show, the circulant DBoF embedding achieves an excellent trade-off between size and accuracy.
Tasks	Video Classification
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01140v2
PDF	http://arxiv.org/pdf/1810.01140v2.pdf
PWC	https://paperswithcode.com/paper/training-compact-deep-learning-models-for
Repo	https://github.com/araujoalexandre/youtube8m-circulant
Framework	tf

MixTrain: Scalable Training of Verifiably Robust Neural Networks


Title	MixTrain: Scalable Training of Verifiably Robust Neural Networks
Authors	Shiqi Wang, Yizheng Chen, Ahmed Abdou, Suman Jana
Abstract	Making neural networks robust against adversarial inputs has resulted in an arms race between new defenses and attacks. The most promising defenses, adversarially robust training and verifiably robust training, have limitations that restrict their practical applications. The adversarially robust training only makes the networks robust against a subclass of attackers and we reveal such weaknesses by developing a new attack based on interval gradients. By contrast, verifiably robust training provides protection against any L-p norm-bounded attacker but incurs orders of magnitude more computational and memory overhead than adversarially robust training. We propose two novel techniques, stochastic robust approximation and dynamic mixed training, to drastically improve the efficiency of verifiably robust training without sacrificing verified robustness. We leverage two critical insights: (1) instead of over the entire training set, sound over-approximations over randomly subsampled training data points are sufficient for efficiently guiding the robust training process; and (2) We observe that the test accuracy and verifiable robustness often conflict after certain training epochs. Therefore, we use a dynamic loss function to adaptively balance them for each epoch. We designed and implemented our techniques as part of MixTrain and evaluated it on six networks trained on three popular datasets including MNIST, CIFAR, and ImageNet-200. Our evaluations show that MixTrain can achieve up to $95.2%$ verified robust accuracy against $L_\infty$ norm-bounded attackers while taking $15$ and $3$ times less training time than state-of-the-art verifiably robust training and adversarially robust training schemes, respectively. Furthermore, MixTrain easily scales to larger networks like the one trained on ImageNet-200, significantly outperforming the existing verifiably robust training methods.
Tasks
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02625v2
PDF	http://arxiv.org/pdf/1811.02625v2.pdf
PWC	https://paperswithcode.com/paper/mixtrain-scalable-training-of-verifiably
Repo	https://github.com/tcwangshiqi-columbia/Interval-Attack
Framework	tf

Convolutional Neural Networks with Recurrent Neural Filters


Title	Convolutional Neural Networks with Recurrent Neural Filters
Authors	Yi Yang
Abstract	We introduce a class of convolutional neural networks (CNNs) that utilize recurrent neural networks (RNNs) as convolution filters. A convolution filter is typically implemented as a linear affine transformation followed by a non-linear function, which fails to account for language compositionality. As a result, it limits the use of high-order filters that are often warranted for natural language processing tasks. In this work, we model convolution filters with RNNs that naturally capture compositionality and long-term dependencies in language. We show that simple CNN architectures equipped with recurrent neural filters (RNFs) achieve results that are on par with the best published ones on the Stanford Sentiment Treebank and two answer sentence selection datasets.
Tasks	Sentiment Analysis
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09315v1
PDF	http://arxiv.org/pdf/1808.09315v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-with-recurrent
Repo	https://github.com/bloomberg/cnn-rnf
Framework	tf

Setting up a Reinforcement Learning Task with a Real-World Robot


Title	Setting up a Reinforcement Learning Task with a Real-World Robot
Authors	A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra
Abstract	Reinforcement learning is a promising approach to developing hard-to-engineer adaptive solutions for complex and diverse robotic tasks. However, learning with real-world robots is often unreliable and difficult, which resulted in their low adoption in reinforcement learning research. This difficulty is worsened by the lack of guidelines for setting up learning tasks with robots. In this work, we develop a learning task with a UR5 robotic arm to bring to light some key elements of a task setup and study their contributions to the challenges with robots. We find that learning performance can be highly sensitive to the setup, and thus oversights and omissions in setup details can make effective learning, reproducibility, and fair comparison hard. Our study suggests some mitigating steps to help future experimenters avoid difficulties and pitfalls. We show that highly reliable and repeatable experiments can be performed in our setup, indicating the possibility of reinforcement learning research extensively based on real-world robots.
Tasks
Published	2018-03-19
URL	http://arxiv.org/abs/1803.07067v1
PDF	http://arxiv.org/pdf/1803.07067v1.pdf
PWC	https://paperswithcode.com/paper/setting-up-a-reinforcement-learning-task-with
Repo	https://github.com/kindredresearch/SenseAct
Framework	none

Self-Supervised Generation of Spatial Audio for 360 Video


Title	Self-Supervised Generation of Spatial Audio for 360 Video
Authors	Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, Oliver Wang
Abstract	We introduce an approach to convert mono audio recorded by a 360 video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 video viewing, but spatial audio microphones are still rare in current 360 video production. Our system consists of end-to-end trainable neural networks that separate individual sound sources and localize them on the viewing sphere, conditioned on multi-modal analysis of audio and 360 video frames. We introduce several datasets, including one filmed ourselves, and one collected in-the-wild from YouTube, consisting of 360 videos uploaded with spatial audio. During training, ground-truth spatial audio serves as self-supervision and a mixed down mono track forms the input to our network. Using our approach, we show that it is possible to infer the spatial location of sound sources based only on 360 video and a mono audio track.
Tasks
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02587v1
PDF	http://arxiv.org/pdf/1809.02587v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-generation-of-spatial-audio-1
Repo	https://github.com/pedro-morgado/spatialaudiogen
Framework	tf

Adversarial Removal of Demographic Attributes from Text Data


Title	Adversarial Removal of Demographic Attributes from Text Data
Authors	Yanai Elazar, Yoav Goldberg
Abstract	Recent advances in Representation Learning and Adversarial Training seem to succeed in removing unwanted features from the learned representation. We show that demographic information of authors is encoded in – and can be recovered from – the intermediate representations learned by text-based neural classifiers. The implication is that decisions of classifiers trained on textual data are not agnostic to – and likely condition on – demographic attributes. When attempting to remove such demographic information using adversarial training, we find that while the adversarial component achieves chance-level development-set accuracy during training, a post-hoc classifier, trained on the encoded sentences from the first part, still manages to reach substantially higher classification accuracies on the same data. This behavior is consistent across several tasks, demographic properties and datasets. We explore several techniques to improve the effectiveness of the adversarial component. Our main conclusion is a cautionary one: do not rely on the adversarial training to achieve invariant representation to sensitive features.
Tasks	Representation Learning
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06640v2
PDF	http://arxiv.org/pdf/1808.06640v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-removal-of-demographic-attributes
Repo	https://github.com/yanaiela/demog-text-removal
Framework	none

VDMS: Efficient Big-Visual-Data Access for Machine Learning Workloads


Title	VDMS: Efficient Big-Visual-Data Access for Machine Learning Workloads
Authors	Luis Remis, Vishakha Gupta-Cledat, Christina Strong, Ragaad Altarawneh
Abstract	We introduce the Visual Data Management System (VDMS), which enables faster access to big-visual-data and adds support to visual analytics. This is achieved by searching for relevant visual data via metadata stored as a graph, and enabling faster access to visual data through new machine-friendly storage formats. VDMS differs from existing large scale photo serving, video streaming, and textual big-data management systems due to its primary focus on supporting machine learning and data analytics pipelines that use visual data (images, videos, and feature vectors), treating these as first class entities. We describe how to use VDMS via its user friendly interface and how it enables rich and efficient vision analytics through a machine learning pipeline for processing medical images. We show the improved performance of 2x in complex queries over a comparable set-up.
Tasks
Published	2018-10-28
URL	http://arxiv.org/abs/1810.11832v3
PDF	http://arxiv.org/pdf/1810.11832v3.pdf
PWC	https://paperswithcode.com/paper/vdms-efficient-big-visual-data-access-for
Repo	https://github.com/IntelLabs/vdms
Framework	none

Road Damage Detection Using Deep Neural Networks with Images Captured Through a Smartphone


Title	Road Damage Detection Using Deep Neural Networks with Images Captured Through a Smartphone
Authors	Hiroya Maeda, Yoshihide Sekimoto, Toshikazu Seto, Takehiro Kashiyama, Hiroshi Omata
Abstract	Research on damage detection of road surfaces using image processing techniques has been actively conducted, achieving considerably high detection accuracies. Many studies only focus on the detection of the presence or absence of damage. However, in a real-world scenario, when the road managers from a governing body need to repair such damage, they need to clearly understand the type of damage in order to take effective action. In addition, in many of these previous studies, the researchers acquire their own data using different methods. Hence, there is no uniform road damage dataset available openly, leading to the absence of a benchmark for road damage detection. This study makes three contributions to address these issues. First, to the best of our knowledge, for the first time, a large-scale road damage dataset is prepared. This dataset is composed of 9,053 road damage images captured with a smartphone installed on a car, with 15,435 instances of road surface damage included in these road images. In order to generate this dataset, we cooperated with 7 municipalities in Japan and acquired road images for more than 40 hours. These images were captured in a wide variety of weather and illuminance conditions. In each image, we annotated the bounding box representing the location and type of damage. Next, we used a state-of-the-art object detection method using convolutional neural networks to train the damage detection model with our dataset, and compared the accuracy and runtime speed on both, using a GPU server and a smartphone. Finally, we demonstrate that the type of damage can be classified into eight types with high accuracy by applying the proposed object detection method. The road damage dataset, our experimental results, and the developed smartphone application used in this study are publicly available (https://github.com/sekilab/RoadDamageDetector/).
Tasks	Object Detection, Road Damage Detection
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09454v2
PDF	http://arxiv.org/pdf/1801.09454v2.pdf
PWC	https://paperswithcode.com/paper/road-damage-detection-using-deep-neural
Repo	https://github.com/sekilab/RoadDamageDetector
Framework	tf

The Hybrid Bootstrap: A Drop-in Replacement for Dropout


Title	The Hybrid Bootstrap: A Drop-in Replacement for Dropout
Authors	Robert Kosar, David W. Scott
Abstract	Regularization is an important component of predictive model building. The hybrid bootstrap is a regularization technique that functions similarly to dropout except that features are resampled from other training points rather than replaced with zeros. We show that the hybrid bootstrap offers superior performance to dropout. We also present a sampling based technique to simplify hyperparameter choice. Next, we provide an alternative sampling technique for convolutional neural networks. Finally, we demonstrate the efficacy of the hybrid bootstrap on non-image tasks using tree-based models.
Tasks
Published	2018-01-22
URL	http://arxiv.org/abs/1801.07316v1
PDF	http://arxiv.org/pdf/1801.07316v1.pdf
PWC	https://paperswithcode.com/paper/the-hybrid-bootstrap-a-drop-in-replacement
Repo	https://github.com/r-kosar/hybrid_bootstrap
Framework	none