October 21, 2019

3098 words 15 mins read

Paper Group AWR 160

Dataset Distillation. Revisiting Distillation and Incremental Classifier Learning. Using deep Q-learning to understand the tax evasion behavior of risk-averse firms. Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence. Masked Conditional Neural Networks for Environmental Sound Classification. ECC: Platform-Independe …

Dataset Distillation


Title	Dataset Distillation
Authors	Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros
Abstract	Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge from a large training dataset into a small one. The idea is to synthesize a small number of data points that do not need to come from the correct data distribution, but will, when given to the learning algorithm as training data, approximate the model trained on the original data. For example, we show that it is possible to compress 60,000 MNIST training images into just 10 synthetic distilled images (one per class) and achieve close to original performance with only a few gradient descent steps, given a fixed network initialization. We evaluate our method in various initialization settings and with different learning objectives. Experiments on multiple datasets show the advantage of our approach compared to alternative methods.
Tasks
Published	2018-11-27
URL	https://arxiv.org/abs/1811.10959v3
PDF	https://arxiv.org/pdf/1811.10959v3.pdf
PWC	https://paperswithcode.com/paper/dataset-distillation
Repo	https://github.com/SsnL/dataset-distillation
Framework	pytorch

Revisiting Distillation and Incremental Classifier Learning


Title	Revisiting Distillation and Incremental Classifier Learning
Authors	Khurram Javed, Faisal Shafait
Abstract	One of the key differences between the learning mechanism of humans and Artificial Neural Networks (ANNs) is the ability of humans to learn one task at a time. ANNs, on the other hand, can only learn multiple tasks simultaneously. Any attempts at learning new tasks incrementally cause them to completely forget about previous tasks. This lack of ability to learn incrementally, called Catastrophic Forgetting, is considered a major hurdle in building a true AI system. In this paper, our goal is to isolate the truly effective existing ideas for incremental learning from those that only work under certain conditions. To this end, we first thoroughly analyze the current state of the art (iCaRL) method for incremental learning and demonstrate that the good performance of the system is not because of the reasons presented in the existing literature. We conclude that the success of iCaRL is primarily due to knowledge distillation and recognize a key limitation of knowledge distillation, i.e, it often leads to bias in classifiers. Finally, we propose a dynamic threshold moving algorithm that is able to successfully remove this bias. We demonstrate the effectiveness of our algorithm on CIFAR100 and MNIST datasets showing near-optimal results. Our implementation is available at https://github.com/Khurramjaved96/incremental-learning.
Tasks
Published	2018-07-08
URL	http://arxiv.org/abs/1807.02802v2
PDF	http://arxiv.org/pdf/1807.02802v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-distillation-and-incremental
Repo	https://github.com/einavyog/my-incremental-learning
Framework	pytorch

Using deep Q-learning to understand the tax evasion behavior of risk-averse firms


Title	Using deep Q-learning to understand the tax evasion behavior of risk-averse firms
Authors	Nikolaos D. Goumagias, Dimitrios Hristu-Varsakelis, Yannis M. Assael
Abstract	Designing tax policies that are effective in curbing tax evasion and maximize state revenues requires a rigorous understanding of taxpayer behavior. This work explores the problem of determining the strategy a self-interested, risk-averse tax entity is expected to follow, as it “navigates” - in the context of a Markov Decision Process - a government-controlled tax environment that includes random audits, penalties and occasional tax amnesties. Although simplified versions of this problem have been previously explored, the mere assumption of risk-aversion (as opposed to risk-neutrality) raises the complexity of finding the optimal policy well beyond the reach of analytical techniques. Here, we obtain approximate solutions via a combination of Q-learning and recent advances in Deep Reinforcement Learning. By doing so, we i) determine the tax evasion behavior expected of the taxpayer entity, ii) calculate the degree of risk aversion of the “average” entity given empirical estimates of tax evasion, and iii) evaluate sample tax policies, in terms of expected revenues. Our model can be useful as a testbed for “in-vitro” testing of tax policies, while our results lead to various policy recommendations.
Tasks	Q-Learning
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09466v1
PDF	http://arxiv.org/pdf/1801.09466v1.pdf
PWC	https://paperswithcode.com/paper/using-deep-q-learning-to-understand-the-tax
Repo	https://github.com/iassael/tax-evasion-dqn
Framework	pytorch

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence


Title	Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Authors	Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, Philip H. S. Torr
Abstract	Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence.
Tasks
Published	2018-01-30
URL	http://arxiv.org/abs/1801.10112v3
PDF	http://arxiv.org/pdf/1801.10112v3.pdf
PWC	https://paperswithcode.com/paper/riemannian-walk-for-incremental-learning
Repo	https://github.com/facebookresearch/agem
Framework	tf

Masked Conditional Neural Networks for Environmental Sound Classification


Title	Masked Conditional Neural Networks for Environmental Sound Classification
Authors	Fady Medhat, David Chesmore, John Robinson
Abstract	The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network’s links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k. We extended the Urbansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance.
Tasks	Environmental Sound Classification
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10004v2
PDF	http://arxiv.org/pdf/1805.10004v2.pdf
PWC	https://paperswithcode.com/paper/masked-conditional-neural-networks-for-1
Repo	https://github.com/fadymedhat/MCLNN
Framework	tf

ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model


Title	ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
Authors	Haichuan Yang, Yuhao Zhu, Ji Liu
Abstract	Many DNN-enabled vision applications constantly operate under severe energy constraints such as unmanned aerial vehicles, Augmented Reality headsets, and smartphones. Designing DNNs that can meet a stringent energy budget is becoming increasingly important. This paper proposes ECC, a framework that compresses DNNs to meet a given energy constraint while minimizing accuracy loss. The key idea of ECC is to model the DNN energy consumption via a novel bilinear regression function. The energy estimate model allows us to formulate DNN compression as a constrained optimization that minimizes the DNN loss function over the energy constraint. The optimization problem, however, has nontrivial constraints. Therefore, existing deep learning solvers do not apply directly. We propose an optimization algorithm that combines the essence of the Alternating Direction Method of Multipliers (ADMM) framework with gradient-based learning algorithms. The algorithm decomposes the original constrained optimization into several subproblems that are solved iteratively and efficiently. ECC is also portable across different hardware platforms without requiring hardware knowledge. Experiments show that ECC achieves higher accuracy under the same or lower energy budget compared to state-of-the-art resource-constrained DNN compression techniques.
Tasks	Neural Network Compression
Published	2018-12-05
URL	http://arxiv.org/abs/1812.01803v3
PDF	http://arxiv.org/pdf/1812.01803v3.pdf
PWC	https://paperswithcode.com/paper/ecc-energy-constrained-deep-neural-network
Repo	https://github.com/hyang1990/energy_constrained_compression
Framework	pytorch

Mining gold from implicit models to improve likelihood-free inference


Title	Mining gold from implicit models to improve likelihood-free inference
Authors	Johann Brehmer, Gilles Louppe, Juan Pavez, Kyle Cranmer
Abstract	Simulators often provide the best description of real-world phenomena. However, they also lead to challenging inverse problems because the density they implicitly define is often intractable. We present a new suite of simulation-based inference techniques that go beyond the traditional Approximate Bayesian Computation approach, which struggles in a high-dimensional setting, and extend methods that use surrogate models based on neural networks. We show that additional information, such as the joint likelihood ratio and the joint score, can often be extracted from simulators and used to augment the training data for these surrogate models. Finally, we demonstrate that these new techniques are more sample efficient and provide higher-fidelity inference than traditional methods.
Tasks
Published	2018-05-30
URL	https://arxiv.org/abs/1805.12244v4
PDF	https://arxiv.org/pdf/1805.12244v4.pdf
PWC	https://paperswithcode.com/paper/mining-gold-from-implicit-models-to-improve
Repo	https://github.com/johannbrehmer/simulator-mining-example
Framework	none

Partial Adversarial Domain Adaptation


Title	Partial Adversarial Domain Adaptation
Authors	Zhangjie Cao, Lijia Ma, Mingsheng Long, Jianmin Wang
Abstract	Domain adversarial learning aligns the feature distributions across the source and target domains in a two-player minimax game. Existing domain adversarial networks generally assume identical label space across different domains. In the presence of big data, there is strong motivation of transferring deep models from existing big domains to unknown small domains. This paper introduces partial domain adaptation as a new domain adaptation scenario, which relaxes the fully shared label space assumption to that the source label space subsumes the target label space. Previous methods typically match the whole source domain to the target domain, which are vulnerable to negative transfer for the partial domain adaptation problem due to the large mismatch between label spaces. We present Partial Adversarial Domain Adaptation (PADA), which simultaneously alleviates negative transfer by down-weighing the data of outlier source classes for training both source classifier and domain adversary, and promotes positive transfer by matching the feature distributions in the shared label space. Experiments show that PADA exceeds state-of-the-art results for partial domain adaptation tasks on several datasets.
Tasks	Domain Adaptation, Partial Domain Adaptation
Published	2018-08-10
URL	http://arxiv.org/abs/1808.04205v1
PDF	http://arxiv.org/pdf/1808.04205v1.pdf
PWC	https://paperswithcode.com/paper/partial-adversarial-domain-adaptation
Repo	https://github.com/thuml/PADA
Framework	pytorch

Let’s take a Walk on Superpixels Graphs: Deformable Linear Objects Segmentation and Model Estimation


Title	Let’s take a Walk on Superpixels Graphs: Deformable Linear Objects Segmentation and Model Estimation
Authors	Daniele De Gregorio, Gianluca Palli, Luigi Di Stefano
Abstract	While robotic manipulation of rigid objects is quite straightforward, coping with deformable objects is an open issue. More specifically, tasks like tying a knot, wiring a connector or even surgical suturing deal with the domain of Deformable Linear Objects (DLOs). In particular the detection of a DLO is a non-trivial problem especially under clutter and occlusions (as well as self-occlusions). The pose estimation of a DLO results into the identification of its parameters related to a designed model, e.g. a basis spline. It follows that the stand-alone segmentation of a DLO might not be sufficient to conduct a full manipulation task. This is why we propose a novel framework able to perform both a semantic segmentation and b-spline modeling of multiple deformable linear objects simultaneously without strict requirements about environment (i.e. the background). The core algorithm is based on biased random walks over the Region Adiacency Graph built on a superpixel oversegmentation of the source image. The algorithm is initialized by a Convolutional Neural Networks that detects the DLO’s endcaps. An open source implementation of the proposed approach is also provided to easy the reproduction of the whole detection pipeline along with a novel cables dataset in order to encourage further experiments.
Tasks	Pose Estimation, Semantic Segmentation
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04461v1
PDF	http://arxiv.org/pdf/1810.04461v1.pdf
PWC	https://paperswithcode.com/paper/lets-take-a-walk-on-superpixels-graphs
Repo	https://github.com/m4nh/cables_dataset
Framework	none

Machine Learning for Link Quality Estimation: A Survey


Title	Machine Learning for Link Quality Estimation: A Survey
Authors	Gregor Cerar, Halil Yetgin, Mihael Mohorčič, Carolina Fortuna
Abstract	Since the emergence of wireless communication networks, a plethora of research papers focus their attention on the quality aspects of wireless links. The analysis of the rich body of existing literature on link quality estimation using models developed from data traces indicates that the techniques used for modeling link quality estimation are becoming increasingly sophisticated. A number of recent estimators leverage machine learning (ML) techniques that require a sophisticated design and development process, each of which has a great potential to significantly affect the overall model performance. In this paper, we provide a comprehensive survey on link quality estimators developed from empirical data and review a rich variety of the existing open source datasets. We then perform a systematic analysis to reveal the influence of the design decisions taken in each step of ML process on the final performance of the ML-based link quality estimators. One substantial lesson learned is that the measurement data preprocessing and feature engineering have a higher influence on the performance of the model than that of the choice of ML algorithms.
Tasks	Feature Engineering, Link Quality Estimation
Published	2018-12-07
URL	https://arxiv.org/abs/1812.08856v5
PDF	https://arxiv.org/pdf/1812.08856v5.pdf
PWC	https://paperswithcode.com/paper/analysis-of-machine-learning-for-link-quality
Repo	https://github.com/sensorlab/link-quality-estimation
Framework	none

Zero-Shot Dual Machine Translation


Title	Zero-Shot Dual Machine Translation
Authors	Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, Thomas Hofmann
Abstract	Neural Machine Translation (NMT) systems rely on large amounts of parallel data. This is a major challenge for low-resource languages. Building on recent work on unsupervised and semi-supervised methods, we present an approach that combines zero-shot and dual learning. The latter relies on reinforcement learning, to exploit the duality of the machine translation task, and requires only monolingual data for the target language pair. Experiments show that a zero-shot dual system, trained on English-French and English-Spanish, outperforms by large margins a standard NMT system in zero-shot translation performance on Spanish-French (both directions). The zero-shot dual method approaches the performance, within 2.2 BLEU points, of a comparable supervised setting. Our method can obtain improvements also on the setting where a small amount of parallel data for the zero-shot language pair is available. Adding Russian, to extend our experiments to jointly modeling 6 zero-shot translation directions, all directions improve between 4 and 15 BLEU points, again, reaching performance near that of the supervised setting.
Tasks	Machine Translation
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10338v1
PDF	http://arxiv.org/pdf/1805.10338v1.pdf
PWC	https://paperswithcode.com/paper/zero-shot-dual-machine-translation
Repo	https://github.com/liernisestorain/zero-shot-dual-MT
Framework	tf

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering


Title	HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Authors	Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning
Abstract	Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
Tasks	Question Answering
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09600v1
PDF	http://arxiv.org/pdf/1809.09600v1.pdf
PWC	https://paperswithcode.com/paper/hotpotqa-a-dataset-for-diverse-explainable
Repo	https://github.com/facebookresearch/UnsupervisedDecomposition
Framework	pytorch

Unsupervised Depth Estimation, 3D Face Rotation and Replacement


Title	Unsupervised Depth Estimation, 3D Face Rotation and Replacement
Authors	Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, Christopher Pal
Abstract	We present an unsupervised approach for learning to estimate three dimensional (3D) facial structure from a single image while also predicting 3D viewpoint transformations that match a desired pose and facial geometry. We achieve this by inferring the depth of facial keypoints of an input image in an unsupervised manner, without using any form of ground-truth depth information. We show how it is possible to use these depths as intermediate computations within a new backpropable loss to predict the parameters of a 3D affine transformation matrix that maps inferred 3D keypoints of an input face to the corresponding 2D keypoints on a desired target facial geometry or pose. Our resulting approach, called DepthNets, can therefore be used to infer plausible 3D transformations from one face pose to another, allowing faces to be frontalized, transformed into 3D models or even warped to another pose and facial geometry. Lastly, we identify certain shortcomings with our formulation, and explore adversarial image translation techniques as a post-processing step to re-synthesize complete head shots for faces re-targeted to different poses or identities.
Tasks	Depth Estimation
Published	2018-03-25
URL	http://arxiv.org/abs/1803.09202v5
PDF	http://arxiv.org/pdf/1803.09202v5.pdf
PWC	https://paperswithcode.com/paper/unsupervised-depth-estimation-3d-face
Repo	https://github.com/joelmoniz/DepthNets
Framework	pytorch

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance


Title	Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
Authors	Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio
Abstract	In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shift-reduce parsing schemes, our approach is free from the potential problem of compounding errors, while being faster and easier to parallelize. Our model achieves competitive performance amongst single model, discriminative parsers in the PTB dataset and outperforms previous models in the CTB dataset.
Tasks	Constituency Parsing
Published	2018-06-11
URL	http://arxiv.org/abs/1806.04168v1
PDF	http://arxiv.org/pdf/1806.04168v1.pdf
PWC	https://paperswithcode.com/paper/straight-to-the-tree-constituency-parsing
Repo	https://github.com/sordonia/distance_parser
Framework	pytorch

Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages


Title	Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages
Authors	Nurendra Choudhary, Rajat Singh, Ishita Bindlish, Manish Shrivastava
Abstract	Code-mixed data is an important challenge of natural language processing because its characteristics completely vary from the traditional structures of standard languages. In this paper, we propose a novel approach called Sentiment Analysis of Code-Mixed Text (SACMT) to classify sentences into their corresponding sentiment - positive, negative or neutral, using contrastive learning. We utilize the shared parameters of siamese networks to map the sentences of code-mixed and standard languages to a common sentiment space. Also, we introduce a basic clustering based preprocessing method to capture variations of code-mixed transliterated words. Our experiments reveal that SACMT outperforms the state-of-the-art approaches in sentiment analysis for code-mixed text by 7.6% in accuracy and 10.1% in F-score.
Tasks	Sentiment Analysis
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00806v1
PDF	http://arxiv.org/pdf/1804.00806v1.pdf
PWC	https://paperswithcode.com/paper/sentiment-analysis-of-code-mixed-languages
Repo	https://github.com/mankadronit/60DaysofUdacity-Challenge
Framework	pytorch