October 21, 2019

3098 words 15 mins read

Paper Group AWR 160

Paper Group AWR 160

Dataset Distillation. Revisiting Distillation and Incremental Classifier Learning. Using deep Q-learning to understand the tax evasion behavior of risk-averse firms. Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence. Masked Conditional Neural Networks for Environmental Sound Classification. ECC: Platform-Independe …

Dataset Distillation

Title Dataset Distillation
Authors Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros
Abstract Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge from a large training dataset into a small one. The idea is to synthesize a small number of data points that do not need to come from the correct data distribution, but will, when given to the learning algorithm as training data, approximate the model trained on the original data. For example, we show that it is possible to compress 60,000 MNIST training images into just 10 synthetic distilled images (one per class) and achieve close to original performance with only a few gradient descent steps, given a fixed network initialization. We evaluate our method in various initialization settings and with different learning objectives. Experiments on multiple datasets show the advantage of our approach compared to alternative methods.
Tasks
Published 2018-11-27
URL https://arxiv.org/abs/1811.10959v3
PDF https://arxiv.org/pdf/1811.10959v3.pdf
PWC https://paperswithcode.com/paper/dataset-distillation
Repo https://github.com/SsnL/dataset-distillation
Framework pytorch

Revisiting Distillation and Incremental Classifier Learning

Title Revisiting Distillation and Incremental Classifier Learning
Authors Khurram Javed, Faisal Shafait
Abstract One of the key differences between the learning mechanism of humans and Artificial Neural Networks (ANNs) is the ability of humans to learn one task at a time. ANNs, on the other hand, can only learn multiple tasks simultaneously. Any attempts at learning new tasks incrementally cause them to completely forget about previous tasks. This lack of ability to learn incrementally, called Catastrophic Forgetting, is considered a major hurdle in building a true AI system. In this paper, our goal is to isolate the truly effective existing ideas for incremental learning from those that only work under certain conditions. To this end, we first thoroughly analyze the current state of the art (iCaRL) method for incremental learning and demonstrate that the good performance of the system is not because of the reasons presented in the existing literature. We conclude that the success of iCaRL is primarily due to knowledge distillation and recognize a key limitation of knowledge distillation, i.e, it often leads to bias in classifiers. Finally, we propose a dynamic threshold moving algorithm that is able to successfully remove this bias. We demonstrate the effectiveness of our algorithm on CIFAR100 and MNIST datasets showing near-optimal results. Our implementation is available at https://github.com/Khurramjaved96/incremental-learning.
Tasks
Published 2018-07-08
URL http://arxiv.org/abs/1807.02802v2
PDF http://arxiv.org/pdf/1807.02802v2.pdf
PWC https://paperswithcode.com/paper/revisiting-distillation-and-incremental
Repo https://github.com/einavyog/my-incremental-learning
Framework pytorch

Using deep Q-learning to understand the tax evasion behavior of risk-averse firms

Title Using deep Q-learning to understand the tax evasion behavior of risk-averse firms
Authors Nikolaos D. Goumagias, Dimitrios Hristu-Varsakelis, Yannis M. Assael
Abstract Designing tax policies that are effective in curbing tax evasion and maximize state revenues requires a rigorous understanding of taxpayer behavior. This work explores the problem of determining the strategy a self-interested, risk-averse tax entity is expected to follow, as it “navigates” - in the context of a Markov Decision Process - a government-controlled tax environment that includes random audits, penalties and occasional tax amnesties. Although simplified versions of this problem have been previously explored, the mere assumption of risk-aversion (as opposed to risk-neutrality) raises the complexity of finding the optimal policy well beyond the reach of analytical techniques. Here, we obtain approximate solutions via a combination of Q-learning and recent advances in Deep Reinforcement Learning. By doing so, we i) determine the tax evasion behavior expected of the taxpayer entity, ii) calculate the degree of risk aversion of the “average” entity given empirical estimates of tax evasion, and iii) evaluate sample tax policies, in terms of expected revenues. Our model can be useful as a testbed for “in-vitro” testing of tax policies, while our results lead to various policy recommendations.
Tasks Q-Learning
Published 2018-01-29
URL http://arxiv.org/abs/1801.09466v1
PDF http://arxiv.org/pdf/1801.09466v1.pdf
PWC https://paperswithcode.com/paper/using-deep-q-learning-to-understand-the-tax
Repo https://github.com/iassael/tax-evasion-dqn
Framework pytorch

Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence

Title Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Authors Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, Philip H. S. Torr
Abstract Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence.
Tasks
Published 2018-01-30
URL http://arxiv.org/abs/1801.10112v3
PDF http://arxiv.org/pdf/1801.10112v3.pdf
PWC https://paperswithcode.com/paper/riemannian-walk-for-incremental-learning
Repo https://github.com/facebookresearch/agem
Framework tf

Masked Conditional Neural Networks for Environmental Sound Classification

Title Masked Conditional Neural Networks for Environmental Sound Classification
Authors Fady Medhat, David Chesmore, John Robinson
Abstract The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network’s links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k. We extended the Urbansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance.
Tasks Environmental Sound Classification
Published 2018-05-25
URL http://arxiv.org/abs/1805.10004v2
PDF http://arxiv.org/pdf/1805.10004v2.pdf
PWC https://paperswithcode.com/paper/masked-conditional-neural-networks-for-1
Repo https://github.com/fadymedhat/MCLNN
Framework tf

ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model

Title ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
Authors Haichuan Yang, Yuhao Zhu, Ji Liu
Abstract Many DNN-enabled vision applications constantly operate under severe energy constraints such as unmanned aerial vehicles, Augmented Reality headsets, and smartphones. Designing DNNs that can meet a stringent energy budget is becoming increasingly important. This paper proposes ECC, a framework that compresses DNNs to meet a given energy constraint while minimizing accuracy loss. The key idea of ECC is to model the DNN energy consumption via a novel bilinear regression function. The energy estimate model allows us to formulate DNN compression as a constrained optimization that minimizes the DNN loss function over the energy constraint. The optimization problem, however, has nontrivial constraints. Therefore, existing deep learning solvers do not apply directly. We propose an optimization algorithm that combines the essence of the Alternating Direction Method of Multipliers (ADMM) framework with gradient-based learning algorithms. The algorithm decomposes the original constrained optimization into several subproblems that are solved iteratively and efficiently. ECC is also portable across different hardware platforms without requiring hardware knowledge. Experiments show that ECC achieves higher accuracy under the same or lower energy budget compared to state-of-the-art resource-constrained DNN compression techniques.
Tasks Neural Network Compression
Published 2018-12-05
URL http://arxiv.org/abs/1812.01803v3
PDF http://arxiv.org/pdf/1812.01803v3.pdf
PWC https://paperswithcode.com/paper/ecc-energy-constrained-deep-neural-network
Repo https://github.com/hyang1990/energy_constrained_compression
Framework pytorch

Mining gold from implicit models to improve likelihood-free inference

Title Mining gold from implicit models to improve likelihood-free inference
Authors Johann Brehmer, Gilles Louppe, Juan Pavez, Kyle Cranmer
Abstract Simulators often provide the best description of real-world phenomena. However, they also lead to challenging inverse problems because the density they implicitly define is often intractable. We present a new suite of simulation-based inference techniques that go beyond the traditional Approximate Bayesian Computation approach, which struggles in a high-dimensional setting, and extend methods that use surrogate models based on neural networks. We show that additional information, such as the joint likelihood ratio and the joint score, can often be extracted from simulators and used to augment the training data for these surrogate models. Finally, we demonstrate that these new techniques are more sample efficient and provide higher-fidelity inference than traditional methods.
Tasks
Published 2018-05-30
URL https://arxiv.org/abs/1805.12244v4
PDF https://arxiv.org/pdf/1805.12244v4.pdf
PWC https://paperswithcode.com/paper/mining-gold-from-implicit-models-to-improve
Repo https://github.com/johannbrehmer/simulator-mining-example
Framework none

Partial Adversarial Domain Adaptation

Title Partial Adversarial Domain Adaptation
Authors Zhangjie Cao, Lijia Ma, Mingsheng Long, Jianmin Wang
Abstract Domain adversarial learning aligns the feature distributions across the source and target domains in a two-player minimax game. Existing domain adversarial networks generally assume identical label space across different domains. In the presence of big data, there is strong motivation of transferring deep models from existing big domains to unknown small domains. This paper introduces partial domain adaptation as a new domain adaptation scenario, which relaxes the fully shared label space assumption to that the source label space subsumes the target label space. Previous methods typically match the whole source domain to the target domain, which are vulnerable to negative transfer for the partial domain adaptation problem due to the large mismatch between label spaces. We present Partial Adversarial Domain Adaptation (PADA), which simultaneously alleviates negative transfer by down-weighing the data of outlier source classes for training both source classifier and domain adversary, and promotes positive transfer by matching the feature distributions in the shared label space. Experiments show that PADA exceeds state-of-the-art results for partial domain adaptation tasks on several datasets.
Tasks Domain Adaptation, Partial Domain Adaptation
Published 2018-08-10
URL http://arxiv.org/abs/1808.04205v1
PDF http://arxiv.org/pdf/1808.04205v1.pdf
PWC https://paperswithcode.com/paper/partial-adversarial-domain-adaptation
Repo https://github.com/thuml/PADA
Framework pytorch

Let’s take a Walk on Superpixels Graphs: Deformable Linear Objects Segmentation and Model Estimation

Title Let’s take a Walk on Superpixels Graphs: Deformable Linear Objects Segmentation and Model Estimation
Authors Daniele De Gregorio, Gianluca Palli, Luigi Di Stefano
Abstract While robotic manipulation of rigid objects is quite straightforward, coping with deformable objects is an open issue. More specifically, tasks like tying a knot, wiring a connector or even surgical suturing deal with the domain of Deformable Linear Objects (DLOs). In particular the detection of a DLO is a non-trivial problem especially under clutter and occlusions (as well as self-occlusions). The pose estimation of a DLO results into the identification of its parameters related to a designed model, e.g. a basis spline. It follows that the stand-alone segmentation of a DLO might not be sufficient to conduct a full manipulation task. This is why we propose a novel framework able to perform both a semantic segmentation and b-spline modeling of multiple deformable linear objects simultaneously without strict requirements about environment (i.e. the background). The core algorithm is based on biased random walks over the Region Adiacency Graph built on a superpixel oversegmentation of the source image. The algorithm is initialized by a Convolutional Neural Networks that detects the DLO’s endcaps. An open source implementation of the proposed approach is also provided to easy the reproduction of the whole detection pipeline along with a novel cables dataset in order to encourage further experiments.
Tasks Pose Estimation, Semantic Segmentation
Published 2018-10-10
URL http://arxiv.org/abs/1810.04461v1
PDF http://arxiv.org/pdf/1810.04461v1.pdf
PWC https://paperswithcode.com/paper/lets-take-a-walk-on-superpixels-graphs
Repo https://github.com/m4nh/cables_dataset
Framework none
Title Machine Learning for Link Quality Estimation: A Survey
Authors Gregor Cerar, Halil Yetgin, Mihael Mohorčič, Carolina Fortuna
Abstract Since the emergence of wireless communication networks, a plethora of research papers focus their attention on the quality aspects of wireless links. The analysis of the rich body of existing literature on link quality estimation using models developed from data traces indicates that the techniques used for modeling link quality estimation are becoming increasingly sophisticated. A number of recent estimators leverage machine learning (ML) techniques that require a sophisticated design and development process, each of which has a great potential to significantly affect the overall model performance. In this paper, we provide a comprehensive survey on link quality estimators developed from empirical data and review a rich variety of the existing open source datasets. We then perform a systematic analysis to reveal the influence of the design decisions taken in each step of ML process on the final performance of the ML-based link quality estimators. One substantial lesson learned is that the measurement data preprocessing and feature engineering have a higher influence on the performance of the model than that of the choice of ML algorithms.
Tasks Feature Engineering, Link Quality Estimation
Published 2018-12-07
URL https://arxiv.org/abs/1812.08856v5
PDF https://arxiv.org/pdf/1812.08856v5.pdf
PWC https://paperswithcode.com/paper/analysis-of-machine-learning-for-link-quality
Repo https://github.com/sensorlab/link-quality-estimation
Framework none

Zero-Shot Dual Machine Translation

Title Zero-Shot Dual Machine Translation
Authors Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, Thomas Hofmann
Abstract Neural Machine Translation (NMT) systems rely on large amounts of parallel data. This is a major challenge for low-resource languages. Building on recent work on unsupervised and semi-supervised methods, we present an approach that combines zero-shot and dual learning. The latter relies on reinforcement learning, to exploit the duality of the machine translation task, and requires only monolingual data for the target language pair. Experiments show that a zero-shot dual system, trained on English-French and English-Spanish, outperforms by large margins a standard NMT system in zero-shot translation performance on Spanish-French (both directions). The zero-shot dual method approaches the performance, within 2.2 BLEU points, of a comparable supervised setting. Our method can obtain improvements also on the setting where a small amount of parallel data for the zero-shot language pair is available. Adding Russian, to extend our experiments to jointly modeling 6 zero-shot translation directions, all directions improve between 4 and 15 BLEU points, again, reaching performance near that of the supervised setting.
Tasks Machine Translation
Published 2018-05-25
URL http://arxiv.org/abs/1805.10338v1
PDF http://arxiv.org/pdf/1805.10338v1.pdf
PWC https://paperswithcode.com/paper/zero-shot-dual-machine-translation
Repo https://github.com/liernisestorain/zero-shot-dual-MT
Framework tf

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Title HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Authors Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning
Abstract Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
Tasks Question Answering
Published 2018-09-25
URL http://arxiv.org/abs/1809.09600v1
PDF http://arxiv.org/pdf/1809.09600v1.pdf
PWC https://paperswithcode.com/paper/hotpotqa-a-dataset-for-diverse-explainable
Repo https://github.com/facebookresearch/UnsupervisedDecomposition
Framework pytorch

Unsupervised Depth Estimation, 3D Face Rotation and Replacement

Title Unsupervised Depth Estimation, 3D Face Rotation and Replacement
Authors Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, Christopher Pal
Abstract We present an unsupervised approach for learning to estimate three dimensional (3D) facial structure from a single image while also predicting 3D viewpoint transformations that match a desired pose and facial geometry. We achieve this by inferring the depth of facial keypoints of an input image in an unsupervised manner, without using any form of ground-truth depth information. We show how it is possible to use these depths as intermediate computations within a new backpropable loss to predict the parameters of a 3D affine transformation matrix that maps inferred 3D keypoints of an input face to the corresponding 2D keypoints on a desired target facial geometry or pose. Our resulting approach, called DepthNets, can therefore be used to infer plausible 3D transformations from one face pose to another, allowing faces to be frontalized, transformed into 3D models or even warped to another pose and facial geometry. Lastly, we identify certain shortcomings with our formulation, and explore adversarial image translation techniques as a post-processing step to re-synthesize complete head shots for faces re-targeted to different poses or identities.
Tasks Depth Estimation
Published 2018-03-25
URL http://arxiv.org/abs/1803.09202v5
PDF http://arxiv.org/pdf/1803.09202v5.pdf
PWC https://paperswithcode.com/paper/unsupervised-depth-estimation-3d-face
Repo https://github.com/joelmoniz/DepthNets
Framework pytorch

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

Title Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
Authors Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio
Abstract In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shift-reduce parsing schemes, our approach is free from the potential problem of compounding errors, while being faster and easier to parallelize. Our model achieves competitive performance amongst single model, discriminative parsers in the PTB dataset and outperforms previous models in the CTB dataset.
Tasks Constituency Parsing
Published 2018-06-11
URL http://arxiv.org/abs/1806.04168v1
PDF http://arxiv.org/pdf/1806.04168v1.pdf
PWC https://paperswithcode.com/paper/straight-to-the-tree-constituency-parsing
Repo https://github.com/sordonia/distance_parser
Framework pytorch

Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages

Title Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages
Authors Nurendra Choudhary, Rajat Singh, Ishita Bindlish, Manish Shrivastava
Abstract Code-mixed data is an important challenge of natural language processing because its characteristics completely vary from the traditional structures of standard languages. In this paper, we propose a novel approach called Sentiment Analysis of Code-Mixed Text (SACMT) to classify sentences into their corresponding sentiment - positive, negative or neutral, using contrastive learning. We utilize the shared parameters of siamese networks to map the sentences of code-mixed and standard languages to a common sentiment space. Also, we introduce a basic clustering based preprocessing method to capture variations of code-mixed transliterated words. Our experiments reveal that SACMT outperforms the state-of-the-art approaches in sentiment analysis for code-mixed text by 7.6% in accuracy and 10.1% in F-score.
Tasks Sentiment Analysis
Published 2018-04-03
URL http://arxiv.org/abs/1804.00806v1
PDF http://arxiv.org/pdf/1804.00806v1.pdf
PWC https://paperswithcode.com/paper/sentiment-analysis-of-code-mixed-languages
Repo https://github.com/mankadronit/60DaysofUdacity-Challenge
Framework pytorch
comments powered by Disqus