Paper Group AWR 160
Dataset Distillation. Revisiting Distillation and Incremental Classifier Learning. Using deep Q-learning to understand the tax evasion behavior of risk-averse firms. Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence. Masked Conditional Neural Networks for Environmental Sound Classification. ECC: Platform-Independe …
Dataset Distillation
Title | Dataset Distillation |
Authors | Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros |
Abstract | Model distillation aims to distill the knowledge of a complex model into a simpler one. In this paper, we consider an alternative formulation called dataset distillation: we keep the model fixed and instead attempt to distill the knowledge from a large training dataset into a small one. The idea is to synthesize a small number of data points that do not need to come from the correct data distribution, but will, when given to the learning algorithm as training data, approximate the model trained on the original data. For example, we show that it is possible to compress 60,000 MNIST training images into just 10 synthetic distilled images (one per class) and achieve close to original performance with only a few gradient descent steps, given a fixed network initialization. We evaluate our method in various initialization settings and with different learning objectives. Experiments on multiple datasets show the advantage of our approach compared to alternative methods. |
Tasks | |
Published | 2018-11-27 |
URL | https://arxiv.org/abs/1811.10959v3 |
https://arxiv.org/pdf/1811.10959v3.pdf | |
PWC | https://paperswithcode.com/paper/dataset-distillation |
Repo | https://github.com/SsnL/dataset-distillation |
Framework | pytorch |
Revisiting Distillation and Incremental Classifier Learning
Title | Revisiting Distillation and Incremental Classifier Learning |
Authors | Khurram Javed, Faisal Shafait |
Abstract | One of the key differences between the learning mechanism of humans and Artificial Neural Networks (ANNs) is the ability of humans to learn one task at a time. ANNs, on the other hand, can only learn multiple tasks simultaneously. Any attempts at learning new tasks incrementally cause them to completely forget about previous tasks. This lack of ability to learn incrementally, called Catastrophic Forgetting, is considered a major hurdle in building a true AI system. In this paper, our goal is to isolate the truly effective existing ideas for incremental learning from those that only work under certain conditions. To this end, we first thoroughly analyze the current state of the art (iCaRL) method for incremental learning and demonstrate that the good performance of the system is not because of the reasons presented in the existing literature. We conclude that the success of iCaRL is primarily due to knowledge distillation and recognize a key limitation of knowledge distillation, i.e, it often leads to bias in classifiers. Finally, we propose a dynamic threshold moving algorithm that is able to successfully remove this bias. We demonstrate the effectiveness of our algorithm on CIFAR100 and MNIST datasets showing near-optimal results. Our implementation is available at https://github.com/Khurramjaved96/incremental-learning. |
Tasks | |
Published | 2018-07-08 |
URL | http://arxiv.org/abs/1807.02802v2 |
http://arxiv.org/pdf/1807.02802v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-distillation-and-incremental |
Repo | https://github.com/einavyog/my-incremental-learning |
Framework | pytorch |
Using deep Q-learning to understand the tax evasion behavior of risk-averse firms
Title | Using deep Q-learning to understand the tax evasion behavior of risk-averse firms |
Authors | Nikolaos D. Goumagias, Dimitrios Hristu-Varsakelis, Yannis M. Assael |
Abstract | Designing tax policies that are effective in curbing tax evasion and maximize state revenues requires a rigorous understanding of taxpayer behavior. This work explores the problem of determining the strategy a self-interested, risk-averse tax entity is expected to follow, as it “navigates” - in the context of a Markov Decision Process - a government-controlled tax environment that includes random audits, penalties and occasional tax amnesties. Although simplified versions of this problem have been previously explored, the mere assumption of risk-aversion (as opposed to risk-neutrality) raises the complexity of finding the optimal policy well beyond the reach of analytical techniques. Here, we obtain approximate solutions via a combination of Q-learning and recent advances in Deep Reinforcement Learning. By doing so, we i) determine the tax evasion behavior expected of the taxpayer entity, ii) calculate the degree of risk aversion of the “average” entity given empirical estimates of tax evasion, and iii) evaluate sample tax policies, in terms of expected revenues. Our model can be useful as a testbed for “in-vitro” testing of tax policies, while our results lead to various policy recommendations. |
Tasks | Q-Learning |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09466v1 |
http://arxiv.org/pdf/1801.09466v1.pdf | |
PWC | https://paperswithcode.com/paper/using-deep-q-learning-to-understand-the-tax |
Repo | https://github.com/iassael/tax-evasion-dqn |
Framework | pytorch |
Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Title | Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence |
Authors | Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, Philip H. S. Torr |
Abstract | Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence. |
Tasks | |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.10112v3 |
http://arxiv.org/pdf/1801.10112v3.pdf | |
PWC | https://paperswithcode.com/paper/riemannian-walk-for-incremental-learning |
Repo | https://github.com/facebookresearch/agem |
Framework | tf |
Masked Conditional Neural Networks for Environmental Sound Classification
Title | Masked Conditional Neural Networks for Environmental Sound Classification |
Authors | Fady Medhat, David Chesmore, John Robinson |
Abstract | The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network’s links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k. We extended the Urbansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance. |
Tasks | Environmental Sound Classification |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10004v2 |
http://arxiv.org/pdf/1805.10004v2.pdf | |
PWC | https://paperswithcode.com/paper/masked-conditional-neural-networks-for-1 |
Repo | https://github.com/fadymedhat/MCLNN |
Framework | tf |
ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
Title | ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model |
Authors | Haichuan Yang, Yuhao Zhu, Ji Liu |
Abstract | Many DNN-enabled vision applications constantly operate under severe energy constraints such as unmanned aerial vehicles, Augmented Reality headsets, and smartphones. Designing DNNs that can meet a stringent energy budget is becoming increasingly important. This paper proposes ECC, a framework that compresses DNNs to meet a given energy constraint while minimizing accuracy loss. The key idea of ECC is to model the DNN energy consumption via a novel bilinear regression function. The energy estimate model allows us to formulate DNN compression as a constrained optimization that minimizes the DNN loss function over the energy constraint. The optimization problem, however, has nontrivial constraints. Therefore, existing deep learning solvers do not apply directly. We propose an optimization algorithm that combines the essence of the Alternating Direction Method of Multipliers (ADMM) framework with gradient-based learning algorithms. The algorithm decomposes the original constrained optimization into several subproblems that are solved iteratively and efficiently. ECC is also portable across different hardware platforms without requiring hardware knowledge. Experiments show that ECC achieves higher accuracy under the same or lower energy budget compared to state-of-the-art resource-constrained DNN compression techniques. |
Tasks | Neural Network Compression |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.01803v3 |
http://arxiv.org/pdf/1812.01803v3.pdf | |
PWC | https://paperswithcode.com/paper/ecc-energy-constrained-deep-neural-network |
Repo | https://github.com/hyang1990/energy_constrained_compression |
Framework | pytorch |
Mining gold from implicit models to improve likelihood-free inference
Title | Mining gold from implicit models to improve likelihood-free inference |
Authors | Johann Brehmer, Gilles Louppe, Juan Pavez, Kyle Cranmer |
Abstract | Simulators often provide the best description of real-world phenomena. However, they also lead to challenging inverse problems because the density they implicitly define is often intractable. We present a new suite of simulation-based inference techniques that go beyond the traditional Approximate Bayesian Computation approach, which struggles in a high-dimensional setting, and extend methods that use surrogate models based on neural networks. We show that additional information, such as the joint likelihood ratio and the joint score, can often be extracted from simulators and used to augment the training data for these surrogate models. Finally, we demonstrate that these new techniques are more sample efficient and provide higher-fidelity inference than traditional methods. |
Tasks | |
Published | 2018-05-30 |
URL | https://arxiv.org/abs/1805.12244v4 |
https://arxiv.org/pdf/1805.12244v4.pdf | |
PWC | https://paperswithcode.com/paper/mining-gold-from-implicit-models-to-improve |
Repo | https://github.com/johannbrehmer/simulator-mining-example |
Framework | none |
Partial Adversarial Domain Adaptation
Title | Partial Adversarial Domain Adaptation |
Authors | Zhangjie Cao, Lijia Ma, Mingsheng Long, Jianmin Wang |
Abstract | Domain adversarial learning aligns the feature distributions across the source and target domains in a two-player minimax game. Existing domain adversarial networks generally assume identical label space across different domains. In the presence of big data, there is strong motivation of transferring deep models from existing big domains to unknown small domains. This paper introduces partial domain adaptation as a new domain adaptation scenario, which relaxes the fully shared label space assumption to that the source label space subsumes the target label space. Previous methods typically match the whole source domain to the target domain, which are vulnerable to negative transfer for the partial domain adaptation problem due to the large mismatch between label spaces. We present Partial Adversarial Domain Adaptation (PADA), which simultaneously alleviates negative transfer by down-weighing the data of outlier source classes for training both source classifier and domain adversary, and promotes positive transfer by matching the feature distributions in the shared label space. Experiments show that PADA exceeds state-of-the-art results for partial domain adaptation tasks on several datasets. |
Tasks | Domain Adaptation, Partial Domain Adaptation |
Published | 2018-08-10 |
URL | http://arxiv.org/abs/1808.04205v1 |
http://arxiv.org/pdf/1808.04205v1.pdf | |
PWC | https://paperswithcode.com/paper/partial-adversarial-domain-adaptation |
Repo | https://github.com/thuml/PADA |
Framework | pytorch |
Let’s take a Walk on Superpixels Graphs: Deformable Linear Objects Segmentation and Model Estimation
Title | Let’s take a Walk on Superpixels Graphs: Deformable Linear Objects Segmentation and Model Estimation |
Authors | Daniele De Gregorio, Gianluca Palli, Luigi Di Stefano |
Abstract | While robotic manipulation of rigid objects is quite straightforward, coping with deformable objects is an open issue. More specifically, tasks like tying a knot, wiring a connector or even surgical suturing deal with the domain of Deformable Linear Objects (DLOs). In particular the detection of a DLO is a non-trivial problem especially under clutter and occlusions (as well as self-occlusions). The pose estimation of a DLO results into the identification of its parameters related to a designed model, e.g. a basis spline. It follows that the stand-alone segmentation of a DLO might not be sufficient to conduct a full manipulation task. This is why we propose a novel framework able to perform both a semantic segmentation and b-spline modeling of multiple deformable linear objects simultaneously without strict requirements about environment (i.e. the background). The core algorithm is based on biased random walks over the Region Adiacency Graph built on a superpixel oversegmentation of the source image. The algorithm is initialized by a Convolutional Neural Networks that detects the DLO’s endcaps. An open source implementation of the proposed approach is also provided to easy the reproduction of the whole detection pipeline along with a novel cables dataset in order to encourage further experiments. |
Tasks | Pose Estimation, Semantic Segmentation |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.04461v1 |
http://arxiv.org/pdf/1810.04461v1.pdf | |
PWC | https://paperswithcode.com/paper/lets-take-a-walk-on-superpixels-graphs |
Repo | https://github.com/m4nh/cables_dataset |
Framework | none |
Machine Learning for Link Quality Estimation: A Survey
Title | Machine Learning for Link Quality Estimation: A Survey |
Authors | Gregor Cerar, Halil Yetgin, Mihael Mohorčič, Carolina Fortuna |
Abstract | Since the emergence of wireless communication networks, a plethora of research papers focus their attention on the quality aspects of wireless links. The analysis of the rich body of existing literature on link quality estimation using models developed from data traces indicates that the techniques used for modeling link quality estimation are becoming increasingly sophisticated. A number of recent estimators leverage machine learning (ML) techniques that require a sophisticated design and development process, each of which has a great potential to significantly affect the overall model performance. In this paper, we provide a comprehensive survey on link quality estimators developed from empirical data and review a rich variety of the existing open source datasets. We then perform a systematic analysis to reveal the influence of the design decisions taken in each step of ML process on the final performance of the ML-based link quality estimators. One substantial lesson learned is that the measurement data preprocessing and feature engineering have a higher influence on the performance of the model than that of the choice of ML algorithms. |
Tasks | Feature Engineering, Link Quality Estimation |
Published | 2018-12-07 |
URL | https://arxiv.org/abs/1812.08856v5 |
https://arxiv.org/pdf/1812.08856v5.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-machine-learning-for-link-quality |
Repo | https://github.com/sensorlab/link-quality-estimation |
Framework | none |
Zero-Shot Dual Machine Translation
Title | Zero-Shot Dual Machine Translation |
Authors | Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, Thomas Hofmann |
Abstract | Neural Machine Translation (NMT) systems rely on large amounts of parallel data. This is a major challenge for low-resource languages. Building on recent work on unsupervised and semi-supervised methods, we present an approach that combines zero-shot and dual learning. The latter relies on reinforcement learning, to exploit the duality of the machine translation task, and requires only monolingual data for the target language pair. Experiments show that a zero-shot dual system, trained on English-French and English-Spanish, outperforms by large margins a standard NMT system in zero-shot translation performance on Spanish-French (both directions). The zero-shot dual method approaches the performance, within 2.2 BLEU points, of a comparable supervised setting. Our method can obtain improvements also on the setting where a small amount of parallel data for the zero-shot language pair is available. Adding Russian, to extend our experiments to jointly modeling 6 zero-shot translation directions, all directions improve between 4 and 15 BLEU points, again, reaching performance near that of the supervised setting. |
Tasks | Machine Translation |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10338v1 |
http://arxiv.org/pdf/1805.10338v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-dual-machine-translation |
Repo | https://github.com/liernisestorain/zero-shot-dual-MT |
Framework | tf |
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Title | HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering |
Authors | Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning |
Abstract | Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions. |
Tasks | Question Answering |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09600v1 |
http://arxiv.org/pdf/1809.09600v1.pdf | |
PWC | https://paperswithcode.com/paper/hotpotqa-a-dataset-for-diverse-explainable |
Repo | https://github.com/facebookresearch/UnsupervisedDecomposition |
Framework | pytorch |
Unsupervised Depth Estimation, 3D Face Rotation and Replacement
Title | Unsupervised Depth Estimation, 3D Face Rotation and Replacement |
Authors | Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, Christopher Pal |
Abstract | We present an unsupervised approach for learning to estimate three dimensional (3D) facial structure from a single image while also predicting 3D viewpoint transformations that match a desired pose and facial geometry. We achieve this by inferring the depth of facial keypoints of an input image in an unsupervised manner, without using any form of ground-truth depth information. We show how it is possible to use these depths as intermediate computations within a new backpropable loss to predict the parameters of a 3D affine transformation matrix that maps inferred 3D keypoints of an input face to the corresponding 2D keypoints on a desired target facial geometry or pose. Our resulting approach, called DepthNets, can therefore be used to infer plausible 3D transformations from one face pose to another, allowing faces to be frontalized, transformed into 3D models or even warped to another pose and facial geometry. Lastly, we identify certain shortcomings with our formulation, and explore adversarial image translation techniques as a post-processing step to re-synthesize complete head shots for faces re-targeted to different poses or identities. |
Tasks | Depth Estimation |
Published | 2018-03-25 |
URL | http://arxiv.org/abs/1803.09202v5 |
http://arxiv.org/pdf/1803.09202v5.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-depth-estimation-3d-face |
Repo | https://github.com/joelmoniz/DepthNets |
Framework | pytorch |
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
Title | Straight to the Tree: Constituency Parsing with Neural Syntactic Distance |
Authors | Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio |
Abstract | In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shift-reduce parsing schemes, our approach is free from the potential problem of compounding errors, while being faster and easier to parallelize. Our model achieves competitive performance amongst single model, discriminative parsers in the PTB dataset and outperforms previous models in the CTB dataset. |
Tasks | Constituency Parsing |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.04168v1 |
http://arxiv.org/pdf/1806.04168v1.pdf | |
PWC | https://paperswithcode.com/paper/straight-to-the-tree-constituency-parsing |
Repo | https://github.com/sordonia/distance_parser |
Framework | pytorch |
Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages
Title | Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages |
Authors | Nurendra Choudhary, Rajat Singh, Ishita Bindlish, Manish Shrivastava |
Abstract | Code-mixed data is an important challenge of natural language processing because its characteristics completely vary from the traditional structures of standard languages. In this paper, we propose a novel approach called Sentiment Analysis of Code-Mixed Text (SACMT) to classify sentences into their corresponding sentiment - positive, negative or neutral, using contrastive learning. We utilize the shared parameters of siamese networks to map the sentences of code-mixed and standard languages to a common sentiment space. Also, we introduce a basic clustering based preprocessing method to capture variations of code-mixed transliterated words. Our experiments reveal that SACMT outperforms the state-of-the-art approaches in sentiment analysis for code-mixed text by 7.6% in accuracy and 10.1% in F-score. |
Tasks | Sentiment Analysis |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.00806v1 |
http://arxiv.org/pdf/1804.00806v1.pdf | |
PWC | https://paperswithcode.com/paper/sentiment-analysis-of-code-mixed-languages |
Repo | https://github.com/mankadronit/60DaysofUdacity-Challenge |
Framework | pytorch |