Paper Group AWR 143
Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning. CatBoost: unbiased boosting with categorical features. Rethinking Feature Discrimination and Polymerization for Large-scale Recognition. Generating Sentences by Editing Prototypes. EnzyNet: enzyme classification using 3D convolutional neural networ …
Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning
Title | Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning |
Authors | Morgan A. Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Maurice Ngolè Mboula, David Coeurjolly, Marco Cuturi, Gabriel Peyré, Jean-Luc Starck |
Abstract | This paper introduces a new nonlinear dictionary learning method for histograms in the probability simplex. The method leverages optimal transport theory, in the sense that our aim is to reconstruct histograms using so-called displacement interpolations (a.k.a. Wasserstein barycenters) between dictionary atoms; such atoms are themselves synthetic histograms in the probability simplex. Our method simultaneously estimates such atoms, and, for each datapoint, the vector of weights that can optimally reconstruct it as an optimal transport barycenter of such atoms. Our method is computationally tractable thanks to the addition of an entropic regularization to the usual optimal transportation problem, leading to an approximation scheme that is efficient, parallel and simple to differentiate. Both atoms and weights are learned using a gradient-based descent method. Gradients are obtained by automatic differentiation of the generalized Sinkhorn iterations that yield barycenters with entropic smoothing. Because of its formulation relying on Wasserstein barycenters instead of the usual matrix product between dictionary and codes, our method allows for nonlinear relationships between atoms and the reconstruction of input data. We illustrate its application in several different image processing settings. |
Tasks | Dictionary Learning |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.01955v3 |
http://arxiv.org/pdf/1708.01955v3.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-dictionary-learning-optimal |
Repo | https://github.com/matthieuheitz/WassersteinDictionaryLearning |
Framework | none |
CatBoost: unbiased boosting with categorical features
Title | CatBoost: unbiased boosting with categorical features |
Authors | Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin |
Abstract | This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results. |
Tasks | Dimensionality Reduction |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09516v5 |
http://arxiv.org/pdf/1706.09516v5.pdf | |
PWC | https://paperswithcode.com/paper/catboost-unbiased-boosting-with-categorical |
Repo | https://github.com/yumoh/catboost_iter |
Framework | none |
Rethinking Feature Discrimination and Polymerization for Large-scale Recognition
Title | Rethinking Feature Discrimination and Polymerization for Large-scale Recognition |
Authors | Yu Liu, Hongyang Li, Xiaogang Wang |
Abstract | Feature matters. How to train a deep network to acquire discriminative features across categories and polymerized features within classes has always been at the core of many computer vision tasks, specially for large-scale recognition systems where test identities are unseen during training and the number of classes could be at million scale. In this paper, we address this problem based on the simple intuition that the cosine distance of features in high-dimensional space should be close enough within one class and far away across categories. To this end, we proposed the congenerous cosine (COCO) algorithm to simultaneously optimize the cosine similarity among data. It inherits the softmax property to make inter-class features discriminative as well as shares the idea of class centroid in metric learning. Unlike previous work where the center is a temporal, statistical variable within one mini-batch during training, the formulated centroid is responsible for clustering inner-class features to enforce them polymerized around the network truncus. COCO is bundled with discriminative training and learned end-to-end with stable convergence. Experiments on five benchmarks have been extensively conducted to verify the effectiveness of our approach on both small-scale classification task and large-scale human recognition problem. |
Tasks | Metric Learning |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00870v2 |
http://arxiv.org/pdf/1710.00870v2.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-feature-discrimination-and |
Repo | https://github.com/sciencefans/coco_loss |
Framework | none |
Generating Sentences by Editing Prototypes
Title | Generating Sentences by Editing Prototypes |
Authors | Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang |
Abstract | We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies. |
Tasks | Language Modelling |
Published | 2017-09-26 |
URL | http://arxiv.org/abs/1709.08878v2 |
http://arxiv.org/pdf/1709.08878v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-sentences-by-editing-prototypes |
Repo | https://github.com/kelvinguu/neural-editor |
Framework | pytorch |
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
Title | EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation |
Authors | Afshine Amidi, Shervine Amidi, Dimitrios Vlachakis, Vasileios Megalooikonomou, Nikos Paragios, Evangelia I. Zacharaki |
Abstract | During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank has increased more than 15 fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence however is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The 2-layer architecture was investigated on a large dataset of 63,558 enzymes from the Protein Data Bank and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet. |
Tasks | |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06017v1 |
http://arxiv.org/pdf/1707.06017v1.pdf | |
PWC | https://paperswithcode.com/paper/enzynet-enzyme-classification-using-3d |
Repo | https://github.com/edraizen/molmimic |
Framework | pytorch |
Hybrid Isolation Forest - Application to Intrusion Detection
Title | Hybrid Isolation Forest - Application to Intrusion Detection |
Authors | Pierre-François Marteau, Saeid Soheily-Khah, Nicolas Béchet |
Abstract | From the identification of a drawback in the Isolation Forest (IF) algorithm that limits its use in the scope of anomaly detection, we propose two extensions that allow to firstly overcome the previously mention limitation and secondly to provide it with some supervised learning capability. The resulting Hybrid Isolation Forest (HIF) that we propose is first evaluated on a synthetic dataset to analyze the effect of the new meta-parameters that are introduced and verify that the addressed limitation of the IF algorithm is effectively overcame. We hen compare the two algorithms on the ISCX benchmark dataset, in the context of a network intrusion detection application. Our experiments show that HIF outperforms IF, but also challenges the 1-class and 2-classes SVM baselines with computational efficiency. |
Tasks | Anomaly Detection, Intrusion Detection, Network Intrusion Detection |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.03800v1 |
http://arxiv.org/pdf/1705.03800v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-isolation-forest-application-to |
Repo | https://github.com/pfmarteau/HIF |
Framework | none |
Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling
Title | Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling |
Authors | Chaitanya Ahuja, Louis-Philippe Morency |
Abstract | Recurrent neural networks have shown remarkable success in modeling sequences. However low resource situations still adversely affect the generalizability of these models. We introduce a new family of models, called Lattice Recurrent Units (LRU), to address the challenge of learning deep multi-layer recurrent models with limited resources. LRU models achieve this goal by creating distinct (but coupled) flow of information inside the units: a first flow along time dimension and a second flow along depth dimension. It also offers a symmetry in how information can flow horizontally and vertically. We analyze the effects of decoupling three different components of our LRU model: Reset Gate, Update Gate and Projected State. We evaluate this family on new LRU models on computational convergence rates and statistical efficiency. Our experiments are performed on four publicly-available datasets, comparing with Grid-LSTM and Recurrent Highway networks. Our results show that LRU has better empirical computational convergence rates and statistical efficiency values, along with learning more accurate language models. |
Tasks | |
Published | 2017-10-06 |
URL | http://arxiv.org/abs/1710.02254v2 |
http://arxiv.org/pdf/1710.02254v2.pdf | |
PWC | https://paperswithcode.com/paper/lattice-recurrent-unit-improving-convergence |
Repo | https://github.com/simonnanty/8f6667310a94f0c18bda10a1d5ff578c |
Framework | none |
On a Formal Model of Safe and Scalable Self-driving Cars
Title | On a Formal Model of Safe and Scalable Self-driving Cars |
Authors | Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua |
Abstract | In recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of this paper is to add to the equation two additional crucial parameters. The first is standardization of safety assurance — what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability — engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, and drive the entire field into a “winter of autonomous driving”. In the first part of the paper we propose a white-box, interpretable, mathematical model for safety assurance, which we call Responsibility-Sensitive Safety (RSS). In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars. |
Tasks | Autonomous Driving, Self-Driving Cars |
Published | 2017-08-21 |
URL | http://arxiv.org/abs/1708.06374v6 |
http://arxiv.org/pdf/1708.06374v6.pdf | |
PWC | https://paperswithcode.com/paper/on-a-formal-model-of-safe-and-scalable-self |
Repo | https://github.com/PhilippeW83440/CarND-Path-Planning-Project |
Framework | none |
Fine-tuning deep CNN models on specific MS COCO categories
Title | Fine-tuning deep CNN models on specific MS COCO categories |
Authors | Daniel Sonntag, Michael Barz, Jan Zacharias, Sven Stauden, Vahid Rahmani, Áron Fóthi, András Lőrincz |
Abstract | Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned. |
Tasks | |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01476v1 |
http://arxiv.org/pdf/1709.01476v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-tuning-deep-cnn-models-on-specific-ms |
Repo | https://github.com/DFKI-Interactive-Machine-Learning/py-faster-rcnn-ft |
Framework | none |
Adversarial Learning for Neural Dialogue Generation
Title | Adversarial Learning for Neural Dialogue Generation |
Authors | Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky |
Abstract | In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances. We cast the task as a reinforcement learning (RL) problem where we jointly train two systems, a generative model to produce response sequences, and a discriminator—analagous to the human evaluator in the Turing test— to distinguish between the human-generated dialogues and the machine-generated ones. The outputs from the discriminator are then used as rewards for the generative model, pushing the system to generate dialogues that mostly resemble human dialogues. In addition to adversarial training we describe a model for adversarial {\em evaluation} that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls. Experimental results on several metrics, including adversarial evaluation, demonstrate that the adversarially-trained system generates higher-quality responses than previous baselines. |
Tasks | Dialogue Generation |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06547v5 |
http://arxiv.org/pdf/1701.06547v5.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-learning-for-neural-dialogue |
Repo | https://github.com/AIJoris/DPAC-DialogueGAN |
Framework | pytorch |
Fast and Accurate Neural Word Segmentation for Chinese
Title | Fast and Accurate Neural Word Segmentation for Chinese |
Authors | Deng Cai, Hai Zhao, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang |
Abstract | Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets. |
Tasks | Chinese Word Segmentation, Feature Engineering |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07047v1 |
http://arxiv.org/pdf/1704.07047v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-accurate-neural-word-segmentation |
Repo | https://github.com/jcyk/greedyCWS |
Framework | none |
Moonshine: Distilling with Cheap Convolutions
Title | Moonshine: Distilling with Cheap Convolutions |
Authors | Elliot J. Crowley, Gavin Gray, Amos Storkey |
Abstract | Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data. |
Tasks | |
Published | 2017-11-07 |
URL | http://arxiv.org/abs/1711.02613v4 |
http://arxiv.org/pdf/1711.02613v4.pdf | |
PWC | https://paperswithcode.com/paper/moonshine-distilling-with-cheap-convolutions |
Repo | https://github.com/BayesWatch/pytorch-moonshine |
Framework | pytorch |
Adaptive Nonparametric Clustering
Title | Adaptive Nonparametric Clustering |
Authors | Kirill Efimov, Larisa Adamyan, Vladimir Spokoiny |
Abstract | This paper presents a new approach to non-parametric cluster analysis called Adaptive Weights Clustering (AWC). The idea is to identify the clustering structure by checking at different points and for different scales on departure from local homogeneity. The proposed procedure describes the clustering structure in terms of weights ( w_{ij} ) each of them measures the degree of local inhomogeneity for two neighbor local clusters using statistical tests of “no gap” between them. % The procedure starts from very local scale, then the parameter of locality grows by some factor at each step. The method is fully adaptive and does not require to specify the number of clusters or their structure. The clustering results are not sensitive to noise and outliers, the procedure is able to recover different clusters with sharp edges or manifold structure. The method is scalable and computationally feasible. An intensive numerical study shows a state-of-the-art performance of the method in various artificial examples and applications to text data. Our theoretical study states optimal sensitivity of AWC to local inhomogeneity. |
Tasks | |
Published | 2017-09-26 |
URL | http://arxiv.org/abs/1709.09102v1 |
http://arxiv.org/pdf/1709.09102v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-nonparametric-clustering |
Repo | https://github.com/larisahax/awc |
Framework | none |
Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising
Title | Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising |
Authors | Peng Liu, Ruogu Fang |
Abstract | In this work, we explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn pixel-distribution from noisy data. By increasing CNN’s width with large reception fields and more channels in each layer, CNNs can reveal the ability to learn pixel-distribution, which is a prior existing in many different types of noise. The key to our approach is a discovery that wider CNNs tends to learn the pixel-distribution features, which provides the probability of that inference-mapping primarily relies on the priors instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work: Wide inference Networks (WIN) on additive white Gaussian noise (AWGN) and demonstrate that by learning the pixel-distribution in images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}. |
Tasks | Denoising, Image Denoising |
Published | 2017-07-28 |
URL | http://arxiv.org/abs/1707.09135v1 |
http://arxiv.org/pdf/1707.09135v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-pixel-distribution-prior-with-wider |
Repo | https://github.com/cswin/WIN |
Framework | none |
Challenges in Data-to-Document Generation
Title | Challenges in Data-to-Document Generation |
Authors | Sam Wiseman, Stuart M. Shieber, Alexander M. Rush |
Abstract | Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.08052v1 |
http://arxiv.org/pdf/1707.08052v1.pdf | |
PWC | https://paperswithcode.com/paper/challenges-in-data-to-document-generation |
Repo | https://github.com/harvardnlp/boxscore-data |
Framework | none |