July 29, 2019

2740 words 13 mins read

Paper Group AWR 143

Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning. CatBoost: unbiased boosting with categorical features. Rethinking Feature Discrimination and Polymerization for Large-scale Recognition. Generating Sentences by Editing Prototypes. EnzyNet: enzyme classification using 3D convolutional neural networ …

Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning


Title	Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning
Authors	Morgan A. Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Maurice Ngolè Mboula, David Coeurjolly, Marco Cuturi, Gabriel Peyré, Jean-Luc Starck
Abstract	This paper introduces a new nonlinear dictionary learning method for histograms in the probability simplex. The method leverages optimal transport theory, in the sense that our aim is to reconstruct histograms using so-called displacement interpolations (a.k.a. Wasserstein barycenters) between dictionary atoms; such atoms are themselves synthetic histograms in the probability simplex. Our method simultaneously estimates such atoms, and, for each datapoint, the vector of weights that can optimally reconstruct it as an optimal transport barycenter of such atoms. Our method is computationally tractable thanks to the addition of an entropic regularization to the usual optimal transportation problem, leading to an approximation scheme that is efficient, parallel and simple to differentiate. Both atoms and weights are learned using a gradient-based descent method. Gradients are obtained by automatic differentiation of the generalized Sinkhorn iterations that yield barycenters with entropic smoothing. Because of its formulation relying on Wasserstein barycenters instead of the usual matrix product between dictionary and codes, our method allows for nonlinear relationships between atoms and the reconstruction of input data. We illustrate its application in several different image processing settings.
Tasks	Dictionary Learning
Published	2017-08-07
URL	http://arxiv.org/abs/1708.01955v3
PDF	http://arxiv.org/pdf/1708.01955v3.pdf
PWC	https://paperswithcode.com/paper/wasserstein-dictionary-learning-optimal
Repo	https://github.com/matthieuheitz/WassersteinDictionaryLearning
Framework	none

CatBoost: unbiased boosting with categorical features


Title	CatBoost: unbiased boosting with categorical features
Authors	Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin
Abstract	This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.
Tasks	Dimensionality Reduction
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09516v5
PDF	http://arxiv.org/pdf/1706.09516v5.pdf
PWC	https://paperswithcode.com/paper/catboost-unbiased-boosting-with-categorical
Repo	https://github.com/yumoh/catboost_iter
Framework	none

Rethinking Feature Discrimination and Polymerization for Large-scale Recognition


Title	Rethinking Feature Discrimination and Polymerization for Large-scale Recognition
Authors	Yu Liu, Hongyang Li, Xiaogang Wang
Abstract	Feature matters. How to train a deep network to acquire discriminative features across categories and polymerized features within classes has always been at the core of many computer vision tasks, specially for large-scale recognition systems where test identities are unseen during training and the number of classes could be at million scale. In this paper, we address this problem based on the simple intuition that the cosine distance of features in high-dimensional space should be close enough within one class and far away across categories. To this end, we proposed the congenerous cosine (COCO) algorithm to simultaneously optimize the cosine similarity among data. It inherits the softmax property to make inter-class features discriminative as well as shares the idea of class centroid in metric learning. Unlike previous work where the center is a temporal, statistical variable within one mini-batch during training, the formulated centroid is responsible for clustering inner-class features to enforce them polymerized around the network truncus. COCO is bundled with discriminative training and learned end-to-end with stable convergence. Experiments on five benchmarks have been extensively conducted to verify the effectiveness of our approach on both small-scale classification task and large-scale human recognition problem.
Tasks	Metric Learning
Published	2017-10-02
URL	http://arxiv.org/abs/1710.00870v2
PDF	http://arxiv.org/pdf/1710.00870v2.pdf
PWC	https://paperswithcode.com/paper/rethinking-feature-discrimination-and
Repo	https://github.com/sciencefans/coco_loss
Framework	none

Generating Sentences by Editing Prototypes


Title	Generating Sentences by Editing Prototypes
Authors	Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang
Abstract	We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.
Tasks	Language Modelling
Published	2017-09-26
URL	http://arxiv.org/abs/1709.08878v2
PDF	http://arxiv.org/pdf/1709.08878v2.pdf
PWC	https://paperswithcode.com/paper/generating-sentences-by-editing-prototypes
Repo	https://github.com/kelvinguu/neural-editor
Framework	pytorch

EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation


Title	EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
Authors	Afshine Amidi, Shervine Amidi, Dimitrios Vlachakis, Vasileios Megalooikonomou, Nikos Paragios, Evangelia I. Zacharaki
Abstract	During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank has increased more than 15 fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence however is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The 2-layer architecture was investigated on a large dataset of 63,558 enzymes from the Protein Data Bank and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.
Tasks
Published	2017-07-19
URL	http://arxiv.org/abs/1707.06017v1
PDF	http://arxiv.org/pdf/1707.06017v1.pdf
PWC	https://paperswithcode.com/paper/enzynet-enzyme-classification-using-3d
Repo	https://github.com/edraizen/molmimic
Framework	pytorch

Hybrid Isolation Forest - Application to Intrusion Detection


Title	Hybrid Isolation Forest - Application to Intrusion Detection
Authors	Pierre-François Marteau, Saeid Soheily-Khah, Nicolas Béchet
Abstract	From the identification of a drawback in the Isolation Forest (IF) algorithm that limits its use in the scope of anomaly detection, we propose two extensions that allow to firstly overcome the previously mention limitation and secondly to provide it with some supervised learning capability. The resulting Hybrid Isolation Forest (HIF) that we propose is first evaluated on a synthetic dataset to analyze the effect of the new meta-parameters that are introduced and verify that the addressed limitation of the IF algorithm is effectively overcame. We hen compare the two algorithms on the ISCX benchmark dataset, in the context of a network intrusion detection application. Our experiments show that HIF outperforms IF, but also challenges the 1-class and 2-classes SVM baselines with computational efficiency.
Tasks	Anomaly Detection, Intrusion Detection, Network Intrusion Detection
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03800v1
PDF	http://arxiv.org/pdf/1705.03800v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-isolation-forest-application-to
Repo	https://github.com/pfmarteau/HIF
Framework	none

Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling


Title	Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling
Authors	Chaitanya Ahuja, Louis-Philippe Morency
Abstract	Recurrent neural networks have shown remarkable success in modeling sequences. However low resource situations still adversely affect the generalizability of these models. We introduce a new family of models, called Lattice Recurrent Units (LRU), to address the challenge of learning deep multi-layer recurrent models with limited resources. LRU models achieve this goal by creating distinct (but coupled) flow of information inside the units: a first flow along time dimension and a second flow along depth dimension. It also offers a symmetry in how information can flow horizontally and vertically. We analyze the effects of decoupling three different components of our LRU model: Reset Gate, Update Gate and Projected State. We evaluate this family on new LRU models on computational convergence rates and statistical efficiency. Our experiments are performed on four publicly-available datasets, comparing with Grid-LSTM and Recurrent Highway networks. Our results show that LRU has better empirical computational convergence rates and statistical efficiency values, along with learning more accurate language models.
Tasks
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02254v2
PDF	http://arxiv.org/pdf/1710.02254v2.pdf
PWC	https://paperswithcode.com/paper/lattice-recurrent-unit-improving-convergence
Repo	https://github.com/simonnanty/8f6667310a94f0c18bda10a1d5ff578c
Framework	none

On a Formal Model of Safe and Scalable Self-driving Cars


Title	On a Formal Model of Safe and Scalable Self-driving Cars
Authors	Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua
Abstract	In recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of this paper is to add to the equation two additional crucial parameters. The first is standardization of safety assurance — what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability — engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, and drive the entire field into a “winter of autonomous driving”. In the first part of the paper we propose a white-box, interpretable, mathematical model for safety assurance, which we call Responsibility-Sensitive Safety (RSS). In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.
Tasks	Autonomous Driving, Self-Driving Cars
Published	2017-08-21
URL	http://arxiv.org/abs/1708.06374v6
PDF	http://arxiv.org/pdf/1708.06374v6.pdf
PWC	https://paperswithcode.com/paper/on-a-formal-model-of-safe-and-scalable-self
Repo	https://github.com/PhilippeW83440/CarND-Path-Planning-Project
Framework	none

Fine-tuning deep CNN models on specific MS COCO categories


Title	Fine-tuning deep CNN models on specific MS COCO categories
Authors	Daniel Sonntag, Michael Barz, Jan Zacharias, Sven Stauden, Vahid Rahmani, Áron Fóthi, András Lőrincz
Abstract	Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned.
Tasks
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01476v1
PDF	http://arxiv.org/pdf/1709.01476v1.pdf
PWC	https://paperswithcode.com/paper/fine-tuning-deep-cnn-models-on-specific-ms
Repo	https://github.com/DFKI-Interactive-Machine-Learning/py-faster-rcnn-ft
Framework	none

Adversarial Learning for Neural Dialogue Generation


Title	Adversarial Learning for Neural Dialogue Generation
Authors	Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky
Abstract	In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances. We cast the task as a reinforcement learning (RL) problem where we jointly train two systems, a generative model to produce response sequences, and a discriminator—analagous to the human evaluator in the Turing test— to distinguish between the human-generated dialogues and the machine-generated ones. The outputs from the discriminator are then used as rewards for the generative model, pushing the system to generate dialogues that mostly resemble human dialogues. In addition to adversarial training we describe a model for adversarial {\em evaluation} that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls. Experimental results on several metrics, including adversarial evaluation, demonstrate that the adversarially-trained system generates higher-quality responses than previous baselines.
Tasks	Dialogue Generation
Published	2017-01-23
URL	http://arxiv.org/abs/1701.06547v5
PDF	http://arxiv.org/pdf/1701.06547v5.pdf
PWC	https://paperswithcode.com/paper/adversarial-learning-for-neural-dialogue
Repo	https://github.com/AIJoris/DPAC-DialogueGAN
Framework	pytorch

Fast and Accurate Neural Word Segmentation for Chinese


Title	Fast and Accurate Neural Word Segmentation for Chinese
Authors	Deng Cai, Hai Zhao, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang
Abstract	Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.
Tasks	Chinese Word Segmentation, Feature Engineering
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07047v1
PDF	http://arxiv.org/pdf/1704.07047v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-accurate-neural-word-segmentation
Repo	https://github.com/jcyk/greedyCWS
Framework	none

Moonshine: Distilling with Cheap Convolutions


Title	Moonshine: Distilling with Cheap Convolutions
Authors	Elliot J. Crowley, Gavin Gray, Amos Storkey
Abstract	Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.
Tasks
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02613v4
PDF	http://arxiv.org/pdf/1711.02613v4.pdf
PWC	https://paperswithcode.com/paper/moonshine-distilling-with-cheap-convolutions
Repo	https://github.com/BayesWatch/pytorch-moonshine
Framework	pytorch

Adaptive Nonparametric Clustering


Title	Adaptive Nonparametric Clustering
Authors	Kirill Efimov, Larisa Adamyan, Vladimir Spokoiny
Abstract	This paper presents a new approach to non-parametric cluster analysis called Adaptive Weights Clustering (AWC). The idea is to identify the clustering structure by checking at different points and for different scales on departure from local homogeneity. The proposed procedure describes the clustering structure in terms of weights ( w_{ij} ) each of them measures the degree of local inhomogeneity for two neighbor local clusters using statistical tests of “no gap” between them. % The procedure starts from very local scale, then the parameter of locality grows by some factor at each step. The method is fully adaptive and does not require to specify the number of clusters or their structure. The clustering results are not sensitive to noise and outliers, the procedure is able to recover different clusters with sharp edges or manifold structure. The method is scalable and computationally feasible. An intensive numerical study shows a state-of-the-art performance of the method in various artificial examples and applications to text data. Our theoretical study states optimal sensitivity of AWC to local inhomogeneity.
Tasks
Published	2017-09-26
URL	http://arxiv.org/abs/1709.09102v1
PDF	http://arxiv.org/pdf/1709.09102v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-nonparametric-clustering
Repo	https://github.com/larisahax/awc
Framework	none

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising


Title	Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising
Authors	Peng Liu, Ruogu Fang
Abstract	In this work, we explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn pixel-distribution from noisy data. By increasing CNN’s width with large reception fields and more channels in each layer, CNNs can reveal the ability to learn pixel-distribution, which is a prior existing in many different types of noise. The key to our approach is a discovery that wider CNNs tends to learn the pixel-distribution features, which provides the probability of that inference-mapping primarily relies on the priors instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work: Wide inference Networks (WIN) on additive white Gaussian noise (AWGN) and demonstrate that by learning the pixel-distribution in images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}.
Tasks	Denoising, Image Denoising
Published	2017-07-28
URL	http://arxiv.org/abs/1707.09135v1
PDF	http://arxiv.org/pdf/1707.09135v1.pdf
PWC	https://paperswithcode.com/paper/learning-pixel-distribution-prior-with-wider
Repo	https://github.com/cswin/WIN
Framework	none

Challenges in Data-to-Document Generation


Title	Challenges in Data-to-Document Generation
Authors	Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Abstract	Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.
Tasks	Data-to-Text Generation, Text Generation
Published	2017-07-25
URL	http://arxiv.org/abs/1707.08052v1
PDF	http://arxiv.org/pdf/1707.08052v1.pdf
PWC	https://paperswithcode.com/paper/challenges-in-data-to-document-generation
Repo	https://github.com/harvardnlp/boxscore-data
Framework	none