July 29, 2019

2740 words 13 mins read

Paper Group AWR 143

Paper Group AWR 143

Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning. CatBoost: unbiased boosting with categorical features. Rethinking Feature Discrimination and Polymerization for Large-scale Recognition. Generating Sentences by Editing Prototypes. EnzyNet: enzyme classification using 3D convolutional neural networ …

Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning

Title Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning
Authors Morgan A. Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Maurice Ngolè Mboula, David Coeurjolly, Marco Cuturi, Gabriel Peyré, Jean-Luc Starck
Abstract This paper introduces a new nonlinear dictionary learning method for histograms in the probability simplex. The method leverages optimal transport theory, in the sense that our aim is to reconstruct histograms using so-called displacement interpolations (a.k.a. Wasserstein barycenters) between dictionary atoms; such atoms are themselves synthetic histograms in the probability simplex. Our method simultaneously estimates such atoms, and, for each datapoint, the vector of weights that can optimally reconstruct it as an optimal transport barycenter of such atoms. Our method is computationally tractable thanks to the addition of an entropic regularization to the usual optimal transportation problem, leading to an approximation scheme that is efficient, parallel and simple to differentiate. Both atoms and weights are learned using a gradient-based descent method. Gradients are obtained by automatic differentiation of the generalized Sinkhorn iterations that yield barycenters with entropic smoothing. Because of its formulation relying on Wasserstein barycenters instead of the usual matrix product between dictionary and codes, our method allows for nonlinear relationships between atoms and the reconstruction of input data. We illustrate its application in several different image processing settings.
Tasks Dictionary Learning
Published 2017-08-07
URL http://arxiv.org/abs/1708.01955v3
PDF http://arxiv.org/pdf/1708.01955v3.pdf
PWC https://paperswithcode.com/paper/wasserstein-dictionary-learning-optimal
Repo https://github.com/matthieuheitz/WassersteinDictionaryLearning
Framework none

CatBoost: unbiased boosting with categorical features

Title CatBoost: unbiased boosting with categorical features
Authors Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin
Abstract This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.
Tasks Dimensionality Reduction
Published 2017-06-28
URL http://arxiv.org/abs/1706.09516v5
PDF http://arxiv.org/pdf/1706.09516v5.pdf
PWC https://paperswithcode.com/paper/catboost-unbiased-boosting-with-categorical
Repo https://github.com/yumoh/catboost_iter
Framework none

Rethinking Feature Discrimination and Polymerization for Large-scale Recognition

Title Rethinking Feature Discrimination and Polymerization for Large-scale Recognition
Authors Yu Liu, Hongyang Li, Xiaogang Wang
Abstract Feature matters. How to train a deep network to acquire discriminative features across categories and polymerized features within classes has always been at the core of many computer vision tasks, specially for large-scale recognition systems where test identities are unseen during training and the number of classes could be at million scale. In this paper, we address this problem based on the simple intuition that the cosine distance of features in high-dimensional space should be close enough within one class and far away across categories. To this end, we proposed the congenerous cosine (COCO) algorithm to simultaneously optimize the cosine similarity among data. It inherits the softmax property to make inter-class features discriminative as well as shares the idea of class centroid in metric learning. Unlike previous work where the center is a temporal, statistical variable within one mini-batch during training, the formulated centroid is responsible for clustering inner-class features to enforce them polymerized around the network truncus. COCO is bundled with discriminative training and learned end-to-end with stable convergence. Experiments on five benchmarks have been extensively conducted to verify the effectiveness of our approach on both small-scale classification task and large-scale human recognition problem.
Tasks Metric Learning
Published 2017-10-02
URL http://arxiv.org/abs/1710.00870v2
PDF http://arxiv.org/pdf/1710.00870v2.pdf
PWC https://paperswithcode.com/paper/rethinking-feature-discrimination-and
Repo https://github.com/sciencefans/coco_loss
Framework none

Generating Sentences by Editing Prototypes

Title Generating Sentences by Editing Prototypes
Authors Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang
Abstract We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.
Tasks Language Modelling
Published 2017-09-26
URL http://arxiv.org/abs/1709.08878v2
PDF http://arxiv.org/pdf/1709.08878v2.pdf
PWC https://paperswithcode.com/paper/generating-sentences-by-editing-prototypes
Repo https://github.com/kelvinguu/neural-editor
Framework pytorch

EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

Title EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
Authors Afshine Amidi, Shervine Amidi, Dimitrios Vlachakis, Vasileios Megalooikonomou, Nikos Paragios, Evangelia I. Zacharaki
Abstract During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank has increased more than 15 fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence however is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The 2-layer architecture was investigated on a large dataset of 63,558 enzymes from the Protein Data Bank and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.
Tasks
Published 2017-07-19
URL http://arxiv.org/abs/1707.06017v1
PDF http://arxiv.org/pdf/1707.06017v1.pdf
PWC https://paperswithcode.com/paper/enzynet-enzyme-classification-using-3d
Repo https://github.com/edraizen/molmimic
Framework pytorch

Hybrid Isolation Forest - Application to Intrusion Detection

Title Hybrid Isolation Forest - Application to Intrusion Detection
Authors Pierre-François Marteau, Saeid Soheily-Khah, Nicolas Béchet
Abstract From the identification of a drawback in the Isolation Forest (IF) algorithm that limits its use in the scope of anomaly detection, we propose two extensions that allow to firstly overcome the previously mention limitation and secondly to provide it with some supervised learning capability. The resulting Hybrid Isolation Forest (HIF) that we propose is first evaluated on a synthetic dataset to analyze the effect of the new meta-parameters that are introduced and verify that the addressed limitation of the IF algorithm is effectively overcame. We hen compare the two algorithms on the ISCX benchmark dataset, in the context of a network intrusion detection application. Our experiments show that HIF outperforms IF, but also challenges the 1-class and 2-classes SVM baselines with computational efficiency.
Tasks Anomaly Detection, Intrusion Detection, Network Intrusion Detection
Published 2017-05-10
URL http://arxiv.org/abs/1705.03800v1
PDF http://arxiv.org/pdf/1705.03800v1.pdf
PWC https://paperswithcode.com/paper/hybrid-isolation-forest-application-to
Repo https://github.com/pfmarteau/HIF
Framework none

Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling

Title Lattice Recurrent Unit: Improving Convergence and Statistical Efficiency for Sequence Modeling
Authors Chaitanya Ahuja, Louis-Philippe Morency
Abstract Recurrent neural networks have shown remarkable success in modeling sequences. However low resource situations still adversely affect the generalizability of these models. We introduce a new family of models, called Lattice Recurrent Units (LRU), to address the challenge of learning deep multi-layer recurrent models with limited resources. LRU models achieve this goal by creating distinct (but coupled) flow of information inside the units: a first flow along time dimension and a second flow along depth dimension. It also offers a symmetry in how information can flow horizontally and vertically. We analyze the effects of decoupling three different components of our LRU model: Reset Gate, Update Gate and Projected State. We evaluate this family on new LRU models on computational convergence rates and statistical efficiency. Our experiments are performed on four publicly-available datasets, comparing with Grid-LSTM and Recurrent Highway networks. Our results show that LRU has better empirical computational convergence rates and statistical efficiency values, along with learning more accurate language models.
Tasks
Published 2017-10-06
URL http://arxiv.org/abs/1710.02254v2
PDF http://arxiv.org/pdf/1710.02254v2.pdf
PWC https://paperswithcode.com/paper/lattice-recurrent-unit-improving-convergence
Repo https://github.com/simonnanty/8f6667310a94f0c18bda10a1d5ff578c
Framework none

On a Formal Model of Safe and Scalable Self-driving Cars

Title On a Formal Model of Safe and Scalable Self-driving Cars
Authors Shai Shalev-Shwartz, Shaked Shammah, Amnon Shashua
Abstract In recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of this paper is to add to the equation two additional crucial parameters. The first is standardization of safety assurance — what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability — engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, and drive the entire field into a “winter of autonomous driving”. In the first part of the paper we propose a white-box, interpretable, mathematical model for safety assurance, which we call Responsibility-Sensitive Safety (RSS). In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.
Tasks Autonomous Driving, Self-Driving Cars
Published 2017-08-21
URL http://arxiv.org/abs/1708.06374v6
PDF http://arxiv.org/pdf/1708.06374v6.pdf
PWC https://paperswithcode.com/paper/on-a-formal-model-of-safe-and-scalable-self
Repo https://github.com/PhilippeW83440/CarND-Path-Planning-Project
Framework none

Fine-tuning deep CNN models on specific MS COCO categories

Title Fine-tuning deep CNN models on specific MS COCO categories
Authors Daniel Sonntag, Michael Barz, Jan Zacharias, Sven Stauden, Vahid Rahmani, Áron Fóthi, András Lőrincz
Abstract Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned.
Tasks
Published 2017-09-05
URL http://arxiv.org/abs/1709.01476v1
PDF http://arxiv.org/pdf/1709.01476v1.pdf
PWC https://paperswithcode.com/paper/fine-tuning-deep-cnn-models-on-specific-ms
Repo https://github.com/DFKI-Interactive-Machine-Learning/py-faster-rcnn-ft
Framework none

Adversarial Learning for Neural Dialogue Generation

Title Adversarial Learning for Neural Dialogue Generation
Authors Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, Dan Jurafsky
Abstract In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances. We cast the task as a reinforcement learning (RL) problem where we jointly train two systems, a generative model to produce response sequences, and a discriminator—analagous to the human evaluator in the Turing test— to distinguish between the human-generated dialogues and the machine-generated ones. The outputs from the discriminator are then used as rewards for the generative model, pushing the system to generate dialogues that mostly resemble human dialogues. In addition to adversarial training we describe a model for adversarial {\em evaluation} that uses success in fooling an adversary as a dialogue evaluation metric, while avoiding a number of potential pitfalls. Experimental results on several metrics, including adversarial evaluation, demonstrate that the adversarially-trained system generates higher-quality responses than previous baselines.
Tasks Dialogue Generation
Published 2017-01-23
URL http://arxiv.org/abs/1701.06547v5
PDF http://arxiv.org/pdf/1701.06547v5.pdf
PWC https://paperswithcode.com/paper/adversarial-learning-for-neural-dialogue
Repo https://github.com/AIJoris/DPAC-DialogueGAN
Framework pytorch

Fast and Accurate Neural Word Segmentation for Chinese

Title Fast and Accurate Neural Word Segmentation for Chinese
Authors Deng Cai, Hai Zhao, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang
Abstract Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.
Tasks Chinese Word Segmentation, Feature Engineering
Published 2017-04-24
URL http://arxiv.org/abs/1704.07047v1
PDF http://arxiv.org/pdf/1704.07047v1.pdf
PWC https://paperswithcode.com/paper/fast-and-accurate-neural-word-segmentation
Repo https://github.com/jcyk/greedyCWS
Framework none

Moonshine: Distilling with Cheap Convolutions

Title Moonshine: Distilling with Cheap Convolutions
Authors Elliot J. Crowley, Gavin Gray, Amos Storkey
Abstract Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.
Tasks
Published 2017-11-07
URL http://arxiv.org/abs/1711.02613v4
PDF http://arxiv.org/pdf/1711.02613v4.pdf
PWC https://paperswithcode.com/paper/moonshine-distilling-with-cheap-convolutions
Repo https://github.com/BayesWatch/pytorch-moonshine
Framework pytorch

Adaptive Nonparametric Clustering

Title Adaptive Nonparametric Clustering
Authors Kirill Efimov, Larisa Adamyan, Vladimir Spokoiny
Abstract This paper presents a new approach to non-parametric cluster analysis called Adaptive Weights Clustering (AWC). The idea is to identify the clustering structure by checking at different points and for different scales on departure from local homogeneity. The proposed procedure describes the clustering structure in terms of weights ( w_{ij} ) each of them measures the degree of local inhomogeneity for two neighbor local clusters using statistical tests of “no gap” between them. % The procedure starts from very local scale, then the parameter of locality grows by some factor at each step. The method is fully adaptive and does not require to specify the number of clusters or their structure. The clustering results are not sensitive to noise and outliers, the procedure is able to recover different clusters with sharp edges or manifold structure. The method is scalable and computationally feasible. An intensive numerical study shows a state-of-the-art performance of the method in various artificial examples and applications to text data. Our theoretical study states optimal sensitivity of AWC to local inhomogeneity.
Tasks
Published 2017-09-26
URL http://arxiv.org/abs/1709.09102v1
PDF http://arxiv.org/pdf/1709.09102v1.pdf
PWC https://paperswithcode.com/paper/adaptive-nonparametric-clustering
Repo https://github.com/larisahax/awc
Framework none

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Title Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising
Authors Peng Liu, Ruogu Fang
Abstract In this work, we explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn pixel-distribution from noisy data. By increasing CNN’s width with large reception fields and more channels in each layer, CNNs can reveal the ability to learn pixel-distribution, which is a prior existing in many different types of noise. The key to our approach is a discovery that wider CNNs tends to learn the pixel-distribution features, which provides the probability of that inference-mapping primarily relies on the priors instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work: Wide inference Networks (WIN) on additive white Gaussian noise (AWGN) and demonstrate that by learning the pixel-distribution in images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}.
Tasks Denoising, Image Denoising
Published 2017-07-28
URL http://arxiv.org/abs/1707.09135v1
PDF http://arxiv.org/pdf/1707.09135v1.pdf
PWC https://paperswithcode.com/paper/learning-pixel-distribution-prior-with-wider
Repo https://github.com/cswin/WIN
Framework none

Challenges in Data-to-Document Generation

Title Challenges in Data-to-Document Generation
Authors Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
Abstract Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.
Tasks Data-to-Text Generation, Text Generation
Published 2017-07-25
URL http://arxiv.org/abs/1707.08052v1
PDF http://arxiv.org/pdf/1707.08052v1.pdf
PWC https://paperswithcode.com/paper/challenges-in-data-to-document-generation
Repo https://github.com/harvardnlp/boxscore-data
Framework none
comments powered by Disqus