July 30, 2019

3012 words 15 mins read

Paper Group AWR 9

SGDLibrary: A MATLAB library for stochastic gradient descent algorithms. Naive Bayes Classification for Subset Selection. Attention-Based Models for Text-Dependent Speaker Verification. Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors. SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Clas …

SGDLibrary: A MATLAB library for stochastic gradient descent algorithms


Title	SGDLibrary: A MATLAB library for stochastic gradient descent algorithms
Authors	Hiroyuki Kasai
Abstract	We consider the problem of finding the minimizer of a function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ of the finite-sum form $\min f(w) = 1/n\sum_{i}^n f_i(w)$. This problem has been studied intensively in recent years in the field of machine learning (ML). One promising approach for large-scale data is to use a stochastic optimization algorithm to solve the problem. SGDLibrary is a readable, flexible and extensible pure-MATLAB library of a collection of stochastic optimization algorithms. The purpose of the library is to provide researchers and implementers a comprehensive evaluation environment for the use of these algorithms on various ML problems.
Tasks	Stochastic Optimization
Published	2017-10-27
URL	http://arxiv.org/abs/1710.10951v2
PDF	http://arxiv.org/pdf/1710.10951v2.pdf
PWC	https://paperswithcode.com/paper/sgdlibrary-a-matlab-library-for-stochastic
Repo	https://github.com/hiroyuki-kasai/SGDLibrary
Framework	none

Naive Bayes Classification for Subset Selection


Title	Naive Bayes Classification for Subset Selection
Authors	Luca Mossina, Emmanuel Rachelson
Abstract	This article focuses on the question of learning how to automatically select a subset of items among a bigger set. We introduce a methodology for the inference of ensembles of discrete values, based on the Naive Bayes assumption. Our motivation stems from practical use cases where one wishes to predict an unordered set of (possibly interdependent) values from a set of observed features. This problem can be considered in the context of Multi-label Classification (MLC) where such values are seen as labels associated to continuous or discrete features. We introduce the \nbx algorithm, an extension of Naive Bayes classification into the multi-label domain, discuss its properties and evaluate our approach on real-world problems.
Tasks	Multi-Label Classification
Published	2017-07-19
URL	http://arxiv.org/abs/1707.06142v1
PDF	http://arxiv.org/pdf/1707.06142v1.pdf
PWC	https://paperswithcode.com/paper/naive-bayes-classification-for-subset
Repo	https://github.com/SuReLI/naibx-mlc
Framework	none

Attention-Based Models for Text-Dependent Speaker Verification


Title	Attention-Based Models for Text-Dependent Speaker Verification
Authors	F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan
Abstract	Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependent speaker recognition system. We explore different topologies and their variants of the attention layer, and compare different pooling methods on the attention weights. Ultimately, we show that attention-based models can improves the Equal Error Rate (EER) of our speaker verification system by relatively 14% compared to our non-attention LSTM baseline model.
Tasks	Image Captioning, Machine Translation, Speaker Recognition, Speaker Verification, Speech Recognition, Text-Dependent Speaker Verification
Published	2017-10-28
URL	http://arxiv.org/abs/1710.10470v3
PDF	http://arxiv.org/pdf/1710.10470v3.pdf
PWC	https://paperswithcode.com/paper/attention-based-models-for-text-dependent
Repo	https://github.com/liyongze/lstm_speaker_verification
Framework	tf

Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors


Title	Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors
Authors	Yu Zhao, Rennong Yang, Guillaume Chevalier, Maoguo Gong
Abstract	Human activity recognition (HAR) has become a popular topic in research because of its wide application. With the development of deep learning, new ideas have appeared to address HAR problems. Here, a deep network architecture using residual bidirectional long short-term memory (LSTM) cells is proposed. The advantages of the new network include that a bidirectional connection can concatenate the positive time direction (forward state) and the negative time direction (backward state). Second, residual connections between stacked cells act as highways for gradients, which can pass underlying information directly to the upper layer, effectively avoiding the gradient vanishing problem. Generally, the proposed network shows improvements on both the temporal (using bidirectional cells) and the spatial (residual connections stacked deeply) dimensions, aiming to enhance the recognition rate. When tested with the Opportunity data set and the public domain UCI data set, the accuracy was increased by 4.78% and 3.68%, respectively, compared with previously reported results. Finally, the confusion matrix of the public domain UCI data set was analyzed.
Tasks	Activity Recognition, Human Activity Recognition
Published	2017-08-22
URL	http://arxiv.org/abs/1708.08989v2
PDF	http://arxiv.org/pdf/1708.08989v2.pdf
PWC	https://paperswithcode.com/paper/deep-residual-bidir-lstm-for-human-activity
Repo	https://github.com/guillaume-chevalier/HAR-stacked-residual-bidir-LSTMs
Framework	tf

SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Classification


Title	SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Classification
Authors	Atsushi Takeda
Abstract	Deep convolutional neural networks (CNNs) achieve remarkable performance on image classification tasks. Recent studies, however, have demonstrated that generalization abilities are more important than the depth of neural networks for improving performance on image classification tasks. Herein, a new neural network called SwGridNet is proposed. A SwGridNet includes many convolutional processing units which connect mutually as a grid network where many processing paths exist between input and output. A SwGridNet has high generalization capability because the multipath architecture has the same effect of ensemble learning. As described in this paper, details of the SwGridNet network architecture are presented. Experimentally obtained results presented in this paper show that SwGridNets respectively achieve test error rates of 2.95% and 15.67% in a CIFAR-10 and CIFAR-100 classification tasks. The results indicate that the SwGridNet performance approximates that of state-of-the-art deep CNNs.
Tasks	Image Classification
Published	2017-09-22
URL	http://arxiv.org/abs/1709.07646v3
PDF	http://arxiv.org/pdf/1709.07646v3.pdf
PWC	https://paperswithcode.com/paper/swgridnet-a-deep-convolutional-neural-network
Repo	https://github.com/takedarts/swgridnet
Framework	none

Provable defenses against adversarial examples via the convex outer adversarial polytope


Title	Provable defenses against adversarial examples via the convex outer adversarial polytope
Authors	Eric Wong, J. Zico Kolter
Abstract	We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $\epsilon = 0.1$), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.
Tasks	Adversarial Attack
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00851v3
PDF	http://arxiv.org/pdf/1711.00851v3.pdf
PWC	https://paperswithcode.com/paper/provable-defenses-against-adversarial
Repo	https://github.com/fra31/mmr-universal
Framework	pytorch

Cascaded Pyramid Network for Multi-Person Pose Estimation


Title	Cascaded Pyramid Network for Multi-Person Pose Estimation
Authors	Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun
Abstract	The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these “hard” keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the “simple” keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the “hard” keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.Code (https://github.com/chenyilun95/tf-cpn.git) and the detection results are publicly available for further research.
Tasks	Keypoint Detection, Multi-Person Pose Estimation, Pose Estimation
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07319v2
PDF	http://arxiv.org/pdf/1711.07319v2.pdf
PWC	https://paperswithcode.com/paper/cascaded-pyramid-network-for-multi-person
Repo	https://github.com/fenglinglwb/MSPN
Framework	pytorch

Sampling Matters in Deep Embedding Learning


Title	Sampling Matters in Deep Embedding Learning
Authors	Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
Abstract	Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that selecting training examples plays an equally important role. We propose distance weighted sampling, which selects more informative and stable examples than traditional approaches. In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions. We evaluate our approach on the Stanford Online Products, CAR196, and the CUB200-2011 datasets for image retrieval and clustering, and on the LFW dataset for face verification. Our method achieves state-of-the-art performance on all of them.
Tasks	Face Verification, Image Retrieval, Metric Learning, Zero-Shot Learning
Published	2017-06-23
URL	http://arxiv.org/abs/1706.07567v2
PDF	http://arxiv.org/pdf/1706.07567v2.pdf
PWC	https://paperswithcode.com/paper/sampling-matters-in-deep-embedding-learning
Repo	https://github.com/Confusezius/Deep-Metric-Learning-Baselines
Framework	pytorch

Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations


Title	Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations
Authors	Diane Bouchacourt, Ryota Tomioka, Sebastian Nowozin
Abstract	We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.
Tasks
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08841v1
PDF	http://arxiv.org/pdf/1705.08841v1.pdf
PWC	https://paperswithcode.com/paper/multi-level-variational-autoencoder-learning
Repo	https://github.com/ananyahjha93/multi-level-vae
Framework	pytorch

MR Acquisition-Invariant Representation Learning


Title	MR Acquisition-Invariant Representation Learning
Authors	Wouter M. Kouw, Marco Loog, Lambertus W. Bartels, Adriënne M. Mendrik
Abstract	Voxelwise classification approaches are popular and effective methods for tissue quantification in brain magnetic resonance imaging (MRI) scans. However, generalization of these approaches is hampered by large differences between sets of MRI scans such as differences in field strength, vendor or acquisition protocols. Due to this acquisition related variation, classifiers trained on data from a specific scanner fail or under-perform when applied to data that was acquired differently. In order to address this lack of generalization, we propose a Siamese neural network (MRAI-net) to learn a representation that minimizes the between-scanner variation, while maintaining the contrast between brain tissues necessary for brain tissue quantification. The proposed MRAI-net was evaluated on both simulated and real MRI data. After learning the MR acquisition invariant representation, any supervised classification model that uses feature vectors can be applied. In this paper, we provide a proof of principle, which shows that a linear classifier applied on the MRAI representation is able to outperform supervised convolutional neural network classifiers for tissue classification when little target training data is available.
Tasks	Representation Learning
Published	2017-09-22
URL	http://arxiv.org/abs/1709.07944v2
PDF	http://arxiv.org/pdf/1709.07944v2.pdf
PWC	https://paperswithcode.com/paper/mr-acquisition-invariant-representation
Repo	https://github.com/wmkouw/mrai-net
Framework	none

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference


Title	A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Authors	Adina Williams, Nikita Nangia, Samuel R. Bowman
Abstract	This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English–making it possible to evaluate systems on nearly the full complexity of the language–and it offers an explicit setting for the evaluation of cross-genre domain adaptation.
Tasks	Domain Adaptation, Natural Language Inference
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05426v4
PDF	http://arxiv.org/pdf/1704.05426v4.pdf
PWC	https://paperswithcode.com/paper/a-broad-coverage-challenge-corpus-for
Repo	https://github.com/nyu-mll/multiNLI
Framework	tf

Representation Learning on Graphs: Methods and Applications


Title	Representation Learning on Graphs: Methods and Applications
Authors	William L. Hamilton, Rex Ying, Jure Leskovec
Abstract	Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
Tasks	Dimensionality Reduction, Representation Learning
Published	2017-09-17
URL	http://arxiv.org/abs/1709.05584v3
PDF	http://arxiv.org/pdf/1709.05584v3.pdf
PWC	https://paperswithcode.com/paper/representation-learning-on-graphs-methods-and
Repo	https://github.com/dariarom94/Material_database
Framework	none

Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy


Title	Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy
Authors	Zi Long, Ryuichiro Kimura, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto
Abstract	Neural machine translation (NMT), a new approach to machine translation, has achieved promising results comparable to those of traditional approaches such as statistical machine translation (SMT). Despite its recent success, NMT cannot handle a larger vocabulary because the training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. In this paper, we propose to select phrases that contain out-of-vocabulary words using the statistical approach of branching entropy. This allows the proposed NMT system to be applied to a translation task of any language pair without any language-specific knowledge about technical term identification. The selected phrases are then replaced with tokens during training and post-translated by the phrase translation table of SMT. Evaluation on Japanese-to-Chinese, Chinese-to-Japanese, Japanese-to-English and English-to-Japanese patent sentence translation proved the effectiveness of phrases selected with branching entropy, where the proposed NMT model achieves a substantial improvement over a baseline NMT model without our proposed technique. Moreover, the number of translation errors of under-translation by the baseline NMT model without our proposed technique reduces to around half by the proposed NMT model.
Tasks	Machine Translation
Published	2017-04-14
URL	http://arxiv.org/abs/1704.04520v6
PDF	http://arxiv.org/pdf/1704.04520v6.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-model-with-a-large
Repo	https://github.com/FulstatResearch/Machine-Translation-Language-Model
Framework	tf

Incremental Learning of Object Detectors without Catastrophic Forgetting


Title	Incremental Learning of Object Detectors without Catastrophic Forgetting
Authors	Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari
Abstract	Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i.e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data. They suffer from “catastrophic forgetting” - an abrupt degradation of performance on the original set of classes, when the training objective is adapted to the new classes. We present a method to address this issue, and learn object detectors incrementally, when neither the original training data nor annotations for the original classes in the new training set are available. The core of our proposed solution is a loss function to balance the interplay between predictions on the new classes and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the updated networks. This incremental learning can be performed multiple times, for a new set of classes in each step, with a moderate drop in performance compared to the baseline network trained on the ensemble of data. We present object detection results on the PASCAL VOC 2007 and COCO datasets, along with a detailed empirical analysis of the approach.
Tasks	Object Detection
Published	2017-08-23
URL	http://arxiv.org/abs/1708.06977v1
PDF	http://arxiv.org/pdf/1708.06977v1.pdf
PWC	https://paperswithcode.com/paper/incremental-learning-of-object-detectors
Repo	https://github.com/Ze-Yang/Context-Transformer
Framework	pytorch

Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold


Title	Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold
Authors	Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith
Abstract	We present a new, efficient frame-semantic parser that labels semantic arguments to FrameNet predicates. Built using an extension to the segmental RNN that emphasizes recall, our basic system achieves competitive performance without any calls to a syntactic parser. We then introduce a method that uses phrase-syntactic annotations from the Penn Treebank during training only, through a multitask objective; no parsing is required at training or test time. This “syntactic scaffold” offers a cheaper alternative to traditional syntactic pipelining, and achieves state-of-the-art performance.
Tasks	Semantic Parsing
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09528v1
PDF	http://arxiv.org/pdf/1706.09528v1.pdf
PWC	https://paperswithcode.com/paper/frame-semantic-parsing-with-softmax-margin
Repo	https://github.com/swabhs/open-sesame
Framework	none