July 30, 2019

3012 words 15 mins read

Paper Group AWR 9

Paper Group AWR 9

SGDLibrary: A MATLAB library for stochastic gradient descent algorithms. Naive Bayes Classification for Subset Selection. Attention-Based Models for Text-Dependent Speaker Verification. Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors. SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Clas …

SGDLibrary: A MATLAB library for stochastic gradient descent algorithms

Title SGDLibrary: A MATLAB library for stochastic gradient descent algorithms
Authors Hiroyuki Kasai
Abstract We consider the problem of finding the minimizer of a function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ of the finite-sum form $\min f(w) = 1/n\sum_{i}^n f_i(w)$. This problem has been studied intensively in recent years in the field of machine learning (ML). One promising approach for large-scale data is to use a stochastic optimization algorithm to solve the problem. SGDLibrary is a readable, flexible and extensible pure-MATLAB library of a collection of stochastic optimization algorithms. The purpose of the library is to provide researchers and implementers a comprehensive evaluation environment for the use of these algorithms on various ML problems.
Tasks Stochastic Optimization
Published 2017-10-27
URL http://arxiv.org/abs/1710.10951v2
PDF http://arxiv.org/pdf/1710.10951v2.pdf
PWC https://paperswithcode.com/paper/sgdlibrary-a-matlab-library-for-stochastic
Repo https://github.com/hiroyuki-kasai/SGDLibrary
Framework none

Naive Bayes Classification for Subset Selection

Title Naive Bayes Classification for Subset Selection
Authors Luca Mossina, Emmanuel Rachelson
Abstract This article focuses on the question of learning how to automatically select a subset of items among a bigger set. We introduce a methodology for the inference of ensembles of discrete values, based on the Naive Bayes assumption. Our motivation stems from practical use cases where one wishes to predict an unordered set of (possibly interdependent) values from a set of observed features. This problem can be considered in the context of Multi-label Classification (MLC) where such values are seen as labels associated to continuous or discrete features. We introduce the \nbx algorithm, an extension of Naive Bayes classification into the multi-label domain, discuss its properties and evaluate our approach on real-world problems.
Tasks Multi-Label Classification
Published 2017-07-19
URL http://arxiv.org/abs/1707.06142v1
PDF http://arxiv.org/pdf/1707.06142v1.pdf
PWC https://paperswithcode.com/paper/naive-bayes-classification-for-subset
Repo https://github.com/SuReLI/naibx-mlc
Framework none

Attention-Based Models for Text-Dependent Speaker Verification

Title Attention-Based Models for Text-Dependent Speaker Verification
Authors F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan
Abstract Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependent speaker recognition system. We explore different topologies and their variants of the attention layer, and compare different pooling methods on the attention weights. Ultimately, we show that attention-based models can improves the Equal Error Rate (EER) of our speaker verification system by relatively 14% compared to our non-attention LSTM baseline model.
Tasks Image Captioning, Machine Translation, Speaker Recognition, Speaker Verification, Speech Recognition, Text-Dependent Speaker Verification
Published 2017-10-28
URL http://arxiv.org/abs/1710.10470v3
PDF http://arxiv.org/pdf/1710.10470v3.pdf
PWC https://paperswithcode.com/paper/attention-based-models-for-text-dependent
Repo https://github.com/liyongze/lstm_speaker_verification
Framework tf

Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors

Title Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors
Authors Yu Zhao, Rennong Yang, Guillaume Chevalier, Maoguo Gong
Abstract Human activity recognition (HAR) has become a popular topic in research because of its wide application. With the development of deep learning, new ideas have appeared to address HAR problems. Here, a deep network architecture using residual bidirectional long short-term memory (LSTM) cells is proposed. The advantages of the new network include that a bidirectional connection can concatenate the positive time direction (forward state) and the negative time direction (backward state). Second, residual connections between stacked cells act as highways for gradients, which can pass underlying information directly to the upper layer, effectively avoiding the gradient vanishing problem. Generally, the proposed network shows improvements on both the temporal (using bidirectional cells) and the spatial (residual connections stacked deeply) dimensions, aiming to enhance the recognition rate. When tested with the Opportunity data set and the public domain UCI data set, the accuracy was increased by 4.78% and 3.68%, respectively, compared with previously reported results. Finally, the confusion matrix of the public domain UCI data set was analyzed.
Tasks Activity Recognition, Human Activity Recognition
Published 2017-08-22
URL http://arxiv.org/abs/1708.08989v2
PDF http://arxiv.org/pdf/1708.08989v2.pdf
PWC https://paperswithcode.com/paper/deep-residual-bidir-lstm-for-human-activity
Repo https://github.com/guillaume-chevalier/HAR-stacked-residual-bidir-LSTMs
Framework tf

SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Classification

Title SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Classification
Authors Atsushi Takeda
Abstract Deep convolutional neural networks (CNNs) achieve remarkable performance on image classification tasks. Recent studies, however, have demonstrated that generalization abilities are more important than the depth of neural networks for improving performance on image classification tasks. Herein, a new neural network called SwGridNet is proposed. A SwGridNet includes many convolutional processing units which connect mutually as a grid network where many processing paths exist between input and output. A SwGridNet has high generalization capability because the multipath architecture has the same effect of ensemble learning. As described in this paper, details of the SwGridNet network architecture are presented. Experimentally obtained results presented in this paper show that SwGridNets respectively achieve test error rates of 2.95% and 15.67% in a CIFAR-10 and CIFAR-100 classification tasks. The results indicate that the SwGridNet performance approximates that of state-of-the-art deep CNNs.
Tasks Image Classification
Published 2017-09-22
URL http://arxiv.org/abs/1709.07646v3
PDF http://arxiv.org/pdf/1709.07646v3.pdf
PWC https://paperswithcode.com/paper/swgridnet-a-deep-convolutional-neural-network
Repo https://github.com/takedarts/swgridnet
Framework none

Provable defenses against adversarial examples via the convex outer adversarial polytope

Title Provable defenses against adversarial examples via the convex outer adversarial polytope
Authors Eric Wong, J. Zico Kolter
Abstract We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $\epsilon = 0.1$), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.
Tasks Adversarial Attack
Published 2017-11-02
URL http://arxiv.org/abs/1711.00851v3
PDF http://arxiv.org/pdf/1711.00851v3.pdf
PWC https://paperswithcode.com/paper/provable-defenses-against-adversarial
Repo https://github.com/fra31/mmr-universal
Framework pytorch

Cascaded Pyramid Network for Multi-Person Pose Estimation

Title Cascaded Pyramid Network for Multi-Person Pose Estimation
Authors Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun
Abstract The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these “hard” keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the “simple” keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the “hard” keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.Code (https://github.com/chenyilun95/tf-cpn.git) and the detection results are publicly available for further research.
Tasks Keypoint Detection, Multi-Person Pose Estimation, Pose Estimation
Published 2017-11-20
URL http://arxiv.org/abs/1711.07319v2
PDF http://arxiv.org/pdf/1711.07319v2.pdf
PWC https://paperswithcode.com/paper/cascaded-pyramid-network-for-multi-person
Repo https://github.com/fenglinglwb/MSPN
Framework pytorch

Sampling Matters in Deep Embedding Learning

Title Sampling Matters in Deep Embedding Learning
Authors Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl
Abstract Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that selecting training examples plays an equally important role. We propose distance weighted sampling, which selects more informative and stable examples than traditional approaches. In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions. We evaluate our approach on the Stanford Online Products, CAR196, and the CUB200-2011 datasets for image retrieval and clustering, and on the LFW dataset for face verification. Our method achieves state-of-the-art performance on all of them.
Tasks Face Verification, Image Retrieval, Metric Learning, Zero-Shot Learning
Published 2017-06-23
URL http://arxiv.org/abs/1706.07567v2
PDF http://arxiv.org/pdf/1706.07567v2.pdf
PWC https://paperswithcode.com/paper/sampling-matters-in-deep-embedding-learning
Repo https://github.com/Confusezius/Deep-Metric-Learning-Baselines
Framework pytorch

Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations

Title Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations
Authors Diane Bouchacourt, Ryota Tomioka, Sebastian Nowozin
Abstract We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.
Tasks
Published 2017-05-24
URL http://arxiv.org/abs/1705.08841v1
PDF http://arxiv.org/pdf/1705.08841v1.pdf
PWC https://paperswithcode.com/paper/multi-level-variational-autoencoder-learning
Repo https://github.com/ananyahjha93/multi-level-vae
Framework pytorch

MR Acquisition-Invariant Representation Learning

Title MR Acquisition-Invariant Representation Learning
Authors Wouter M. Kouw, Marco Loog, Lambertus W. Bartels, Adriënne M. Mendrik
Abstract Voxelwise classification approaches are popular and effective methods for tissue quantification in brain magnetic resonance imaging (MRI) scans. However, generalization of these approaches is hampered by large differences between sets of MRI scans such as differences in field strength, vendor or acquisition protocols. Due to this acquisition related variation, classifiers trained on data from a specific scanner fail or under-perform when applied to data that was acquired differently. In order to address this lack of generalization, we propose a Siamese neural network (MRAI-net) to learn a representation that minimizes the between-scanner variation, while maintaining the contrast between brain tissues necessary for brain tissue quantification. The proposed MRAI-net was evaluated on both simulated and real MRI data. After learning the MR acquisition invariant representation, any supervised classification model that uses feature vectors can be applied. In this paper, we provide a proof of principle, which shows that a linear classifier applied on the MRAI representation is able to outperform supervised convolutional neural network classifiers for tissue classification when little target training data is available.
Tasks Representation Learning
Published 2017-09-22
URL http://arxiv.org/abs/1709.07944v2
PDF http://arxiv.org/pdf/1709.07944v2.pdf
PWC https://paperswithcode.com/paper/mr-acquisition-invariant-representation
Repo https://github.com/wmkouw/mrai-net
Framework none

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Title A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Authors Adina Williams, Nikita Nangia, Samuel R. Bowman
Abstract This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English–making it possible to evaluate systems on nearly the full complexity of the language–and it offers an explicit setting for the evaluation of cross-genre domain adaptation.
Tasks Domain Adaptation, Natural Language Inference
Published 2017-04-18
URL http://arxiv.org/abs/1704.05426v4
PDF http://arxiv.org/pdf/1704.05426v4.pdf
PWC https://paperswithcode.com/paper/a-broad-coverage-challenge-corpus-for
Repo https://github.com/nyu-mll/multiNLI
Framework tf

Representation Learning on Graphs: Methods and Applications

Title Representation Learning on Graphs: Methods and Applications
Authors William L. Hamilton, Rex Ying, Jure Leskovec
Abstract Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.
Tasks Dimensionality Reduction, Representation Learning
Published 2017-09-17
URL http://arxiv.org/abs/1709.05584v3
PDF http://arxiv.org/pdf/1709.05584v3.pdf
PWC https://paperswithcode.com/paper/representation-learning-on-graphs-methods-and
Repo https://github.com/dariarom94/Material_database
Framework none

Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy

Title Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy
Authors Zi Long, Ryuichiro Kimura, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto
Abstract Neural machine translation (NMT), a new approach to machine translation, has achieved promising results comparable to those of traditional approaches such as statistical machine translation (SMT). Despite its recent success, NMT cannot handle a larger vocabulary because the training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. In this paper, we propose to select phrases that contain out-of-vocabulary words using the statistical approach of branching entropy. This allows the proposed NMT system to be applied to a translation task of any language pair without any language-specific knowledge about technical term identification. The selected phrases are then replaced with tokens during training and post-translated by the phrase translation table of SMT. Evaluation on Japanese-to-Chinese, Chinese-to-Japanese, Japanese-to-English and English-to-Japanese patent sentence translation proved the effectiveness of phrases selected with branching entropy, where the proposed NMT model achieves a substantial improvement over a baseline NMT model without our proposed technique. Moreover, the number of translation errors of under-translation by the baseline NMT model without our proposed technique reduces to around half by the proposed NMT model.
Tasks Machine Translation
Published 2017-04-14
URL http://arxiv.org/abs/1704.04520v6
PDF http://arxiv.org/pdf/1704.04520v6.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-model-with-a-large
Repo https://github.com/FulstatResearch/Machine-Translation-Language-Model
Framework tf

Incremental Learning of Object Detectors without Catastrophic Forgetting

Title Incremental Learning of Object Detectors without Catastrophic Forgetting
Authors Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari
Abstract Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i.e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data. They suffer from “catastrophic forgetting” - an abrupt degradation of performance on the original set of classes, when the training objective is adapted to the new classes. We present a method to address this issue, and learn object detectors incrementally, when neither the original training data nor annotations for the original classes in the new training set are available. The core of our proposed solution is a loss function to balance the interplay between predictions on the new classes and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the updated networks. This incremental learning can be performed multiple times, for a new set of classes in each step, with a moderate drop in performance compared to the baseline network trained on the ensemble of data. We present object detection results on the PASCAL VOC 2007 and COCO datasets, along with a detailed empirical analysis of the approach.
Tasks Object Detection
Published 2017-08-23
URL http://arxiv.org/abs/1708.06977v1
PDF http://arxiv.org/pdf/1708.06977v1.pdf
PWC https://paperswithcode.com/paper/incremental-learning-of-object-detectors
Repo https://github.com/Ze-Yang/Context-Transformer
Framework pytorch

Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold

Title Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold
Authors Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith
Abstract We present a new, efficient frame-semantic parser that labels semantic arguments to FrameNet predicates. Built using an extension to the segmental RNN that emphasizes recall, our basic system achieves competitive performance without any calls to a syntactic parser. We then introduce a method that uses phrase-syntactic annotations from the Penn Treebank during training only, through a multitask objective; no parsing is required at training or test time. This “syntactic scaffold” offers a cheaper alternative to traditional syntactic pipelining, and achieves state-of-the-art performance.
Tasks Semantic Parsing
Published 2017-06-29
URL http://arxiv.org/abs/1706.09528v1
PDF http://arxiv.org/pdf/1706.09528v1.pdf
PWC https://paperswithcode.com/paper/frame-semantic-parsing-with-softmax-margin
Repo https://github.com/swabhs/open-sesame
Framework none
comments powered by Disqus