Paper Group AWR 9
SGDLibrary: A MATLAB library for stochastic gradient descent algorithms. Naive Bayes Classification for Subset Selection. Attention-Based Models for Text-Dependent Speaker Verification. Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors. SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Clas …
SGDLibrary: A MATLAB library for stochastic gradient descent algorithms
Title | SGDLibrary: A MATLAB library for stochastic gradient descent algorithms |
Authors | Hiroyuki Kasai |
Abstract | We consider the problem of finding the minimizer of a function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ of the finite-sum form $\min f(w) = 1/n\sum_{i}^n f_i(w)$. This problem has been studied intensively in recent years in the field of machine learning (ML). One promising approach for large-scale data is to use a stochastic optimization algorithm to solve the problem. SGDLibrary is a readable, flexible and extensible pure-MATLAB library of a collection of stochastic optimization algorithms. The purpose of the library is to provide researchers and implementers a comprehensive evaluation environment for the use of these algorithms on various ML problems. |
Tasks | Stochastic Optimization |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10951v2 |
http://arxiv.org/pdf/1710.10951v2.pdf | |
PWC | https://paperswithcode.com/paper/sgdlibrary-a-matlab-library-for-stochastic |
Repo | https://github.com/hiroyuki-kasai/SGDLibrary |
Framework | none |
Naive Bayes Classification for Subset Selection
Title | Naive Bayes Classification for Subset Selection |
Authors | Luca Mossina, Emmanuel Rachelson |
Abstract | This article focuses on the question of learning how to automatically select a subset of items among a bigger set. We introduce a methodology for the inference of ensembles of discrete values, based on the Naive Bayes assumption. Our motivation stems from practical use cases where one wishes to predict an unordered set of (possibly interdependent) values from a set of observed features. This problem can be considered in the context of Multi-label Classification (MLC) where such values are seen as labels associated to continuous or discrete features. We introduce the \nbx algorithm, an extension of Naive Bayes classification into the multi-label domain, discuss its properties and evaluate our approach on real-world problems. |
Tasks | Multi-Label Classification |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06142v1 |
http://arxiv.org/pdf/1707.06142v1.pdf | |
PWC | https://paperswithcode.com/paper/naive-bayes-classification-for-subset |
Repo | https://github.com/SuReLI/naibx-mlc |
Framework | none |
Attention-Based Models for Text-Dependent Speaker Verification
Title | Attention-Based Models for Text-Dependent Speaker Verification |
Authors | F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan |
Abstract | Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependent speaker recognition system. We explore different topologies and their variants of the attention layer, and compare different pooling methods on the attention weights. Ultimately, we show that attention-based models can improves the Equal Error Rate (EER) of our speaker verification system by relatively 14% compared to our non-attention LSTM baseline model. |
Tasks | Image Captioning, Machine Translation, Speaker Recognition, Speaker Verification, Speech Recognition, Text-Dependent Speaker Verification |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10470v3 |
http://arxiv.org/pdf/1710.10470v3.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-models-for-text-dependent |
Repo | https://github.com/liyongze/lstm_speaker_verification |
Framework | tf |
Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors
Title | Deep Residual Bidir-LSTM for Human Activity Recognition Using Wearable Sensors |
Authors | Yu Zhao, Rennong Yang, Guillaume Chevalier, Maoguo Gong |
Abstract | Human activity recognition (HAR) has become a popular topic in research because of its wide application. With the development of deep learning, new ideas have appeared to address HAR problems. Here, a deep network architecture using residual bidirectional long short-term memory (LSTM) cells is proposed. The advantages of the new network include that a bidirectional connection can concatenate the positive time direction (forward state) and the negative time direction (backward state). Second, residual connections between stacked cells act as highways for gradients, which can pass underlying information directly to the upper layer, effectively avoiding the gradient vanishing problem. Generally, the proposed network shows improvements on both the temporal (using bidirectional cells) and the spatial (residual connections stacked deeply) dimensions, aiming to enhance the recognition rate. When tested with the Opportunity data set and the public domain UCI data set, the accuracy was increased by 4.78% and 3.68%, respectively, compared with previously reported results. Finally, the confusion matrix of the public domain UCI data set was analyzed. |
Tasks | Activity Recognition, Human Activity Recognition |
Published | 2017-08-22 |
URL | http://arxiv.org/abs/1708.08989v2 |
http://arxiv.org/pdf/1708.08989v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-residual-bidir-lstm-for-human-activity |
Repo | https://github.com/guillaume-chevalier/HAR-stacked-residual-bidir-LSTMs |
Framework | tf |
SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Classification
Title | SwGridNet: A Deep Convolutional Neural Network based on Grid Topology for Image Classification |
Authors | Atsushi Takeda |
Abstract | Deep convolutional neural networks (CNNs) achieve remarkable performance on image classification tasks. Recent studies, however, have demonstrated that generalization abilities are more important than the depth of neural networks for improving performance on image classification tasks. Herein, a new neural network called SwGridNet is proposed. A SwGridNet includes many convolutional processing units which connect mutually as a grid network where many processing paths exist between input and output. A SwGridNet has high generalization capability because the multipath architecture has the same effect of ensemble learning. As described in this paper, details of the SwGridNet network architecture are presented. Experimentally obtained results presented in this paper show that SwGridNets respectively achieve test error rates of 2.95% and 15.67% in a CIFAR-10 and CIFAR-100 classification tasks. The results indicate that the SwGridNet performance approximates that of state-of-the-art deep CNNs. |
Tasks | Image Classification |
Published | 2017-09-22 |
URL | http://arxiv.org/abs/1709.07646v3 |
http://arxiv.org/pdf/1709.07646v3.pdf | |
PWC | https://paperswithcode.com/paper/swgridnet-a-deep-convolutional-neural-network |
Repo | https://github.com/takedarts/swgridnet |
Framework | none |
Provable defenses against adversarial examples via the convex outer adversarial polytope
Title | Provable defenses against adversarial examples via the convex outer adversarial polytope |
Authors | Eric Wong, J. Zico Kolter |
Abstract | We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $\epsilon = 0.1$), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial. |
Tasks | Adversarial Attack |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00851v3 |
http://arxiv.org/pdf/1711.00851v3.pdf | |
PWC | https://paperswithcode.com/paper/provable-defenses-against-adversarial |
Repo | https://github.com/fra31/mmr-universal |
Framework | pytorch |
Cascaded Pyramid Network for Multi-Person Pose Estimation
Title | Cascaded Pyramid Network for Multi-Person Pose Estimation |
Authors | Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun |
Abstract | The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these “hard” keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the “simple” keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the “hard” keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.Code (https://github.com/chenyilun95/tf-cpn.git) and the detection results are publicly available for further research. |
Tasks | Keypoint Detection, Multi-Person Pose Estimation, Pose Estimation |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07319v2 |
http://arxiv.org/pdf/1711.07319v2.pdf | |
PWC | https://paperswithcode.com/paper/cascaded-pyramid-network-for-multi-person |
Repo | https://github.com/fenglinglwb/MSPN |
Framework | pytorch |
Sampling Matters in Deep Embedding Learning
Title | Sampling Matters in Deep Embedding Learning |
Authors | Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, Philipp Krähenbühl |
Abstract | Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that selecting training examples plays an equally important role. We propose distance weighted sampling, which selects more informative and stable examples than traditional approaches. In addition, we show that a simple margin based loss is sufficient to outperform all other loss functions. We evaluate our approach on the Stanford Online Products, CAR196, and the CUB200-2011 datasets for image retrieval and clustering, and on the LFW dataset for face verification. Our method achieves state-of-the-art performance on all of them. |
Tasks | Face Verification, Image Retrieval, Metric Learning, Zero-Shot Learning |
Published | 2017-06-23 |
URL | http://arxiv.org/abs/1706.07567v2 |
http://arxiv.org/pdf/1706.07567v2.pdf | |
PWC | https://paperswithcode.com/paper/sampling-matters-in-deep-embedding-learning |
Repo | https://github.com/Confusezius/Deep-Metric-Learning-Baselines |
Framework | pytorch |
Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations
Title | Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations |
Authors | Diane Bouchacourt, Ryota Tomioka, Sebastian Nowozin |
Abstract | We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups. |
Tasks | |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08841v1 |
http://arxiv.org/pdf/1705.08841v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-level-variational-autoencoder-learning |
Repo | https://github.com/ananyahjha93/multi-level-vae |
Framework | pytorch |
MR Acquisition-Invariant Representation Learning
Title | MR Acquisition-Invariant Representation Learning |
Authors | Wouter M. Kouw, Marco Loog, Lambertus W. Bartels, Adriënne M. Mendrik |
Abstract | Voxelwise classification approaches are popular and effective methods for tissue quantification in brain magnetic resonance imaging (MRI) scans. However, generalization of these approaches is hampered by large differences between sets of MRI scans such as differences in field strength, vendor or acquisition protocols. Due to this acquisition related variation, classifiers trained on data from a specific scanner fail or under-perform when applied to data that was acquired differently. In order to address this lack of generalization, we propose a Siamese neural network (MRAI-net) to learn a representation that minimizes the between-scanner variation, while maintaining the contrast between brain tissues necessary for brain tissue quantification. The proposed MRAI-net was evaluated on both simulated and real MRI data. After learning the MR acquisition invariant representation, any supervised classification model that uses feature vectors can be applied. In this paper, we provide a proof of principle, which shows that a linear classifier applied on the MRAI representation is able to outperform supervised convolutional neural network classifiers for tissue classification when little target training data is available. |
Tasks | Representation Learning |
Published | 2017-09-22 |
URL | http://arxiv.org/abs/1709.07944v2 |
http://arxiv.org/pdf/1709.07944v2.pdf | |
PWC | https://paperswithcode.com/paper/mr-acquisition-invariant-representation |
Repo | https://github.com/wmkouw/mrai-net |
Framework | none |
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Title | A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference |
Authors | Adina Williams, Nikita Nangia, Samuel R. Bowman |
Abstract | This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English–making it possible to evaluate systems on nearly the full complexity of the language–and it offers an explicit setting for the evaluation of cross-genre domain adaptation. |
Tasks | Domain Adaptation, Natural Language Inference |
Published | 2017-04-18 |
URL | http://arxiv.org/abs/1704.05426v4 |
http://arxiv.org/pdf/1704.05426v4.pdf | |
PWC | https://paperswithcode.com/paper/a-broad-coverage-challenge-corpus-for |
Repo | https://github.com/nyu-mll/multiNLI |
Framework | tf |
Representation Learning on Graphs: Methods and Applications
Title | Representation Learning on Graphs: Methods and Applications |
Authors | William L. Hamilton, Rex Ying, Jure Leskovec |
Abstract | Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph neural networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work. |
Tasks | Dimensionality Reduction, Representation Learning |
Published | 2017-09-17 |
URL | http://arxiv.org/abs/1709.05584v3 |
http://arxiv.org/pdf/1709.05584v3.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-on-graphs-methods-and |
Repo | https://github.com/dariarom94/Material_database |
Framework | none |
Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy
Title | Neural Machine Translation Model with a Large Vocabulary Selected by Branching Entropy |
Authors | Zi Long, Ryuichiro Kimura, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto |
Abstract | Neural machine translation (NMT), a new approach to machine translation, has achieved promising results comparable to those of traditional approaches such as statistical machine translation (SMT). Despite its recent success, NMT cannot handle a larger vocabulary because the training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. In this paper, we propose to select phrases that contain out-of-vocabulary words using the statistical approach of branching entropy. This allows the proposed NMT system to be applied to a translation task of any language pair without any language-specific knowledge about technical term identification. The selected phrases are then replaced with tokens during training and post-translated by the phrase translation table of SMT. Evaluation on Japanese-to-Chinese, Chinese-to-Japanese, Japanese-to-English and English-to-Japanese patent sentence translation proved the effectiveness of phrases selected with branching entropy, where the proposed NMT model achieves a substantial improvement over a baseline NMT model without our proposed technique. Moreover, the number of translation errors of under-translation by the baseline NMT model without our proposed technique reduces to around half by the proposed NMT model. |
Tasks | Machine Translation |
Published | 2017-04-14 |
URL | http://arxiv.org/abs/1704.04520v6 |
http://arxiv.org/pdf/1704.04520v6.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-model-with-a-large |
Repo | https://github.com/FulstatResearch/Machine-Translation-Language-Model |
Framework | tf |
Incremental Learning of Object Detectors without Catastrophic Forgetting
Title | Incremental Learning of Object Detectors without Catastrophic Forgetting |
Authors | Konstantin Shmelkov, Cordelia Schmid, Karteek Alahari |
Abstract | Despite their success for object detection, convolutional neural networks are ill-equipped for incremental learning, i.e., adapting the original model trained on a set of classes to additionally detect objects of new classes, in the absence of the initial training data. They suffer from “catastrophic forgetting” - an abrupt degradation of performance on the original set of classes, when the training objective is adapted to the new classes. We present a method to address this issue, and learn object detectors incrementally, when neither the original training data nor annotations for the original classes in the new training set are available. The core of our proposed solution is a loss function to balance the interplay between predictions on the new classes and a new distillation loss which minimizes the discrepancy between responses for old classes from the original and the updated networks. This incremental learning can be performed multiple times, for a new set of classes in each step, with a moderate drop in performance compared to the baseline network trained on the ensemble of data. We present object detection results on the PASCAL VOC 2007 and COCO datasets, along with a detailed empirical analysis of the approach. |
Tasks | Object Detection |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.06977v1 |
http://arxiv.org/pdf/1708.06977v1.pdf | |
PWC | https://paperswithcode.com/paper/incremental-learning-of-object-detectors |
Repo | https://github.com/Ze-Yang/Context-Transformer |
Framework | pytorch |
Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold
Title | Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold |
Authors | Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith |
Abstract | We present a new, efficient frame-semantic parser that labels semantic arguments to FrameNet predicates. Built using an extension to the segmental RNN that emphasizes recall, our basic system achieves competitive performance without any calls to a syntactic parser. We then introduce a method that uses phrase-syntactic annotations from the Penn Treebank during training only, through a multitask objective; no parsing is required at training or test time. This “syntactic scaffold” offers a cheaper alternative to traditional syntactic pipelining, and achieves state-of-the-art performance. |
Tasks | Semantic Parsing |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09528v1 |
http://arxiv.org/pdf/1706.09528v1.pdf | |
PWC | https://paperswithcode.com/paper/frame-semantic-parsing-with-softmax-margin |
Repo | https://github.com/swabhs/open-sesame |
Framework | none |