October 19, 2019

3218 words 16 mins read

Paper Group ANR 128

AlphaSeq: Sequence Discovery with Deep Reinforcement Learning. Interpretable Set Functions. Self-Attentional Acoustic Models. Near Maximum Likelihood Decoding with Deep Learning. A Classification approach towards Unsupervised Learning of Visual Representations. Localization Guided Learning for Pedestrian Attribute Recognition. Categorizing Variants …

AlphaSeq: Sequence Discovery with Deep Reinforcement Learning


Title	AlphaSeq: Sequence Discovery with Deep Reinforcement Learning
Authors	Yulin Shao, Soung Chang Liew, Taotao Wang
Abstract	Sequences play an important role in many applications and systems. Discovering sequences with desired properties has long been an interesting intellectual pursuit. This paper puts forth a new paradigm, AlphaSeq, to discover desired sequences algorithmically using deep reinforcement learning (DRL) techniques. AlphaSeq treats the sequence discovery problem as an episodic symbol-filling game, in which a player fills symbols in the vacant positions of a sequence set sequentially during an episode of the game. Each episode ends with a completely-filled sequence set, upon which a reward is given based on the desirability of the sequence set. AlphaSeq models the game as a Markov Decision Process (MDP), and adapts the DRL framework of AlphaGo to solve the MDP. Sequences discovered improve progressively as AlphaSeq, starting as a novice, learns to become an expert game player through many episodes of game playing. Compared with traditional sequence construction by mathematical tools, AlphaSeq is particularly suitable for problems with complex objectives intractable to mathematical analysis. We demonstrate the searching capabilities of AlphaSeq in two applications: 1) AlphaSeq successfully rediscovers a set of ideal complementary codes that can zero-force all potential interferences in multi-carrier CDMA systems. 2) AlphaSeq discovers new sequences that triple the signal-to-interference ratio – benchmarked against the well-known Legendre sequence – of a mismatched filter estimator in pulse compression radar systems.
Tasks
Published	2018-09-26
URL	https://arxiv.org/abs/1810.01218v3
PDF	https://arxiv.org/pdf/1810.01218v3.pdf
PWC	https://paperswithcode.com/paper/alphaseq-sequence-discovery-with-deep
Repo
Framework

Interpretable Set Functions


Title	Interpretable Set Functions
Authors	Andrew Cotter, Maya Gupta, Heinrich Jiang, James Muller, Taman Narayan, Serena Wang, Tao Zhu
Abstract	We propose learning flexible but interpretable functions that aggregate a variable-length set of permutation-invariant feature vectors to predict a label. We use a deep lattice network model so we can architect the model structure to enhance interpretability, and add monotonicity constraints between inputs-and-outputs. We then use the proposed set function to automate the engineering of dense, interpretable features from sparse categorical features, which we call semantic feature engine. Experiments on real-world data show the achieved accuracy is similar to deep sets or deep neural networks, and is easier to debug and understand.
Tasks
Published	2018-05-31
URL	http://arxiv.org/abs/1806.00050v1
PDF	http://arxiv.org/pdf/1806.00050v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-set-functions
Repo
Framework

Self-Attentional Acoustic Models


Title	Self-Attentional Acoustic Models
Authors	Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel
Abstract	Self-attention is a method of encoding sequences of vectors by relating these vectors to each-other based on pairwise similarities. These models have recently shown promising results for modeling discrete sequences, but they are non-trivial to apply to acoustic modeling due to computational and modeling issues. In this paper, we apply self-attention to acoustic modeling, proposing several improvements to mitigate these issues: First, self-attention memory grows quadratically in the sequence length, which we address through a downsampling technique. Second, we find that previous approaches to incorporate position information into the model are unsuitable and explore other representations and hybrid models to this end. Third, to stress the importance of local context in the acoustic signal, we propose a Gaussian biasing approach that allows explicit control over the context range. Experiments find that our model approaches a strong baseline based on LSTMs with network-in-network connections while being much faster to compute. Besides speed, we find that interpretability is a strength of self-attentional acoustic models, and demonstrate that self-attention heads learn a linguistically plausible division of labor.
Tasks
Published	2018-03-26
URL	http://arxiv.org/abs/1803.09519v2
PDF	http://arxiv.org/pdf/1803.09519v2.pdf
PWC	https://paperswithcode.com/paper/self-attentional-acoustic-models
Repo
Framework

Near Maximum Likelihood Decoding with Deep Learning


Title	Near Maximum Likelihood Decoding with Deep Learning
Authors	Eliya Nachmani, Yaron Bachar, Elad Marciano, David Burshtein, Yair Be’ery
Abstract	A novel and efficient neural decoder algorithm is proposed. The proposed decoder is based on the neural Belief Propagation algorithm and the Automorphism Group. By combining neural belief propagation with permutations from the Automorphism Group we achieve near maximum likelihood performance for High Density Parity Check codes. Moreover, the proposed decoder significantly improves the decoding complexity, compared to our earlier work on the topic. We also investigate the training process and show how it can be accelerated. Simulations of the hessian and the condition number show why the learning process is accelerated. We demonstrate the decoding algorithm for various linear block codes of length up to 63 bits.
Tasks
Published	2018-01-08
URL	http://arxiv.org/abs/1801.02726v1
PDF	http://arxiv.org/pdf/1801.02726v1.pdf
PWC	https://paperswithcode.com/paper/near-maximum-likelihood-decoding-with-deep
Repo
Framework

A Classification approach towards Unsupervised Learning of Visual Representations


Title	A Classification approach towards Unsupervised Learning of Visual Representations
Authors	Aditya Vora
Abstract	In this paper, we present a technique for unsupervised learning of visual representations. Specifically, we train a model for foreground and background classification task, in the process of which it learns visual representations. Foreground and background patches for training come af- ter mining for such patches from hundreds and thousands of unlabelled videos available on the web which we ex- tract using a proposed patch extraction algorithm. With- out using any supervision, with just using 150, 000 unla- belled videos and the PASCAL VOC 2007 dataset, we train a object recognition model that achieves 45.3 mAP which is close to the best performing unsupervised feature learn- ing technique whereas better than many other proposed al- gorithms. The code for patch extraction is implemented in Matlab and available open source at the following link .
Tasks	Object Recognition
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00428v1
PDF	http://arxiv.org/pdf/1806.00428v1.pdf
PWC	https://paperswithcode.com/paper/a-classification-approach-towards
Repo
Framework

Localization Guided Learning for Pedestrian Attribute Recognition


Title	Localization Guided Learning for Pedestrian Attribute Recognition
Authors	Pengze Liu, Xihui Liu, Junjie Yan, Jing Shao
Abstract	Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos. Existing methods try to use additional pose, part or viewpoint information to complement the global feature representation for attribute classification. However, these methods face difficulties in localizing the areas corresponding to different attributes. To address this problem, we propose a novel Localization Guided Network which assigns attribute-specific weights to local features based on the affinity between proposals pre-extracted proposals and attribute locations. The advantage of our model is that our local features are learned automatically for each attribute and emphasized by the interaction with global features. We demonstrate the effectiveness of our Localization Guided Network on two pedestrian attribute benchmarks (PA-100K and RAP). Our result surpasses the previous state-of-the-art in all five metrics on both datasets.
Tasks	Pedestrian Attribute Recognition, Scene Understanding
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09102v1
PDF	http://arxiv.org/pdf/1808.09102v1.pdf
PWC	https://paperswithcode.com/paper/localization-guided-learning-for-pedestrian
Repo
Framework

Categorizing Variants of Goodhart’s Law


Title	Categorizing Variants of Goodhart’s Law
Authors	David Manheim, Scott Garrabrant
Abstract	There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart’s Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are “(at least) four different mechanisms” that relate to Goodhart’s Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field.
Tasks
Published	2018-03-13
URL	http://arxiv.org/abs/1803.04585v4
PDF	http://arxiv.org/pdf/1803.04585v4.pdf
PWC	https://paperswithcode.com/paper/categorizing-variants-of-goodharts-law
Repo
Framework

FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices


Title	FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices
Authors	Shuochao Yao, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Lu Su, Tarek Abdelzaher
Abstract	Deep neural networks show great potential as solutions to many sensing application problems, but their excessive resource demand slows down execution time, pausing a serious impediment to deployment on low-end devices. To address this challenge, recent literature focused on compressing neural network size to improve performance. We show that changing neural network size does not proportionally affect performance attributes of interest, such as execution time. Rather, extreme run-time nonlinearities exist over the network configuration space. Hence, we propose a novel framework, called FastDeepIoT, that uncovers the non-linear relation between neural network structure and execution time, then exploits that understanding to find network configurations that significantly improve the trade-off between execution time and accuracy on mobile and embedded devices. FastDeepIoT makes two key contributions. First, FastDeepIoT automatically learns an accurate and highly interpretable execution time model for deep neural networks on the target device. This is done without prior knowledge of either the hardware specifications or the detailed implementation of the used deep learning library. Second, FastDeepIoT informs a compression algorithm how to minimize execution time on the profiled device without impacting accuracy. We evaluate FastDeepIoT using three different sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus. FastDeepIoT further reduces the neural network execution time by $48%$ to $78%$ and energy consumption by $37%$ to $69%$ compared with the state-of-the-art compression algorithms.
Tasks
Published	2018-09-19
URL	http://arxiv.org/abs/1809.06970v1
PDF	http://arxiv.org/pdf/1809.06970v1.pdf
PWC	https://paperswithcode.com/paper/fastdeepiot-towards-understanding-and
Repo
Framework

Deep Learning Framework for Digital Breast Tomosynthesis Reconstruction


Title	Deep Learning Framework for Digital Breast Tomosynthesis Reconstruction
Authors	Nikita Moriakov, Koen Michielsen, Jonas Adler, Ritse Mann, Ioannis Sechopoulos, Jonas Teuwen
Abstract	Digital breast tomosynthesis is rapidly replacing digital mammography as the basic x-ray technique for evaluation of the breasts. However, the sparse sampling and limited angular range gives rise to different artifacts, which manufacturers try to solve in several ways. In this study we propose an extension of the Learned Primal-Dual algorithm for digital breast tomosynthesis. The Learned Primal-Dual algorithm is a deep neural network consisting of several `reconstruction blocks’, which take in raw sinogram data as the initial input, perform a forward and a backward pass by taking projections and back-projections, and use a convolutional neural network to produce an intermediate reconstruction result which is then improved further by the successive reconstruction block. We extend the architecture by providing breast thickness measurements as a mask to the neural network and allow it to learn how to use this thickness mask. We have trained the algorithm on digital phantoms and the corresponding noise-free/noisy projections, and then tested the algorithm on digital phantoms for varying level of noise. Reconstruction performance of the algorithms was compared visually, using MSE loss and Structural Similarity Index. Results indicate that the proposed algorithm outperforms the baseline iterative reconstruction algorithm in terms of reconstruction quality for both breast edges and internal structures and is robust to noise. \|
Tasks
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04640v1
PDF	http://arxiv.org/pdf/1808.04640v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-framework-for-digital-breast
Repo
Framework

Analyzing Covariate Influence on Gender and Race Prediction from Near-Infrared Ocular Images


Title	Analyzing Covariate Influence on Gender and Race Prediction from Near-Infrared Ocular Images
Authors	Denton Bobeldyk, Arun Ross
Abstract	Recent research has explored the possibility of automatically deducing information such as gender, age and race of an individual from their biometric data. While the face modality has been extensively studied in this regard, the iris modality less so. In this paper, we first review the medical literature to establish a biological basis for extracting gender and race cues from the iris. Then, we demonstrate that it is possible to use simple texture descriptors, like BSIF (Binarized Statistical Image Feature) and LBP (Local Binary Patterns), to extract gender and race attributes from an NIR ocular image used in a typical iris recognition system. The proposed method predicts gender and race from a single eye image with an accuracy of 86% and 90%, respectively. In addition, the following analysis are conducted: (a) the role of different parts of the ocular region on attribute prediction; (b) the influence of gender on race prediction, and vice-versa; (c) the impact of eye color on gender and race prediction; (d) the impact of image blur on gender and race prediction; (e) the generalizability of the method across different datasets; and (f) the consistency of prediction performance across the left and right eyes.
Tasks	Iris Recognition
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01912v4
PDF	http://arxiv.org/pdf/1805.01912v4.pdf
PWC	https://paperswithcode.com/paper/analyzing-covariate-influence-on-gender-and
Repo
Framework

Comparing Neural- and N-Gram-Based Language Models for Word Segmentation


Title	Comparing Neural- and N-Gram-Based Language Models for Word Segmentation
Authors	Yerai Doval, Carlos Gómez-Rodríguez
Abstract	Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
Tasks	Language Modelling
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00815v1
PDF	http://arxiv.org/pdf/1812.00815v1.pdf
PWC	https://paperswithcode.com/paper/comparing-neural-and-n-gram-based-language
Repo
Framework

Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform


Title	Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Authors	Chi-Chung Chen, Chia-Lin Yang, Hsiang-Yun Cheng
Abstract	The training process of Deep Neural Network (DNN) is compute-intensive, often taking days to weeks to train a DNN model. Therefore, parallel execution of DNN training on GPUs is a widely adopted approach to speed up the process nowadays. Due to the implementation simplicity, data parallelism is currently the most commonly used parallelization method. Nonetheless, data parallelism suffers from excessive inter-GPU communication overhead due to frequent weight synchronization among GPUs. Another approach is pipelined model parallelism, which partitions a DNN model among GPUs, and processes multiple mini-batches concurrently. This approach can significantly reduce inter-GPU communication cost compared to data parallelism. However, pipelined model parallelism faces the weight staleness issue; that is, gradients are computed with stale weights, leading to training instability and accuracy loss. In this paper, we present a pipelined model parallel execution method that enables high GPU utilization while maintaining robust training accuracy via a novel weight prediction technique, SpecTrain. Experimental results show that our proposal achieves up to 8.91x speedup compared to data parallelism on a 4-GPU platform while maintaining comparable model accuracy.
Tasks
Published	2018-09-08
URL	https://arxiv.org/abs/1809.02839v4
PDF	https://arxiv.org/pdf/1809.02839v4.pdf
PWC	https://paperswithcode.com/paper/efficient-and-robust-parallel-dnn-training
Repo
Framework

Variational Capsules for Image Analysis and Synthesis


Title	Variational Capsules for Image Analysis and Synthesis
Authors	Huaibo Huang, Lingxiao Song, Ran He, Zhenan Sun, Tieniu Tan
Abstract	A capsule is a group of neurons whose activity vector models different properties of the same entity. This paper extends the capsule to a generative version, named variational capsules (VCs). Each VC produces a latent variable for a specific entity, making it possible to integrate image analysis and image synthesis into a unified framework. Variational capsules model an image as a composition of entities in a probabilistic model. Different capsules’ divergence with a specific prior distribution represents the presence of different entities, which can be applied in image analysis tasks such as classification. In addition, variational capsules encode multiple entities in a semantically-disentangling way. Diverse instantiations of capsules are related to various properties of the same entity, making it easy to generate diverse samples with fine-grained semantic attributes. Extensive experiments demonstrate that deep networks designed with variational capsules can not only achieve promising performance on image analysis tasks (including image classification and attribute prediction) but can also improve the diversity and controllability of image synthesis.
Tasks	Image Classification, Image Generation
Published	2018-07-11
URL	http://arxiv.org/abs/1807.04099v1
PDF	http://arxiv.org/pdf/1807.04099v1.pdf
PWC	https://paperswithcode.com/paper/variational-capsules-for-image-analysis-and
Repo
Framework

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate


Title	Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Authors	Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry
Abstract	Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ - similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01796v2
PDF	http://arxiv.org/pdf/1806.01796v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-descent-on-separable-data
Repo
Framework

Reconstructing networks with unknown and heterogeneous errors


Title	Reconstructing networks with unknown and heterogeneous errors
Authors	Tiago P. Peixoto
Abstract	The vast majority of network datasets contains errors and omissions, although this is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here we develop a Bayesian reconstruction approach that lifts these limitations by not only allowing for heterogeneous errors, but also for single edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable uncertainty estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed, and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.
Tasks	Bayesian Inference
Published	2018-06-09
URL	http://arxiv.org/abs/1806.07956v3
PDF	http://arxiv.org/pdf/1806.07956v3.pdf
PWC	https://paperswithcode.com/paper/reconstructing-networks-with-unknown-and
Repo
Framework