Paper Group ANR 128
AlphaSeq: Sequence Discovery with Deep Reinforcement Learning. Interpretable Set Functions. Self-Attentional Acoustic Models. Near Maximum Likelihood Decoding with Deep Learning. A Classification approach towards Unsupervised Learning of Visual Representations. Localization Guided Learning for Pedestrian Attribute Recognition. Categorizing Variants …
AlphaSeq: Sequence Discovery with Deep Reinforcement Learning
Title | AlphaSeq: Sequence Discovery with Deep Reinforcement Learning |
Authors | Yulin Shao, Soung Chang Liew, Taotao Wang |
Abstract | Sequences play an important role in many applications and systems. Discovering sequences with desired properties has long been an interesting intellectual pursuit. This paper puts forth a new paradigm, AlphaSeq, to discover desired sequences algorithmically using deep reinforcement learning (DRL) techniques. AlphaSeq treats the sequence discovery problem as an episodic symbol-filling game, in which a player fills symbols in the vacant positions of a sequence set sequentially during an episode of the game. Each episode ends with a completely-filled sequence set, upon which a reward is given based on the desirability of the sequence set. AlphaSeq models the game as a Markov Decision Process (MDP), and adapts the DRL framework of AlphaGo to solve the MDP. Sequences discovered improve progressively as AlphaSeq, starting as a novice, learns to become an expert game player through many episodes of game playing. Compared with traditional sequence construction by mathematical tools, AlphaSeq is particularly suitable for problems with complex objectives intractable to mathematical analysis. We demonstrate the searching capabilities of AlphaSeq in two applications: 1) AlphaSeq successfully rediscovers a set of ideal complementary codes that can zero-force all potential interferences in multi-carrier CDMA systems. 2) AlphaSeq discovers new sequences that triple the signal-to-interference ratio – benchmarked against the well-known Legendre sequence – of a mismatched filter estimator in pulse compression radar systems. |
Tasks | |
Published | 2018-09-26 |
URL | https://arxiv.org/abs/1810.01218v3 |
https://arxiv.org/pdf/1810.01218v3.pdf | |
PWC | https://paperswithcode.com/paper/alphaseq-sequence-discovery-with-deep |
Repo | |
Framework | |
Interpretable Set Functions
Title | Interpretable Set Functions |
Authors | Andrew Cotter, Maya Gupta, Heinrich Jiang, James Muller, Taman Narayan, Serena Wang, Tao Zhu |
Abstract | We propose learning flexible but interpretable functions that aggregate a variable-length set of permutation-invariant feature vectors to predict a label. We use a deep lattice network model so we can architect the model structure to enhance interpretability, and add monotonicity constraints between inputs-and-outputs. We then use the proposed set function to automate the engineering of dense, interpretable features from sparse categorical features, which we call semantic feature engine. Experiments on real-world data show the achieved accuracy is similar to deep sets or deep neural networks, and is easier to debug and understand. |
Tasks | |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1806.00050v1 |
http://arxiv.org/pdf/1806.00050v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-set-functions |
Repo | |
Framework | |
Self-Attentional Acoustic Models
Title | Self-Attentional Acoustic Models |
Authors | Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel |
Abstract | Self-attention is a method of encoding sequences of vectors by relating these vectors to each-other based on pairwise similarities. These models have recently shown promising results for modeling discrete sequences, but they are non-trivial to apply to acoustic modeling due to computational and modeling issues. In this paper, we apply self-attention to acoustic modeling, proposing several improvements to mitigate these issues: First, self-attention memory grows quadratically in the sequence length, which we address through a downsampling technique. Second, we find that previous approaches to incorporate position information into the model are unsuitable and explore other representations and hybrid models to this end. Third, to stress the importance of local context in the acoustic signal, we propose a Gaussian biasing approach that allows explicit control over the context range. Experiments find that our model approaches a strong baseline based on LSTMs with network-in-network connections while being much faster to compute. Besides speed, we find that interpretability is a strength of self-attentional acoustic models, and demonstrate that self-attention heads learn a linguistically plausible division of labor. |
Tasks | |
Published | 2018-03-26 |
URL | http://arxiv.org/abs/1803.09519v2 |
http://arxiv.org/pdf/1803.09519v2.pdf | |
PWC | https://paperswithcode.com/paper/self-attentional-acoustic-models |
Repo | |
Framework | |
Near Maximum Likelihood Decoding with Deep Learning
Title | Near Maximum Likelihood Decoding with Deep Learning |
Authors | Eliya Nachmani, Yaron Bachar, Elad Marciano, David Burshtein, Yair Be’ery |
Abstract | A novel and efficient neural decoder algorithm is proposed. The proposed decoder is based on the neural Belief Propagation algorithm and the Automorphism Group. By combining neural belief propagation with permutations from the Automorphism Group we achieve near maximum likelihood performance for High Density Parity Check codes. Moreover, the proposed decoder significantly improves the decoding complexity, compared to our earlier work on the topic. We also investigate the training process and show how it can be accelerated. Simulations of the hessian and the condition number show why the learning process is accelerated. We demonstrate the decoding algorithm for various linear block codes of length up to 63 bits. |
Tasks | |
Published | 2018-01-08 |
URL | http://arxiv.org/abs/1801.02726v1 |
http://arxiv.org/pdf/1801.02726v1.pdf | |
PWC | https://paperswithcode.com/paper/near-maximum-likelihood-decoding-with-deep |
Repo | |
Framework | |
A Classification approach towards Unsupervised Learning of Visual Representations
Title | A Classification approach towards Unsupervised Learning of Visual Representations |
Authors | Aditya Vora |
Abstract | In this paper, we present a technique for unsupervised learning of visual representations. Specifically, we train a model for foreground and background classification task, in the process of which it learns visual representations. Foreground and background patches for training come af- ter mining for such patches from hundreds and thousands of unlabelled videos available on the web which we ex- tract using a proposed patch extraction algorithm. With- out using any supervision, with just using 150, 000 unla- belled videos and the PASCAL VOC 2007 dataset, we train a object recognition model that achieves 45.3 mAP which is close to the best performing unsupervised feature learn- ing technique whereas better than many other proposed al- gorithms. The code for patch extraction is implemented in Matlab and available open source at the following link . |
Tasks | Object Recognition |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00428v1 |
http://arxiv.org/pdf/1806.00428v1.pdf | |
PWC | https://paperswithcode.com/paper/a-classification-approach-towards |
Repo | |
Framework | |
Localization Guided Learning for Pedestrian Attribute Recognition
Title | Localization Guided Learning for Pedestrian Attribute Recognition |
Authors | Pengze Liu, Xihui Liu, Junjie Yan, Jing Shao |
Abstract | Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos. Existing methods try to use additional pose, part or viewpoint information to complement the global feature representation for attribute classification. However, these methods face difficulties in localizing the areas corresponding to different attributes. To address this problem, we propose a novel Localization Guided Network which assigns attribute-specific weights to local features based on the affinity between proposals pre-extracted proposals and attribute locations. The advantage of our model is that our local features are learned automatically for each attribute and emphasized by the interaction with global features. We demonstrate the effectiveness of our Localization Guided Network on two pedestrian attribute benchmarks (PA-100K and RAP). Our result surpasses the previous state-of-the-art in all five metrics on both datasets. |
Tasks | Pedestrian Attribute Recognition, Scene Understanding |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09102v1 |
http://arxiv.org/pdf/1808.09102v1.pdf | |
PWC | https://paperswithcode.com/paper/localization-guided-learning-for-pedestrian |
Repo | |
Framework | |
Categorizing Variants of Goodhart’s Law
Title | Categorizing Variants of Goodhart’s Law |
Authors | David Manheim, Scott Garrabrant |
Abstract | There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart’s Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are “(at least) four different mechanisms” that relate to Goodhart’s Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field. |
Tasks | |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04585v4 |
http://arxiv.org/pdf/1803.04585v4.pdf | |
PWC | https://paperswithcode.com/paper/categorizing-variants-of-goodharts-law |
Repo | |
Framework | |
FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices
Title | FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices |
Authors | Shuochao Yao, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Lu Su, Tarek Abdelzaher |
Abstract | Deep neural networks show great potential as solutions to many sensing application problems, but their excessive resource demand slows down execution time, pausing a serious impediment to deployment on low-end devices. To address this challenge, recent literature focused on compressing neural network size to improve performance. We show that changing neural network size does not proportionally affect performance attributes of interest, such as execution time. Rather, extreme run-time nonlinearities exist over the network configuration space. Hence, we propose a novel framework, called FastDeepIoT, that uncovers the non-linear relation between neural network structure and execution time, then exploits that understanding to find network configurations that significantly improve the trade-off between execution time and accuracy on mobile and embedded devices. FastDeepIoT makes two key contributions. First, FastDeepIoT automatically learns an accurate and highly interpretable execution time model for deep neural networks on the target device. This is done without prior knowledge of either the hardware specifications or the detailed implementation of the used deep learning library. Second, FastDeepIoT informs a compression algorithm how to minimize execution time on the profiled device without impacting accuracy. We evaluate FastDeepIoT using three different sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus. FastDeepIoT further reduces the neural network execution time by $48%$ to $78%$ and energy consumption by $37%$ to $69%$ compared with the state-of-the-art compression algorithms. |
Tasks | |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.06970v1 |
http://arxiv.org/pdf/1809.06970v1.pdf | |
PWC | https://paperswithcode.com/paper/fastdeepiot-towards-understanding-and |
Repo | |
Framework | |
Deep Learning Framework for Digital Breast Tomosynthesis Reconstruction
Title | Deep Learning Framework for Digital Breast Tomosynthesis Reconstruction |
Authors | Nikita Moriakov, Koen Michielsen, Jonas Adler, Ritse Mann, Ioannis Sechopoulos, Jonas Teuwen |
Abstract | Digital breast tomosynthesis is rapidly replacing digital mammography as the basic x-ray technique for evaluation of the breasts. However, the sparse sampling and limited angular range gives rise to different artifacts, which manufacturers try to solve in several ways. In this study we propose an extension of the Learned Primal-Dual algorithm for digital breast tomosynthesis. The Learned Primal-Dual algorithm is a deep neural network consisting of several `reconstruction blocks’, which take in raw sinogram data as the initial input, perform a forward and a backward pass by taking projections and back-projections, and use a convolutional neural network to produce an intermediate reconstruction result which is then improved further by the successive reconstruction block. We extend the architecture by providing breast thickness measurements as a mask to the neural network and allow it to learn how to use this thickness mask. We have trained the algorithm on digital phantoms and the corresponding noise-free/noisy projections, and then tested the algorithm on digital phantoms for varying level of noise. Reconstruction performance of the algorithms was compared visually, using MSE loss and Structural Similarity Index. Results indicate that the proposed algorithm outperforms the baseline iterative reconstruction algorithm in terms of reconstruction quality for both breast edges and internal structures and is robust to noise. | |
Tasks | |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04640v1 |
http://arxiv.org/pdf/1808.04640v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-framework-for-digital-breast |
Repo | |
Framework | |
Analyzing Covariate Influence on Gender and Race Prediction from Near-Infrared Ocular Images
Title | Analyzing Covariate Influence on Gender and Race Prediction from Near-Infrared Ocular Images |
Authors | Denton Bobeldyk, Arun Ross |
Abstract | Recent research has explored the possibility of automatically deducing information such as gender, age and race of an individual from their biometric data. While the face modality has been extensively studied in this regard, the iris modality less so. In this paper, we first review the medical literature to establish a biological basis for extracting gender and race cues from the iris. Then, we demonstrate that it is possible to use simple texture descriptors, like BSIF (Binarized Statistical Image Feature) and LBP (Local Binary Patterns), to extract gender and race attributes from an NIR ocular image used in a typical iris recognition system. The proposed method predicts gender and race from a single eye image with an accuracy of 86% and 90%, respectively. In addition, the following analysis are conducted: (a) the role of different parts of the ocular region on attribute prediction; (b) the influence of gender on race prediction, and vice-versa; (c) the impact of eye color on gender and race prediction; (d) the impact of image blur on gender and race prediction; (e) the generalizability of the method across different datasets; and (f) the consistency of prediction performance across the left and right eyes. |
Tasks | Iris Recognition |
Published | 2018-05-04 |
URL | http://arxiv.org/abs/1805.01912v4 |
http://arxiv.org/pdf/1805.01912v4.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-covariate-influence-on-gender-and |
Repo | |
Framework | |
Comparing Neural- and N-Gram-Based Language Models for Word Segmentation
Title | Comparing Neural- and N-Gram-Based Language Models for Word Segmentation |
Authors | Yerai Doval, Carlos Gómez-Rodríguez |
Abstract | Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future. |
Tasks | Language Modelling |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00815v1 |
http://arxiv.org/pdf/1812.00815v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-neural-and-n-gram-based-language |
Repo | |
Framework | |
Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform
Title | Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform |
Authors | Chi-Chung Chen, Chia-Lin Yang, Hsiang-Yun Cheng |
Abstract | The training process of Deep Neural Network (DNN) is compute-intensive, often taking days to weeks to train a DNN model. Therefore, parallel execution of DNN training on GPUs is a widely adopted approach to speed up the process nowadays. Due to the implementation simplicity, data parallelism is currently the most commonly used parallelization method. Nonetheless, data parallelism suffers from excessive inter-GPU communication overhead due to frequent weight synchronization among GPUs. Another approach is pipelined model parallelism, which partitions a DNN model among GPUs, and processes multiple mini-batches concurrently. This approach can significantly reduce inter-GPU communication cost compared to data parallelism. However, pipelined model parallelism faces the weight staleness issue; that is, gradients are computed with stale weights, leading to training instability and accuracy loss. In this paper, we present a pipelined model parallel execution method that enables high GPU utilization while maintaining robust training accuracy via a novel weight prediction technique, SpecTrain. Experimental results show that our proposal achieves up to 8.91x speedup compared to data parallelism on a 4-GPU platform while maintaining comparable model accuracy. |
Tasks | |
Published | 2018-09-08 |
URL | https://arxiv.org/abs/1809.02839v4 |
https://arxiv.org/pdf/1809.02839v4.pdf | |
PWC | https://paperswithcode.com/paper/efficient-and-robust-parallel-dnn-training |
Repo | |
Framework | |
Variational Capsules for Image Analysis and Synthesis
Title | Variational Capsules for Image Analysis and Synthesis |
Authors | Huaibo Huang, Lingxiao Song, Ran He, Zhenan Sun, Tieniu Tan |
Abstract | A capsule is a group of neurons whose activity vector models different properties of the same entity. This paper extends the capsule to a generative version, named variational capsules (VCs). Each VC produces a latent variable for a specific entity, making it possible to integrate image analysis and image synthesis into a unified framework. Variational capsules model an image as a composition of entities in a probabilistic model. Different capsules’ divergence with a specific prior distribution represents the presence of different entities, which can be applied in image analysis tasks such as classification. In addition, variational capsules encode multiple entities in a semantically-disentangling way. Diverse instantiations of capsules are related to various properties of the same entity, making it easy to generate diverse samples with fine-grained semantic attributes. Extensive experiments demonstrate that deep networks designed with variational capsules can not only achieve promising performance on image analysis tasks (including image classification and attribute prediction) but can also improve the diversity and controllability of image synthesis. |
Tasks | Image Classification, Image Generation |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04099v1 |
http://arxiv.org/pdf/1807.04099v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-capsules-for-image-analysis-and |
Repo | |
Framework | |
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Title | Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate |
Authors | Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry |
Abstract | Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ - similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD. |
Tasks | |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01796v2 |
http://arxiv.org/pdf/1806.01796v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-descent-on-separable-data |
Repo | |
Framework | |
Reconstructing networks with unknown and heterogeneous errors
Title | Reconstructing networks with unknown and heterogeneous errors |
Authors | Tiago P. Peixoto |
Abstract | The vast majority of network datasets contains errors and omissions, although this is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here we develop a Bayesian reconstruction approach that lifts these limitations by not only allowing for heterogeneous errors, but also for single edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable uncertainty estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed, and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks. |
Tasks | Bayesian Inference |
Published | 2018-06-09 |
URL | http://arxiv.org/abs/1806.07956v3 |
http://arxiv.org/pdf/1806.07956v3.pdf | |
PWC | https://paperswithcode.com/paper/reconstructing-networks-with-unknown-and |
Repo | |
Framework | |