October 20, 2019

2787 words 14 mins read

Paper Group ANR 98

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe. Lightweight Adaptive Mixture of Neural and N-gram Language Models. First-order and second-order variants of the gradient descent in a unified framework. Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks. ActiveStereoNet: …

Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe


Title	Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe
Authors	Jiong Gong, Haihao Shen, Guoming Zhang, Xiaoli Liu, Shane Li, Ge Jin, Niharika Maheshwari, Evarist Fomenko, Eden Segal
Abstract	High throughput and low latency inference of deep neural networks are critical for the deployment of deep learning applications. This paper presents the efficient inference techniques of IntelCaffe, the first Intel optimized deep learning framework that supports efficient 8-bit low precision inference and model optimization techniques of convolutional neural networks on Intel Xeon Scalable Processors. The 8-bit optimized model is automatically generated with a calibration process from FP32 model without the need of fine-tuning or retraining. We show that the inference throughput and latency with ResNet-50, Inception-v3 and SSD are improved by 1.38X-2.9X and 1.35X-3X respectively with neglectable accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and 26X-37X from BVLC Caffe. All these techniques have been open-sourced on IntelCaffe GitHub1, and the artifact is provided to reproduce the result on Amazon AWS Cloud.
Tasks	Calibration
Published	2018-05-04
URL	http://arxiv.org/abs/1805.08691v1
PDF	http://arxiv.org/pdf/1805.08691v1.pdf
PWC	https://paperswithcode.com/paper/highly-efficient-8-bit-low-precision-1
Repo
Framework

Lightweight Adaptive Mixture of Neural and N-gram Language Models


Title	Lightweight Adaptive Mixture of Neural and N-gram Language Models
Authors	Anton Bakhtin, Arthur Szlam, Marc’Aurelio Ranzato, Edouard Grave
Abstract	It is often the case that the best performing language model is an ensemble of a neural language model with n-grams. In this work, we propose a method to improve how these two models are combined. By using a small network which predicts the mixture weight between the two models, we adapt their relative importance at each time step. Because the gating network is small, it trains quickly on small amounts of held out data, and does not add overhead at scoring time. Our experiments carried out on the One Billion Word benchmark show a significant improvement over the state of the art ensemble without retraining of the basic modules.
Tasks	Language Modelling
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07705v2
PDF	http://arxiv.org/pdf/1804.07705v2.pdf
PWC	https://paperswithcode.com/paper/lightweight-adaptive-mixture-of-neural-and-n
Repo
Framework

First-order and second-order variants of the gradient descent in a unified framework


Title	First-order and second-order variants of the gradient descent in a unified framework
Authors	Thomas Pierrot, Nicolas Perrin, Olivier Sigaud
Abstract	In this paper, we provide an overview of first-order and second-order variants of the gradient descent method that are commonly used in machine learning. We propose a general framework in which 6 of these variants can be interpreted as different instances of the same approach. They are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent method, the gradient covariance matrix approach, and Newton’s method. Besides interpreting these methods within a single framework, we explain their specificities and show under which conditions some of them coincide.
Tasks
Published	2018-10-18
URL	https://arxiv.org/abs/1810.08102v2
PDF	https://arxiv.org/pdf/1810.08102v2.pdf
PWC	https://paperswithcode.com/paper/first-order-and-second-order-variants-of-the
Repo
Framework

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks


Title	Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks
Authors	Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh
Abstract	Neural language models have been widely used in various NLP tasks, including machine translation, next word prediction and conversational agents. However, it is challenging to deploy these models on mobile devices due to their slow prediction speed, where the bottleneck is to compute top candidates in the softmax layer. In this paper, we introduce a novel softmax layer approximation algorithm by exploiting the clustering structure of context vectors. Our algorithm uses a light-weight screening model to predict a much smaller set of candidate words based on the given context, and then conducts an exact softmax only within that subset. Training such a procedure end-to-end is challenging as traditional clustering methods are discrete and non-differentiable, and thus unable to be used with back-propagation in the training process. Using the Gumbel softmax, we are able to train the screening model end-to-end on the training set to exploit data distribution. The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction. For example, for machine translation task on German to English dataset with around 25K vocabulary, we can achieve 20.4 times speed up with 98.9% precision@1 and 99.3% precision@5 with the original softmax layer prediction, while state-of-the-art ~\citep{MSRprediction} only achieves 6.7x speedup with 98.7% precision@1 and 98.1% precision@5 for the same task.
Tasks	Machine Translation
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12406v1
PDF	http://arxiv.org/pdf/1810.12406v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-screen-for-fast-softmax-inference
Repo
Framework

ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems


Title	ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems
Authors	Yinda Zhang, Sameh Khamis, Christoph Rhemann, Julien Valentin, Adarsh Kowdle, Vladimir Tankovich, Michael Schoenberg, Shahram Izadi, Thomas Funkhouser, Sean Fanello
Abstract	In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of $1/30th$ of a pixel; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes.
Tasks
Published	2018-07-16
URL	http://arxiv.org/abs/1807.06009v1
PDF	http://arxiv.org/pdf/1807.06009v1.pdf
PWC	https://paperswithcode.com/paper/activestereonet-end-to-end-self-supervised
Repo
Framework

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings


Title	Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings
Authors	Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann, John R. Talburt
Abstract	Keyword extraction is a fundamental task in natural language processing that facilitates mapping of documents to a concise set of representative single and multi-word phrases. Keywords from text documents are primarily extracted using supervised and unsupervised approaches. In this paper, we present an unsupervised technique that uses a combination of theme-weighted personalized PageRank algorithm and neural phrase embeddings for extracting and ranking keywords. We also introduce an efficient way of processing text documents and training phrase embeddings using existing techniques. We share an evaluation dataset derived from an existing dataset that is used for choosing the underlying embedding model. The evaluations for ranked keyword extraction are performed on two benchmark datasets comprising of short abstracts (Inspec), and long scientific papers (SemEval 2010), and is shown to produce results better than the state-of-the-art systems.
Tasks	Keyword Extraction
Published	2018-07-16
URL	http://arxiv.org/abs/1807.05962v1
PDF	http://arxiv.org/pdf/1807.05962v1.pdf
PWC	https://paperswithcode.com/paper/theme-weighted-ranking-of-keywords-from-text
Repo
Framework

Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference


Title	Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference
Authors	Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová
Abstract	Gradient-descent-based algorithms and their stochastic versions have widespread applications in machine learning and statistical inference. In this work we perform an analytic study of the performances of one of them, the Langevin algorithm, in the context of noisy high-dimensional inference. We employ the Langevin algorithm to sample the posterior probability measure for the spiked matrix-tensor model. The typical behaviour of this algorithm is described by a system of integro-differential equations that we call the Langevin state evolution, whose solution is compared with the one of the state evolution of approximate message passing (AMP). Our results show that, remarkably, the algorithmic threshold of the Langevin algorithm is sub-optimal with respect to the one given by AMP. We conjecture this phenomenon to be due to the residual glassiness present in that region of parameters. Finally we show how a landscape-annealing protocol, that uses the Langevin algorithm but violate the Bayes-optimality condition, can approach the performance of AMP.
Tasks
Published	2018-12-21
URL	https://arxiv.org/abs/1812.09066v4
PDF	https://arxiv.org/pdf/1812.09066v4.pdf
PWC	https://paperswithcode.com/paper/marvels-and-pitfalls-of-the-langevin
Repo
Framework

Bypassing Feature Squeezing by Increasing Adversary Strength


Title	Bypassing Feature Squeezing by Increasing Adversary Strength
Authors	Yash Sharma, Pin-Yu Chen
Abstract	Feature Squeezing is a recently proposed defense method which reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. It has been shown that feature squeezing defenses can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks. However, we demonstrate on the MNIST and CIFAR-10 datasets that by increasing the adversary strength of said state-of-the-art attacks, one can bypass the detection framework with adversarial examples of minimal visual distortion. These results suggest for proposed defenses to validate against stronger attack configurations.
Tasks
Published	2018-03-27
URL	http://arxiv.org/abs/1803.09868v1
PDF	http://arxiv.org/pdf/1803.09868v1.pdf
PWC	https://paperswithcode.com/paper/bypassing-feature-squeezing-by-increasing
Repo
Framework

Deep Learning: Computational Aspects


Title	Deep Learning: Computational Aspects
Authors	Nicholas Polson, Vadim Sokolov
Abstract	In this article we review computational aspects of Deep Learning (DL). Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. Stochastic gradient descent (SGD) optimization and batch sampling are used to learn from massive data sets.
Tasks
Published	2018-08-26
URL	https://arxiv.org/abs/1808.08618v2
PDF	https://arxiv.org/pdf/1808.08618v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-computational-aspects
Repo
Framework

Learning to Segment Inputs for NMT Favors Character-Level Processing


Title	Learning to Segment Inputs for NMT Favors Character-Level Processing
Authors	Julia Kreutzer, Artem Sokolov
Abstract	Most modern neural machine translation (NMT) systems rely on presegmented inputs. Segmentation granularity importantly determines the input and output sequence lengths, hence the modeling depth, and source and target vocabularies, which in turn determine model size, computational costs of softmax normalization, and handling of out-of-vocabulary words. However, the current practice is to use static, heuristic-based segmentations that are fixed before NMT training. This begs the question whether the chosen segmentation is optimal for the translation task. To overcome suboptimal segmentation choices, we present an algorithm for dynamic segmentation based on the Adaptative Computation Time algorithm (Graves 2016), that is trainable end-to-end and driven by the NMT objective. In an evaluation on four translation tasks we found that, given the freedom to navigate between different segmentation levels, the model prefers to operate on (almost) character level, providing support for purely character-level NMT models from a novel angle.
Tasks	Machine Translation
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01480v3
PDF	http://arxiv.org/pdf/1810.01480v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-segment-inputs-for-nmt-favors
Repo
Framework

Inexact SARAH Algorithm for Stochastic Optimization


Title	Inexact SARAH Algorithm for Stochastic Optimization
Authors	Lam M. Nguyen, Katya Scheinberg, Martin Takáč
Abstract	We develop and analyze a variant of variance reducing stochastic gradient algorithm, known as SARAH, which does not require computation of the exact gradient. Thus this new method can be applied to general expectation minimization problems rather than only finite sum problems. While the original SARAH algorithm, as well as its predecessor, SVRG, require an exact gradient computation on each outer iteration, the inexact variant of SARAH (iSARAH), which we develop here, requires only stochastic gradient computed on a mini-batch of sufficient size. The proposed method combines variance reduction via sample size selection and iterative stochastic gradient updates. We analyze the convergence rate of the algorithms for strongly convex, convex, and nonconvex cases with appropriate mini-batch size selected for each case. We show that with an additional, reasonable, assumption iSARAH achieves the best known complexity among stochastic methods in the case of general convex case stochastic value functions.
Tasks	Stochastic Optimization
Published	2018-11-25
URL	http://arxiv.org/abs/1811.10105v1
PDF	http://arxiv.org/pdf/1811.10105v1.pdf
PWC	https://paperswithcode.com/paper/inexact-sarah-algorithm-for-stochastic
Repo
Framework

Chinese Discourse Segmentation Using Bilingual Discourse Commonality


Title	Chinese Discourse Segmentation Using Bilingual Discourse Commonality
Authors	Jingfeng Yang, Sujian Li
Abstract	Discourse segmentation aims to segment Elementary Discourse Units (EDUs) and is a fundamental task in discourse analysis. For Chinese, previous researches identify EDUs just through discriminating the functions of punctuations. In this paper, we argue that Chinese EDUs may not end at the punctuation positions and should follow the definition of EDU in RST-DT. With this definition, we conduct Chinese discourse segmentation with the help of English labeled data.Using discourse commonality between English and Chinese, we design an adversarial neural network framework to extract common language-independent features and language-specific features which are useful for discourse segmentation, when there is no or only a small scale of Chinese labeled data available. Experiments on discourse segmentation demonstrate that our models can leverage common features from bilingual data, and learn efficient Chinese-specific features from a small amount of Chinese labeled data, outperforming the baseline models.
Tasks
Published	2018-08-30
URL	http://arxiv.org/abs/1809.01497v1
PDF	http://arxiv.org/pdf/1809.01497v1.pdf
PWC	https://paperswithcode.com/paper/chinese-discourse-segmentation-using
Repo
Framework

Can Multisensory Cues in VR Help Train Pattern Recognition to Citizen Scientists?


Title	Can Multisensory Cues in VR Help Train Pattern Recognition to Citizen Scientists?
Authors	Alina Striner
Abstract	As the internet of things (IoT) has integrated physical and digital technologies, designing for multiple sensory media (mulsemedia) has become more attainable. Designing technology for multiple senses has the capacity to improve virtual realism, extend our ability to process information, and more easily transfer knowledge between physical and digital environments. HCI researchers are beginning to explore the viability of integrating multimedia into virtual experiences, however research has yet to consider whether mulsemedia truly enhances realism, immersion and knowledge transfer. My work developing StreamBED, a VR training platform to train citizen science water monitors plans to consider the role of mulsemedia in immersion and learning goals. Future findings about the role of mulsemedia in learning contexts will potentially allow learners to experience, connect to, learn from spaces that are impossible to experience firsthand.
Tasks	Transfer Learning
Published	2018-04-01
URL	http://arxiv.org/abs/1804.00229v1
PDF	http://arxiv.org/pdf/1804.00229v1.pdf
PWC	https://paperswithcode.com/paper/can-multisensory-cues-in-vr-help-train
Repo
Framework

Determining ellipses from low-resolution images with a comprehensive image formation model


Title	Determining ellipses from low-resolution images with a comprehensive image formation model
Authors	Wojciech Chojnacki, Zygmunt L. Szpak
Abstract	When determining the parameters of a parametric planar shape based on a single low-resolution image, common estimation paradigms lead to inaccurate parameter estimates. The reason behind poor estimation results is that standard estimation frameworks fail to model the image formation process at a sufficiently detailed level of analysis. We propose a new method for estimating the parameters of a planar elliptic shape based on a single photon-limited, low-resolution image. Our technique incorporates the effects of several elements - point-spread function, discretisation step, quantisation step, and photon noise - into a single cohesive and manageable statistical model. While we concentrate on the particular task of estimating the parameters of elliptic shapes, our ideas and methods have a much broader scope and can be used to address the problem of estimating the parameters of an arbitrary parametrically representable planar shape. Comprehensive experimental results on simulated and real imagery demonstrate that our approach yields parameter estimates with unprecedented accuracy. Furthermore, our method supplies a parameter covariance matrix as a measure of uncertainty for the estimated parameters, as well as a planar confidence region as a means for visualising the parameter uncertainty. The mathematical model developed in this paper may prove useful in a variety of disciplines which operate with imagery at the limits of resolution.
Tasks
Published	2018-07-18
URL	http://arxiv.org/abs/1807.06814v3
PDF	http://arxiv.org/pdf/1807.06814v3.pdf
PWC	https://paperswithcode.com/paper/determining-ellipses-from-low-resolution
Repo
Framework

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning


Title	Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning
Authors	Ziming Li, Julia Kiseleva, Maarten de Rijke
Abstract	The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.
Tasks	Dialogue Generation, Imitation Learning
Published	2018-12-09
URL	http://arxiv.org/abs/1812.03509v1
PDF	http://arxiv.org/pdf/1812.03509v1.pdf
PWC	https://paperswithcode.com/paper/dialogue-generation-from-imitation-learning
Repo
Framework