May 7, 2019

2951 words 14 mins read

Paper Group AWR 46

Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling. An End-to-End Architecture for Keyword Spotting and Voice Activity Detection. Learning Optimized Risk Scores. Programming with a Differentiable Forth Interpreter. Learning Where to Attend Like a Human Driver. Learning to Refi …

Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling


Title	Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling
Authors	Santi Puch, Asier Aduriz, Adrià Casamitjana, Veronica Vilaplana, Paula Petrone, Grégory Operto, Raffaele Cacciaglia, Stavros Skouras, Carles Falcon, José Luis Molinuevo, Juan Domingo Gispert
Abstract	This paper describes a new neuroimaging analysis toolbox that allows for the modeling of nonlinear effects at the voxel level, overcoming limitations of methods based on linear models like the GLM. We illustrate its features using a relevant example in which distinct nonlinear trajectories of Alzheimer’s disease related brain atrophy patterns were found across the full biological spectrum of the disease. The open-source toolbox presented in this paper is available at https://github.com/imatge-upc/VNeAT.
Tasks
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00667v3
PDF	http://arxiv.org/pdf/1612.00667v3.pdf
PWC	https://paperswithcode.com/paper/voxelwise-nonlinear-regression-toolbox-for
Repo	https://github.com/imatge-upc/VNeAT
Framework	none

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection


Title	An End-to-End Architecture for Keyword Spotting and Voice Activity Detection
Authors	Chris Lengerich, Awni Hannun
Abstract	We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection. We develop novel inference algorithms for an end-to-end Recurrent Neural Network trained with the Connectionist Temporal Classification loss function which allow our model to achieve high accuracy on both keyword spotting and voice activity detection without retraining. In contrast to prior voice activity detection models, our architecture does not require aligned training data and uses the same parameters as the keyword spotting model. This allows us to deploy a high quality voice activity detector with no additional memory or maintenance requirements.
Tasks	Action Detection, Activity Detection, Keyword Spotting
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09405v1
PDF	http://arxiv.org/pdf/1611.09405v1.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-architecture-for-keyword
Repo	https://github.com/taylorlu/AudioKWS
Framework	tf

Learning Optimized Risk Scores


Title	Learning Optimized Risk Scores
Authors	Berk Ustun, Cynthia Rudin
Abstract	Risk scores are simple classification models that let users make quick risk predictions by adding and subtracting a few small numbers. These models are widely used in medicine and criminal justice, but are difficult to learn from data because they need to be calibrated, sparse, use small integer coefficients, and obey application-specific operational constraints. In this paper, we present a new machine learning approach to learn risk scores. We formulate the risk score problem as a mixed integer nonlinear program, and present a cutting plane algorithm for non-convex settings to efficiently recover its optimal solution. We improve our algorithm with specialized techniques to generate feasible solutions, narrow the optimality gap, and reduce data-related computation. Our approach can fit risk scores in a way that scales linearly in the number of samples, provides a certificate of optimality, and obeys real-world constraints without parameter tuning or post-processing. We benchmark the performance benefits of this approach through an extensive set of numerical experiments, comparing to risk scores built using heuristic approaches. We also discuss its practical benefits through a real-world application where we build a customized risk score for ICU seizure prediction in collaboration with the Massachusetts General Hospital.
Tasks	Seizure prediction
Published	2016-10-01
URL	https://arxiv.org/abs/1610.00168v5
PDF	https://arxiv.org/pdf/1610.00168v5.pdf
PWC	https://paperswithcode.com/paper/learning-optimized-risk-scores
Repo	https://github.com/ustunb/risk-slim
Framework	none

Programming with a Differentiable Forth Interpreter


Title	Programming with a Differentiable Forth Interpreter
Authors	Matko Bošnjak, Tim Rocktäschel, Jason Naradowsky, Sebastian Riedel
Abstract	Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories.
Tasks
Published	2016-05-21
URL	http://arxiv.org/abs/1605.06640v3
PDF	http://arxiv.org/pdf/1605.06640v3.pdf
PWC	https://paperswithcode.com/paper/programming-with-a-differentiable-forth
Repo	https://github.com/uclmr/d4
Framework	tf

Learning Where to Attend Like a Human Driver


Title	Learning Where to Attend Like a Human Driver
Authors	Andrea Palazzi, Francesco Solera, Simone Calderara, Stefano Alletto, Rita Cucchiara
Abstract	Despite the advent of autonomous cars, it’s likely - at least in the near future - that human attention will still maintain a central role as a guarantee in terms of legal responsibility during the driving task. In this paper we study the dynamics of the driver’s gaze and use it as a proxy to understand related attentional mechanisms. First, we build our analysis upon two questions: where and what the driver is looking at? Second, we model the driver’s gaze by training a coarse-to-fine convolutional network on short sequences extracted from the DR(eye)VE dataset. Experimental comparison against different baselines reveal that the driver’s gaze can indeed be learnt to some extent, despite i) being highly subjective and ii) having only one driver’s gaze available for each sequence due to the irreproducibility of the scene. Eventually, we advocate for a new assisted driving paradigm which suggests to the driver, with no intervention, where she should focus her attention.
Tasks
Published	2016-11-24
URL	http://arxiv.org/abs/1611.08215v2
PDF	http://arxiv.org/pdf/1611.08215v2.pdf
PWC	https://paperswithcode.com/paper/learning-where-to-attend-like-a-human-driver
Repo	https://github.com/francescosolera/dreyeving
Framework	none

Learning to Refine Object Segments


Title	Learning to Refine Object Segments
Authors	Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr
Abstract	Object segmentation requires both object-level information and low-level pixel data. This presents a challenge for feedforward networks: lower layers in convolutional nets capture rich spatial information, while upper layers encode object-level knowledge but are invariant to factors such as pose and appearance. In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach. The resulting bottom-up/top-down architecture is capable of efficiently generating high-fidelity object masks. Similarly to skip connections, our approach leverages features at all layers of the net. Unlike skip connections, our approach does not attempt to output independent predictions at each layer. Instead, we first output a coarse `mask encoding’ in a feedforward pass, then refine this mask encoding in a top-down pass utilizing features at successively lower layers. The approach is simple, fast, and effective. Building on the recent DeepMask network for generating object proposals, we show accuracy improvements of 10-20% in average recall for various setups. Additionally, by optimizing the overall network architecture, our approach, which we call SharpMask, is 50% faster than the original DeepMask network (under .8s per image). \|
Tasks	Semantic Segmentation
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08695v2
PDF	http://arxiv.org/pdf/1603.08695v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-refine-object-segments
Repo	https://github.com/aby2s/sharpmask
Framework	tf

Exploring the Limits of Language Modeling


Title	Exploring the Limits of Language Modeling
Authors	Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu
Abstract	In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.
Tasks	Language Modelling
Published	2016-02-07
URL	http://arxiv.org/abs/1602.02410v2
PDF	http://arxiv.org/pdf/1602.02410v2.pdf
PWC	https://paperswithcode.com/paper/exploring-the-limits-of-language-modeling
Repo	https://github.com/IBM/MAX-News-Text-Generator
Framework	tf

TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks


Title	TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks
Authors	Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, Marc Pollefeys
Abstract	In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under- or overfitting. The main reason for these drawbacks is that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal “canonical” instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines.
Tasks
Published	2016-04-21
URL	http://arxiv.org/abs/1604.06318v2
PDF	http://arxiv.org/pdf/1604.06318v2.pdf
PWC	https://paperswithcode.com/paper/ti-pooling-transformation-invariant-pooling
Repo	https://github.com/nsavinov/semantic3dnet
Framework	torch

Gap Safe screening rules for sparsity enforcing penalties


Title	Gap Safe screening rules for sparsity enforcing penalties
Authors	Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon
Abstract	In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the expected sparsity of the solutions and consequently leading to faster solvers. When the procedure is guaranteed not to discard variables wrongly the rules are said to be safe. In this work, we propose a unifying framework for generalized linear models regularized with standard sparsity enforcing penalties such as $\ell_1$ or $\ell_1/\ell_2$ norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but are particularly well suited to (block) coordinate descent methods. Applied to many standard learning tasks, Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc., we report significant speed-ups compared to previously proposed safe rules on all tested data sets.
Tasks
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05780v4
PDF	http://arxiv.org/pdf/1611.05780v4.pdf
PWC	https://paperswithcode.com/paper/gap-safe-screening-rules-for-sparsity
Repo	https://github.com/EugeneNdiaye/Gap_Safe_Rules
Framework	none

The Freiburg Groceries Dataset


Title	The Freiburg Groceries Dataset
Authors	Philipp Jund, Nichola Abdo, Andreas Eitel, Wolfram Burgard
Abstract	With the increasing performance of machine learning techniques in the last few years, the computer vision and robotics communities have created a large number of datasets for benchmarking object recognition tasks. These datasets cover a large spectrum of natural images and object categories, making them not only useful as a testbed for comparing machine learning approaches, but also a great resource for bootstrapping different domain-specific perception and robotic systems. One such domain is domestic environments, where an autonomous robot has to recognize a large variety of everyday objects such as groceries. This is a challenging task due to the large variety of objects and products, and where there is great need for real-world training data that goes beyond product images available online. In this paper, we address this issue and present a dataset consisting of 5,000 images covering 25 different classes of groceries, with at least 97 images per class. We collected all images from real-world settings at different stores and apartments. In contrast to existing groceries datasets, our dataset includes a large variety of perspectives, lighting conditions, and degrees of clutter. Overall, our images contain thousands of different object instances. It is our hope that machine learning and robotics researchers find this dataset of use for training, testing, and bootstrapping their approaches. As a baseline classifier to facilitate comparison, we re-trained the CaffeNet architecture (an adaptation of the well-known AlexNet) on our dataset and achieved a mean accuracy of 78.9%. We release this trained model along with the code and data splits we used in our experiments.
Tasks	Object Recognition
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05799v1
PDF	http://arxiv.org/pdf/1611.05799v1.pdf
PWC	https://paperswithcode.com/paper/the-freiburg-groceries-dataset
Repo	https://github.com/PhilJd/freiburg_groceries_dataset
Framework	none

Bi-directional Attention with Agreement for Dependency Parsing


Title	Bi-directional Attention with Agreement for Dependency Parsing
Authors	Hao Cheng, Hao Fang, Xiaodong He, Jianfeng Gao, Li Deng
Abstract	We develop a novel bi-directional attention model for dependency parsing, which learns to agree on headword predictions from the forward and backward parsing directions. The parsing procedure for each direction is formulated as sequentially querying the memory component that stores continuous headword embeddings. The proposed parser makes use of {\it soft} headword embeddings, allowing the model to implicitly capture high-order parsing history without dramatically increasing the computational complexity. We conduct experiments on English, Chinese, and 12 other languages from the CoNLL 2006 shared task, showing that the proposed model achieves state-of-the-art unlabeled attachment scores on 6 languages.
Tasks	Dependency Parsing
Published	2016-08-06
URL	http://arxiv.org/abs/1608.02076v2
PDF	http://arxiv.org/pdf/1608.02076v2.pdf
PWC	https://paperswithcode.com/paper/bi-directional-attention-with-agreement-for
Repo	https://github.com/hao-cheng/biattdp
Framework	none

Words or Characters? Fine-grained Gating for Reading Comprehension


Title	Words or Characters? Fine-grained Gating for Reading Comprehension
Authors	Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
Abstract	Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension. We present a fine-grained gating mechanism to dynamically combine word-level and character-level representations based on properties of the words. We also extend the idea of fine-grained gating to modeling the interaction between questions and paragraphs for reading comprehension. Experiments show that our approach can improve the performance on reading comprehension tasks, achieving new state-of-the-art results on the Children’s Book Test dataset. To demonstrate the generality of our gating mechanism, we also show improved results on a social media tag prediction task.
Tasks	Question Answering, Reading Comprehension
Published	2016-11-06
URL	http://arxiv.org/abs/1611.01724v2
PDF	http://arxiv.org/pdf/1611.01724v2.pdf
PWC	https://paperswithcode.com/paper/words-or-characters-fine-grained-gating-for
Repo	https://github.com/kimiyoung/fg-gating
Framework	none

Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM


Title	Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM
Authors	Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, Xiaolong Wang
Abstract	Modeling human conversations is the essence for building satisfying chat-bots with multi-turn dialog ability. Conversation modeling will notably benefit from domain knowledge since the relationships between sentences can be clarified due to semantic hints introduced by knowledge. In this paper, a deep neural network is proposed to incorporate background knowledge for conversation modeling. Through a specially designed Recall gate, domain knowledge can be transformed into the extra global memory of Long Short-Term Memory (LSTM), so as to enhance LSTM by cooperating with its local memory to capture the implicit semantic relevance between sentences within conversations. In addition, this paper introduces the loose structured domain knowledge base, which can be built with slight amount of manual work and easily adopted by the Recall gate. Our model is evaluated on the context-oriented response selecting task, and experimental results on both two datasets have shown that our approach is promising for modeling human conversations and building key components of automatic chatting systems.
Tasks
Published	2016-05-17
URL	http://arxiv.org/abs/1605.05110v2
PDF	http://arxiv.org/pdf/1605.05110v2.pdf
PWC	https://paperswithcode.com/paper/incorporating-loose-structured-knowledge-into
Repo	https://github.com/JasonForJoy/Leaderboards-for-Multi-Turn-Response-Selection
Framework	none

Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text


Title	Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text
Authors	Ameya Prabhu, Aditya Joshi, Manish Shrivastava, Vasudeva Varma
Abstract	Sentiment analysis (SA) using code-mixed data from social media has several applications in opinion mining ranging from customer satisfaction to social campaign analysis in multilingual societies. Advances in this area are impeded by the lack of a suitable annotated dataset. We introduce a Hindi-English (Hi-En) code-mixed dataset for sentiment analysis and perform empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media. In this paper, we introduce learning sub-word level representations in LSTM (Subword-LSTM) architecture instead of character-level or word-level representations. This linguistic prior in our architecture enables us to learn the information about sentiment value of important morphemes. This also seems to work well in highly noisy text containing misspellings as shown in our experiments which is demonstrated in morpheme-level feature maps learned by our model. Also, we hypothesize that encoding this linguistic prior in the Subword-LSTM architecture leads to the superior performance. Our system attains accuracy 4-5% greater than traditional approaches on our dataset, and also outperforms the available system for sentiment analysis in Hi-En code-mixed text by 18%.
Tasks	Opinion Mining, Sentiment Analysis
Published	2016-11-02
URL	http://arxiv.org/abs/1611.00472v1
PDF	http://arxiv.org/pdf/1611.00472v1.pdf
PWC	https://paperswithcode.com/paper/towards-sub-word-level-compositions-for
Repo	https://github.com/DrImpossible/Sub-word-LSTM
Framework	none

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU


Title	Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU
Authors	Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan Kautz
Abstract	We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU’s computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at https://github.com/NVlabs/GA3C .
Tasks
Published	2016-11-18
URL	http://arxiv.org/abs/1611.06256v3
PDF	http://arxiv.org/pdf/1611.06256v3.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-through-asynchronous
Repo	https://github.com/NVlabs/GA3C
Framework	tf