Paper Group AWR 46
Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling. An End-to-End Architecture for Keyword Spotting and Voice Activity Detection. Learning Optimized Risk Scores. Programming with a Differentiable Forth Interpreter. Learning Where to Attend Like a Human Driver. Learning to Refi …
Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling
Title | Voxelwise nonlinear regression toolbox for neuroimage analysis: Application to aging and neurodegenerative disease modeling |
Authors | Santi Puch, Asier Aduriz, Adrià Casamitjana, Veronica Vilaplana, Paula Petrone, Grégory Operto, Raffaele Cacciaglia, Stavros Skouras, Carles Falcon, José Luis Molinuevo, Juan Domingo Gispert |
Abstract | This paper describes a new neuroimaging analysis toolbox that allows for the modeling of nonlinear effects at the voxel level, overcoming limitations of methods based on linear models like the GLM. We illustrate its features using a relevant example in which distinct nonlinear trajectories of Alzheimer’s disease related brain atrophy patterns were found across the full biological spectrum of the disease. The open-source toolbox presented in this paper is available at https://github.com/imatge-upc/VNeAT. |
Tasks | |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00667v3 |
http://arxiv.org/pdf/1612.00667v3.pdf | |
PWC | https://paperswithcode.com/paper/voxelwise-nonlinear-regression-toolbox-for |
Repo | https://github.com/imatge-upc/VNeAT |
Framework | none |
An End-to-End Architecture for Keyword Spotting and Voice Activity Detection
Title | An End-to-End Architecture for Keyword Spotting and Voice Activity Detection |
Authors | Chris Lengerich, Awni Hannun |
Abstract | We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection. We develop novel inference algorithms for an end-to-end Recurrent Neural Network trained with the Connectionist Temporal Classification loss function which allow our model to achieve high accuracy on both keyword spotting and voice activity detection without retraining. In contrast to prior voice activity detection models, our architecture does not require aligned training data and uses the same parameters as the keyword spotting model. This allows us to deploy a high quality voice activity detector with no additional memory or maintenance requirements. |
Tasks | Action Detection, Activity Detection, Keyword Spotting |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09405v1 |
http://arxiv.org/pdf/1611.09405v1.pdf | |
PWC | https://paperswithcode.com/paper/an-end-to-end-architecture-for-keyword |
Repo | https://github.com/taylorlu/AudioKWS |
Framework | tf |
Learning Optimized Risk Scores
Title | Learning Optimized Risk Scores |
Authors | Berk Ustun, Cynthia Rudin |
Abstract | Risk scores are simple classification models that let users make quick risk predictions by adding and subtracting a few small numbers. These models are widely used in medicine and criminal justice, but are difficult to learn from data because they need to be calibrated, sparse, use small integer coefficients, and obey application-specific operational constraints. In this paper, we present a new machine learning approach to learn risk scores. We formulate the risk score problem as a mixed integer nonlinear program, and present a cutting plane algorithm for non-convex settings to efficiently recover its optimal solution. We improve our algorithm with specialized techniques to generate feasible solutions, narrow the optimality gap, and reduce data-related computation. Our approach can fit risk scores in a way that scales linearly in the number of samples, provides a certificate of optimality, and obeys real-world constraints without parameter tuning or post-processing. We benchmark the performance benefits of this approach through an extensive set of numerical experiments, comparing to risk scores built using heuristic approaches. We also discuss its practical benefits through a real-world application where we build a customized risk score for ICU seizure prediction in collaboration with the Massachusetts General Hospital. |
Tasks | Seizure prediction |
Published | 2016-10-01 |
URL | https://arxiv.org/abs/1610.00168v5 |
https://arxiv.org/pdf/1610.00168v5.pdf | |
PWC | https://paperswithcode.com/paper/learning-optimized-risk-scores |
Repo | https://github.com/ustunb/risk-slim |
Framework | none |
Programming with a Differentiable Forth Interpreter
Title | Programming with a Differentiable Forth Interpreter |
Authors | Matko Bošnjak, Tim Rocktäschel, Jason Naradowsky, Sebastian Riedel |
Abstract | Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories. |
Tasks | |
Published | 2016-05-21 |
URL | http://arxiv.org/abs/1605.06640v3 |
http://arxiv.org/pdf/1605.06640v3.pdf | |
PWC | https://paperswithcode.com/paper/programming-with-a-differentiable-forth |
Repo | https://github.com/uclmr/d4 |
Framework | tf |
Learning Where to Attend Like a Human Driver
Title | Learning Where to Attend Like a Human Driver |
Authors | Andrea Palazzi, Francesco Solera, Simone Calderara, Stefano Alletto, Rita Cucchiara |
Abstract | Despite the advent of autonomous cars, it’s likely - at least in the near future - that human attention will still maintain a central role as a guarantee in terms of legal responsibility during the driving task. In this paper we study the dynamics of the driver’s gaze and use it as a proxy to understand related attentional mechanisms. First, we build our analysis upon two questions: where and what the driver is looking at? Second, we model the driver’s gaze by training a coarse-to-fine convolutional network on short sequences extracted from the DR(eye)VE dataset. Experimental comparison against different baselines reveal that the driver’s gaze can indeed be learnt to some extent, despite i) being highly subjective and ii) having only one driver’s gaze available for each sequence due to the irreproducibility of the scene. Eventually, we advocate for a new assisted driving paradigm which suggests to the driver, with no intervention, where she should focus her attention. |
Tasks | |
Published | 2016-11-24 |
URL | http://arxiv.org/abs/1611.08215v2 |
http://arxiv.org/pdf/1611.08215v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-where-to-attend-like-a-human-driver |
Repo | https://github.com/francescosolera/dreyeving |
Framework | none |
Learning to Refine Object Segments
Title | Learning to Refine Object Segments |
Authors | Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr |
Abstract | Object segmentation requires both object-level information and low-level pixel data. This presents a challenge for feedforward networks: lower layers in convolutional nets capture rich spatial information, while upper layers encode object-level knowledge but are invariant to factors such as pose and appearance. In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach. The resulting bottom-up/top-down architecture is capable of efficiently generating high-fidelity object masks. Similarly to skip connections, our approach leverages features at all layers of the net. Unlike skip connections, our approach does not attempt to output independent predictions at each layer. Instead, we first output a coarse `mask encoding’ in a feedforward pass, then refine this mask encoding in a top-down pass utilizing features at successively lower layers. The approach is simple, fast, and effective. Building on the recent DeepMask network for generating object proposals, we show accuracy improvements of 10-20% in average recall for various setups. Additionally, by optimizing the overall network architecture, our approach, which we call SharpMask, is 50% faster than the original DeepMask network (under .8s per image). | |
Tasks | Semantic Segmentation |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08695v2 |
http://arxiv.org/pdf/1603.08695v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-refine-object-segments |
Repo | https://github.com/aby2s/sharpmask |
Framework | tf |
Exploring the Limits of Language Modeling
Title | Exploring the Limits of Language Modeling |
Authors | Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu |
Abstract | In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon. |
Tasks | Language Modelling |
Published | 2016-02-07 |
URL | http://arxiv.org/abs/1602.02410v2 |
http://arxiv.org/pdf/1602.02410v2.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-limits-of-language-modeling |
Repo | https://github.com/IBM/MAX-News-Text-Generator |
Framework | tf |
TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks
Title | TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks |
Authors | Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, Marc Pollefeys |
Abstract | In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under- or overfitting. The main reason for these drawbacks is that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal “canonical” instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines. |
Tasks | |
Published | 2016-04-21 |
URL | http://arxiv.org/abs/1604.06318v2 |
http://arxiv.org/pdf/1604.06318v2.pdf | |
PWC | https://paperswithcode.com/paper/ti-pooling-transformation-invariant-pooling |
Repo | https://github.com/nsavinov/semantic3dnet |
Framework | torch |
Gap Safe screening rules for sparsity enforcing penalties
Title | Gap Safe screening rules for sparsity enforcing penalties |
Authors | Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Joseph Salmon |
Abstract | In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the expected sparsity of the solutions and consequently leading to faster solvers. When the procedure is guaranteed not to discard variables wrongly the rules are said to be safe. In this work, we propose a unifying framework for generalized linear models regularized with standard sparsity enforcing penalties such as $\ell_1$ or $\ell_1/\ell_2$ norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but are particularly well suited to (block) coordinate descent methods. Applied to many standard learning tasks, Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc., we report significant speed-ups compared to previously proposed safe rules on all tested data sets. |
Tasks | |
Published | 2016-11-17 |
URL | http://arxiv.org/abs/1611.05780v4 |
http://arxiv.org/pdf/1611.05780v4.pdf | |
PWC | https://paperswithcode.com/paper/gap-safe-screening-rules-for-sparsity |
Repo | https://github.com/EugeneNdiaye/Gap_Safe_Rules |
Framework | none |
The Freiburg Groceries Dataset
Title | The Freiburg Groceries Dataset |
Authors | Philipp Jund, Nichola Abdo, Andreas Eitel, Wolfram Burgard |
Abstract | With the increasing performance of machine learning techniques in the last few years, the computer vision and robotics communities have created a large number of datasets for benchmarking object recognition tasks. These datasets cover a large spectrum of natural images and object categories, making them not only useful as a testbed for comparing machine learning approaches, but also a great resource for bootstrapping different domain-specific perception and robotic systems. One such domain is domestic environments, where an autonomous robot has to recognize a large variety of everyday objects such as groceries. This is a challenging task due to the large variety of objects and products, and where there is great need for real-world training data that goes beyond product images available online. In this paper, we address this issue and present a dataset consisting of 5,000 images covering 25 different classes of groceries, with at least 97 images per class. We collected all images from real-world settings at different stores and apartments. In contrast to existing groceries datasets, our dataset includes a large variety of perspectives, lighting conditions, and degrees of clutter. Overall, our images contain thousands of different object instances. It is our hope that machine learning and robotics researchers find this dataset of use for training, testing, and bootstrapping their approaches. As a baseline classifier to facilitate comparison, we re-trained the CaffeNet architecture (an adaptation of the well-known AlexNet) on our dataset and achieved a mean accuracy of 78.9%. We release this trained model along with the code and data splits we used in our experiments. |
Tasks | Object Recognition |
Published | 2016-11-17 |
URL | http://arxiv.org/abs/1611.05799v1 |
http://arxiv.org/pdf/1611.05799v1.pdf | |
PWC | https://paperswithcode.com/paper/the-freiburg-groceries-dataset |
Repo | https://github.com/PhilJd/freiburg_groceries_dataset |
Framework | none |
Bi-directional Attention with Agreement for Dependency Parsing
Title | Bi-directional Attention with Agreement for Dependency Parsing |
Authors | Hao Cheng, Hao Fang, Xiaodong He, Jianfeng Gao, Li Deng |
Abstract | We develop a novel bi-directional attention model for dependency parsing, which learns to agree on headword predictions from the forward and backward parsing directions. The parsing procedure for each direction is formulated as sequentially querying the memory component that stores continuous headword embeddings. The proposed parser makes use of {\it soft} headword embeddings, allowing the model to implicitly capture high-order parsing history without dramatically increasing the computational complexity. We conduct experiments on English, Chinese, and 12 other languages from the CoNLL 2006 shared task, showing that the proposed model achieves state-of-the-art unlabeled attachment scores on 6 languages. |
Tasks | Dependency Parsing |
Published | 2016-08-06 |
URL | http://arxiv.org/abs/1608.02076v2 |
http://arxiv.org/pdf/1608.02076v2.pdf | |
PWC | https://paperswithcode.com/paper/bi-directional-attention-with-agreement-for |
Repo | https://github.com/hao-cheng/biattdp |
Framework | none |
Words or Characters? Fine-grained Gating for Reading Comprehension
Title | Words or Characters? Fine-grained Gating for Reading Comprehension |
Authors | Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov |
Abstract | Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension. We present a fine-grained gating mechanism to dynamically combine word-level and character-level representations based on properties of the words. We also extend the idea of fine-grained gating to modeling the interaction between questions and paragraphs for reading comprehension. Experiments show that our approach can improve the performance on reading comprehension tasks, achieving new state-of-the-art results on the Children’s Book Test dataset. To demonstrate the generality of our gating mechanism, we also show improved results on a social media tag prediction task. |
Tasks | Question Answering, Reading Comprehension |
Published | 2016-11-06 |
URL | http://arxiv.org/abs/1611.01724v2 |
http://arxiv.org/pdf/1611.01724v2.pdf | |
PWC | https://paperswithcode.com/paper/words-or-characters-fine-grained-gating-for |
Repo | https://github.com/kimiyoung/fg-gating |
Framework | none |
Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM
Title | Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM |
Authors | Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, Xiaolong Wang |
Abstract | Modeling human conversations is the essence for building satisfying chat-bots with multi-turn dialog ability. Conversation modeling will notably benefit from domain knowledge since the relationships between sentences can be clarified due to semantic hints introduced by knowledge. In this paper, a deep neural network is proposed to incorporate background knowledge for conversation modeling. Through a specially designed Recall gate, domain knowledge can be transformed into the extra global memory of Long Short-Term Memory (LSTM), so as to enhance LSTM by cooperating with its local memory to capture the implicit semantic relevance between sentences within conversations. In addition, this paper introduces the loose structured domain knowledge base, which can be built with slight amount of manual work and easily adopted by the Recall gate. Our model is evaluated on the context-oriented response selecting task, and experimental results on both two datasets have shown that our approach is promising for modeling human conversations and building key components of automatic chatting systems. |
Tasks | |
Published | 2016-05-17 |
URL | http://arxiv.org/abs/1605.05110v2 |
http://arxiv.org/pdf/1605.05110v2.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-loose-structured-knowledge-into |
Repo | https://github.com/JasonForJoy/Leaderboards-for-Multi-Turn-Response-Selection |
Framework | none |
Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text
Title | Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text |
Authors | Ameya Prabhu, Aditya Joshi, Manish Shrivastava, Vasudeva Varma |
Abstract | Sentiment analysis (SA) using code-mixed data from social media has several applications in opinion mining ranging from customer satisfaction to social campaign analysis in multilingual societies. Advances in this area are impeded by the lack of a suitable annotated dataset. We introduce a Hindi-English (Hi-En) code-mixed dataset for sentiment analysis and perform empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media. In this paper, we introduce learning sub-word level representations in LSTM (Subword-LSTM) architecture instead of character-level or word-level representations. This linguistic prior in our architecture enables us to learn the information about sentiment value of important morphemes. This also seems to work well in highly noisy text containing misspellings as shown in our experiments which is demonstrated in morpheme-level feature maps learned by our model. Also, we hypothesize that encoding this linguistic prior in the Subword-LSTM architecture leads to the superior performance. Our system attains accuracy 4-5% greater than traditional approaches on our dataset, and also outperforms the available system for sentiment analysis in Hi-En code-mixed text by 18%. |
Tasks | Opinion Mining, Sentiment Analysis |
Published | 2016-11-02 |
URL | http://arxiv.org/abs/1611.00472v1 |
http://arxiv.org/pdf/1611.00472v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-sub-word-level-compositions-for |
Repo | https://github.com/DrImpossible/Sub-word-LSTM |
Framework | none |
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU
Title | Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU |
Authors | Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan Kautz |
Abstract | We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU’s computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at https://github.com/NVlabs/GA3C . |
Tasks | |
Published | 2016-11-18 |
URL | http://arxiv.org/abs/1611.06256v3 |
http://arxiv.org/pdf/1611.06256v3.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-through-asynchronous |
Repo | https://github.com/NVlabs/GA3C |
Framework | tf |