October 21, 2019

3156 words 15 mins read

Paper Group AWR 122

Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors. Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware. Applying Deep Learning To Airbnb Search. Fine-grained Activity Recognition in Baseball Videos. Commonsense for Generative Multi-Hop Question Answering Tasks. Visual Reasoning by Progressive Module Networks. In …

Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors


Title	Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors
Authors	Andrew Ilyas, Logan Engstrom, Aleksander Madry
Abstract	We study the problem of generating adversarial examples in a black-box setting in which only loss-oracle access to a model is available. We introduce a framework that conceptually unifies much of the existing work on black-box attacks, and we demonstrate that the current state-of-the-art methods are optimal in a natural sense. Despite this optimality, we show how to improve black-box attacks by bringing a new element into the problem: gradient priors. We give a bandit optimization-based algorithm that allows us to seamlessly integrate any such priors, and we explicitly identify and incorporate two examples. The resulting methods use two to four times fewer queries and fail two to five times less often than the current state-of-the-art.
Tasks
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07978v3
PDF	http://arxiv.org/pdf/1807.07978v3.pdf
PWC	https://paperswithcode.com/paper/prior-convictions-black-box-adversarial
Repo	https://github.com/mllab-adv-attack/lazy-attack
Framework	tf

Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware


Title	Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware
Authors	Peter Blouw, Xuan Choo, Eric Hunsberger, Chris Eliasmith
Abstract	Using Intel’s Loihi neuromorphic research chip and ABR’s Nengo Deep Learning toolkit, we analyze the inference speed, dynamic power consumption, and energy cost per inference of a two-layer neural network keyword spotter trained to recognize a single phrase. We perform comparative analyses of this keyword spotter running on more conventional hardware devices including a CPU, a GPU, Nvidia’s Jetson TX1, and the Movidius Neural Compute Stick. Our results indicate that for this inference application, Loihi outperforms all of these alternatives on an energy cost per inference basis while maintaining equivalent inference accuracy. Furthermore, an analysis of tradeoffs between network size, inference speed, and energy cost indicates that Loihi’s comparative advantage over other low-power computing devices improves for larger networks.
Tasks	Keyword Spotting
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01739v2
PDF	http://arxiv.org/pdf/1812.01739v2.pdf
PWC	https://paperswithcode.com/paper/benchmarking-keyword-spotting-efficiency-on
Repo	https://github.com/abr/power_benchmarks
Framework	tf

Applying Deep Learning To Airbnb Search


Title	Applying Deep Learning To Airbnb Search
Authors	Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Yang, Huizhong Duan, Qing Zhang, Nick Barrow-Williams, Bradley C. Turnbull, Brendan M. Collins, Thomas Legrand
Abstract	The application to search ranking is one of the biggest machine learning success stories at Airbnb. Much of the initial gains were driven by a gradient boosted decision tree model. The gains, however, plateaued over time. This paper discusses the work done in applying neural networks in an attempt to break out of that plateau. We present our perspective not with the intention of pushing the frontier of new modeling techniques. Instead, ours is a story of the elements we found useful in applying neural networks to a real life product. Deep learning was steep learning for us. To other teams embarking on similar journeys, we hope an account of our struggles and triumphs will provide some useful pointers. Bon voyage!
Tasks
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09591v2
PDF	http://arxiv.org/pdf/1810.09591v2.pdf
PWC	https://paperswithcode.com/paper/applying-deep-learning-to-airbnb-search
Repo	https://github.com/SachaIZADI/Misc-Machine-Learning
Framework	tf

Fine-grained Activity Recognition in Baseball Videos


Title	Fine-grained Activity Recognition in Baseball Videos
Authors	AJ Piergiovanni, Michael S. Ryoo
Abstract	In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
Tasks	Action Detection, Activity Detection, Activity Recognition, Video Classification
Published	2018-04-09
URL	http://arxiv.org/abs/1804.03247v1
PDF	http://arxiv.org/pdf/1804.03247v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-activity-recognition-in-baseball
Repo	https://github.com/piergiaj/mlb-youtube
Framework	pytorch

Commonsense for Generative Multi-Hop Question Answering Tasks


Title	Commonsense for Generative Multi-Hop Question Answering Tasks
Authors	Lisa Bauer, Yicheng Wang, Mohit Bansal
Abstract	Reading comprehension QA tasks have seen a recent surge in popularity, yet most works have focused on fact-finding extractive QA. We instead focus on a more challenging multi-hop generative task (NarrativeQA), which requires the model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer. This type of multi-step reasoning also often requires understanding implicit relations, which humans resolve via external, background commonsense knowledge. We first present a strong generative baseline that uses a multi-attention mechanism to perform multiple hops of reasoning and a pointer-generator decoder to synthesize the answer. This model performs substantially better than previous generative models, and is competitive with current state-of-the-art span prediction models. We next introduce a novel system for selecting grounded multi-hop relational commonsense information from ConceptNet via a pointwise mutual information and term-frequency based scoring function. Finally, we effectively use this extracted commonsense information to fill in gaps of reasoning between context hops, using a selectively-gated attention mechanism. This boosts the model’s performance significantly (also verified via human evaluation), establishing a new state-of-the-art for the task. We also show promising initial results of the generalizability of our background knowledge enhancements by demonstrating some improvement on QAngaroo-WikiHop, another multi-hop reasoning dataset.
Tasks	Question Answering, Reading Comprehension
Published	2018-09-17
URL	https://arxiv.org/abs/1809.06309v3
PDF	https://arxiv.org/pdf/1809.06309v3.pdf
PWC	https://paperswithcode.com/paper/commonsense-for-generative-multi-hop-question
Repo	https://github.com/yicheng-w/CommonSenseMultiHopQA
Framework	tf

Visual Reasoning by Progressive Module Networks


Title	Visual Reasoning by Progressive Module Networks
Authors	Seung Wook Kim, Makarand Tapaswi, Sanja Fidler
Abstract	Humans learn to solve tasks of increasing complexity by building on top of previously acquired knowledge. Typically, there exists a natural progression in the tasks that we learn - most do not require completely independent solutions, but can be broken down into simpler subtasks. We propose to represent a solver for each task as a neural module that calls existing modules (solvers for simpler tasks) in a functional program-like manner. Lower modules are a black box to the calling module, and communicate only via a query and an output. Thus, a module for a new task learns to query existing modules and composes their outputs in order to produce its own output. Our model effectively combines previous skill-sets, does not suffer from forgetting, and is fully differentiable. We test our model in learning a set of visual reasoning tasks, and demonstrate improved performances in all tasks by learning progressively. By evaluating the reasoning process using human judges, we show that our model is more interpretable than an attention-based baseline.
Tasks	Visual Reasoning
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02453v2
PDF	http://arxiv.org/pdf/1806.02453v2.pdf
PWC	https://paperswithcode.com/paper/visual-reasoning-by-progressive-module
Repo	https://github.com/seung-kim/pmn_demo
Framework	pytorch

Inferring transportation modes from GPS trajectories using a convolutional neural network


Title	Inferring transportation modes from GPS trajectories using a convolutional neural network
Authors	Sina Dabiri, Kevin Heaslip
Abstract	Identifying the distribution of users’ transportation modes is an essential part of travel demand analysis and transportation planning. With the advent of ubiquitous GPS-enabled devices (e.g., a smartphone), a cost-effective approach for inferring commuters’ mobility mode(s) is to leverage their GPS trajectories. A majority of studies have proposed mode inference models based on hand-crafted features and traditional machine learning algorithms. However, manual features engender some major drawbacks including vulnerability to traffic and environmental conditions as well as possessing human’s bias in creating efficient features. One way to overcome these issues is by utilizing Convolutional Neural Network (CNN) schemes that are capable of automatically driving high-level features from the raw input. Accordingly, in this paper, we take advantage of CNN architectures so as to predict travel modes based on only raw GPS trajectories, where the modes are labeled as walk, bike, bus, driving, and train. Our key contribution is designing the layout of the CNN’s input layer in such a way that not only is adaptable with the CNN schemes but represents fundamental motion characteristics of a moving object including speed, acceleration, jerk, and bearing rate. Furthermore, we ameliorate the quality of GPS logs through several data preprocessing steps. Using the clean input layer, a variety of CNN configurations are evaluated to achieve the best CNN architecture. The highest accuracy of 84.8% has been achieved through the ensemble of the best CNN configuration. In this research, we contrast our methodology with traditional machine learning algorithms as well as the seminal and most related studies to demonstrate the superiority of our framework.
Tasks
Published	2018-04-05
URL	http://arxiv.org/abs/1804.02386v1
PDF	http://arxiv.org/pdf/1804.02386v1.pdf
PWC	https://paperswithcode.com/paper/inferring-transportation-modes-from-gps
Repo	https://github.com/PatrickMotylinski/LBCPI-project
Framework	none

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image


Title	LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image
Authors	Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem
Abstract	We propose an algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. L-shape room). Our method operates directly on the panoramic image, rather than decomposing into perspective images as do recent works. Our network architecture is similar to that of RoomNet, but we show improvements due to aligning the image based on vanishing points, predicting multiple layout elements (corners, boundaries, size and translation), and fitting a constrained Manhattan layout to the resulting predictions. Our method compares well in speed and accuracy to other existing work on panoramas, achieves among the best accuracy for perspective images, and can handle both cuboid-shaped and more general Manhattan layouts.
Tasks	3D Room Layouts From A Single Rgb Panorama
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08999v1
PDF	http://arxiv.org/pdf/1803.08999v1.pdf
PWC	https://paperswithcode.com/paper/layoutnet-reconstructing-the-3d-room-layout
Repo	https://github.com/zouchuhang/LayoutNet
Framework	pytorch

STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework


Title	STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework
Authors	Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, Haifeng Wang
Abstract	Simultaneous translation, which translates sentences before they are finished, is useful in many scenarios but is notoriously difficult due to word-order differences. While the conventional seq-to-seq framework is only suitable for full-sentence translation, we propose a novel prefix-to-prefix framework for simultaneous translation that implicitly learns to anticipate in a single translation model. Within this framework, we present a very simple yet surprisingly effective wait-k policy trained to generate the target sentence concurrently with the source sentence, but always k words behind. Experiments show our strategy achieves low latency and reasonable quality (compared to full-sentence translation) on 4 directions: zh<->en and de<->en.
Tasks
Published	2018-10-19
URL	https://arxiv.org/abs/1810.08398v5
PDF	https://arxiv.org/pdf/1810.08398v5.pdf
PWC	https://paperswithcode.com/paper/stacl-simultaneous-translation-with
Repo	https://github.com/SimulTrans-demo/STACL
Framework	none

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs


Title	Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
Authors	Xiaolong Wang, Yufei Ye, Abhinav Gupta
Abstract	We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this paper, we build upon the recently introduced Graph Convolutional Network (GCN) and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category). After a series of graph convolutions, we predict the visual classifier for each category. During training, the visual classifiers for a few categories are given to learn the GCN parameters. At test time, these filters are used to predict the visual classifiers of unseen categories. We show that our approach is robust to noise in the KG. More importantly, our approach provides significant improvement in performance compared to the current state-of-the-art results (from 2 ~ 3% on some metrics to whopping 20% on a few).
Tasks	Knowledge Graphs, Zero-Shot Learning
Published	2018-03-21
URL	http://arxiv.org/abs/1803.08035v2
PDF	http://arxiv.org/pdf/1803.08035v2.pdf
PWC	https://paperswithcode.com/paper/zero-shot-recognition-via-semantic-embeddings
Repo	https://github.com/JudyYe/zero-shot-gcn
Framework	tf

News Session-Based Recommendations using Deep Neural Networks


Title	News Session-Based Recommendations using Deep Neural Networks
Authors	Gabriel de Souza P. Moreira, Felipe Ferreira, Adilson Marques da Cunha
Abstract	News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling, fast growing number of items, accelerated item’s value decay, and users preferences dynamic shift. Some promising results have been recently achieved by the usage of Deep Learning techniques on Recommender Systems, specially for item’s feature extraction and for session-based recommendations with Recurrent Neural Networks. In this paper, it is proposed an instantiation of the CHAMELEON – a Deep Learning Meta-Architecture for News Recommender Systems. This architecture is composed of two modules, the first responsible to learn news articles representations, based on their text and metadata, and the second module aimed to provide session-based recommendations using Recurrent Neural Networks. The recommendation task addressed in this work is next-item prediction for users sessions: “what is the next most likely article a user might read in a session?” Users sessions context is leveraged by the architecture to provide additional information in such extreme cold-start scenario of news recommendation. Users’ behavior and item features are both merged in an hybrid recommendation approach. A temporal offline evaluation method is also proposed as a complementary contribution, for a more realistic evaluation of such task, considering dynamic factors that affect global readership interests like popularity, recency, and seasonality. Experiments with an extensive number of session-based recommendation methods were performed and the proposed instantiation of CHAMELEON meta-architecture obtained a significant relative improvement in top-n accuracy and ranking metrics (10% on Hit Rate and 13% on MRR) over the best benchmark methods.
Tasks	Recommendation Systems, Session-Based Recommendations
Published	2018-07-31
URL	http://arxiv.org/abs/1808.00076v3
PDF	http://arxiv.org/pdf/1808.00076v3.pdf
PWC	https://paperswithcode.com/paper/news-session-based-recommendations-using-deep
Repo	https://github.com/gabrielspmoreira/chameleon_recsys
Framework	tf

Stochastic algorithms with descent guarantees for ICA


Title	Stochastic algorithms with descent guarantees for ICA
Authors	Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, Francis Bach
Abstract	Independent component analysis (ICA) is a widespread data exploration technique, where observed signals are modeled as linear mixtures of independent components. From a machine learning point of view, it amounts to a matrix factorization problem with a statistical independence criterion. Infomax is one of the most used ICA algorithms. It is based on a loss function which is a non-convex log-likelihood. We develop a new majorization-minimization framework adapted to this loss function. We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits. First, unlike most algorithms found in the literature, the proposed methods do not rely on any critical hyper-parameter like a step size, nor do they require a line-search technique. Second, the algorithm for the finite sum setting, although stochastic, guarantees a decrease of the loss function at each iteration. Experiments demonstrate progress on the state-of-the-art for large scale datasets, without the necessity for any manual parameter tuning.
Tasks
Published	2018-05-25
URL	https://arxiv.org/abs/1805.10054v2
PDF	https://arxiv.org/pdf/1805.10054v2.pdf
PWC	https://paperswithcode.com/paper/em-algorithms-for-ica
Repo	https://github.com/pierreablin/mmica
Framework	none

On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network


Title	On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network
Authors	Yuanhang Su, C. -C. Jay Kuo
Abstract	In this work, we first analyze the memory behavior in three recurrent neural networks (RNN) cells; namely, the simple RNN (SRN), the long short-term memory (LSTM) and the gated recurrent unit (GRU), where the memory is defined as a function that maps previous elements in a sequence to the current output. Our study shows that all three of them suffer rapid memory decay. Then, to alleviate this effect, we introduce trainable scaling factors that act like an attention mechanism to adjust memory decay adaptively. The new design is called the extended LSTM (ELSTM). Finally, to design a system that is robust to previous erroneous predictions, we propose a dependent bidirectional recurrent neural network (DBRNN). Extensive experiments are conducted on different language tasks to demonstrate the superiority of the proposed ELSTM and DBRNN solutions. The ELTSM has achieved up to 30% increase in the labeled attachment score (LAS) as compared to LSTM and GRU in the dependency parsing (DP) task. Our models also outperform other state-of-the-art models such as bi-attention and convolutional sequence to sequence (convseq2seq) by close to 10% in the LAS. The code is released as an open source (https://github.com/yuanhangsu/ELSTM-DBRNN)
Tasks	Dependency Parsing
Published	2018-02-27
URL	https://arxiv.org/abs/1803.01686v5
PDF	https://arxiv.org/pdf/1803.01686v5.pdf
PWC	https://paperswithcode.com/paper/on-extended-long-short-term-memory-and
Repo	https://github.com/yuanhangsu/ELSTM-DBRNN
Framework	tf

GADGET SVM: A Gossip-bAseD sub-GradiEnT Solver for Linear SVMs


Title	GADGET SVM: A Gossip-bAseD sub-GradiEnT Solver for Linear SVMs
Authors	Haimonti Dutta, Nitin Nataraj
Abstract	In the era of big data, an important weapon in a machine learning researcher’s arsenal is a scalable Support Vector Machine (SVM) algorithm. SVMs are extensively used for solving classification problems. Traditional algorithms for learning SVMs often scale super linearly with training set size which becomes infeasible very quickly for large data sets. In recent years, scalable algorithms have been designed which study the primal or dual formulations of the problem. This often suggests a way to decompose the problem and facilitate development of distributed algorithms. In this paper, we present a distributed algorithm for learning linear Support Vector Machines in the primal form for binary classification called Gossip-bAseD sub-GradiEnT (GADGET) SVM. The algorithm is designed such that it can be executed locally on nodes of a distributed system. Each node processes its local homogeneously partitioned data and learns a primal SVM model. It then gossips with random neighbors about the classifier learnt and uses this information to update the model. Extensive theoretical and empirical results suggest that this anytime algorithm has performance comparable to its centralized and online counterparts.
Tasks
Published	2018-12-05
URL	http://arxiv.org/abs/1812.02261v1
PDF	http://arxiv.org/pdf/1812.02261v1.pdf
PWC	https://paperswithcode.com/paper/gadget-svm-a-gossip-based-sub-gradient-solver
Repo	https://github.com/nitinnat/GADGET
Framework	none

Lightweight Probabilistic Deep Networks


Title	Lightweight Probabilistic Deep Networks
Authors	Jochen Gast, Stefan Roth
Abstract	Even though probabilistic treatments of neural networks have a long history, they have not found widespread use in practice. Sampling approaches are often too slow already for simple networks. The size of the inputs and the depth of typical CNN architectures in computer vision only compound this problem. Uncertainty in neural networks has thus been largely ignored in practice, despite the fact that it may provide important information about the reliability of predictions and the inner workings of the network. In this paper, we introduce two lightweight approaches to making supervised learning with probabilistic deep networks practical: First, we suggest probabilistic output layers for classification and regression that require only minimal changes to existing networks. Second, we employ assumed density filtering and show that activation uncertainties can be propagated in a practical fashion through the entire network, again with minor changes. Both probabilistic networks retain the predictive power of the deterministic counterpart, but yield uncertainties that correlate well with the empirical error induced by their predictions. Moreover, the robustness to adversarial examples is significantly increased.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11327v1
PDF	http://arxiv.org/pdf/1805.11327v1.pdf
PWC	https://paperswithcode.com/paper/lightweight-probabilistic-deep-networks
Repo	https://github.com/mattiasegu/uncertainty_estimation_deep_learning
Framework	pytorch