July 28, 2019

3234 words 16 mins read

Paper Group ANR 274

Paper Group ANR 274

Identifying First-person Camera Wearers in Third-person Videos. Structured Set Matching Networks for One-Shot Part Labeling. Partial Knowledge In Embeddings. Multitask diffusion adaptation over networks with common latent representations. Structural Embedding of Syntactic Trees for Machine Comprehension. Improving the Performance of Online Neural T …

Identifying First-person Camera Wearers in Third-person Videos

Title Identifying First-person Camera Wearers in Third-person Videos
Authors Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo
Abstract We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the camera wearer is not visible from his/her own egocentric video, preventing the use of direct feature matching. In this paper, we propose a new semi-Siamese Convolutional Neural Network architecture to address this novel challenge. We formulate the problem as learning a joint embedding space for first- and third-person videos that considers both spatial- and motion-domain cues. A new triplet loss function is designed to minimize the distance between correct first- and third-person matches while maximizing the distance between incorrect ones. This end-to-end approach performs significantly better than several baselines, in part by learning the first- and third-person features optimized for matching jointly with the distance measure itself.
Tasks Activity Recognition, Object Tracking, Scene Understanding
Published 2017-04-20
URL http://arxiv.org/abs/1704.06340v1
PDF http://arxiv.org/pdf/1704.06340v1.pdf
PWC https://paperswithcode.com/paper/identifying-first-person-camera-wearers-in
Repo
Framework

Structured Set Matching Networks for One-Shot Part Labeling

Title Structured Set Matching Networks for One-Shot Part Labeling
Authors Jonghyun Choi, Jayant Krishnamurthy, Aniruddha Kembhavi, Ali Farhadi
Abstract Diagrams often depict complex phenomena and serve as a good test bed for visual and textual reasoning. However, understanding diagrams using natural image understanding approaches requires large training datasets of diagrams, which are very hard to obtain. Instead, this can be addressed as a matching problem either between labeled diagrams, images or both. This problem is very challenging since the absence of significant color and texture renders local cues ambiguous and requires global reasoning. We consider the problem of one-shot part labeling: labeling multiple parts of an object in a target image given only a single source image of that category. For this set-to-set matching problem, we introduce the Structured Set Matching Network (SSMN), a structured prediction model that incorporates convolutional neural networks. The SSMN is trained using global normalization to maximize local match scores between corresponding elements and a global consistency score among all matched elements, while also enforcing a matching constraint between the two sets. The SSMN significantly outperforms several strong baselines on three label transfer scenarios: diagram-to-diagram, evaluated on a new diagram dataset of over 200 categories; image-to-image, evaluated on a dataset built on top of the Pascal Part Dataset; and image-to-diagram, evaluated on transferring labels across these datasets.
Tasks Structured Prediction
Published 2017-12-05
URL http://arxiv.org/abs/1712.01867v2
PDF http://arxiv.org/pdf/1712.01867v2.pdf
PWC https://paperswithcode.com/paper/structured-set-matching-networks-for-one-shot
Repo
Framework

Partial Knowledge In Embeddings

Title Partial Knowledge In Embeddings
Authors Ramanathan V. Guha
Abstract Representing domain knowledge is crucial for any task. There has been a wide range of techniques developed to represent this knowledge, from older logic based approaches to the more recent deep learning based techniques (i.e. embeddings). In this paper, we discuss some of these methods, focusing on the representational expressiveness tradeoffs that are often made. In particular, we focus on the the ability of various techniques to encode partial knowledge' - a key component of successful knowledge systems. We introduce and describe the concepts of ensembles of embeddings’ and `aggregate embeddings’ and demonstrate how they allow for partial knowledge. |
Tasks
Published 2017-10-28
URL http://arxiv.org/abs/1710.10538v1
PDF http://arxiv.org/pdf/1710.10538v1.pdf
PWC https://paperswithcode.com/paper/partial-knowledge-in-embeddings
Repo
Framework

Multitask diffusion adaptation over networks with common latent representations

Title Multitask diffusion adaptation over networks with common latent representations
Authors Jie Chen, Cédric Richard, Ali H. Sayed
Abstract Online learning with streaming data in a distributed and collaborative manner can be useful in a wide range of applications. This topic has been receiving considerable attention in recent years with emphasis on both single-task and multitask scenarios. In single-task adaptation, agents cooperate to track an objective of common interest, while in multitask adaptation agents track multiple objectives simultaneously. Regularization is one useful technique to promote and exploit similarity among tasks in the latter scenario. This work examines an alternative way to model relations among tasks by assuming that they all share a common latent feature representation. As a result, a new multitask learning formulation is presented and algorithms are developed for its solution in a distributed online manner. We present a unified framework to analyze the mean-square-error performance of the adaptive strategies, and conduct simulations to illustrate the theoretical findings and potential applications.
Tasks
Published 2017-02-13
URL http://arxiv.org/abs/1702.03614v1
PDF http://arxiv.org/pdf/1702.03614v1.pdf
PWC https://paperswithcode.com/paper/multitask-diffusion-adaptation-over-networks
Repo
Framework

Structural Embedding of Syntactic Trees for Machine Comprehension

Title Structural Embedding of Syntactic Trees for Machine Comprehension
Authors Rui Liu, Junjie Hu, Wei Wei, Zi Yang, Eric Nyberg
Abstract Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we propose structural embedding of syntactic trees (SEST), an algorithm framework to utilize structured information and encode them into vector representations that can boost the performance of algorithms for the machine comprehension. We evaluate our approach using a state-of-the-art neural attention model on the SQuAD dataset. Experimental results demonstrate that our model can accurately identify the syntactic boundaries of the sentences and extract answers that are syntactically coherent over the baseline methods.
Tasks Question Answering, Reading Comprehension
Published 2017-03-02
URL http://arxiv.org/abs/1703.00572v3
PDF http://arxiv.org/pdf/1703.00572v3.pdf
PWC https://paperswithcode.com/paper/structural-embedding-of-syntactic-trees-for
Repo
Framework

Improving the Performance of Online Neural Transducer Models

Title Improving the Performance of Online Neural Transducer Models
Authors Tara N. Sainath, Chung-Cheng Chiu, Rohit Prabhavalkar, Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Zhifeng Chen
Abstract Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to non-streaming models such as Listen, Attend and Spell (LAS). In this paper, we present various improvements to NT. Specifically, we look at increasing the window over which NT computes attention, mainly by looking backwards in time so the model still remains online. In addition, we explore initializing a NT model from a LAS-trained model so that it is guided with a better alignment. Finally, we explore including stronger language models such as using wordpiece models, and applying an external LM during the beam search. On a Voice Search task, we find with these improvements we can get NT to match the performance of LAS.
Tasks
Published 2017-12-05
URL http://arxiv.org/abs/1712.01807v1
PDF http://arxiv.org/pdf/1712.01807v1.pdf
PWC https://paperswithcode.com/paper/improving-the-performance-of-online-neural
Repo
Framework

Autonomous Quadrotor Landing using Deep Reinforcement Learning

Title Autonomous Quadrotor Landing using Deep Reinforcement Learning
Authors Riccardo Polvara, Massimiliano Patacchiola, Sanjay Sharma, Jian Wan, Andrew Manning, Robert Sutton, Angelo Cangelosi
Abstract Landing an unmanned aerial vehicle (UAV) on a ground marker is an open problem despite the effort of the research community. Previous attempts mostly focused on the analysis of hand-crafted geometric features and the use of external sensors in order to allow the vehicle to approach the land-pad. In this article, we propose a method based on deep reinforcement learning that only requires low-resolution images taken from a down-looking camera in order to identify the position of the marker and land the UAV on it. The proposed approach is based on a hierarchy of Deep Q-Networks (DQNs) used as high-level control policy for the navigation toward the marker. We implemented different technical solutions, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Using domain randomization we trained the vehicle on uniform textures and we tested it on a large variety of simulated and real-world environments. The overall performance is comparable with a state-of-the-art algorithm and human pilots.
Tasks
Published 2017-09-11
URL http://arxiv.org/abs/1709.03339v3
PDF http://arxiv.org/pdf/1709.03339v3.pdf
PWC https://paperswithcode.com/paper/autonomous-quadrotor-landing-using-deep
Repo
Framework

Not all bytes are equal: Neural byte sieve for fuzzing

Title Not all bytes are equal: Neural byte sieve for fuzzing
Authors Mohit Rajpal, William Blum, Rishabh Singh
Abstract Fuzzing is a popular dynamic program analysis technique used to find vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input designed to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and often the best approach to generating such inputs is through applying uniform random mutations to pre-existing valid inputs (seed files). We present a learning technique that uses neural networks to learn patterns in the input files from past fuzzing explorations to guide future fuzzing explorations. In particular, the neural models learn a function to predict good (and bad) locations in input files to perform fuzzing mutations based on the past mutations and corresponding code coverage information. We implement several neural models including LSTMs and sequence-to-sequence models that can encode variable length input files. We incorporate our models in the state-of-the-art AFL (American Fuzzy Lop) fuzzer and show significant improvements in terms of code coverage, unique code paths, and crashes for various input formats including ELF, PNG, PDF, and XML.
Tasks
Published 2017-11-10
URL http://arxiv.org/abs/1711.04596v1
PDF http://arxiv.org/pdf/1711.04596v1.pdf
PWC https://paperswithcode.com/paper/not-all-bytes-are-equal-neural-byte-sieve-for
Repo
Framework

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

Title Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
Authors Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Joon Kyung Kim, Vikas Chandra, Hadi Esmaeilzadeh
Abstract Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of BitFusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss and Stripes. In the same area, frequency, and process technology, BitFusion offers 3.9x speedup and 5.1x energy savings over Eyeriss. Compared to Stripes, BitFusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, BitFusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while BitFusion merely consumes 895 milliwatts of power.
Tasks
Published 2017-12-05
URL http://arxiv.org/abs/1712.01507v2
PDF http://arxiv.org/pdf/1712.01507v2.pdf
PWC https://paperswithcode.com/paper/bit-fusion-bit-level-dynamically-composable
Repo
Framework

An Online Learning Approach to Generative Adversarial Networks

Title An Online Learning Approach to Generative Adversarial Networks
Authors Paulina Grnarova, Kfir Y. Levy, Aurelien Lucchi, Thomas Hofmann, Andreas Krause
Abstract We consider the problem of training generative models with a Generative Adversarial Network (GAN). Although GANs can accurately model complex distributions, they are known to be difficult to train due to instabilities caused by a difficult minimax optimization problem. In this paper, we view the problem of training GANs as finding a mixed strategy in a zero-sum game. Building on ideas from online learning we propose a novel training method named Chekhov GAN 1 . On the theory side, we show that our method provably converges to an equilibrium for semi-shallow GAN architectures, i.e. architectures where the discriminator is a one layer network and the generator is arbitrary. On the practical side, we develop an efficient heuristic guided by our theoretical results, which we apply to commonly used deep GAN architectures. On several real world tasks our approach exhibits improved stability and performance compared to standard GAN training.
Tasks
Published 2017-06-10
URL http://arxiv.org/abs/1706.03269v1
PDF http://arxiv.org/pdf/1706.03269v1.pdf
PWC https://paperswithcode.com/paper/an-online-learning-approach-to-generative
Repo
Framework

Collaborative Filtering using Denoising Auto-Encoders for Market Basket Data

Title Collaborative Filtering using Denoising Auto-Encoders for Market Basket Data
Authors Andres G. Abad, Luis I. Reyes-Castro
Abstract Recommender systems (RS) help users navigate large sets of items in the search for “interesting” ones. One approach to RS is Collaborative Filtering (CF), which is based on the idea that similar users are interested in similar items. Most model-based approaches to CF seek to train a machine-learning/data-mining model based on sparse data; the model is then used to provide recommendations. While most of the proposed approaches are effective for small-size situations, the combinatorial nature of the problem makes it impractical for medium-to-large instances. In this work we present a novel approach to CF that works by training a Denoising Auto-Encoder (DAE) on corrupted baskets, i.e., baskets from which one or more items have been removed. The DAE is then forced to learn to reconstruct the original basket given its corrupted input. Due to recent advancements in optimization and other technologies for training neural-network models (such as DAE), the proposed method results in a scalable and practical approach to CF. The contribution of this work is twofold: (1) to identify missing items in observed baskets and, thus, directly providing a CF model; and, (2) to construct a generative model of baskets which may be used, for instance, in simulation analysis or as part of a more complex analytical method.
Tasks Denoising, Recommendation Systems
Published 2017-08-14
URL http://arxiv.org/abs/1708.04312v1
PDF http://arxiv.org/pdf/1708.04312v1.pdf
PWC https://paperswithcode.com/paper/collaborative-filtering-using-denoising-auto
Repo
Framework

Impact of Feature Selection on Micro-Text Classification

Title Impact of Feature Selection on Micro-Text Classification
Authors Ankit Vadehra, Maura R. Grossman, Gordon V. Cormack
Abstract Social media datasets, especially Twitter tweets, are popular in the field of text classification. Tweets are a valuable source of micro-text (sometimes referred to as “micro-blogs”), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others. Tweets often include keywords referred to as “Hashtags” that can be used as labels for the tweet. Using tweets encompassing 50 labels, we studied the impact of word versus character-level feature selection and extraction on different learners to solve a multi-class classification task. We show that feature extraction of simple character-level groups performs better than simple word groups and pre-processing methods like normalizing using Porter’s Stemming and Part-of-Speech (“POS”)-Lemmatization.
Tasks Feature Selection, Lemmatization, Recommendation Systems, Sentiment Analysis, Text Classification
Published 2017-08-27
URL http://arxiv.org/abs/1708.08123v1
PDF http://arxiv.org/pdf/1708.08123v1.pdf
PWC https://paperswithcode.com/paper/impact-of-feature-selection-on-micro-text
Repo
Framework

KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools

Title KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools
Authors Tharindu Weerasooriya, Nandula Perera, S. R. Liyanage
Abstract Since a tweet is limited to 140 characters, it is ambiguous and difficult for traditional Natural Language Processing (NLP) tools to analyse. This research presents KeyXtract which enhances the machine learning based Stanford CoreNLP Part-of-Speech (POS) tagger with the Twitter model to extract essential keywords from a tweet. The system was developed using rule-based parsers and two corpora. The data for the research was obtained from a Twitter profile of a telecommunication company. The system development consisted of two stages. At the initial stage, a domain specific corpus was compiled after analysing the tweets. The POS tagger extracted the Noun Phrases and Verb Phrases while the parsers removed noise and extracted any other keywords missed by the POS tagger. The system was evaluated using the Turing Test. After it was tested and compared against Stanford CoreNLP, the second stage of the system was developed addressing the shortcomings of the first stage. It was enhanced using Named Entity Recognition and Lemmatization. The second stage was also tested using the Turing test and its pass rate increased from 50.00% to 83.33%. The performance of the final system output was measured using the F1 score. Stanford CoreNLP with the Twitter model had an average F1 of 0.69 while the improved system had a F1 of 0.77. The accuracy of the system could be improved by using a complete domain specific corpus. Since the system used linguistic features of a sentence, it could be applied to other NLP tools.
Tasks Lemmatization, Named Entity Recognition
Published 2017-08-09
URL http://arxiv.org/abs/1708.02912v1
PDF http://arxiv.org/pdf/1708.02912v1.pdf
PWC https://paperswithcode.com/paper/keyxtract-twitter-model-an-essential-keywords
Repo
Framework

Gradient-free Policy Architecture Search and Adaptation

Title Gradient-free Policy Architecture Search and Adaptation
Authors Sayna Ebrahimi, Anna Rohrbach, Trevor Darrell
Abstract We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent’s lifetime as it learns to drive in a realistic simulated environment.
Tasks Autonomous Driving, Neural Architecture Search
Published 2017-10-16
URL http://arxiv.org/abs/1710.05958v1
PDF http://arxiv.org/pdf/1710.05958v1.pdf
PWC https://paperswithcode.com/paper/gradient-free-policy-architecture-search-and
Repo
Framework

View Selection with Geometric Uncertainty Modeling

Title View Selection with Geometric Uncertainty Modeling
Authors Cheng Peng, Volkan Isler
Abstract Estimating positions of world points from features observed in images is a key problem in 3D reconstruction, image mosaicking,simultaneous localization and mapping and structure from motion. We consider a special instance in which there is a dominant ground plane $\mathcal{G}$ viewed from a parallel viewing plane $\mathcal{S}$ above it. Such instances commonly arise, for example, in aerial photography. Consider a world point $g \in \mathcal{G}$ and its worst case reconstruction uncertainty $\varepsilon(g,\mathcal{S})$ obtained by merging \emph{all} possible views of $g$ chosen from $\mathcal{S}$. We first show that one can pick two views $s_p$ and $s_q$ such that the uncertainty $\varepsilon(g,{s_p,s_q})$ obtained using only these two views is almost as good as (i.e. within a small constant factor of) $\varepsilon(g,\mathcal{S})$. Next, we extend the result to the entire ground plane $\mathcal{G}$ and show that one can pick a small subset of $\mathcal{S’} \subseteq \mathcal{S}$ (which grows only linearly with the area of $\mathcal{G}$) and still obtain a constant factor approximation, for every point $g \in \mathcal{G}$, to the minimum worst case estimate obtained by merging all views in $\mathcal{S}$. Finally, we present a multi-resolution view selection method which extends our techniques to non-planar scenes. We show that the method can produce rich and accurate dense reconstructions with a small number of views. Our results provide a view selection mechanism with provable performance guarantees which can drastically increase the speed of scene reconstruction algorithms. In addition to theoretical results, we demonstrate their effectiveness in an application where aerial imagery is used for monitoring farms and orchards.
Tasks 3D Reconstruction, Simultaneous Localization and Mapping
Published 2017-03-31
URL http://arxiv.org/abs/1704.00085v2
PDF http://arxiv.org/pdf/1704.00085v2.pdf
PWC https://paperswithcode.com/paper/view-selection-with-geometric-uncertainty
Repo
Framework
comments powered by Disqus