July 27, 2019

2943 words 14 mins read

Paper Group ANR 477

HoME: a Household Multimodal Environment. Sharp Bounds for Generalized Uniformity Testing. Learning Interpretable Spatial Operations in a Rich 3D Blocks World. Autonomous Braking System via Deep Reinforcement Learning. Mosquito Detection with Neural Networks: The Buzz of Deep Learning. Human Action Recognition System using Good Features and Multila …

HoME: a Household Multimodal Environment


Title	HoME: a Household Multimodal Environment
Authors	Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville
Abstract	We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.
Tasks
Published	2017-11-29
URL	http://arxiv.org/abs/1711.11017v1
PDF	http://arxiv.org/pdf/1711.11017v1.pdf
PWC	https://paperswithcode.com/paper/home-a-household-multimodal-environment
Repo
Framework

Sharp Bounds for Generalized Uniformity Testing


Title	Sharp Bounds for Generalized Uniformity Testing
Authors	Ilias Diakonikolas, Daniel M. Kane, Alistair Stewart
Abstract	We study the problem of generalized uniformity testing \cite{BC17} of a discrete probability distribution: Given samples from a probability distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we want to distinguish, with probability at least $2/3$, between the case that $p$ is uniform on some {\em subset} of $\mathbf{\Omega}$ versus $\epsilon$-far, in total variation distance, from any such uniform distribution. We establish tight bounds on the sample complexity of generalized uniformity testing. In more detail, we present a computationally efficient tester whose sample complexity is optimal, up to constant factors, and a matching information-theoretic lower bound. Specifically, we show that the sample complexity of generalized uniformity testing is $\Theta\left(1/(\epsilon^{4/3}\p_3) + 1/(\epsilon^{2} \p_2) \right)$.
Tasks
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02087v1
PDF	http://arxiv.org/pdf/1709.02087v1.pdf
PWC	https://paperswithcode.com/paper/sharp-bounds-for-generalized-uniformity
Repo
Framework

Learning Interpretable Spatial Operations in a Rich 3D Blocks World


Title	Learning Interpretable Spatial Operations in a Rich 3D Blocks World
Authors	Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu
Abstract	In this paper, we study the problem of mapping natural language instructions to complex spatial actions in a 3D blocks world. We first introduce a new dataset that pairs complex 3D spatial operations to rich natural language descriptions that require complex spatial and pragmatic interpretations such as “mirroring”, “twisting”, and “balancing”. This dataset, built on the simulation environment of Bisk, Yuret, and Marcu (2016), attains language that is significantly richer and more complex, while also doubling the size of the original dataset in the 2D environment with 100 new world configurations and 250,000 tokens. In addition, we propose a new neural architecture that achieves competitive results while automatically discovering an inventory of interpretable spatial operations (Figure 5)
Tasks
Published	2017-12-10
URL	http://arxiv.org/abs/1712.03463v2
PDF	http://arxiv.org/pdf/1712.03463v2.pdf
PWC	https://paperswithcode.com/paper/learning-interpretable-spatial-operations-in
Repo
Framework

Autonomous Braking System via Deep Reinforcement Learning


Title	Autonomous Braking System via Deep Reinforcement Learning
Authors	Hyunmin Chae, Chang Mook Kang, ByeoungDo Kim, Jaekyum Kim, Chung Choo Chung, Jun Won Choi
Abstract	In this paper, we propose a new autonomous braking system based on deep reinforcement learning. The proposed autonomous braking system automatically decides whether to apply the brake at each time step when confronting the risk of collision using the information on the obstacle obtained by the sensors. The problem of designing brake control is formulated as searching for the optimal policy in Markov decision process (MDP) model where the state is given by the relative position of the obstacle and the vehicle’s speed, and the action space is defined as whether brake is stepped or not. The policy used for brake control is learned through computer simulations using the deep reinforcement learning method called deep Q-network (DQN). In order to derive desirable braking policy, we propose the reward function which balances the damage imposed to the obstacle in case of accident and the reward achieved when the vehicle runs out of risk as soon as possible. DQN is trained for the scenario where a vehicle is encountered with a pedestrian crossing the urban road. Experiments show that the control agent exhibits desirable control behavior and avoids collision without any mistake in various uncertain environments.
Tasks
Published	2017-02-08
URL	http://arxiv.org/abs/1702.02302v2
PDF	http://arxiv.org/pdf/1702.02302v2.pdf
PWC	https://paperswithcode.com/paper/autonomous-braking-system-via-deep
Repo
Framework

Mosquito Detection with Neural Networks: The Buzz of Deep Learning


Title	Mosquito Detection with Neural Networks: The Buzz of Deep Learning
Authors	Ivan Kiskin, Bernardo Pérez Orozco, Theo Windebank, Davide Zilli, Marianne Sinka, Kathy Willis, Stephen Roberts
Abstract	Many real-world time-series analysis problems are characterised by scarce data. Solutions typically rely on hand-crafted features extracted from the time or frequency domain allied with classification or regression engines which condition on this (often low-dimensional) feature vector. The huge advances enjoyed by many application domains in recent years have been fuelled by the use of deep learning architectures trained on large data sets. This paper presents an application of deep learning for acoustic event detection in a challenging, data-scarce, real-world problem. Our candidate challenge is to accurately detect the presence of a mosquito from its acoustic signature. We develop convolutional neural networks (CNNs) operating on wavelet transformations of audio recordings. Furthermore, we interrogate the network’s predictive power by visualising statistics of network-excitatory samples. These visualisations offer a deep insight into the relative informativeness of components in the detection problem. We include comparisons with conventional classifiers, conditioned on both hand-tuned and generic features, to stress the strength of automatic deep feature learning. Detection is achieved with performance metrics significantly surpassing those of existing algorithmic methods, as well as marginally exceeding those attained by individual human experts.
Tasks	Time Series, Time Series Analysis
Published	2017-05-15
URL	http://arxiv.org/abs/1705.05180v1
PDF	http://arxiv.org/pdf/1705.05180v1.pdf
PWC	https://paperswithcode.com/paper/mosquito-detection-with-neural-networks-the
Repo
Framework

Human Action Recognition System using Good Features and Multilayer Perceptron Network


Title	Human Action Recognition System using Good Features and Multilayer Perceptron Network
Authors	Jonti Talukdar, Bhavana Mehta
Abstract	Human action recognition involves the characterization of human actions through the automated analysis of video data and is integral in the development of smart computer vision systems. However, several challenges like dynamic backgrounds, camera stabilization, complex actions, occlusions etc. make action recognition in a real time and robust fashion difficult. Several complex approaches exist but are computationally intensive. This paper presents a novel approach of using a combination of good features along with iterative optical flow algorithm to compute feature vectors which are classified using a multilayer perceptron (MLP) network. The use of multiple features for motion descriptors enhances the quality of tracking. Resilient backpropagation algorithm is used for training the feedforward neural network reducing the learning time. The overall system accuracy is improved by optimizing the various parameters of the multilayer perceptron network.
Tasks	Optical Flow Estimation, Temporal Action Localization
Published	2017-08-22
URL	http://arxiv.org/abs/1708.06794v1
PDF	http://arxiv.org/pdf/1708.06794v1.pdf
PWC	https://paperswithcode.com/paper/human-action-recognition-system-using-good
Repo
Framework

Building a Neural Machine Translation System Using Only Synthetic Parallel Data


Title	Building a Neural Machine Translation System Using Only Synthetic Parallel Data
Authors	Jaehong Park, Jongyoon Song, Sungroh Yoon
Abstract	Recent works have shown that synthetic parallel data automatically generated by translation models can be effective for various neural machine translation (NMT) issues. In this study, we build NMT systems using only synthetic parallel data. As an efficient alternative to real parallel data, we also present a new type of synthetic parallel corpus. The proposed pseudo parallel data are distinct from previous works in that ground truth and synthetic examples are mixed on both sides of sentence pairs. Experiments on Czech-German and French-German translations demonstrate the efficacy of the proposed pseudo parallel corpus, which shows not only enhanced results for bidirectional translation tasks but also substantial improvement with the aid of a ground truth real parallel corpus.
Tasks	Machine Translation
Published	2017-04-02
URL	http://arxiv.org/abs/1704.00253v4
PDF	http://arxiv.org/pdf/1704.00253v4.pdf
PWC	https://paperswithcode.com/paper/building-a-neural-machine-translation-system
Repo
Framework

Predicting the Gender of Indonesian Names


Title	Predicting the Gender of Indonesian Names
Authors	Ali Akbar Septiandri
Abstract	We investigated a way to predict the gender of a name using character-level Long-Short Term Memory (char-LSTM). We compared our method with some conventional machine learning methods, namely Naive Bayes, logistic regression, and XGBoost with n-grams as the features. We evaluated the models on a dataset consisting of the names of Indonesian people. It is not common to use a family name as the surname in Indonesian culture, except in some ethnicities. Therefore, we inferred the gender from both full names and first names. The results show that we can achieve 92.25% accuracy from full names, while using first names only yields 90.65% accuracy. These results are better than the ones from applying the classical machine learning algorithms to n-grams.
Tasks
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07129v2
PDF	http://arxiv.org/pdf/1707.07129v2.pdf
PWC	https://paperswithcode.com/paper/predicting-the-gender-of-indonesian-names
Repo
Framework

Decoding Sentiment from Distributed Representations of Sentences


Title	Decoding Sentiment from Distributed Representations of Sentences
Authors	Edoardo Maria Ponti, Ivan Vulić, Anna Korhonen
Abstract	Distributed representations of sentences have been developed recently to represent their meaning as real-valued vectors. However, it is not clear how much information such representations retain about the polarity of sentences. To study this question, we decode sentiment from unsupervised sentence representations learned with different architectures (sensitive to the order of words, the order of sentences, or none) in 9 typologically diverse languages. Sentiment results from the (recursive) composition of lexical items and grammatical strategies such as negation and concession. The results are manifold: we show that there is no `one-size-fits-all’ representation architecture outperforming the others across the board. Rather, the top-ranking architectures depend on the language and data at hand. Moreover, we find that in several cases the additive composition model based on skip-gram word vectors may surpass supervised state-of-art architectures such as bidirectional LSTMs. Finally, we provide a possible explanation of the observed variation based on the type of negative constructions in each language. \|
Tasks
Published	2017-05-17
URL	http://arxiv.org/abs/1705.06369v3
PDF	http://arxiv.org/pdf/1705.06369v3.pdf
PWC	https://paperswithcode.com/paper/decoding-sentiment-from-distributed
Repo
Framework

MAVOT: Memory-Augmented Video Object Tracking


Title	MAVOT: Memory-Augmented Video Object Tracking
Authors	Boyu Liu, Yanzhao Wang, Yu-Wing Tai, Chi-Keung Tang
Abstract	We introduce a one-shot learning approach for video object tracking. The proposed algorithm requires seeing the object to be tracked only once, and employs an external memory to store and remember the evolving features of the foreground object as well as backgrounds over time during tracking. With the relevant memory retrieved and updated in each tracking, our tracking model is capable of maintaining long-term memory of the object, and thus can naturally deal with hard tracking scenarios including partial and total occlusion, motion changes and large scale and shape variations. In our experiments we use the ImageNet ILSVRC2015 video detection dataset to train and use the VOT-2016 benchmark to test and compare our Memory-Augmented Video Object Tracking (MAVOT) model. From the results, we conclude that given its oneshot property and simplicity in design, MAVOT is an attractive approach in visual tracking because it shows good performance on VOT-2016 benchmark and is among the top 5 performers in accuracy and robustness in occlusion, motion changes and empty target.
Tasks	Object Tracking, One-Shot Learning, Video Object Tracking, Visual Tracking
Published	2017-11-26
URL	http://arxiv.org/abs/1711.09414v1
PDF	http://arxiv.org/pdf/1711.09414v1.pdf
PWC	https://paperswithcode.com/paper/mavot-memory-augmented-video-object-tracking
Repo
Framework

Body Joint guided 3D Deep Convolutional Descriptors for Action Recognition


Title	Body Joint guided 3D Deep Convolutional Descriptors for Action Recognition
Authors	Congqi Cao, Yifan Zhang, Chunjie Zhang, Hanqing Lu
Abstract	Three dimensional convolutional neural networks (3D CNNs) have been established as a powerful tool to simultaneously learn features from both spatial and temporal dimensions, which is suitable to be applied to video-based action recognition. In this work, we propose not to directly use the activations of fully-connected layers of a 3D CNN as the video feature, but to use selective convolutional layer activations to form a discriminative descriptor for video. It pools the feature on the convolutional layers under the guidance of body joint positions. Two schemes of mapping body joints into convolutional feature maps for pooling are discussed. The body joint positions can be obtained from any off-the-shelf skeleton estimation algorithm. The helpfulness of the body joint guided feature pooling with inaccurate skeleton estimation is systematically evaluated. To make it end-to-end and do not rely on any sophisticated body joint detection algorithm, we further propose a two-stream bilinear model which can learn the guidance from the body joints and capture the spatio-temporal features simultaneously. In this model, the body joint guided feature pooling is conveniently formulated as a bilinear product operation. Experimental results on three real-world datasets demonstrate the effectiveness of body joint guided pooling which achieves promising performance.
Tasks	Temporal Action Localization
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07160v2
PDF	http://arxiv.org/pdf/1704.07160v2.pdf
PWC	https://paperswithcode.com/paper/body-joint-guided-3d-deep-convolutional
Repo
Framework

Camera Calibration for Daylight Specular-Point Locus


Title	Camera Calibration for Daylight Specular-Point Locus
Authors	Mark S. Drew, Hamid Reza Vaezi Joze, Graham D. Finlayson
Abstract	In this paper we present a new camera calibration method aimed at finding a straight-line locus, in a special colour feature space, that is traversed by daylights and as well also approximately followed by specular points. The aim of the calibration is to enable recovering the colour of the illuminant in a scene, using the calibrated camera. First we prove theoretically that any candidate specular points, for an image that is generated by a specific camera and taken under a daylight, must lie on a straight line in log-chromaticity space, for a chromaticity that is generated using a geometric-mean denominator. Use is made of the assumptions that daylight illuminants can be approximated using Planckians and that camera sensors are narrowband or can be made so by spectral sharpening. Then we show how a particular camera can be calibrated so as to discover this locus. As applications we use this curve for illuminant detection, and also for re-lighting of images to show they would appear under lighting having a different colour temperature.
Tasks	Calibration
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04509v1
PDF	http://arxiv.org/pdf/1712.04509v1.pdf
PWC	https://paperswithcode.com/paper/camera-calibration-for-daylight-specular
Repo
Framework


Title	Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Authors	Karol Hausman, Yevgen Chebotar, Stefan Schaal, Gaurav Sukhatme, Joseph Lim
Abstract	Imitation learning has traditionally been applied to learn a single task from demonstrations thereof. The requirement of structured and isolated demonstrations limits the scalability of imitation learning approaches as they are difficult to apply to real-world scenarios, where robots have to be able to execute a multitude of tasks. In this paper, we propose a multi-modal imitation learning framework that is able to segment and imitate skills from unlabelled and unstructured demonstrations by learning skill segmentation and imitation learning jointly. The extensive simulation results indicate that our method can efficiently separate the demonstrations into individual skills and learn to imitate them using a single multi-modal policy. The video of our experiments is available at http://sites.google.com/view/nips17intentiongan
Tasks	Imitation Learning
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10479v2
PDF	http://arxiv.org/pdf/1705.10479v2.pdf
PWC	https://paperswithcode.com/paper/multi-modal-imitation-learning-from
Repo
Framework

Malicious URL Detection using Machine Learning: A Survey


Title	Malicious URL Detection using Machine Learning: A Survey
Authors	Doyen Sahoo, Chenghao Liu, Steven C. H. Hoi
Abstract	Malicious URL, a.k.a. malicious website, is a common and serious threat to cybersecurity. Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting users to become victims of scams (monetary loss, theft of private information, and malware installation), and cause losses of billions of dollars every year. It is imperative to detect and act on such threats in a timely manner. Traditionally, this detection is done mostly through the usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have been explored with increasing attention in recent years. This article aims to provide a comprehensive survey and a structural understanding of Malicious URL Detection techniques using machine learning. We present the formal formulation of Malicious URL Detection as a machine learning task, and categorize and review the contributions of literature studies that addresses different dimensions of this problem (feature representation, algorithm design, etc.). Further, this article provides a timely and comprehensive survey for a range of different audiences, not only for machine learning researchers and engineers in academia, but also for professionals and practitioners in cybersecurity industry, to help them understand the state of the art and facilitate their own research and practical applications. We also discuss practical issues in system design, open research challenges, and point out some important directions for future research.
Tasks
Published	2017-01-25
URL	https://arxiv.org/abs/1701.07179v3
PDF	https://arxiv.org/pdf/1701.07179v3.pdf
PWC	https://paperswithcode.com/paper/malicious-url-detection-using-machine
Repo
Framework

QCRI Machine Translation Systems for IWSLT 16


Title	QCRI Machine Translation Systems for IWSLT 16
Authors	Nadir Durrani, Fahim Dalvi, Hassan Sajjad, Stephan Vogel
Abstract	This paper describes QCRI’s machine translation systems for the IWSLT 2016 evaluation campaign. We participated in the Arabic->English and English->Arabic tracks. We built both Phrase-based and Neural machine translation models, in an effort to probe whether the newly emerged NMT framework surpasses the traditional phrase-based systems in Arabic-English language pairs. We trained a very strong phrase-based system including, a big language model, the Operation Sequence Model, Neural Network Joint Model and Class-based models along with different domain adaptation techniques such as MML filtering, mixture modeling and using fine tuning over NNJM model. However, a Neural MT system, trained by stacking data from different genres through fine-tuning, and applying ensemble over 8 models, beat our very strong phrase-based system by a significant 2 BLEU points margin in Arabic->English direction. We did not obtain similar gains in the other direction but were still able to outperform the phrase-based system. We also applied system combination on phrase-based and NMT outputs.
Tasks	Domain Adaptation, Language Modelling, Machine Translation
Published	2017-01-14
URL	http://arxiv.org/abs/1701.03924v1
PDF	http://arxiv.org/pdf/1701.03924v1.pdf
PWC	https://paperswithcode.com/paper/qcri-machine-translation-systems-for-iwslt-16
Repo
Framework