January 29, 2020

3265 words 16 mins read

Paper Group ANR 732

Faster AutoAugment: Learning Augmentation Strategies using Backpropagation. Improving Evolutionary Strategies with Generative Neural Networks. Running Event Visualization using Videos from Multiple Cameras. What Can Learned Intrinsic Rewards Capture?. Learning Self-Game-Play Agents for Combinatorial Optimization Problems. GA-DAN: Geometry-Aware Dom …

Faster AutoAugment: Learning Augmentation Strategies using Backpropagation


Title	Faster AutoAugment: Learning Augmentation Strategies using Backpropagation
Authors	Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, Hideki Nakayama
Abstract	Data augmentation methods are indispensable heuristics to boost the performance of deep neural networks, especially in image recognition tasks. Recently, several studies have shown that augmentation strategies found by search algorithms outperform hand-made strategies. Such methods employ black-box search algorithms over image transformations with continuous or discrete parameters and require a long time to obtain better strategies. In this paper, we propose a differentiable policy search pipeline for data augmentation, which is much faster than previous methods. We introduce approximate gradients for several transformation operations with discrete parameters as well as the differentiable mechanism for selecting operations. As the objective of training, we minimize the distance between the distributions of augmented data and the original data, which can be differentiated. We show that our method, Faster AutoAugment, achieves significantly faster searching than prior work without a performance drop.
Tasks	Data Augmentation
Published	2019-11-16
URL	https://arxiv.org/abs/1911.06987v1
PDF	https://arxiv.org/pdf/1911.06987v1.pdf
PWC	https://paperswithcode.com/paper/faster-autoaugment-learning-augmentation
Repo
Framework

Improving Evolutionary Strategies with Generative Neural Networks


Title	Improving Evolutionary Strategies with Generative Neural Networks
Authors	Louis Faury, Clement Calauzenes, Olivier Fercoq, Syrine Krichen
Abstract	Evolutionary Strategies (ES) are a popular family of black-box zeroth-order optimization algorithms which rely on search distributions to efficiently optimize a large variety of objective functions. This paper investigates the potential benefits of using highly flexible search distributions in classical ES algorithms, in contrast to standard ones (typically Gaussians). We model such distributions with Generative Neural Networks (GNNs) and introduce a new training algorithm that leverages their expressiveness to accelerate the ES procedure. We show that this tailored algorithm can readily incorporate existing ES algorithms, and outperforms the state-of-the-art on diverse objective functions.
Tasks
Published	2019-01-31
URL	http://arxiv.org/abs/1901.11271v1
PDF	http://arxiv.org/pdf/1901.11271v1.pdf
PWC	https://paperswithcode.com/paper/improving-evolutionary-strategies-with
Repo
Framework

Running Event Visualization using Videos from Multiple Cameras


Title	Running Event Visualization using Videos from Multiple Cameras
Authors	Yeshwanth Napolean, Priadi Teguh Wibowo, Jan van Gemert
Abstract	Visualizing the trajectory of multiple runners with videos collected at different points in a race could be useful for sports performance analysis. The videos and the trajectories can also aid in athlete health monitoring. While the runners unique ID and their appearance are distinct, the task is not straightforward because the video data does not contain explicit information as to which runners appear in each of the videos. There is no direct supervision of the model in tracking athletes, only filtering steps to remove irrelevant detections. Other factors of concern include occlusion of runners and harsh illumination. To this end, we identify two methods for runner identification at different points of the event, for determining their trajectory. One is scene text detection which recognizes the runners by detecting a unique ‘bib number’ attached to their clothes and the other is person re-identification which detects the runners based on their appearance. We train our method without ground truth but to evaluate the proposed methods, we create a ground truth database which consists of video and frame interval information where the runners appear. The videos in the dataset was recorded by nine cameras at different locations during the a marathon event. This data is annotated with bib numbers of runners appearing in each video. The bib numbers of runners known to occur in the frame are used to filter irrelevant text and numbers detected. Except for this filtering step, no supervisory signal is used. The experimental evidence shows that the scene text recognition method achieves an F1-score of 74. Combining the two methods, that is - using samples collected by text spotter to train the re-identification model yields a higher F1-score of 85.8. Re-training the person re-identification model with identified inliers yields a slight improvement in performance(F1 score of 87.8).
Tasks	Person Re-Identification, Scene Text Detection, Scene Text Recognition
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02835v1
PDF	https://arxiv.org/pdf/1909.02835v1.pdf
PWC	https://paperswithcode.com/paper/running-event-visualization-using-videos-from
Repo
Framework

What Can Learned Intrinsic Rewards Capture?


Title	What Can Learned Intrinsic Rewards Capture?
Authors	Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh
Abstract	Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. The objective of an agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we instead consider the proposition that the reward function itself may be a good locus of knowledge. This is consistent with a common use, in the literature, of hand-designed intrinsic rewards to improve the learning dynamics of an agent. We adopt the multi-lifetime setting of the Optimal Rewards Framework, and propose to meta-learn an intrinsic reward function from experience that allows agents to maximise their extrinsic rewards accumulated until the end of their lifetimes. Rewards as a locus of knowledge provide guidance on “what” the agent should strive to do rather than “how” the agent should behave; the latter is more directly captured in policies or value functions for example. Thus, our focus here is on demonstrating the following: (1) that it is feasible to meta-learn good reward functions, (2) that the learned reward functions can capture interesting kinds of “what” knowledge, and (3) that because of the indirectness of this form of knowledge the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05500v1
PDF	https://arxiv.org/pdf/1912.05500v1.pdf
PWC	https://paperswithcode.com/paper/what-can-learned-intrinsic-rewards-capture-1
Repo
Framework

Learning Self-Game-Play Agents for Combinatorial Optimization Problems


Title	Learning Self-Game-Play Agents for Combinatorial Optimization Problems
Authors	Ruiyang Xu, Karl Lieberherr
Abstract	Recent progress in reinforcement learning (RL) using self-game-play has shown remarkable performance on several board games (e.g., Chess and Go) as well as video games (e.g., Atari games and Dota2). It is plausible to consider that RL, starting from zero knowledge, might be able to gradually approximate a winning strategy after a certain amount of training. In this paper, we explore neural Monte-Carlo-Tree-Search (neural MCTS), an RL algorithm which has been applied successfully by DeepMind to play Go and Chess at a super-human level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification (ZG) to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problem. The ZG also provides a specially designed neural MCTS. We use a combinatorial planning problem for which the ground-truth policy is efficiently computable to demonstrate that ZG is promising.
Tasks	Atari Games, Board Games, Combinatorial Optimization
Published	2019-03-08
URL	https://arxiv.org/abs/1903.03674v2
PDF	https://arxiv.org/pdf/1903.03674v2.pdf
PWC	https://paperswithcode.com/paper/learning-self-game-play-agents-for
Repo
Framework

GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition


Title	GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition
Authors	Fangneng Zhan, Chuhui Xue, Shijian Lu
Abstract	Recent adversarial learning research has achieved very impressive progress for modelling cross-domain data shifts in appearance space but its counterpart in modelling cross-domain shifts in geometry space lags far behind. This paper presents an innovative Geometry-Aware Domain Adaptation Network (GA-DAN) that is capable of modelling cross-domain shifts concurrently in both geometry space and appearance space and realistically converting images across domains with very different characteristics. In the proposed GA-DAN, a novel multi-modal spatial learning technique is designed which converts a source-domain image into multiple images of different spatial views as in the target domain. A new disentangled cycle-consistency loss is introduced which balances the cycle consistency in appearance and geometry spaces and improves the learning of the whole network greatly. The proposed GA-DAN has been evaluated for the classic scene text detection and recognition tasks, and experiments show that the domain-adapted images achieve superior scene text detection and recognition performance while applied to network training.
Tasks	Domain Adaptation, Scene Text Detection
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09653v1
PDF	https://arxiv.org/pdf/1907.09653v1.pdf
PWC	https://paperswithcode.com/paper/ga-dan-geometry-aware-domain-adaptation
Repo
Framework

City-Scale Road Extraction from Satellite Imagery v2: Road Speeds and Travel Times


Title	City-Scale Road Extraction from Satellite Imagery v2: Road Speeds and Travel Times
Authors	Adam Van Etten
Abstract	Automated road network extraction from remote sensing imagery remains a significant challenge despite its importance in a broad array of applications. To this end, we explore road network extraction at scale with inference of semantic features of the graph, identifying speed limits and route travel times for each roadway. We call this approach City-Scale Road Extraction from Satellite Imagery v2 (CRESIv2), Including estimates for travel time permits true optimal routing (rather than just the shortest geographic distance), which is not possible with existing remote sensing imagery based methods. We evaluate our method using two sources of labels (OpenStreetMap, and those from the SpaceNet dataset), and find that models both trained and tested on SpaceNet labels outperform OpenStreetMap labels by greater than 60%. We quantify the performance of our algorithm with the Average Path Length Similarity (APLS) and map topology (TOPO) graph-theoretic metrics over a diverse test area covering four cities in the SpaceNet dataset. For a traditional edge weight of geometric distance, we find an aggregate of 5% improvement over existing methods for SpaceNet data. We also test our algorithm on Google satellite imagery with OpenStreetMap labels, and find a 23% improvement over previous work. Metric scores decrease by only 4% on large graphs when using travel time rather than geometric distance for edge weights, indicating that optimizing routing for travel time is feasible with this approach.
Tasks
Published	2019-08-06
URL	https://arxiv.org/abs/1908.09715v2
PDF	https://arxiv.org/pdf/1908.09715v2.pdf
PWC	https://paperswithcode.com/paper/city-scale-road-extraction-from-satellite-1
Repo
Framework

Closed Form Variances for Variational Auto-Encoders


Title	Closed Form Variances for Variational Auto-Encoders
Authors	Graham Fyffe
Abstract	We propose a reformulation of Variational Auto-Encoders eliminating half of the network outputs (the variances) in a deep network setting. While it is well known that the posterior is in general intractable, we show that the variances of Gaussian posteriors and likelihoods may be solved in closed form, producing improved variational lower bounds over their learned counterparts in experiments. The closed forms reduce to remarkably simple expressions – in particular, one optimal choice for the posterior variance is simply the identity matrix. We arrive at these conclusions by analyzing the variational lower bound objective irrespective of any particular network architecture, deriving its partial derivatives and closed form solutions for all parameters but the posterior means. In deriving the closed form likelihood variance, we show that the objective is underdetermined, which we resolve by constraining the presumed information content of the data examples. Any of these modifications may be applied to simplify, and perhaps improve, any Variational Auto-Encoder.
Tasks
Published	2019-12-21
URL	https://arxiv.org/abs/1912.10309v2
PDF	https://arxiv.org/pdf/1912.10309v2.pdf
PWC	https://paperswithcode.com/paper/closed-form-variances-for-variational-auto
Repo
Framework

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition – RRC-MLT-2019


Title	ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition – RRC-MLT-2019
Authors	Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-lin Liu, Jean-Marc Ogier
Abstract	With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge.
Tasks	Scene Text Detection
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00945v1
PDF	https://arxiv.org/pdf/1907.00945v1.pdf
PWC	https://paperswithcode.com/paper/icdar2019-robust-reading-challenge-on-multi
Repo
Framework

Deep Neural Network Symbol Detection for Millimeter Wave Communications


Title	Deep Neural Network Symbol Detection for Millimeter Wave Communications
Authors	Yun Liao, Nariman Farsad, Nir Shlezinger, Yonina C. Eldar, Andrea J. Goldsmith
Abstract	This paper proposes to use a deep neural network (DNN)-based symbol detector for mmWave systems such that CSI acquisition can be bypassed. In particular, we consider a sliding bidirectional recurrent neural network (BRNN) architecture that is suitable for the long memory length of typical mmWave channels. The performance of the DNN detector is evaluated in comparison to that of the Viterbi detector. The results show that the performance of the DNN detector is close to that of the optimal Viterbi detector with perfect CSI, and that it outperforms the Viterbi algorithm with CSI estimation error. Further experiments show that the DNN detector is robust to a wide range of noise levels and varying channel conditions, and that a pretrained detector can be reliably applied to different mmWave channel realizations with minimal overhead.
Tasks
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11294v1
PDF	https://arxiv.org/pdf/1907.11294v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-symbol-detection-for
Repo
Framework

Tutorial on NLP-Inspired Network Embedding


Title	Tutorial on NLP-Inspired Network Embedding
Authors	Boaz Shmueli
Abstract	This tutorial covers a few recent papers in the field of network embedding. Network embedding is a collective term for techniques for mapping graph nodes to vectors of real numbers in a multidimensional space. To be useful, a good embedding should preserve the structure of the graph. The vectors can then be used as input to various network and graph analysis tasks, such as link prediction. The papers discussed develop methods for the online learning of such embeddings, and include DeepWalk, LINE, node2vec, struc2vec and megapath2vec. These new methods and developments in online learning of network embeddings have major applications for the analysis of graphs and networks, including online social networks.
Tasks	Link Prediction, Network Embedding
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07212v1
PDF	https://arxiv.org/pdf/1910.07212v1.pdf
PWC	https://paperswithcode.com/paper/tutorial-on-nlp-inspired-network-embedding
Repo
Framework

Casting a Wide Net: Robust Extraction of Potentially Idiomatic Expressions


Title	Casting a Wide Net: Robust Extraction of Potentially Idiomatic Expressions
Authors	Hessel Haagsma, Malvina Nissim, Johan Bos
Abstract	Idiomatic expressions like `out of the woods' and` up the ante’ present a range of difficulties for natural language processing applications. We present work on the annotation and extraction of what we term potentially idiomatic expressions (PIEs), a subclass of multiword expressions covering both literal and non-literal uses of idiomatic expressions. Existing corpora of PIEs are small and have limited coverage of different PIE types, which hampers research. To further progress on the extraction and disambiguation of potentially idiomatic expressions, larger corpora of PIEs are required. In addition, larger corpora are a potential source for valuable linguistic insights into idiomatic expressions and their variability. We propose automatic tools to facilitate the building of larger PIE corpora, by investigating the feasibility of using dictionary-based extraction of PIEs as a pre-extraction tool for English. We do this by assessing the reliability and coverage of idiom dictionaries, the annotation of a PIE corpus, and the automatic extraction of PIEs from a large corpus. Results show that combinations of dictionaries are a reliable source of idiomatic expressions, that PIEs can be annotated with a high reliability (0.74-0.91 Fleiss’ Kappa), and that parse-based PIE extraction yields highly accurate performance (88% F1-score). Combining complementary PIE extraction methods increases reliability further, to over 92% F1-score. Moreover, the extraction method presented here could be extended to other types of multiword expressions and to other languages, given that sufficient NLP tools are available.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08829v1
PDF	https://arxiv.org/pdf/1911.08829v1.pdf
PWC	https://paperswithcode.com/paper/casting-a-wide-net-robust-extraction-of
Repo
Framework

Convolutional neural network for breathing phase detection in lung sounds


Title	Convolutional neural network for breathing phase detection in lung sounds
Authors	Cristina Jácome, Johan Ravn, Einar Holsbø, Juan Carlos Aviles-Solis, Hasse Melbye, Lars Ailo Bongo
Abstract	We applied deep learning to create an algorithm for breathing phase detection in lung sound recordings, and we compared the breathing phases detected by the algorithm and manually annotated by two experienced lung sound researchers. Our algorithm uses a convolutional neural network with spectrograms as the features, removing the need to specify features explicitly. We trained and evaluated the algorithm using three subsets that are larger than previously seen in the literature. We evaluated the performance of the method using two methods. First, discrete count of agreed breathing phases (using 50% overlap between a pair of boxes), shows a mean agreement with lung sound experts of 97% for inspiration and 87% for expiration. Second, the fraction of time of agreement (in seconds) gives higher pseudo-kappa values for inspiration (0.73-0.88) than expiration (0.63-0.84), showing an average sensitivity of 97% and an average specificity of 84%. With both evaluation methods, the agreement between the annotators and the algorithm shows human level performance for the algorithm. The developed algorithm is valid for detecting breathing phases in lung sound recordings.
Tasks
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10251v1
PDF	http://arxiv.org/pdf/1903.10251v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-network-for-breathing
Repo
Framework

A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning


Title	A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning
Authors	Shihao Ge, Haruna Isah, Farhana Zulkernine, Shahzad Khan
Abstract	The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in the business, healthcare, manufacturing, and security. The analytics of streaming data usually relies on the output of offline analytics on static or archived data. However, businesses and organizations like our industry partner Gnowit, strive to provide their customers with real time market information and continuously look for a unified analytics framework that can integrate both streaming and offline analytics in a seamless fashion to extract knowledge from large volumes of hybrid streaming data. We present our study on designing a multilevel streaming text data analytics framework by comparing leading edge scalable open-source, distributed, and in-memory technologies. We demonstrate the functionality of the framework for a use case of multilevel text analytics using deep learning for language understanding and sentiment analysis including data indexing and query processing. Our framework combines Spark streaming for real time text processing, the Long Short Term Memory (LSTM) deep learning model for higher level sentiment analysis, and other tools for SQL-based analytical processing to provide a scalable solution for multilevel streaming text analytics.
Tasks	Sentiment Analysis
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06690v1
PDF	https://arxiv.org/pdf/1907.06690v1.pdf
PWC	https://paperswithcode.com/paper/a-scalable-framework-for-multilevel-streaming
Repo
Framework

Falls Prediction in eldery people using Gated Recurrent Units


Title	Falls Prediction in eldery people using Gated Recurrent Units
Authors	Marcin Radzio, Maciej Wielgosz, Matej Mertik
Abstract	Falls prevention, especially in older people, becomes an increasingly important topic in the times of aging societies. In this work, we present Gated Recurrent Unit-based neural networks models designed for predicting falls (syncope). The cardiovascular systems signals used in the study come from Gravitational Physiology, Aging and Medicine Research Unit, Institute of Physiology, Medical University of Graz. We used two of the collected signals, heart rate, and mean blood pressure. By using bidirectional GRU model, it was possible to predict the syncope occurrence approximately ten minutes before the manual marker.
Tasks
Published	2019-08-02
URL	https://arxiv.org/abs/1908.01050v1
PDF	https://arxiv.org/pdf/1908.01050v1.pdf
PWC	https://paperswithcode.com/paper/falls-prediction-in-eldery-people-using-gated
Repo
Framework