Paper Group ANR 732
Faster AutoAugment: Learning Augmentation Strategies using Backpropagation. Improving Evolutionary Strategies with Generative Neural Networks. Running Event Visualization using Videos from Multiple Cameras. What Can Learned Intrinsic Rewards Capture?. Learning Self-Game-Play Agents for Combinatorial Optimization Problems. GA-DAN: Geometry-Aware Dom …
Faster AutoAugment: Learning Augmentation Strategies using Backpropagation
Title | Faster AutoAugment: Learning Augmentation Strategies using Backpropagation |
Authors | Ryuichiro Hataya, Jan Zdenek, Kazuki Yoshizoe, Hideki Nakayama |
Abstract | Data augmentation methods are indispensable heuristics to boost the performance of deep neural networks, especially in image recognition tasks. Recently, several studies have shown that augmentation strategies found by search algorithms outperform hand-made strategies. Such methods employ black-box search algorithms over image transformations with continuous or discrete parameters and require a long time to obtain better strategies. In this paper, we propose a differentiable policy search pipeline for data augmentation, which is much faster than previous methods. We introduce approximate gradients for several transformation operations with discrete parameters as well as the differentiable mechanism for selecting operations. As the objective of training, we minimize the distance between the distributions of augmented data and the original data, which can be differentiated. We show that our method, Faster AutoAugment, achieves significantly faster searching than prior work without a performance drop. |
Tasks | Data Augmentation |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.06987v1 |
https://arxiv.org/pdf/1911.06987v1.pdf | |
PWC | https://paperswithcode.com/paper/faster-autoaugment-learning-augmentation |
Repo | |
Framework | |
Improving Evolutionary Strategies with Generative Neural Networks
Title | Improving Evolutionary Strategies with Generative Neural Networks |
Authors | Louis Faury, Clement Calauzenes, Olivier Fercoq, Syrine Krichen |
Abstract | Evolutionary Strategies (ES) are a popular family of black-box zeroth-order optimization algorithms which rely on search distributions to efficiently optimize a large variety of objective functions. This paper investigates the potential benefits of using highly flexible search distributions in classical ES algorithms, in contrast to standard ones (typically Gaussians). We model such distributions with Generative Neural Networks (GNNs) and introduce a new training algorithm that leverages their expressiveness to accelerate the ES procedure. We show that this tailored algorithm can readily incorporate existing ES algorithms, and outperforms the state-of-the-art on diverse objective functions. |
Tasks | |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11271v1 |
http://arxiv.org/pdf/1901.11271v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-evolutionary-strategies-with |
Repo | |
Framework | |
Running Event Visualization using Videos from Multiple Cameras
Title | Running Event Visualization using Videos from Multiple Cameras |
Authors | Yeshwanth Napolean, Priadi Teguh Wibowo, Jan van Gemert |
Abstract | Visualizing the trajectory of multiple runners with videos collected at different points in a race could be useful for sports performance analysis. The videos and the trajectories can also aid in athlete health monitoring. While the runners unique ID and their appearance are distinct, the task is not straightforward because the video data does not contain explicit information as to which runners appear in each of the videos. There is no direct supervision of the model in tracking athletes, only filtering steps to remove irrelevant detections. Other factors of concern include occlusion of runners and harsh illumination. To this end, we identify two methods for runner identification at different points of the event, for determining their trajectory. One is scene text detection which recognizes the runners by detecting a unique ‘bib number’ attached to their clothes and the other is person re-identification which detects the runners based on their appearance. We train our method without ground truth but to evaluate the proposed methods, we create a ground truth database which consists of video and frame interval information where the runners appear. The videos in the dataset was recorded by nine cameras at different locations during the a marathon event. This data is annotated with bib numbers of runners appearing in each video. The bib numbers of runners known to occur in the frame are used to filter irrelevant text and numbers detected. Except for this filtering step, no supervisory signal is used. The experimental evidence shows that the scene text recognition method achieves an F1-score of 74. Combining the two methods, that is - using samples collected by text spotter to train the re-identification model yields a higher F1-score of 85.8. Re-training the person re-identification model with identified inliers yields a slight improvement in performance(F1 score of 87.8). |
Tasks | Person Re-Identification, Scene Text Detection, Scene Text Recognition |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02835v1 |
https://arxiv.org/pdf/1909.02835v1.pdf | |
PWC | https://paperswithcode.com/paper/running-event-visualization-using-videos-from |
Repo | |
Framework | |
What Can Learned Intrinsic Rewards Capture?
Title | What Can Learned Intrinsic Rewards Capture? |
Authors | Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh |
Abstract | Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. The objective of an agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we instead consider the proposition that the reward function itself may be a good locus of knowledge. This is consistent with a common use, in the literature, of hand-designed intrinsic rewards to improve the learning dynamics of an agent. We adopt the multi-lifetime setting of the Optimal Rewards Framework, and propose to meta-learn an intrinsic reward function from experience that allows agents to maximise their extrinsic rewards accumulated until the end of their lifetimes. Rewards as a locus of knowledge provide guidance on “what” the agent should strive to do rather than “how” the agent should behave; the latter is more directly captured in policies or value functions for example. Thus, our focus here is on demonstrating the following: (1) that it is feasible to meta-learn good reward functions, (2) that the learned reward functions can capture interesting kinds of “what” knowledge, and (3) that because of the indirectness of this form of knowledge the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05500v1 |
https://arxiv.org/pdf/1912.05500v1.pdf | |
PWC | https://paperswithcode.com/paper/what-can-learned-intrinsic-rewards-capture-1 |
Repo | |
Framework | |
Learning Self-Game-Play Agents for Combinatorial Optimization Problems
Title | Learning Self-Game-Play Agents for Combinatorial Optimization Problems |
Authors | Ruiyang Xu, Karl Lieberherr |
Abstract | Recent progress in reinforcement learning (RL) using self-game-play has shown remarkable performance on several board games (e.g., Chess and Go) as well as video games (e.g., Atari games and Dota2). It is plausible to consider that RL, starting from zero knowledge, might be able to gradually approximate a winning strategy after a certain amount of training. In this paper, we explore neural Monte-Carlo-Tree-Search (neural MCTS), an RL algorithm which has been applied successfully by DeepMind to play Go and Chess at a super-human level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification (ZG) to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problem. The ZG also provides a specially designed neural MCTS. We use a combinatorial planning problem for which the ground-truth policy is efficiently computable to demonstrate that ZG is promising. |
Tasks | Atari Games, Board Games, Combinatorial Optimization |
Published | 2019-03-08 |
URL | https://arxiv.org/abs/1903.03674v2 |
https://arxiv.org/pdf/1903.03674v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-self-game-play-agents-for |
Repo | |
Framework | |
GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition
Title | GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition |
Authors | Fangneng Zhan, Chuhui Xue, Shijian Lu |
Abstract | Recent adversarial learning research has achieved very impressive progress for modelling cross-domain data shifts in appearance space but its counterpart in modelling cross-domain shifts in geometry space lags far behind. This paper presents an innovative Geometry-Aware Domain Adaptation Network (GA-DAN) that is capable of modelling cross-domain shifts concurrently in both geometry space and appearance space and realistically converting images across domains with very different characteristics. In the proposed GA-DAN, a novel multi-modal spatial learning technique is designed which converts a source-domain image into multiple images of different spatial views as in the target domain. A new disentangled cycle-consistency loss is introduced which balances the cycle consistency in appearance and geometry spaces and improves the learning of the whole network greatly. The proposed GA-DAN has been evaluated for the classic scene text detection and recognition tasks, and experiments show that the domain-adapted images achieve superior scene text detection and recognition performance while applied to network training. |
Tasks | Domain Adaptation, Scene Text Detection |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09653v1 |
https://arxiv.org/pdf/1907.09653v1.pdf | |
PWC | https://paperswithcode.com/paper/ga-dan-geometry-aware-domain-adaptation |
Repo | |
Framework | |
City-Scale Road Extraction from Satellite Imagery v2: Road Speeds and Travel Times
Title | City-Scale Road Extraction from Satellite Imagery v2: Road Speeds and Travel Times |
Authors | Adam Van Etten |
Abstract | Automated road network extraction from remote sensing imagery remains a significant challenge despite its importance in a broad array of applications. To this end, we explore road network extraction at scale with inference of semantic features of the graph, identifying speed limits and route travel times for each roadway. We call this approach City-Scale Road Extraction from Satellite Imagery v2 (CRESIv2), Including estimates for travel time permits true optimal routing (rather than just the shortest geographic distance), which is not possible with existing remote sensing imagery based methods. We evaluate our method using two sources of labels (OpenStreetMap, and those from the SpaceNet dataset), and find that models both trained and tested on SpaceNet labels outperform OpenStreetMap labels by greater than 60%. We quantify the performance of our algorithm with the Average Path Length Similarity (APLS) and map topology (TOPO) graph-theoretic metrics over a diverse test area covering four cities in the SpaceNet dataset. For a traditional edge weight of geometric distance, we find an aggregate of 5% improvement over existing methods for SpaceNet data. We also test our algorithm on Google satellite imagery with OpenStreetMap labels, and find a 23% improvement over previous work. Metric scores decrease by only 4% on large graphs when using travel time rather than geometric distance for edge weights, indicating that optimizing routing for travel time is feasible with this approach. |
Tasks | |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.09715v2 |
https://arxiv.org/pdf/1908.09715v2.pdf | |
PWC | https://paperswithcode.com/paper/city-scale-road-extraction-from-satellite-1 |
Repo | |
Framework | |
Closed Form Variances for Variational Auto-Encoders
Title | Closed Form Variances for Variational Auto-Encoders |
Authors | Graham Fyffe |
Abstract | We propose a reformulation of Variational Auto-Encoders eliminating half of the network outputs (the variances) in a deep network setting. While it is well known that the posterior is in general intractable, we show that the variances of Gaussian posteriors and likelihoods may be solved in closed form, producing improved variational lower bounds over their learned counterparts in experiments. The closed forms reduce to remarkably simple expressions – in particular, one optimal choice for the posterior variance is simply the identity matrix. We arrive at these conclusions by analyzing the variational lower bound objective irrespective of any particular network architecture, deriving its partial derivatives and closed form solutions for all parameters but the posterior means. In deriving the closed form likelihood variance, we show that the objective is underdetermined, which we resolve by constraining the presumed information content of the data examples. Any of these modifications may be applied to simplify, and perhaps improve, any Variational Auto-Encoder. |
Tasks | |
Published | 2019-12-21 |
URL | https://arxiv.org/abs/1912.10309v2 |
https://arxiv.org/pdf/1912.10309v2.pdf | |
PWC | https://paperswithcode.com/paper/closed-form-variances-for-variational-auto |
Repo | |
Framework | |
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition – RRC-MLT-2019
Title | ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition – RRC-MLT-2019 |
Authors | Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-lin Liu, Jean-Marc Ogier |
Abstract | With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a large scale multi-lingual synthetic dataset to assist the training, and a baseline End-to-End recognition method. The real dataset consists of 20,000 images containing text from 10 languages. The challenge has 4 tasks covering various aspects of multi-lingual scene text: (a) text detection, (b) cropped word script classification, (c) joint text detection and script classification and (d) end-to-end detection and recognition. In total, the competition received 60 submissions from the research and industrial communities. This paper presents the dataset, the tasks and the findings of the presented RRC-MLT-2019 challenge. |
Tasks | Scene Text Detection |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00945v1 |
https://arxiv.org/pdf/1907.00945v1.pdf | |
PWC | https://paperswithcode.com/paper/icdar2019-robust-reading-challenge-on-multi |
Repo | |
Framework | |
Deep Neural Network Symbol Detection for Millimeter Wave Communications
Title | Deep Neural Network Symbol Detection for Millimeter Wave Communications |
Authors | Yun Liao, Nariman Farsad, Nir Shlezinger, Yonina C. Eldar, Andrea J. Goldsmith |
Abstract | This paper proposes to use a deep neural network (DNN)-based symbol detector for mmWave systems such that CSI acquisition can be bypassed. In particular, we consider a sliding bidirectional recurrent neural network (BRNN) architecture that is suitable for the long memory length of typical mmWave channels. The performance of the DNN detector is evaluated in comparison to that of the Viterbi detector. The results show that the performance of the DNN detector is close to that of the optimal Viterbi detector with perfect CSI, and that it outperforms the Viterbi algorithm with CSI estimation error. Further experiments show that the DNN detector is robust to a wide range of noise levels and varying channel conditions, and that a pretrained detector can be reliably applied to different mmWave channel realizations with minimal overhead. |
Tasks | |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.11294v1 |
https://arxiv.org/pdf/1907.11294v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-symbol-detection-for |
Repo | |
Framework | |
Tutorial on NLP-Inspired Network Embedding
Title | Tutorial on NLP-Inspired Network Embedding |
Authors | Boaz Shmueli |
Abstract | This tutorial covers a few recent papers in the field of network embedding. Network embedding is a collective term for techniques for mapping graph nodes to vectors of real numbers in a multidimensional space. To be useful, a good embedding should preserve the structure of the graph. The vectors can then be used as input to various network and graph analysis tasks, such as link prediction. The papers discussed develop methods for the online learning of such embeddings, and include DeepWalk, LINE, node2vec, struc2vec and megapath2vec. These new methods and developments in online learning of network embeddings have major applications for the analysis of graphs and networks, including online social networks. |
Tasks | Link Prediction, Network Embedding |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07212v1 |
https://arxiv.org/pdf/1910.07212v1.pdf | |
PWC | https://paperswithcode.com/paper/tutorial-on-nlp-inspired-network-embedding |
Repo | |
Framework | |
Casting a Wide Net: Robust Extraction of Potentially Idiomatic Expressions
Title | Casting a Wide Net: Robust Extraction of Potentially Idiomatic Expressions |
Authors | Hessel Haagsma, Malvina Nissim, Johan Bos |
Abstract | Idiomatic expressions like out of the woods' and up the ante’ present a range of difficulties for natural language processing applications. We present work on the annotation and extraction of what we term potentially idiomatic expressions (PIEs), a subclass of multiword expressions covering both literal and non-literal uses of idiomatic expressions. Existing corpora of PIEs are small and have limited coverage of different PIE types, which hampers research. To further progress on the extraction and disambiguation of potentially idiomatic expressions, larger corpora of PIEs are required. In addition, larger corpora are a potential source for valuable linguistic insights into idiomatic expressions and their variability. We propose automatic tools to facilitate the building of larger PIE corpora, by investigating the feasibility of using dictionary-based extraction of PIEs as a pre-extraction tool for English. We do this by assessing the reliability and coverage of idiom dictionaries, the annotation of a PIE corpus, and the automatic extraction of PIEs from a large corpus. Results show that combinations of dictionaries are a reliable source of idiomatic expressions, that PIEs can be annotated with a high reliability (0.74-0.91 Fleiss’ Kappa), and that parse-based PIE extraction yields highly accurate performance (88% F1-score). Combining complementary PIE extraction methods increases reliability further, to over 92% F1-score. Moreover, the extraction method presented here could be extended to other types of multiword expressions and to other languages, given that sufficient NLP tools are available. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08829v1 |
https://arxiv.org/pdf/1911.08829v1.pdf | |
PWC | https://paperswithcode.com/paper/casting-a-wide-net-robust-extraction-of |
Repo | |
Framework | |
Convolutional neural network for breathing phase detection in lung sounds
Title | Convolutional neural network for breathing phase detection in lung sounds |
Authors | Cristina Jácome, Johan Ravn, Einar Holsbø, Juan Carlos Aviles-Solis, Hasse Melbye, Lars Ailo Bongo |
Abstract | We applied deep learning to create an algorithm for breathing phase detection in lung sound recordings, and we compared the breathing phases detected by the algorithm and manually annotated by two experienced lung sound researchers. Our algorithm uses a convolutional neural network with spectrograms as the features, removing the need to specify features explicitly. We trained and evaluated the algorithm using three subsets that are larger than previously seen in the literature. We evaluated the performance of the method using two methods. First, discrete count of agreed breathing phases (using 50% overlap between a pair of boxes), shows a mean agreement with lung sound experts of 97% for inspiration and 87% for expiration. Second, the fraction of time of agreement (in seconds) gives higher pseudo-kappa values for inspiration (0.73-0.88) than expiration (0.63-0.84), showing an average sensitivity of 97% and an average specificity of 84%. With both evaluation methods, the agreement between the annotators and the algorithm shows human level performance for the algorithm. The developed algorithm is valid for detecting breathing phases in lung sound recordings. |
Tasks | |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10251v1 |
http://arxiv.org/pdf/1903.10251v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-network-for-breathing |
Repo | |
Framework | |
A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning
Title | A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning |
Authors | Shihao Ge, Haruna Isah, Farhana Zulkernine, Shahzad Khan |
Abstract | The rapid growth of data in velocity, volume, value, variety, and veracity has enabled exciting new opportunities and presented big challenges for businesses of all types. Recently, there has been considerable interest in developing systems for processing continuous data streams with the increasing need for real-time analytics for decision support in the business, healthcare, manufacturing, and security. The analytics of streaming data usually relies on the output of offline analytics on static or archived data. However, businesses and organizations like our industry partner Gnowit, strive to provide their customers with real time market information and continuously look for a unified analytics framework that can integrate both streaming and offline analytics in a seamless fashion to extract knowledge from large volumes of hybrid streaming data. We present our study on designing a multilevel streaming text data analytics framework by comparing leading edge scalable open-source, distributed, and in-memory technologies. We demonstrate the functionality of the framework for a use case of multilevel text analytics using deep learning for language understanding and sentiment analysis including data indexing and query processing. Our framework combines Spark streaming for real time text processing, the Long Short Term Memory (LSTM) deep learning model for higher level sentiment analysis, and other tools for SQL-based analytical processing to provide a scalable solution for multilevel streaming text analytics. |
Tasks | Sentiment Analysis |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06690v1 |
https://arxiv.org/pdf/1907.06690v1.pdf | |
PWC | https://paperswithcode.com/paper/a-scalable-framework-for-multilevel-streaming |
Repo | |
Framework | |
Falls Prediction in eldery people using Gated Recurrent Units
Title | Falls Prediction in eldery people using Gated Recurrent Units |
Authors | Marcin Radzio, Maciej Wielgosz, Matej Mertik |
Abstract | Falls prevention, especially in older people, becomes an increasingly important topic in the times of aging societies. In this work, we present Gated Recurrent Unit-based neural networks models designed for predicting falls (syncope). The cardiovascular systems signals used in the study come from Gravitational Physiology, Aging and Medicine Research Unit, Institute of Physiology, Medical University of Graz. We used two of the collected signals, heart rate, and mean blood pressure. By using bidirectional GRU model, it was possible to predict the syncope occurrence approximately ten minutes before the manual marker. |
Tasks | |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.01050v1 |
https://arxiv.org/pdf/1908.01050v1.pdf | |
PWC | https://paperswithcode.com/paper/falls-prediction-in-eldery-people-using-gated |
Repo | |
Framework | |