January 26, 2020

3037 words 15 mins read

Paper Group ANR 1464

Neural Fictitious Self-Play on ELF Mini-RTS. The ‘Paris-end’ of town? Urban typology through machine learning. A Theoretical Analysis of Deep Neural Networks and Parametric PDEs. Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data. Data Interpolating Prediction: Alternative Interpretation of Mixup. Attentive Adversaria …

Neural Fictitious Self-Play on ELF Mini-RTS


Title	Neural Fictitious Self-Play on ELF Mini-RTS
Authors	Keigo Kawamura, Yoshimasa Tsuruoka
Abstract	Despite the notable successes in video games such as Atari 2600, current AI is yet to defeat human champions in the domain of real-time strategy (RTS) games. One of the reasons is that an RTS game is a multi-agent game, in which single-agent reinforcement learning methods cannot simply be applied because the environment is not a stationary Markov Decision Process. In this paper, we present a first step toward finding a game-theoretic solution to RTS games by applying Neural Fictitious Self-Play (NFSP), a game-theoretic approach for finding Nash equilibria, to Mini-RTS, a small but nontrivial RTS game provided on the ELF platform. More specifically, we show that NFSP can be effectively combined with policy gradient reinforcement learning and be applied to Mini-RTS. Experimental results also show that the scalability of NFSP can be substantially improved by pretraining the models with simple self-play using policy gradients, which by itself gives a strong strategy despite its lack of theoretical guarantee of convergence.
Tasks
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02004v1
PDF	http://arxiv.org/pdf/1902.02004v1.pdf
PWC	https://paperswithcode.com/paper/neural-fictitious-self-play-on-elf-mini-rts
Repo
Framework

The ‘Paris-end’ of town? Urban typology through machine learning


Title	The ‘Paris-end’ of town? Urban typology through machine learning
Authors	Kerry A. Nice, Jason Thompson, Jasper S. Wijnands, Gideon D. P. A. Aschwanden, Mark Stevenson
Abstract	The confluence of recent advances in availability of geospatial information, computing power, and artificial intelligence offers new opportunities to understand how and where our cities differ or are alike. Departing from a traditional top-down' analysis of urban design features, this project analyses millions of images of urban form (consisting of street view, satellite imagery, and street maps) to find shared characteristics. A (novel) neural network-based framework is trained with imagery from the largest 1692 cities in the world and the resulting models are used to compare within-city locations from Melbourne and Sydney to determine the closest connections between these areas and their international comparators. This work demonstrates a new, consistent, and objective method to begin to understand the relationship between cities and their health, transport, and environmental consequences of their design. The results show specific advantages and disadvantages using each type of imagery. Neural networks trained with map imagery will be highly influenced by the mix of roads, public transport, and green and blue space as well as the structure of these elements. The colours of natural and built features stand out as dominant characteristics in satellite imagery. The use of street view imagery will emphasise the features of a human scaled visual geography of streetscapes. Finally, and perhaps most importantly, this research also answers the age-old question, ``Is there really a Paris-end’ to your city?''.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03220v1
PDF	https://arxiv.org/pdf/1910.03220v1.pdf
PWC	https://paperswithcode.com/paper/the-paris-end-of-town-urban-typology-through
Repo
Framework

A Theoretical Analysis of Deep Neural Networks and Parametric PDEs


Title	A Theoretical Analysis of Deep Neural Networks and Parametric PDEs
Authors	Gitta Kutyniok, Philipp Petersen, Mones Raslan, Reinhold Schneider
Abstract	We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical approximation results. We use this low dimensionality to guarantee the existence of a reduced basis. Then, for a large variety of parametric partial differential equations, we construct neural networks that yield approximations of the parametric maps not suffering from a curse of dimension and essentially only depending on the size of the reduced basis.
Tasks
Published	2019-03-31
URL	https://arxiv.org/abs/1904.00377v2
PDF	https://arxiv.org/pdf/1904.00377v2.pdf
PWC	https://paperswithcode.com/paper/a-theoretical-analysis-of-deep-neural-1
Repo
Framework

Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data


Title	Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data
Authors	Richeng Jin, Xiaofan He, Huaiyu Dai
Abstract	The recent advances in sensor technologies and smart devices enable the collaborative collection of a sheer volume of data from multiple information sources. As a promising tool to efficiently extract useful information from such big data, machine learning has been pushed to the forefront and seen great success in a wide range of relevant areas such as computer vision, health care, and financial market analysis. To accommodate the large volume of data, there is a surge of interest in the design of distributed machine learning, among which stochastic gradient descent (SGD) is one of the mostly adopted methods. Nonetheless, distributed machine learning methods may be vulnerable to Byzantine attack, in which the adversary can deliberately share falsified information to disrupt the intended machine learning procedures. Therefore, two asynchronous Byzantine tolerant SGD algorithms are proposed in this work, in which the honest collaborative workers are assumed to store the model parameters derived from their own local data and use them as the ground truth. The proposed algorithms can deal with an arbitrary number of Byzantine attackers and are provably convergent. Simulation results based on a real-world dataset are presented to verify the theoretical results and demonstrate the effectiveness of the proposed algorithms.
Tasks
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10336v3
PDF	http://arxiv.org/pdf/1902.10336v3.pdf
PWC	https://paperswithcode.com/paper/distributed-byzantine-tolerant-stochastic
Repo
Framework

Data Interpolating Prediction: Alternative Interpretation of Mixup


Title	Data Interpolating Prediction: Alternative Interpretation of Mixup
Authors	Takuya Shimada, Shoichiro Yamaguchi, Kohei Hayashi, Sosuke Kobayashi
Abstract	Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples for training and original samples for testing. This gap may prevent a classifier from learning the optimal decision boundary and increase the generalization error. To overcome this problem, we propose an alternative framework called Data Interpolating Prediction (DIP). Unlike common data augmentations, we encapsulate the sample-mixing process in the hypothesis class of a classifier so that train and test samples are treated equally. We derive the generalization bound and show that DIP helps to reduce the original Rademacher complexity. Also, we empirically demonstrate that DIP can outperform existing Mixup.
Tasks	Data Augmentation
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08412v1
PDF	https://arxiv.org/pdf/1906.08412v1.pdf
PWC	https://paperswithcode.com/paper/data-interpolating-prediction-alternative
Repo
Framework

Attentive Adversarial Learning for Domain-Invariant Training


Title	Attentive Adversarial Learning for Domain-Invariant Training
Authors	Zhong Meng, Jinyu Li, Yifan Gong
Abstract	Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR). In ADIT, an auxiliary domain classifier takes in equally-weighted deep features from a deep neural network (DNN) acoustic model and is trained to improve their domain-invariance by optimizing an adversarial loss function. In this work, we propose an attentive ADIT (AADIT) in which we advance the domain classifier with an attention mechanism to automatically weight the input deep features according to their importance in domain classification. With this attentive re-weighting, AADIT can focus on the domain normalization of phonetic components that are more susceptible to domain variability and generates deep features with improved domain-invariance and senone-discriminativity over ADIT. Most importantly, the attention block serves only as an external component to the DNN acoustic model and is not involved in ASR, so AADIT can be used to improve the acoustic modeling with any DNN architectures. More generally, the same methodology can improve any adversarial learning system with an auxiliary discriminator. Evaluated on CHiME-3 dataset, the AADIT achieves 13.6% and 9.3% relative WER improvements, respectively, over a multi-conditional model and a strong ADIT baseline.
Tasks	Speech Recognition
Published	2019-04-28
URL	http://arxiv.org/abs/1904.12400v1
PDF	http://arxiv.org/pdf/1904.12400v1.pdf
PWC	https://paperswithcode.com/paper/attentive-adversarial-learning-for-domain
Repo
Framework

Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents


Title	Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents
Authors	Zalán Borsos, Andrey Khorlin, Andrea Gesmundo
Abstract	Recent advances in Neural Architecture Search (NAS) have produced state-of-the-art architectures on several tasks. NAS shifts the efforts of human experts from developing novel architectures directly to designing architecture search spaces and methods to explore them efficiently. The search space definition captures prior knowledge about the properties of the architectures and it is crucial for the complexity and the performance of the search algorithm. However, different search space definitions require restarting the learning process from scratch. We propose a novel agent based on the Transformer that supports joint training and efficient transfer of prior knowledge between multiple search spaces and tasks.
Tasks	Neural Architecture Search, Transfer Learning
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08102v1
PDF	https://arxiv.org/pdf/1906.08102v1.pdf
PWC	https://paperswithcode.com/paper/transfer-nas-knowledge-transfer-between
Repo
Framework

Extending Event Detection to New Types with Learning from Keywords


Title	Extending Event Detection to New Types with Learning from Keywords
Authors	Viet Dac Lai, Thien Huu Nguyen
Abstract	Traditional event detection classifies a word or a phrase in a given sentence for a set of predefined event types. The limitation of such predefined set is that it prevents the adaptation of the event detection models to new event types. We study a novel formulation of event detection that describes types via several keywords to match the contexts in documents. This facilitates the operation of the models to new types. We introduce a novel feature-based attention mechanism for convolutional neural networks for event detection in the new formulation. Our extensive experiments demonstrate the benefits of the new formulation for new type extension for event detection as well as the proposed attention mechanism for this problem.
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11368v1
PDF	https://arxiv.org/pdf/1910.11368v1.pdf
PWC	https://paperswithcode.com/paper/extending-event-detection-to-new-types-with
Repo
Framework

Classification with unknown class-conditional label noise on non-compact feature spaces


Title	Classification with unknown class-conditional label noise on non-compact feature spaces
Authors	Henry W J Reeve, Ata Kaban
Abstract	We investigate the problem of classification in the presence of unknown class-conditional label noise in which the labels observed by the learner have been corrupted with some unknown class dependent probability. In order to obtain finite sample rates, previous approaches to classification with unknown class-conditional label noise have required that the regression function is close to its extrema on sets of large measure. We shall consider this problem in the setting of non-compact metric spaces, where the regression function need not attain its extrema. In this setting we determine the minimax optimal learning rates (up to logarithmic factors). The rate displays interesting threshold behaviour: When the regression function approaches its extrema at a sufficient rate, the optimal learning rates are of the same order as those obtained in the label-noise free setting. If the regression function approaches its extrema more gradually then classification performance necessarily degrades. In addition, we present an adaptive algorithm which attains these rates without prior knowledge of either the distributional parameters or the local density. This identifies for the first time a scenario in which finite sample rates are achievable in the label noise setting, but they differ from the optimal rates without label noise.
Tasks
Published	2019-02-14
URL	https://arxiv.org/abs/1902.05627v2
PDF	https://arxiv.org/pdf/1902.05627v2.pdf
PWC	https://paperswithcode.com/paper/classification-with-unknown-class-conditional
Repo
Framework

Shifted Randomized Singular Value Decomposition


Title	Shifted Randomized Singular Value Decomposition
Authors	Ali Basirat
Abstract	We extend the randomized singular value decomposition (SVD) algorithm \citep{Halko2011finding} to estimate the SVD of a shifted data matrix without explicitly constructing the matrix in the memory. With no loss in the accuracy of the original algorithm, the extended algorithm provides for a more efficient way of matrix factorization. The algorithm facilitates the low-rank approximation and principal component analysis (PCA) of off-center data matrices. When applied to different types of data matrices, our experimental results confirm the advantages of the extensions made to the original algorithm.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11772v2
PDF	https://arxiv.org/pdf/1911.11772v2.pdf
PWC	https://paperswithcode.com/paper/sifted-randomized-singular-value
Repo
Framework

A Study of the Learning Progress in Neural Architecture Search Techniques


Title	A Study of the Learning Progress in Neural Architecture Search Techniques
Authors	Prabhant Singh, Tobias Jacobs, Sebastien Nicolas, Mischa Schmidt
Abstract	In neural architecture search, the structure of the neural network to best model a given dataset is determined by an automated search process. Efficient Neural Architecture Search (ENAS), proposed by Pham et al. (2018), has recently received considerable attention due to its ability to find excellent architectures within a comparably short search time. In this work, which is motivated by the quest to further improve the learning speed of architecture search, we evaluate the learning progress of the controller which generates the architectures in ENAS. We measure the progress by comparing the architectures generated by it at different controller training epochs, where architectures are evaluated after having re-trained them from scratch. As a surprising result, we find that the learning curves are completely flat, i.e., there is no observable progress of the controller in terms of the performance of its generated architectures. This observation is consistent across the CIFAR-10 and CIFAR-100 datasets and two different search spaces. We conclude that the high quality of the models generated by ENAS is a result of the search space design rather than the controller training, and our results indicate that one-shot architecture design is an efficient alternative to architecture search by ENAS.
Tasks	Neural Architecture Search
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07590v1
PDF	https://arxiv.org/pdf/1906.07590v1.pdf
PWC	https://paperswithcode.com/paper/a-study-of-the-learning-progress-in-neural
Repo
Framework

Using Pairwise Occurrence Information to Improve Knowledge Graph Completion on Large-Scale Datasets


Title	Using Pairwise Occurrence Information to Improve Knowledge Graph Completion on Large-Scale Datasets
Authors	Esma Balkir, Masha Naslidnyk, Dave Palfrey, Arpit Mittal
Abstract	Bilinear models such as DistMult and ComplEx are effective methods for knowledge graph (KG) completion. However, they require large batch sizes, which becomes a performance bottleneck when training on large scale datasets due to memory constraints. In this paper we use occurrences of entity-relation pairs in the dataset to construct a joint learning model and to increase the quality of sampled negatives during training. We show on three standard datasets that when these two techniques are combined, they give a significant improvement in performance, especially when the batch size and the number of generated negative examples are low relative to the size of the dataset. We then apply our techniques to a dataset containing 2 million entities and demonstrate that our model outperforms the baseline by 2.8% absolute on hits@1.
Tasks	Knowledge Graph Completion, Link Prediction
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11583v1
PDF	https://arxiv.org/pdf/1910.11583v1.pdf
PWC	https://paperswithcode.com/paper/using-pairwise-occurrence-information-to
Repo
Framework

Semi-supervised Bootstrapping of Dialogue State Trackers for Task Oriented Modelling


Title	Semi-supervised Bootstrapping of Dialogue State Trackers for Task Oriented Modelling
Authors	Bo-Hsiang Tseng, Marek Rei, Paweł Budzianowski, Richard E. Turner, Bill Byrne, Anna Korhonen
Abstract	Dialogue systems benefit greatly from optimizing on detailed annotations, such as transcribed utterances, internal dialogue state representations and dialogue act labels. However, collecting these annotations is expensive and time-consuming, holding back development in the area of dialogue modelling. In this paper, we investigate semi-supervised learning methods that are able to reduce the amount of required intermediate labelling. We find that by leveraging un-annotated data instead, the amount of turn-level annotations of dialogue state can be significantly reduced when building a neural dialogue system. Our analysis on the MultiWOZ corpus, covering a range of domains and topics, finds that annotations can be reduced by up to 30% while maintaining equivalent system performance. We also describe and evaluate the first end-to-end dialogue model created for the MultiWOZ corpus.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11672v1
PDF	https://arxiv.org/pdf/1911.11672v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-bootstrapping-of-dialogue-1
Repo
Framework

Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory: An Abstract Algebraic Interpretation of Convolutional Neural Networks


Title	Seeing Convolution Through the Eyes of Finite Transformation Semigroup Theory: An Abstract Algebraic Interpretation of Convolutional Neural Networks
Authors	Andrew Hryniowski, Alexander Wong
Abstract	Researchers are actively trying to gain better insights into the representational properties of convolutional neural networks for guiding better network designs and for interpreting a network’s computational nature. Gaining such insights can be an arduous task due to the number of parameters in a network and the complexity of a network’s architecture. Current approaches of neural network interpretation include Bayesian probabilistic interpretations and information theoretic interpretations. In this study, we take a different approach to studying convolutional neural networks by proposing an abstract algebraic interpretation using finite transformation semigroup theory. Specifically, convolutional layers are broken up and mapped to a finite space. The state space of the proposed finite transformation semigroup is then defined as a single element within the convolutional layer, with the acting elements defined by surrounding state elements combined with convolution kernel elements. Generators of the finite transformation semigroup are defined to complete the interpretation. We leverage this approach to analyze the basic properties of the resulting finite transformation semigroup to gain insights on the representational properties of convolutional neural networks, including insights into quantized network representation. Such a finite transformation semigroup interpretation can also enable better understanding outside of the confines of fixed lattice data structures, thus useful for handling data that lie on irregular lattices. Furthermore, the proposed abstract algebraic interpretation is shown to be viable for interpreting convolutional operations within a variety of convolutional neural network architectures.
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10901v1
PDF	https://arxiv.org/pdf/1905.10901v1.pdf
PWC	https://paperswithcode.com/paper/seeing-convolution-through-the-eyes-of-finite
Repo
Framework

Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection


Title	Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection
Authors	Yuan Yuan, Yuwei Lu, Qi Wang
Abstract	Video-based vehicle detection and tracking is one of the most important components for Intelligent Transportation Systems (ITS). When it comes to road junctions, the problem becomes even more difficult due to the occlusions and complex interactions among vehicles. In order to get a precise detection and tracking result, in this work we propose a novel tracking-by-detection framework. In the detection stage, we present a sequential detection model to deal with serious occlusions. In the tracking stage, we model group behavior to treat complex interactions with overlaps and ambiguities. The main contributions of this paper are twofold: 1) Shape prior is exploited in the sequential detection model to tackle occlusions in crowded scene. 2) Traffic force is defined in the traffic scene to model group behavior, and it can assist to handle complex interactions among vehicles. We evaluate the proposed approach on real surveillance videos at road junctions and the performance has demonstrated the effectiveness of our method.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.12641v1
PDF	http://arxiv.org/pdf/1904.12641v1.pdf
PWC	https://paperswithcode.com/paper/190412641
Repo
Framework