January 26, 2020

3277 words 16 mins read

Paper Group ANR 1462

TinySearch – Semantics based Search Engine using Bert Embeddings. All Neural Networks are Created Equal. Neural Networks, Hypersurfaces, and Radon Transforms. Human-In-The-Loop Learning of Qualitative Preference Models. Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once. Towards Robust Direct Perception Netw …

TinySearch – Semantics based Search Engine using Bert Embeddings


Title	TinySearch – Semantics based Search Engine using Bert Embeddings
Authors	Manish Patel
Abstract	Existing search engines use keyword matching or tf-idf based matching to map the query to the web-documents and rank them. They also consider other factors such as page rank, hubs-and-authority scores, knowledge graphs to make the results more meaningful. However, the existing search engines fail to capture the meaning of query when it becomes large and complex. BERT, introduced by Google in 2018, provides embeddings for words as well as sentences. In this paper, I have developed a semantics-oriented search engine using neural networks and BERT embeddings that can search for query and rank the documents in the order of the most meaningful to least meaningful. The results shows improvement over one existing search engine for complex queries for given set of documents.
Tasks	Knowledge Graphs
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02451v1
PDF	https://arxiv.org/pdf/1908.02451v1.pdf
PWC	https://paperswithcode.com/paper/tinysearch-semantics-based-search-engine
Repo
Framework

All Neural Networks are Created Equal


Title	All Neural Networks are Created Equal
Authors	Guy Hacohen, Leshem Choshen, Daphna Weinshall
Abstract	One of the unresolved questions in deep learning is the nature of the solutions that are being discovered. We investigate the collection of solutions reached by the same network architecture, with different random initialization of weights and random mini-batches. These solutions are shown to be rather similar - more often than not, each train and test example is either classified correctly by all the networks, or by none at all. Surprisingly, all the network instances seem to share the same learning dynamics, whereby initially the same train and test examples are correctly recognized by the learned model, followed by other examples which are learned in roughly the same order. When extending the investigation to heterogeneous collections of neural network architectures, once again examples are seen to be learned in the same order irrespective of architecture, although the more powerful architecture may continue to learn and thus achieve higher accuracy. This pattern of results remains true even when the composition of classes in the test set is unrelated to the train set, for example, when using out of sample natural images or even artificial images. To show the robustness of these phenomena we provide an extensive summary of our empirical study, which includes hundreds of graphs describing tens of thousands of networks with varying NN architectures, hyper-parameters and domains. We also discuss cases where this pattern of similarity breaks down, which show that the reported similarity is not an artifact of optimization by gradient descent. Rather, the observed pattern of similarity is characteristic of learning complex problems with big networks. Finally, we show that this pattern of similarity seems to be strongly correlated with effective generalization.
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10854v4
PDF	https://arxiv.org/pdf/1905.10854v4.pdf
PWC	https://paperswithcode.com/paper/all-neural-networks-are-created-equal
Repo
Framework

Neural Networks, Hypersurfaces, and Radon Transforms


Title	Neural Networks, Hypersurfaces, and Radon Transforms
Authors	Soheil Kolouri, Xuwang Yin, Gustavo K. Rohde
Abstract	Connections between integration along hypersufaces, Radon transforms, and neural networks are exploited to highlight an integral geometric mathematical interpretation of neural networks. By analyzing the properties of neural networks as operators on probability distributions for observed data, we show that the distribution of outputs for any node in a neural network can be interpreted as a nonlinear projection along hypersurfaces defined by level surfaces over the input data space. We utilize these descriptions to provide new interpretation for phenomena such as nonlinearity, pooling, activation functions, and adversarial examples in neural network-based learning problems.
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02220v1
PDF	https://arxiv.org/pdf/1907.02220v1.pdf
PWC	https://paperswithcode.com/paper/neural-networks-hypersurfaces-and-radon
Repo
Framework

Human-In-The-Loop Learning of Qualitative Preference Models


Title	Human-In-The-Loop Learning of Qualitative Preference Models
Authors	Joseph Allen, Ahmed Moussa, Xudong Liu
Abstract	In this work, we present a novel human-in-the-loop framework to help the human user understand the decision making process that involves choosing preferred options. We focus on qualitative preference models over alternatives from combinatorial domains. This framework is interactive: the user provides her behavioral data to the framework, and the framework explains the learned model to the user. It is iterative: the framework collects feedback on the learned model from the user and tries to improve it accordingly till the user terminates the iteration. In order to communicate the learned preference model to the user, we develop visualization of intuitive and explainable graphic models, such as lexicographic preference trees and forests, and conditional preference networks. To this end, we discuss key aspects of our framework for lexicographic preference models.
Tasks	Decision Making
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09064v1
PDF	https://arxiv.org/pdf/1909.09064v1.pdf
PWC	https://paperswithcode.com/paper/human-in-the-loop-learning-of-qualitative
Repo
Framework

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once


Title	Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once
Authors	Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dongdong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang
Abstract	Modern deep neural networks are often vulnerable to adversarial samples. Based on the first optimization-based attacking method, many following methods are proposed to improve the attacking performance and speed. Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods. However, current generation-based methods are only able to attack one specific target (category) within one model, thus making them not applicable to real classification systems that often have hundreds/thousands of categories. In this paper, we propose the first Multi-target Adversarial Network (MAN), which can generate multi-target adversarial samples with a single model. By incorporating the specified category information into the intermediate features, it can attack any category of the target classification model during runtime. Experiments show that the proposed MAN can produce stronger attack results and also have better transferability than previous state-of-the-art methods in both multi-target attack task and single-target attack task. We further use the adversarial samples generated by our MAN to improve the robustness of the classification model. It can also achieve better classification accuracy than other methods when attacked by various methods.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05185v1
PDF	https://arxiv.org/pdf/1908.05185v1.pdf
PWC	https://paperswithcode.com/paper/once-a-man-towards-multi-target-attack-via
Repo
Framework

Towards Robust Direct Perception Networks for Automated Driving


Title	Towards Robust Direct Perception Networks for Automated Driving
Authors	Chih-Hong Cheng
Abstract	We consider the problem of engineering robust direct perception neural networks with output being regression. Such networks take high dimensional input image data, and they produce affordances such as the curvature of the upcoming road segment or the distance to the front vehicle. Our proposal starts by allowing a neural network prediction to deviate from the label with tolerance $\Delta$. The source of tolerance can be either contractual or from limiting factors where two entities may label the same data with slightly different numerical values. The tolerance motivates the use of a non-standard loss function where the loss is set to $0$ so long as the prediction-to-label distance is less than $\Delta$. We further extend the loss function and define a new provably robust criterion that is parametric to the allowed output tolerance $\Delta$, the layer index $\tilde{l}$ where perturbation is considered, and the maximum perturbation amount $\kappa$. During training, the robust loss is computed by first propagating symbolic errors from the $\tilde{l}$-th layer (with quantity bounded by $\kappa$) to the output layer, followed by computing the overflow between the error bounds and the allowed tolerance. The overall concept is experimented in engineering a direct perception neural network for understanding the central position of the ego-lane in pixel coordinates.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13600v1
PDF	https://arxiv.org/pdf/1909.13600v1.pdf
PWC	https://paperswithcode.com/paper/towards-robust-direct-perception-networks-for
Repo
Framework

Model-free Deep Reinforcement Learning for Urban Autonomous Driving


Title	Model-free Deep Reinforcement Learning for Urban Autonomous Driving
Authors	Jianyu Chen, Bodi Yuan, Masayoshi Tomizuka
Abstract	Urban autonomous driving decision making is challenging due to complex road geometry and multi-agent interactions. Current decision making methods are mostly manually designing the driving policy, which might result in sub-optimal solutions and is expensive to develop, generalize and maintain at scale. On the other hand, with reinforcement learning (RL), a policy can be learned and improved automatically without any manual designs. However, current RL methods generally do not work well on complex urban scenarios. In this paper, we propose a framework to enable model-free deep reinforcement learning in challenging urban autonomous driving scenarios. We design a specific input representation and use visual encoding to capture the low-dimensional latent states. Several state-of-the-art model-free deep RL algorithms are implemented into our framework, with several tricks to improve their performance. We evaluate our method in a challenging roundabout task with dense surrounding vehicles in a high-definition driving simulator. The result shows that our method can solve the task well and is significantly better than the baseline.
Tasks	Autonomous Driving, Decision Making
Published	2019-04-20
URL	https://arxiv.org/abs/1904.09503v2
PDF	https://arxiv.org/pdf/1904.09503v2.pdf
PWC	https://paperswithcode.com/paper/model-free-deep-reinforcement-learning-for
Repo
Framework

Convolutional Composer Classification


Title	Convolutional Composer Classification
Authors	Harsh Verma, John Thickstun
Abstract	This paper investigates end-to-end learnable models for attributing composers to musical scores. We introduce several pooled, convolutional architectures for this task and draw connections between our approach and classical learning approaches based on global and n-gram features. We evaluate models on a corpus of 2,500 scores from the KernScores collection, authored by a variety of composers spanning the Renaissance era to the early 20th century. This corpus has substantial overlap with the corpora used in several previous, smaller studies; we compare our results on subsets of the corpus to these previous works.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11737v1
PDF	https://arxiv.org/pdf/1911.11737v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-composer-classification
Repo
Framework

Learning with Hierarchical Complement Objective


Title	Learning with Hierarchical Complement Objective
Authors	Hao-Yun Chen, Li-Huang Tsai, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan
Abstract	Label hierarchies widely exist in many vision-related problems, ranging from explicit label hierarchies existed in image classification to latent label hierarchies existed in semantic segmentation. Nevertheless, state-of-the-art methods often deploy cross-entropy loss that implicitly assumes class labels to be exclusive and thus independence from each other. Motivated by the fact that classes from the same parental category usually share certain similarity, we design a new training diagram called Hierarchical Complement Objective Training (HCOT) that leverages the information from label hierarchy. HCOT maximizes the probability of the ground truth class, and at the same time, neutralizes the probabilities of rest of the classes in a hierarchical fashion, making the model take advantage of the label hierarchy explicitly. The proposed HCOT is evaluated on both image classification and semantic segmentation tasks. Experimental results confirm that HCOT outperforms state-of-the-art models in CIFAR-100, ImageNet-2012, and PASCAL-Context. The study further demonstrates that HCOT can be applied on tasks with latent label hierarchies, which is a common characteristic in many machine learning tasks.
Tasks	Image Classification, Semantic Segmentation
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07257v1
PDF	https://arxiv.org/pdf/1911.07257v1.pdf
PWC	https://paperswithcode.com/paper/learning-with-hierarchical-complement
Repo
Framework

Watch, Try, Learn: Meta-Learning from Demonstrations and Reward


Title	Watch, Try, Learn: Meta-Learning from Demonstrations and Reward
Authors	Allan Zhou, Eric Jang, Daniel Kappler, Alex Herzog, Mohi Khansari, Paul Wohlhart, Yunfei Bai, Mrinal Kalakrishnan, Sergey Levine, Chelsea Finn
Abstract	Imitation learning allows agents to learn complex behaviors from demonstrations. However, learning a complex vision-based task may require an impractical number of demonstrations. Meta-imitation learning is a promising approach towards enabling agents to learn a new task from one or a few demonstrations by leveraging experience from learning similar tasks. In the presence of task ambiguity or unobserved dynamics, demonstrations alone may not provide enough information; an agent must also try the task to successfully infer a policy. In this work, we propose a method that can learn to learn from both demonstrations and trial-and-error experience with sparse reward feedback. In comparison to meta-imitation, this approach enables the agent to effectively and efficiently improve itself autonomously beyond the demonstration data. In comparison to meta-reinforcement learning, we can scale to substantially broader distributions of tasks, as the demonstration reduces the burden of exploration. Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks.
Tasks	Imitation Learning, Meta-Learning
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03352v4
PDF	https://arxiv.org/pdf/1906.03352v4.pdf
PWC	https://paperswithcode.com/paper/watch-try-learn-meta-learning-from
Repo
Framework

HM-NAS: Efficient Neural Architecture Search via Hierarchical Masking


Title	HM-NAS: Efficient Neural Architecture Search via Hierarchical Masking
Authors	Shen Yan, Biyi Fang, Faen Zhang, Yu Zheng, Xiao Zeng, Hui Xu, Mi Zhang
Abstract	The use of automatic methods, often referred to as Neural Architecture Search (NAS), in designing neural network architectures has recently drawn considerable attention. In this work, we present an efficient NAS approach, named HM- NAS, that generalizes existing weight sharing based NAS approaches. Existing weight sharing based NAS approaches still adopt hand-designed heuristics to generate architecture candidates. As a consequence, the space of architecture candidates is constrained in a subset of all possible architectures, making the architecture search results sub-optimal. HM-NAS addresses this limitation via two innovations. First, HM-NAS incorporates a multi-level architecture encoding scheme to enable searching for more flexible network architectures. Second, it discards the hand-designed heuristics and incorporates a hierarchical masking scheme that automatically learns and determines the optimal architecture. Compared to state-of-the-art weight sharing based approaches, HM-NAS is able to achieve better architecture search performance and competitive model evaluation accuracy. Without the constraint imposed by the hand-designed heuristics, our searched networks contain more flexible and meaningful architectures that existing weight sharing based NAS approaches are not able to discover.
Tasks	Neural Architecture Search
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00122v2
PDF	https://arxiv.org/pdf/1909.00122v2.pdf
PWC	https://paperswithcode.com/paper/hm-nas-efficient-neural-architecture-search
Repo
Framework

Efficient Novelty-Driven Neural Architecture Search


Title	Efficient Novelty-Driven Neural Architecture Search
Authors	Miao Zhang, Huiqi Li, Shirui Pan, Taoping Liu, Steven Su
Abstract	One-Shot Neural architecture search (NAS) attracts broad attention recently due to its capacity to reduce the computational hours through weight sharing. However, extensive experiments on several recent works show that there is no positive correlation between the validation accuracy with inherited weights from the supernet and the test accuracy after re-training for One-Shot NAS. Different from devising a controller to find the best performing architecture with inherited weights, this paper focuses on how to sample architectures to train the supernet to make it more predictive. A single-path supernet is adopted, where only a small part of weights are optimized in each step, to reduce the memory demand greatly. Furthermore, we abandon devising complicated reward based architecture sampling controller, and sample architectures to train supernet based on novelty search. An efficient novelty search method for NAS is devised in this paper, and extensive experiments demonstrate the effectiveness and efficiency of our novelty search based architecture sampling method. The best architecture obtained by our algorithm with the same search space achieves the state-of-the-art test error rate of 2.51% on CIFAR-10 with only 7.5 hours search time in a single GPU, and a validation perplexity of 60.02 and a test perplexity of 57.36 on PTB. We also transfer these search cell structures to larger datasets ImageNet and WikiText-2, respectively.
Tasks	Neural Architecture Search
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09109v1
PDF	https://arxiv.org/pdf/1907.09109v1.pdf
PWC	https://paperswithcode.com/paper/efficient-novelty-driven-neural-architecture
Repo
Framework

How bad is worst-case data if you know where it comes from?


Title	How bad is worst-case data if you know where it comes from?
Authors	Justin Y. Chen, Gregory Valiant, Paul Valiant
Abstract	We introduce a framework for studying how distributional assumptions on the process by which data is partitioned into a training and test set can be leveraged to provide accurate estimation or learning algorithms, even for worst-case datasets. We consider a setting of $n$ datapoints, $x_1,\ldots,x_n$, together with a specified distribution, $P$, over partitions of these datapoints into a training set, test set, and irrelevant set. An algorithm takes as input a description of $P$ (or sample access), the indices of the test and training sets, and the datapoints in the training set, and returns a model or estimate that will be evaluated on the datapoints in the test set. We evaluate an algorithm in terms of its worst-case expected performance: the expected performance over potential test/training sets, for worst-case datapoints, $x_1,\ldots,x_n.$ This framework is a departure from more typical distributional assumptions on the datapoints (e.g. that data is drawn independently, or according to an exchangeable process), and can model a number of natural data collection processes, including processes with dependencies such as “snowball sampling” and “chain sampling”, and settings where test and training sets satisfy chronological constraints (e.g. the test instances were observed after the training instances). Within this framework, we consider the setting where datapoints are bounded real numbers, and the goal is to estimate the mean of the test set. We give an efficient algorithm that returns a weighted combination of the training set—whose weights depend on the distribution, $P$, and on the training and test set indices—and show that the worst-case expected error achieved by this algorithm is at most a multiplicative $\pi/2$ factor worse than the optimal of such algorithms. The algorithm, and its proof, leverage a surprising connection to the Grothendieck problem.
Tasks
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03605v1
PDF	https://arxiv.org/pdf/1911.03605v1.pdf
PWC	https://paperswithcode.com/paper/how-bad-is-worst-case-data-if-you-know-where
Repo
Framework

On the use of BERT for Neural Machine Translation


Title	On the use of BERT for Neural Machine Translation
Authors	Stéphane Clinchant, Kweon Woo Jung, Vassilina Nikoulina
Abstract	Exploiting large pretrained models for various NMT tasks have gained a lot of visibility recently. In this work we study how BERT pretrained models could be exploited for supervised Neural Machine Translation. We compare various ways to integrate pretrained BERT model with NMT model and study the impact of the monolingual data used for BERT training on the final translation quality. We use WMT-14 English-German, IWSLT15 English-German and IWSLT14 English-Russian datasets for these experiments. In addition to standard task test set evaluation, we perform evaluation on out-of-domain test sets and noise injected test sets, in order to assess how BERT pretrained representations affect model robustness.
Tasks	Machine Translation
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12744v1
PDF	https://arxiv.org/pdf/1909.12744v1.pdf
PWC	https://paperswithcode.com/paper/on-the-use-of-bert-for-neural-machine
Repo
Framework

To Balance or Not to Balance: A Simple-yet-Effective Approach for Learning with Long-Tailed Distributions


Title	To Balance or Not to Balance: A Simple-yet-Effective Approach for Learning with Long-Tailed Distributions
Authors	Junjie Zhang, Lingqiao Liu, Peng Wang, Chunhua Shen
Abstract	Real-world visual data often exhibits a long-tailed distribution, where some ‘‘head’’ classes have a large number of samples, yet only a few samples are available for ‘‘tail’’ classes. Such imbalanced distribution causes a great challenge for learning a deep neural network, which can be boiled down into a dilemma: on the one hand, we prefer to increase the exposure of tail class samples to avoid the excessive dominance of head classes in the classifier training. On the other hand, oversampling tail classes makes the network prone to over-fitting, since head class samples are often consequently under-represented. To resolve this dilemma, in this paper, we propose a simple-yet-effective auxiliary learning approach. The key idea is to split a network into a classifier part and a feature extractor part, and then employ different training strategies for each part. Specifically, to promote the awareness of tail-classes, a class-balanced sampling scheme is utilised for training both the classifier and the feature extractor. For the feature extractor, we also introduce an auxiliary training task, which is to train a classifier under the regular random sampling scheme. In this way, the feature extractor is jointly trained from both sampling strategies and thus can take advantage of all training data and avoid the over-fitting issue. Apart from this basic auxiliary task, we further explore the benefit of using self-supervised learning as the auxiliary task. Without using any bells and whistles, our model achieves superior performance over the state-of-the-art solutions.
Tasks	Auxiliary Learning
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04486v2
PDF	https://arxiv.org/pdf/1912.04486v2.pdf
PWC	https://paperswithcode.com/paper/to-balance-or-not-to-balance-an
Repo
Framework