October 15, 2019

2830 words 14 mins read

Paper Group NANR 99

SNAG: Spoken Narratives and Gaze Dataset. ISNN: Impact Sound Neural Network for Audio-Visual Object Classification. Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks. Multi-Module Recurrent Neural Networks with Transfer Learning. Avoiding degradation in deep feed-forward networks by phasing o …

SNAG: Spoken Narratives and Gaze Dataset


Title	SNAG: Spoken Narratives and Gaze Dataset
Authors	Preethi Vaidyanathan, Emily T. Prud{'}hommeaux, Jeff B. Pelz, Cecilia O. Alm
Abstract	Humans rely on multiple sensory modalities when examining and reasoning over images. In this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task. The task was performed by multiple participants on 100 general-domain images showing everyday objects and activities. We demonstrate the usefulness of the dataset by applying an existing visual-linguistic data fusion framework in order to label important image regions with appropriate linguistic labels.
Tasks
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2022/
PDF	https://www.aclweb.org/anthology/P18-2022
PWC	https://paperswithcode.com/paper/snag-spoken-narratives-and-gaze-dataset
Repo
Framework

ISNN: Impact Sound Neural Network for Audio-Visual Object Classification


Title	ISNN: Impact Sound Neural Network for Audio-Visual Object Classification
Authors	Auston Sterling, Justin Wilson, Sam Lowe, Ming C. Lin
Abstract	3D object geometry reconstruction remains a challenge when working with transparent, occluded, or highly reflective surfaces. While recent methods classify shape features using raw audio, we present a multimodal neural network optimized for estimating an object’s geometry and material. Our networks use spectrograms of recorded and synthesized object impact sounds and voxelized shape estimates to extend the capabilities of vision-based reconstruction. We evaluate our method on multiple datasets of both recorded and synthesized sounds. We further present an interactive application for real-time scene reconstruction in which a user can strike objects, producing sound that can instantly classify and segment the struck object, even if the object is transparent or visually occluded.
Tasks	Object Classification
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Auston_Sterling_ISNN_-_Impact_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Auston_Sterling_ISNN_-_Impact_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/isnn-impact-sound-neural-network-for-audio
Repo
Framework

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks


Title	Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks
Authors	Grant Rotskoff, Eric Vanden-Eijnden
Abstract	The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^{-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^{-1})$. These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7945-parameters-as-interacting-particles-long-time-convergence-and-asymptotic-error-scaling-of-neural-networks
PDF	http://papers.nips.cc/paper/7945-parameters-as-interacting-particles-long-time-convergence-and-asymptotic-error-scaling-of-neural-networks.pdf
PWC	https://paperswithcode.com/paper/parameters-as-interacting-particles-long-time
Repo
Framework

Multi-Module Recurrent Neural Networks with Transfer Learning


Title	Multi-Module Recurrent Neural Networks with Transfer Learning
Authors	Filip Skurniak, Maria Janicka, Aleks Wawer, er
Abstract	This paper describes multiple solutions designed and tested for the problem of word-level metaphor detection. The proposed systems are all based on variants of recurrent neural network architectures. Specifically, we explore multiple sources of information: pre-trained word embeddings (Glove), a dictionary of language concreteness and a transfer learning scenario based on the states of an encoder network from neural network machine translation system. One of the architectures is based on combining all three systems: (1) Neural CRF (Conditional Random Fields), trained directly on the metaphor data set; (2) Neural Machine Translation encoder of a transfer learning scenario; (3) a neural network used to predict final labels, trained directly on the metaphor data set. Our results vary between test sets: Neural CRF standalone is the best one on submission data, while combined system scores the highest on a test subset randomly selected from training data.
Tasks	Machine Translation, Transfer Learning, Word Embeddings
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-0917/
PDF	https://www.aclweb.org/anthology/W18-0917
PWC	https://paperswithcode.com/paper/multi-module-recurrent-neural-networks-with
Repo
Framework

Avoiding degradation in deep feed-forward networks by phasing out skip-connections


Title	Avoiding degradation in deep feed-forward networks by phasing out skip-connections
Authors	Ricardo Pio Monti, Sina Tootoonian, Robin Cao
Abstract	A widely observed phenomenon in deep learning is the degradation problem: increasing the depth of a network leads to a decrease in performance on both test and training data. Novel architectures such as ResNets and Highway networks have addressed this issue by introducing various flavors of skip-connections or gating mechanisms. However, the degradation problem persists in the context of plain feed-forward networks. In this work we propose a simple method to address this issue. The proposed method poses the learning of weights in deep networks as a constrained optimization problem where the presence of skip-connections is penalized by Lagrange multipliers. This allows for skip-connections to be introduced during the early stages of training and subsequently phased out in a principled manner. We demonstrate the benefits of such an approach with experiments on MNIST, fashion-MNIST, CIFAR-10 and CIFAR-100 where the proposed method is shown to greatly decrease the degradation effect (compared to plain networks) and is often competitive with ResNets.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=BJQPG5lR-
PDF	https://openreview.net/pdf?id=BJQPG5lR-
PWC	https://paperswithcode.com/paper/avoiding-degradation-in-deep-feed-forward
Repo
Framework

Discovering Interpretable Representations for Both Deep Generative and Discriminative Models


Title	Discovering Interpretable Representations for Both Deep Generative and Discriminative Models
Authors	Tameem Adel, Zoubin Ghahramani, Adrian Weller
Abstract	Interpretability of representations in both deep generative and discriminative models is highly desirable. Current methods jointly optimize an objective combining accuracy and interpretability. However, this may reduce accuracy, and is not applicable to already trained models. We propose two interpretability frameworks. First, we provide an interpretable lens for an existing model. We use a generative model which takes as input the representation in an existing (generative or discriminative) model, weakly supervised by limited side information. Applying a flexible and invertible transformation to the input leads to an interpretable representation with no loss in accuracy. We extend the approach using an active learning strategy to choose the most useful side information to obtain, allowing a human to guide what “interpretable” means. Our second framework relies on joint optimization for a representation which is both maximally informative about the side information and maximally compressive about the non-interpretable data factors. This leads to a novel perspective on the relationship between compression and regularization. We also propose a new interpretability evaluation metric based on our framework. Empirically, we achieve state-of-the-art results on three datasets using the two proposed algorithms.
Tasks	Active Learning
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=1908
PDF	http://proceedings.mlr.press/v80/adel18a/adel18a.pdf
PWC	https://paperswithcode.com/paper/discovering-interpretable-representations-for
Repo
Framework

Up-cycling Data for Natural Language Generation


Title	Up-cycling Data for Natural Language Generation
Authors	Amy Isard, Oberl, Jon er, Claire Grover
Abstract
Tasks	Text Generation
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1483/
PDF	https://www.aclweb.org/anthology/L18-1483
PWC	https://paperswithcode.com/paper/up-cycling-data-for-natural-language
Repo
Framework

Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss


Title	Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss
Authors	Jianbo Jiao, Ying Cao, Yibing Song, Rynson Lau
Abstract	Monocular depth estimation benefits greatly from learning based techniques. By studying the training data, we observe that the per-pixel depth values in existing datasets typically exhibit a long-tailed distribution. However, most previous approaches treat all the regions in the training data equally regardless of the imbalanced depth distribution, which restricts the model performance particularly on distant depth regions. In this paper, we investigate the long tail property and delve deeper into the distant depth regions (i.e. the tail part) to propose an attention-driven loss for the network supervision. In addition, to better leverage the semantic information for monocular depth estimation, we propose a synergy network to automatically learn the information sharing strategies between the two tasks. With the proposed attention-driven loss and synergy network, the depth estimation and semantic labeling tasks can be mutually improved. Experiments on the challenging indoor dataset show that the proposed approach achieves state-of-the-art performance on both monocular depth estimation and semantic labeling tasks.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Jianbo_Jiao_Look_Deeper_into_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Jianbo_Jiao_Look_Deeper_into_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/look-deeper-into-depth-monocular-depth
Repo
Framework

Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension


Title	Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension
Authors	Shusen Liu, Tao Li, Zhimin Li, Vivek Srikumar, Valerio Pascucci, Peer-Timo Bremer
Abstract	Neural networks models have gained unprecedented popularity in natural language processing due to their state-of-the-art performance and the flexible end-to-end training scheme. Despite their advantages, the lack of interpretability hinders the deployment and refinement of the models. In this work, we present a flexible visualization library for creating customized visual analytic environments, in which the user can investigate and interrogate the relationships among the input, the model internals (i.e., attention), and the output predictions, which in turn shed light on the model decision-making process.
Tasks	Decision Making, Natural Language Inference, Reading Comprehension
Published	2018-11-01
URL	https://www.aclweb.org/anthology/D18-2007/
PDF	https://www.aclweb.org/anthology/D18-2007
PWC	https://paperswithcode.com/paper/visual-interrogation-of-attention-based
Repo
Framework

Duplex Generative Adversarial Network for Unsupervised Domain Adaptation


Title	Duplex Generative Adversarial Network for Unsupervised Domain Adaptation
Authors	Lanqing Hu, Meina Kan, Shiguang Shan, Xilin Chen
Abstract	Domain adaptation attempts to transfer the knowledge obtained from the source domain to the target domain, i.e., the domain where the testing data are. The main challenge lies in the distribution discrepancy between source and target domain. Most existing works endeavor to learn domain invariant representation usually by minimizing a distribution distance, e.g., MMD and the discriminator in the recently proposed generative adversarial network (GAN). Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation. Specifically, our proposed network consists of three parts, an encoder, a generator and two discriminators. The encoder embeds samples from both domains into the latent representation, and the generator decodes the latent representation to both source and target domains respectively conditioned on a domain code, i.e., achieves domain transformation. The generator is pitted against duplex discriminators, one for source domain and the other for target, to ensure the reality of domain transformation, the latent representation domain invariant and the category information of it preserved as well. Our proposed work achieves the state-of-the-art performance on unsupervised domain adaptation of digit classification and object recognition.
Tasks	Domain Adaptation, Object Recognition, Unsupervised Domain Adaptation
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Duplex_Generative_Adversarial_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Duplex_Generative_Adversarial_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/duplex-generative-adversarial-network-for
Repo
Framework

Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings


Title	Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings
Authors	Maksym Del, Andre T{"a}ttar, Mark Fishel
Abstract	This paper describes the University of Tartu{'}s submission to the unsupervised machine translation track of WMT18 news translation shared task. We build several baseline translation systems for both directions of the English-Estonian language pair using monolingual data only; the systems belong to the phrase-based unsupervised machine translation paradigm where we experimented with phrase lengths of up to 3. As a main contribution, we performed a set of standalone experiments with compositional phrase embeddings as a substitute for phrases as individual vocabulary entries. Results show that reasonable n-gram vectors can be obtained by simply summing up individual word vectors which retains or improves the performance of phrase-based unsupervised machine tranlation systems while avoiding limitations of atomic phrase vectors.
Tasks	Machine Translation, Unsupervised Machine Translation
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6407/
PDF	https://www.aclweb.org/anthology/W18-6407
PWC	https://paperswithcode.com/paper/phrase-based-unsupervised-machine-translation
Repo
Framework

Learning to Actively Learn Neural Machine Translation


Title	Learning to Actively Learn Neural Machine Translation
Authors	Ming Liu, Wray Buntine, Gholamreza Haffari
Abstract	Traditional active learning (AL) methods for machine translation (MT) rely on heuristics. However, these heuristics are limited when the characteristics of the MT problem change due to e.g. the language pair or the amount of the initial bitext. In this paper, we present a framework to learn sentence selection strategies for neural MT. We train the AL query strategy using a high-resource language-pair based on AL simulations, and then transfer it to the low-resource language-pair of interest. The learned query strategy capitalizes on the shared characteristics between the language pairs to make an effective use of the AL budget. Our experiments on three language-pairs confirms that our method is more effective than strong heuristic-based methods in various conditions, including cold-start and warm-start as well as small and extremely small data conditions.
Tasks	Active Learning, Imitation Learning, Machine Translation
Published	2018-10-01
URL	https://www.aclweb.org/anthology/K18-1033/
PDF	https://www.aclweb.org/anthology/K18-1033
PWC	https://paperswithcode.com/paper/learning-to-actively-learn-neural-machine
Repo
Framework

Document Representation Learning for Patient History Visualization


Title	Document Representation Learning for Patient History Visualization
Authors	Halid Ziya Yerebakan, Yoshihisa Shinagawa, Parmeet Bhatia, Yiqiang Zhan
Abstract	We tackle the problem of generating a diagrammatic summary of a set of documents each of which pertains to loosely related topics. In particular, we aim at visualizing the medical histories of patients. In medicine, choosing relevant reports from a patient{'}s past exams for comparison provide valuable information for precise treatment planning. Manually finding the relevant reports for comparison studies from a large database is time-consuming, which could result overlooking of some critical information. This task can be automated by defining similarity among documents which is a nontrivial task since these documents are often stored in an unstructured text format. To facilitate this, we have used a representation learning algorithm that creates a semantic representation space for documents where the clinically related documents lie close to each other. We have utilized referral information to weakly supervise a LSTM network to learn this semantic space. The abstract representations within this semantic space are not only useful to visualize disease progressions corresponding to the relevant report groups of a patient, but are also beneficial to analyze diseases at the population level. The proposed key tool here is clustering of documents based on the document similarity whose metric is learned from corpora.
Tasks	Representation Learning
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-2007/
PDF	https://www.aclweb.org/anthology/C18-2007
PWC	https://paperswithcode.com/paper/document-representation-learning-for-patient
Repo
Framework

Neuron as an Agent


Title	Neuron as an Agent
Authors	Shohei Ohsawa, Kei Akuzawa, Tatsuya Matsushima, Gustavo Bezerra, Yusuke Iwasawa, Hiroshi Kajino, Seiya Takenaka, Yutaka Matsuo
Abstract	Existing multi-agent reinforcement learning (MARL) communication methods have relied on a trusted third party (TTP) to distribute reward to agents, leaving them inapplicable in peer-to-peer environments. This paper proposes reward distribution using {\em Neuron as an Agent} (NaaA) in MARL without a TTP with two key ideas: (i) inter-agent reward distribution and (ii) auction theory. Auction theory is introduced because inter-agent reward distribution is insufficient for optimization. Agents in NaaA maximize their profits (the difference between reward and cost) and, as a theoretical result, the auction mechanism is shown to have agents autonomously evaluate counterfactual returns as the values of other agents. NaaA enables representation trades in peer-to-peer environments, ultimately regarding unit in neural networks as agents. Finally, numerical experiments (a single-agent environment from OpenAI Gym and a multi-agent environment from ViZDoom) confirm that NaaA framework optimization leads to better performance in reinforcement learning.
Tasks	Multi-agent Reinforcement Learning
Published	2018-01-01
URL	https://openreview.net/forum?id=BkfEzz-0-
PDF	https://openreview.net/pdf?id=BkfEzz-0-
PWC	https://paperswithcode.com/paper/neuron-as-an-agent
Repo
Framework

Exploiting Transitivity for Learning Person Re-Identification Models on a Budget


Title	Exploiting Transitivity for Learning Person Re-Identification Models on a Budget
Authors	Sourya Roy, Sujoy Paul, Neal E. Young, Amit K. Roy-Chowdhury
Abstract	Minimization of labeling effort for person re-identification in camera networks is an important problem as most of the existing popular methods are supervised and they require large amount of manual annotations, acquiring which is a tedious job. In this work, we focus on this labeling effort minimization problem and approach it as a subset selection task where the objective is to select an optimal subset of image-pairs for labeling without compromising performance. Towards this goal, our proposed scheme first represents any camera network (with k number of cameras) as an edge weighted complete k-partite graph where each vertex denotes a person and similarity scores between persons are used as edge-weights. Then in the second stage, our algorithm selects an optimal subset of pairs by solving a triangle free subgraph maximization problem on the k-partite graph. This sub-graph weight maximization problem is NP-hard (at least for k > = 4) which means for large datasets the optimization problem becomes intractable. In order to make our framework scalable, we propose two polynomial time approximately-optimal algorithms. The first algorithm is a 1/2-approximation algorithm which runs in linear time in the number of edges. The second algorithm is a greedy algorithm with sub-quadratic (in number of edges) time-complexity. Experiments on three state-of-the-art datasets depict that the proposed approach requires on an average only 8-15 % manually labeled pairs in order to achieve the performance when all the pairs are manually annotated.
Tasks	Person Re-Identification
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Roy_Exploiting_Transitivity_for_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Roy_Exploiting_Transitivity_for_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/exploiting-transitivity-for-learning-person
Repo
Framework