Paper Group NANR 99
SNAG: Spoken Narratives and Gaze Dataset. ISNN: Impact Sound Neural Network for Audio-Visual Object Classification. Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks. Multi-Module Recurrent Neural Networks with Transfer Learning. Avoiding degradation in deep feed-forward networks by phasing o …
SNAG: Spoken Narratives and Gaze Dataset
Title | SNAG: Spoken Narratives and Gaze Dataset |
Authors | Preethi Vaidyanathan, Emily T. Prud{'}hommeaux, Jeff B. Pelz, Cecilia O. Alm |
Abstract | Humans rely on multiple sensory modalities when examining and reasoning over images. In this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task. The task was performed by multiple participants on 100 general-domain images showing everyday objects and activities. We demonstrate the usefulness of the dataset by applying an existing visual-linguistic data fusion framework in order to label important image regions with appropriate linguistic labels. |
Tasks | |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2022/ |
https://www.aclweb.org/anthology/P18-2022 | |
PWC | https://paperswithcode.com/paper/snag-spoken-narratives-and-gaze-dataset |
Repo | |
Framework | |
ISNN: Impact Sound Neural Network for Audio-Visual Object Classification
Title | ISNN: Impact Sound Neural Network for Audio-Visual Object Classification |
Authors | Auston Sterling, Justin Wilson, Sam Lowe, Ming C. Lin |
Abstract | 3D object geometry reconstruction remains a challenge when working with transparent, occluded, or highly reflective surfaces. While recent methods classify shape features using raw audio, we present a multimodal neural network optimized for estimating an object’s geometry and material. Our networks use spectrograms of recorded and synthesized object impact sounds and voxelized shape estimates to extend the capabilities of vision-based reconstruction. We evaluate our method on multiple datasets of both recorded and synthesized sounds. We further present an interactive application for real-time scene reconstruction in which a user can strike objects, producing sound that can instantly classify and segment the struck object, even if the object is transparent or visually occluded. |
Tasks | Object Classification |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Auston_Sterling_ISNN_-_Impact_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Auston_Sterling_ISNN_-_Impact_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/isnn-impact-sound-neural-network-for-audio |
Repo | |
Framework | |
Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks
Title | Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks |
Authors | Grant Rotskoff, Eric Vanden-Eijnden |
Abstract | The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^{-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^{-1})$. These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7945-parameters-as-interacting-particles-long-time-convergence-and-asymptotic-error-scaling-of-neural-networks |
http://papers.nips.cc/paper/7945-parameters-as-interacting-particles-long-time-convergence-and-asymptotic-error-scaling-of-neural-networks.pdf | |
PWC | https://paperswithcode.com/paper/parameters-as-interacting-particles-long-time |
Repo | |
Framework | |
Multi-Module Recurrent Neural Networks with Transfer Learning
Title | Multi-Module Recurrent Neural Networks with Transfer Learning |
Authors | Filip Skurniak, Maria Janicka, Aleks Wawer, er |
Abstract | This paper describes multiple solutions designed and tested for the problem of word-level metaphor detection. The proposed systems are all based on variants of recurrent neural network architectures. Specifically, we explore multiple sources of information: pre-trained word embeddings (Glove), a dictionary of language concreteness and a transfer learning scenario based on the states of an encoder network from neural network machine translation system. One of the architectures is based on combining all three systems: (1) Neural CRF (Conditional Random Fields), trained directly on the metaphor data set; (2) Neural Machine Translation encoder of a transfer learning scenario; (3) a neural network used to predict final labels, trained directly on the metaphor data set. Our results vary between test sets: Neural CRF standalone is the best one on submission data, while combined system scores the highest on a test subset randomly selected from training data. |
Tasks | Machine Translation, Transfer Learning, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-0917/ |
https://www.aclweb.org/anthology/W18-0917 | |
PWC | https://paperswithcode.com/paper/multi-module-recurrent-neural-networks-with |
Repo | |
Framework | |
Avoiding degradation in deep feed-forward networks by phasing out skip-connections
Title | Avoiding degradation in deep feed-forward networks by phasing out skip-connections |
Authors | Ricardo Pio Monti, Sina Tootoonian, Robin Cao |
Abstract | A widely observed phenomenon in deep learning is the degradation problem: increasing the depth of a network leads to a decrease in performance on both test and training data. Novel architectures such as ResNets and Highway networks have addressed this issue by introducing various flavors of skip-connections or gating mechanisms. However, the degradation problem persists in the context of plain feed-forward networks. In this work we propose a simple method to address this issue. The proposed method poses the learning of weights in deep networks as a constrained optimization problem where the presence of skip-connections is penalized by Lagrange multipliers. This allows for skip-connections to be introduced during the early stages of training and subsequently phased out in a principled manner. We demonstrate the benefits of such an approach with experiments on MNIST, fashion-MNIST, CIFAR-10 and CIFAR-100 where the proposed method is shown to greatly decrease the degradation effect (compared to plain networks) and is often competitive with ResNets. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=BJQPG5lR- |
https://openreview.net/pdf?id=BJQPG5lR- | |
PWC | https://paperswithcode.com/paper/avoiding-degradation-in-deep-feed-forward |
Repo | |
Framework | |
Discovering Interpretable Representations for Both Deep Generative and Discriminative Models
Title | Discovering Interpretable Representations for Both Deep Generative and Discriminative Models |
Authors | Tameem Adel, Zoubin Ghahramani, Adrian Weller |
Abstract | Interpretability of representations in both deep generative and discriminative models is highly desirable. Current methods jointly optimize an objective combining accuracy and interpretability. However, this may reduce accuracy, and is not applicable to already trained models. We propose two interpretability frameworks. First, we provide an interpretable lens for an existing model. We use a generative model which takes as input the representation in an existing (generative or discriminative) model, weakly supervised by limited side information. Applying a flexible and invertible transformation to the input leads to an interpretable representation with no loss in accuracy. We extend the approach using an active learning strategy to choose the most useful side information to obtain, allowing a human to guide what “interpretable” means. Our second framework relies on joint optimization for a representation which is both maximally informative about the side information and maximally compressive about the non-interpretable data factors. This leads to a novel perspective on the relationship between compression and regularization. We also propose a new interpretability evaluation metric based on our framework. Empirically, we achieve state-of-the-art results on three datasets using the two proposed algorithms. |
Tasks | Active Learning |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=1908 |
http://proceedings.mlr.press/v80/adel18a/adel18a.pdf | |
PWC | https://paperswithcode.com/paper/discovering-interpretable-representations-for |
Repo | |
Framework | |
Up-cycling Data for Natural Language Generation
Title | Up-cycling Data for Natural Language Generation |
Authors | Amy Isard, Oberl, Jon er, Claire Grover |
Abstract | |
Tasks | Text Generation |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1483/ |
https://www.aclweb.org/anthology/L18-1483 | |
PWC | https://paperswithcode.com/paper/up-cycling-data-for-natural-language |
Repo | |
Framework | |
Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss
Title | Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss |
Authors | Jianbo Jiao, Ying Cao, Yibing Song, Rynson Lau |
Abstract | Monocular depth estimation benefits greatly from learning based techniques. By studying the training data, we observe that the per-pixel depth values in existing datasets typically exhibit a long-tailed distribution. However, most previous approaches treat all the regions in the training data equally regardless of the imbalanced depth distribution, which restricts the model performance particularly on distant depth regions. In this paper, we investigate the long tail property and delve deeper into the distant depth regions (i.e. the tail part) to propose an attention-driven loss for the network supervision. In addition, to better leverage the semantic information for monocular depth estimation, we propose a synergy network to automatically learn the information sharing strategies between the two tasks. With the proposed attention-driven loss and synergy network, the depth estimation and semantic labeling tasks can be mutually improved. Experiments on the challenging indoor dataset show that the proposed approach achieves state-of-the-art performance on both monocular depth estimation and semantic labeling tasks. |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Jianbo_Jiao_Look_Deeper_into_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Jianbo_Jiao_Look_Deeper_into_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/look-deeper-into-depth-monocular-depth |
Repo | |
Framework | |
Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension
Title | Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension |
Authors | Shusen Liu, Tao Li, Zhimin Li, Vivek Srikumar, Valerio Pascucci, Peer-Timo Bremer |
Abstract | Neural networks models have gained unprecedented popularity in natural language processing due to their state-of-the-art performance and the flexible end-to-end training scheme. Despite their advantages, the lack of interpretability hinders the deployment and refinement of the models. In this work, we present a flexible visualization library for creating customized visual analytic environments, in which the user can investigate and interrogate the relationships among the input, the model internals (i.e., attention), and the output predictions, which in turn shed light on the model decision-making process. |
Tasks | Decision Making, Natural Language Inference, Reading Comprehension |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/D18-2007/ |
https://www.aclweb.org/anthology/D18-2007 | |
PWC | https://paperswithcode.com/paper/visual-interrogation-of-attention-based |
Repo | |
Framework | |
Duplex Generative Adversarial Network for Unsupervised Domain Adaptation
Title | Duplex Generative Adversarial Network for Unsupervised Domain Adaptation |
Authors | Lanqing Hu, Meina Kan, Shiguang Shan, Xilin Chen |
Abstract | Domain adaptation attempts to transfer the knowledge obtained from the source domain to the target domain, i.e., the domain where the testing data are. The main challenge lies in the distribution discrepancy between source and target domain. Most existing works endeavor to learn domain invariant representation usually by minimizing a distribution distance, e.g., MMD and the discriminator in the recently proposed generative adversarial network (GAN). Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation. Specifically, our proposed network consists of three parts, an encoder, a generator and two discriminators. The encoder embeds samples from both domains into the latent representation, and the generator decodes the latent representation to both source and target domains respectively conditioned on a domain code, i.e., achieves domain transformation. The generator is pitted against duplex discriminators, one for source domain and the other for target, to ensure the reality of domain transformation, the latent representation domain invariant and the category information of it preserved as well. Our proposed work achieves the state-of-the-art performance on unsupervised domain adaptation of digit classification and object recognition. |
Tasks | Domain Adaptation, Object Recognition, Unsupervised Domain Adaptation |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Duplex_Generative_Adversarial_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Duplex_Generative_Adversarial_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/duplex-generative-adversarial-network-for |
Repo | |
Framework | |
Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings
Title | Phrase-based Unsupervised Machine Translation with Compositional Phrase Embeddings |
Authors | Maksym Del, Andre T{"a}ttar, Mark Fishel |
Abstract | This paper describes the University of Tartu{'}s submission to the unsupervised machine translation track of WMT18 news translation shared task. We build several baseline translation systems for both directions of the English-Estonian language pair using monolingual data only; the systems belong to the phrase-based unsupervised machine translation paradigm where we experimented with phrase lengths of up to 3. As a main contribution, we performed a set of standalone experiments with compositional phrase embeddings as a substitute for phrases as individual vocabulary entries. Results show that reasonable n-gram vectors can be obtained by simply summing up individual word vectors which retains or improves the performance of phrase-based unsupervised machine tranlation systems while avoiding limitations of atomic phrase vectors. |
Tasks | Machine Translation, Unsupervised Machine Translation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6407/ |
https://www.aclweb.org/anthology/W18-6407 | |
PWC | https://paperswithcode.com/paper/phrase-based-unsupervised-machine-translation |
Repo | |
Framework | |
Learning to Actively Learn Neural Machine Translation
Title | Learning to Actively Learn Neural Machine Translation |
Authors | Ming Liu, Wray Buntine, Gholamreza Haffari |
Abstract | Traditional active learning (AL) methods for machine translation (MT) rely on heuristics. However, these heuristics are limited when the characteristics of the MT problem change due to e.g. the language pair or the amount of the initial bitext. In this paper, we present a framework to learn sentence selection strategies for neural MT. We train the AL query strategy using a high-resource language-pair based on AL simulations, and then transfer it to the low-resource language-pair of interest. The learned query strategy capitalizes on the shared characteristics between the language pairs to make an effective use of the AL budget. Our experiments on three language-pairs confirms that our method is more effective than strong heuristic-based methods in various conditions, including cold-start and warm-start as well as small and extremely small data conditions. |
Tasks | Active Learning, Imitation Learning, Machine Translation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/K18-1033/ |
https://www.aclweb.org/anthology/K18-1033 | |
PWC | https://paperswithcode.com/paper/learning-to-actively-learn-neural-machine |
Repo | |
Framework | |
Document Representation Learning for Patient History Visualization
Title | Document Representation Learning for Patient History Visualization |
Authors | Halid Ziya Yerebakan, Yoshihisa Shinagawa, Parmeet Bhatia, Yiqiang Zhan |
Abstract | We tackle the problem of generating a diagrammatic summary of a set of documents each of which pertains to loosely related topics. In particular, we aim at visualizing the medical histories of patients. In medicine, choosing relevant reports from a patient{'}s past exams for comparison provide valuable information for precise treatment planning. Manually finding the relevant reports for comparison studies from a large database is time-consuming, which could result overlooking of some critical information. This task can be automated by defining similarity among documents which is a nontrivial task since these documents are often stored in an unstructured text format. To facilitate this, we have used a representation learning algorithm that creates a semantic representation space for documents where the clinically related documents lie close to each other. We have utilized referral information to weakly supervise a LSTM network to learn this semantic space. The abstract representations within this semantic space are not only useful to visualize disease progressions corresponding to the relevant report groups of a patient, but are also beneficial to analyze diseases at the population level. The proposed key tool here is clustering of documents based on the document similarity whose metric is learned from corpora. |
Tasks | Representation Learning |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-2007/ |
https://www.aclweb.org/anthology/C18-2007 | |
PWC | https://paperswithcode.com/paper/document-representation-learning-for-patient |
Repo | |
Framework | |
Neuron as an Agent
Title | Neuron as an Agent |
Authors | Shohei Ohsawa, Kei Akuzawa, Tatsuya Matsushima, Gustavo Bezerra, Yusuke Iwasawa, Hiroshi Kajino, Seiya Takenaka, Yutaka Matsuo |
Abstract | Existing multi-agent reinforcement learning (MARL) communication methods have relied on a trusted third party (TTP) to distribute reward to agents, leaving them inapplicable in peer-to-peer environments. This paper proposes reward distribution using {\em Neuron as an Agent} (NaaA) in MARL without a TTP with two key ideas: (i) inter-agent reward distribution and (ii) auction theory. Auction theory is introduced because inter-agent reward distribution is insufficient for optimization. Agents in NaaA maximize their profits (the difference between reward and cost) and, as a theoretical result, the auction mechanism is shown to have agents autonomously evaluate counterfactual returns as the values of other agents. NaaA enables representation trades in peer-to-peer environments, ultimately regarding unit in neural networks as agents. Finally, numerical experiments (a single-agent environment from OpenAI Gym and a multi-agent environment from ViZDoom) confirm that NaaA framework optimization leads to better performance in reinforcement learning. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=BkfEzz-0- |
https://openreview.net/pdf?id=BkfEzz-0- | |
PWC | https://paperswithcode.com/paper/neuron-as-an-agent |
Repo | |
Framework | |
Exploiting Transitivity for Learning Person Re-Identification Models on a Budget
Title | Exploiting Transitivity for Learning Person Re-Identification Models on a Budget |
Authors | Sourya Roy, Sujoy Paul, Neal E. Young, Amit K. Roy-Chowdhury |
Abstract | Minimization of labeling effort for person re-identification in camera networks is an important problem as most of the existing popular methods are supervised and they require large amount of manual annotations, acquiring which is a tedious job. In this work, we focus on this labeling effort minimization problem and approach it as a subset selection task where the objective is to select an optimal subset of image-pairs for labeling without compromising performance. Towards this goal, our proposed scheme first represents any camera network (with k number of cameras) as an edge weighted complete k-partite graph where each vertex denotes a person and similarity scores between persons are used as edge-weights. Then in the second stage, our algorithm selects an optimal subset of pairs by solving a triangle free subgraph maximization problem on the k-partite graph. This sub-graph weight maximization problem is NP-hard (at least for k > = 4) which means for large datasets the optimization problem becomes intractable. In order to make our framework scalable, we propose two polynomial time approximately-optimal algorithms. The first algorithm is a 1/2-approximation algorithm which runs in linear time in the number of edges. The second algorithm is a greedy algorithm with sub-quadratic (in number of edges) time-complexity. Experiments on three state-of-the-art datasets depict that the proposed approach requires on an average only 8-15 % manually labeled pairs in order to achieve the performance when all the pairs are manually annotated. |
Tasks | Person Re-Identification |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Roy_Exploiting_Transitivity_for_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Roy_Exploiting_Transitivity_for_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-transitivity-for-learning-person |
Repo | |
Framework | |