January 26, 2020

3145 words 15 mins read

Paper Group ANR 1480

Learning latent state representation for speeding up exploration. Improving Generalization in Coreference Resolution via Adversarial Training. Non-convex Penalty for Tensor Completion and Robust PCA. Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning. A New Distribution-Free Concept for Representing, Comparing, and Propag …

Learning latent state representation for speeding up exploration


Title	Learning latent state representation for speeding up exploration
Authors	Giulia Vezzani, Abhishek Gupta, Lorenzo Natale, Pieter Abbeel
Abstract	Exploration is an extremely challenging problem in reinforcement learning, especially in high dimensional state and action spaces and when only sparse rewards are available. Effective representations can indicate which components of the state are task relevant and thus reduce the dimensionality of the space to explore. In this work, we take a representation learning viewpoint on exploration, utilizing prior experience to learn effective latent representations, which can subsequently indicate which regions to explore. Prior experience on separate but related tasks help learn representations of the state which are effective at predicting instantaneous rewards. These learned representations can then be used with an entropy-based exploration method to effectively perform exploration in high dimensional spaces by effectively lowering the dimensionality of the search space. We show the benefits of this representation for meta-exploration in a simulated object pushing environment.
Tasks	Representation Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.12621v1
PDF	https://arxiv.org/pdf/1905.12621v1.pdf
PWC	https://paperswithcode.com/paper/learning-latent-state-representation-for
Repo
Framework

Improving Generalization in Coreference Resolution via Adversarial Training


Title	Improving Generalization in Coreference Resolution via Adversarial Training
Authors	Sanjay Subramanian, Dan Roth
Abstract	In order for coreference resolution systems to be useful in practice, they must be able to generalize to new text. In this work, we demonstrate that the performance of the state-of-the-art system decreases when the names of PER and GPE named entities in the CoNLL dataset are changed to names that do not occur in the training set. We use the technique of adversarial gradient-based training to retrain the state-of-the-art system and demonstrate that the retrained system achieves higher performance on the CoNLL dataset (both with and without the change of named entities) and the GAP dataset.
Tasks	Coreference Resolution
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04728v1
PDF	https://arxiv.org/pdf/1908.04728v1.pdf
PWC	https://paperswithcode.com/paper/improving-generalization-in-coreference-1
Repo
Framework

Non-convex Penalty for Tensor Completion and Robust PCA


Title	Non-convex Penalty for Tensor Completion and Robust PCA
Authors	Tao Li, Jinwen Ma
Abstract	In this paper, we propose a novel non-convex tensor rank surrogate function and a novel non-convex sparsity measure for tensor. The basic idea is to sidestep the bias of $\ell_1-$norm by introducing concavity. Furthermore, we employ the proposed non-convex penalties in tensor recovery problems such as tensor completion and tensor robust principal component analysis, which has various real applications such as image inpainting and denoising. Due to the concavity, the models are difficult to solve. To tackle this problem, we devise majorization minimization algorithms, which optimize upper bounds of original functions in each iteration, and every sub-problem is solved by alternating direction multiplier method. Finally, experimental results on natural images and hyperspectral images demonstrate the effectiveness and efficiency of the proposed methods.
Tasks	Denoising, Image Inpainting
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10165v1
PDF	http://arxiv.org/pdf/1904.10165v1.pdf
PWC	https://paperswithcode.com/paper/non-convex-penalty-for-tensor-completion-and
Repo
Framework

Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning


Title	Controllable Length Control Neural Encoder-Decoder via Reinforcement Learning
Authors	Junyi Bian, Baojun Lin, Ke Zhang, Zhaohui Yan, Hong Tang, Yonghe Zhang
Abstract	Controlling output length in neural language generation is valuable in many scenarios, especially for the tasks that have length constraints. A model with stronger length control capacity can produce sentences with more specific length, however, it usually sacrifices semantic accuracy of the generated sentences. Here, we denote a concept of Controllable Length Control (CLC) for the trade-off between length control capacity and semantic accuracy of the language generation model. More specifically, CLC is to alter length control capacity of the model so as to generate sentence with corresponding quality. This is meaningful in real applications when length control capacity and outputs quality are requested with different priorities, or to overcome unstability of length control during model training. In this paper, we propose two reinforcement learning (RL) methods to adjust the trade-off between length control capacity and semantic accuracy of length control models. Results show that our RL methods improve scores across a wide range of target lengths and achieve the goal of CLC. Additionally, two models LenMC and LenLInit modified on previous length-control models are proposed to obtain better performance in summarization task while still maintain the ability to control length.
Tasks	Text Generation
Published	2019-09-17
URL	https://arxiv.org/abs/1909.09492v1
PDF	https://arxiv.org/pdf/1909.09492v1.pdf
PWC	https://paperswithcode.com/paper/controllable-length-control-neural-encoder
Repo
Framework

A New Distribution-Free Concept for Representing, Comparing, and Propagating Uncertainty in Dynamical Systems with Kernel Probabilistic Programming


Title	A New Distribution-Free Concept for Representing, Comparing, and Propagating Uncertainty in Dynamical Systems with Kernel Probabilistic Programming
Authors	Jia-Jie Zhu, Krikamol Muandet, Moritz Diehl, Bernhard Schölkopf
Abstract	This work presents the concept of kernel mean embedding and kernel probabilistic programming in the context of stochastic systems. We propose formulations to represent, compare, and propagate uncertainties for fairly general stochastic dynamics in a distribution-free manner. The new tools enjoy sound theory rooted in functional analysis and wide applicability as demonstrated in distinct numerical examples. The implication of this new concept is a new mode of thinking about the statistical nature of uncertainty in dynamical systems.
Tasks	Probabilistic Programming
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11082v1
PDF	https://arxiv.org/pdf/1911.11082v1.pdf
PWC	https://paperswithcode.com/paper/a-new-distribution-free-concept-for
Repo
Framework

Trifocal Relative Pose from Lines at Points and its Efficient Solution


Title	Trifocal Relative Pose from Lines at Points and its Efficient Solution
Authors	Ricardo Fabbri, Timothy Duff, Hongyi Fan, Margaret Regan, David da Costa de Pinho, Elias Tsigaridas, Charles Wampler, Jonathan Hauenstein, Benjamin Kimia, Anton Leykin, Tomas Pajdla
Abstract	We present a new minimal problem for relative pose estimation mixing point features with lines incident at points observed in three views and its efficient homotopy continuation solver. We demonstrate the generality of the approach by analyzing and solving an additional problem with mixed point and line correspondences in three views. The minimal problems include correspondences of (i) three points and one line and (ii) three points and two lines through two of the points which is reported and analyzed here for the first time. These are difficult to solve, as they have 216 and - as shown here - 312 solutions, but cover important practical situations when line and point features appear together, e.g., in urban scenes or when observing curves. We demonstrate that even such difficult problems can be solved robustly using a suitable homotopy continuation technique and we provide an implementation optimized for minimal problems that can be integrated into engineering applications. Our simulated and real experiments demonstrate our solvers in the camera geometry computation task in structure from motion. We show that new solvers allow for reconstructing challenging scenes where the standard two-view initialization of structure from motion fails.
Tasks	Pose Estimation
Published	2019-03-23
URL	http://arxiv.org/abs/1903.09755v3
PDF	http://arxiv.org/pdf/1903.09755v3.pdf
PWC	https://paperswithcode.com/paper/trifocal-relative-pose-from-lines-at-points
Repo
Framework

Unsupervised Multi-Domain Multimodal Image-to-Image Translation with Explicit Domain-Constrained Disentanglement


Title	Unsupervised Multi-Domain Multimodal Image-to-Image Translation with Explicit Domain-Constrained Disentanglement
Authors	Weihao Xia, Yujiu Yang, Jing-Hao Xue
Abstract	Image-to-image translation has drawn great attention during the past few years. It aims to translate an image in one domain to a given reference image in another domain. Due to its effectiveness and efficiency, many applications can be formulated as image-to-image translation problems. However, three main challenges remain in image-to-image translation: 1) the lack of large amounts of aligned training pairs for different tasks; 2) the ambiguity of multiple possible outputs from a single input image; and 3) the lack of simultaneous training of multiple datasets from different domains within a single network. We also found in experiments that the implicit disentanglement of content and style could lead to unexpect results. In this paper, we propose a unified framework for learning to generate diverse outputs using unpaired training data and allow simultaneous training of multiple datasets from different domains via a single network. Furthermore, we also investigate how to better extract domain supervision information so as to learn better disentangled representations and achieve better image translation. Experiments show that the proposed method outperforms or is comparable with the state-of-the-art methods.
Tasks	Image-to-Image Translation
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00622v1
PDF	https://arxiv.org/pdf/1911.00622v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-multi-domain-multimodal-image-to
Repo
Framework

Cross-lingual topic prediction for speech using translations


Title	Cross-lingual topic prediction for speech using translations
Authors	Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater
Abstract	Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic? We consider this question in the setting where a small amount of speech in the low-resource language is paired with text translations in a high-resource language. We develop an effective cross-lingual topic classifier by training on just 20 hours of translated speech, using a recent model for direct speech-to-text translation. While the translations are poor, they are still good enough to correctly classify the topic of 1-minute speech segments over 70% of the time - a 20% improvement over a majority-class baseline. Such a system could be useful for humanitarian applications like crisis response, where incoming speech in a foreign low-resource language must be quickly assessed for further action.
Tasks
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11425v2
PDF	https://arxiv.org/pdf/1908.11425v2.pdf
PWC	https://paperswithcode.com/paper/classifying-topics-in-speech-when-all-you
Repo
Framework

CTRL: A Conditional Transformer Language Model for Controllable Generation


Title	CTRL: A Conditional Transformer Language Model for Controllable Generation
Authors	Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher
Abstract	Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at https://github.com/salesforce/ctrl.
Tasks	Language Modelling, Text Generation
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05858v2
PDF	https://arxiv.org/pdf/1909.05858v2.pdf
PWC	https://paperswithcode.com/paper/ctrl-a-conditional-transformer-language-model-1
Repo
Framework

ELSA: A Throughput-Optimized Design of an LSTM Accelerator for Energy-Constrained Devices


Title	ELSA: A Throughput-Optimized Design of an LSTM Accelerator for Energy-Constrained Devices
Authors	Elham Azari, Sarma Vrudhula
Abstract	The next significant step in the evolution and proliferation of artificial intelligence technology will be the integration of neural network (NN) models within embedded and mobile systems. This calls for the design of compact, energy efficient NN models in silicon. In this paper, we present a scalable ASIC design of an LSTM accelerator named ELSA, that is suitable for energy-constrained devices. It includes several architectural innovations to achieve small area and high energy efficiency. To reduce the area and power consumption of the overall design, the compute-intensive units of ELSA employ approximate multiplications and still achieve high performance and accuracy. The performance is further improved through efficient synchronization of the elastic pipeline stages to maximize the utilization. The paper also includes a performance model of ELSA, as a function of the hidden nodes and time steps, permitting its use for the evaluation of any LSTM application. ELSA was implemented in RTL and was synthesized and placed and routed in 65nm technology. Its functionality is demonstrated for language modeling-a common application of LSTM. ELSA is compared against a baseline implementation of an LSTM accelerator with standard functional units and without any of the architectural innovations of ELSA. The paper demonstrates that ELSA can achieve significant improvements in power, area and energy-efficiency when compared to the baseline design and several ASIC implementations reported in the literature, making it suitable for use in embedded systems and real-time applications.
Tasks	Language Modelling
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08683v1
PDF	https://arxiv.org/pdf/1910.08683v1.pdf
PWC	https://paperswithcode.com/paper/elsa-a-throughput-optimized-design-of-an-lstm
Repo
Framework

Recognizing the vocabulary of Brazilian popular newspapers with a free-access computational dictionary


Title	Recognizing the vocabulary of Brazilian popular newspapers with a free-access computational dictionary
Authors	Maria José Finatto, Oto Vale, Eric Laporte
Abstract	We report an experiment to check the identification of a set of words in popular written Portuguese with two versions of a computational dictionary of Brazilian Portuguese, DELAF PB 2004 and DELAF PB 2015. This dictionary is freely available for use in linguistic analyses of Brazilian Portuguese and other researches, which justifies critical study. The vocabulary comes from the PorPopular corpus, made of popular newspapers Di{'a}rio Ga{'u}cho (DG) and Massa! (MA). From DG, we retained a set of texts with 984.465 words (tokens), published in 2008, with the spelling used before the Portuguese Language Orthographic Agreement adopted in 2009. From MA, we examined papers of 2012, 2014 e 2015, with 215.776 words (tokens), all with the new spelling. The checking involved: a) generating lists of words (types) occurring in DG and MA; b) comparing them with the entry lists of both versions of DELAF PB; c) assessing the coverage of this vocabulary; d) proposing ways of incorporating the items not covered. The results of the work show that an average of 19% of the types in DG were not found in DELAF PB 2004 or 2015. In MA, this average is 13%. Switching versions of the dictionary affected slightly the performance in recognizing the words.
Tasks
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09108v1
PDF	http://arxiv.org/pdf/1904.09108v1.pdf
PWC	https://paperswithcode.com/paper/recognizing-the-vocabulary-of-brazilian
Repo
Framework

Decoder-tailored Polar Code Design Using the Genetic Algorithm


Title	Decoder-tailored Polar Code Design Using the Genetic Algorithm
Authors	Ahmed Elkelesh, Moustafa Ebada, Sebastian Cammerer, Stephan ten Brink
Abstract	We propose a new framework for constructing polar codes (i.e., selecting the frozen bit positions) for arbitrary channels, and tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding. The proposed framework is based on the Genetic Algorithm (GenAlg), where populations (i.e., collections) of information sets evolve successively via evolutionary transformations based on their individual error-rate performance. These populations converge towards an information set that fits both the decoding behavior and the defined channel. Using our proposed algorithm over the additive white Gaussian noise (AWGN) channel, we construct a polar code of length 2048 with code rate 0.5, without the CRC-aid, tailored to plain successive cancellation list (SCL) decoding, achieving the same error-rate performance as the CRC-aided SCL decoding, and leading to a coding gain of 1 dB at BER of $10^{-6}$. Further, a belief propagation (BP)-tailored construction approaches the SCL error-rate performance without any modifications in the decoding algorithm itself. The performance gains can be attributed to the significant reduction in the total number of low-weight codewords. To demonstrate the flexibility, coding gains for the Rayleigh channel are shown under SCL and BP decoding. Besides improvements in error-rate performance, we show that, when required, the GenAlg can be also set up to reduce the decoding complexity, e.g., the SCL list size or the number of BP iterations can be reduced, while maintaining the same error-rate performance.
Tasks
Published	2019-01-28
URL	http://arxiv.org/abs/1901.10464v1
PDF	http://arxiv.org/pdf/1901.10464v1.pdf
PWC	https://paperswithcode.com/paper/decoder-tailored-polar-code-design-using-the
Repo
Framework


Title	Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational Interactants
Authors	Mason Swofford, John Charles Peruzzi, Nathan Tsoi, Sydney Thompson, Roberto Martín-Martín, Silvio Savarese, Marynel Vázquez
Abstract	We propose a data-driven approach to detect conversational groups by identifying spatial arrangements typical of these focused social encounters. Our approach uses a novel Deep Affinity Network (DANTE) to predict the likelihood that two individuals in a scene are part of the same conversational group, considering their social context. The predicted pair-wise affinities are then used in a graph clustering framework to identify both small (e.g., dyads) and large groups. The results from our evaluation on multiple, established benchmarks suggest that combining powerful deep learning methods with classical clustering techniques can improve the detection of conversational groups in comparison to prior approaches. Finally, we demonstrate the practicality of our approach in a human-robot interaction scenario. Our efforts show that our work advances group detection not only in theory, but also in practice.
Tasks	Graph Clustering, Scene Understanding
Published	2019-07-24
URL	https://arxiv.org/abs/1907.12910v4
PDF	https://arxiv.org/pdf/1907.12910v4.pdf
PWC	https://paperswithcode.com/paper/dante-deep-affinity-network-for-clustering
Repo
Framework

On the Convergence of AdaBound and its Connection to SGD


Title	On the Convergence of AdaBound and its Connection to SGD
Authors	Pedro Savarese
Abstract	Adaptive gradient methods such as Adam have gained extreme popularity due to their success in training complex neural networks and less sensitivity to hyperparameter tuning compared to SGD. However, it has been recently shown that Adam can fail to converge and might cause poor generalization – this lead to the design of new, sophisticated adaptive methods which attempt to generalize well while being theoretically reliable. In this technical report we focus on AdaBound, a promising, recently proposed optimizer. We present a stochastic convex problem for which AdaBound can provably take arbitrarily long to converge in terms of a factor which is not accounted for in the convergence rate guarantee of Luo et al. (2019). We present a new $O(\sqrt T)$ regret guarantee under different assumptions on the bound functions, and provide empirical results on CIFAR suggesting that a specific form of momentum SGD can match AdaBound’s performance while having less hyperparameters and lower computational costs.
Tasks
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04457v2
PDF	https://arxiv.org/pdf/1908.04457v2.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-of-adabound-and-its
Repo
Framework

Dynamic Attention Networks for Task Oriented Grounding


Title	Dynamic Attention Networks for Task Oriented Grounding
Authors	Soumik Dasgupta, Badri N. Patro, Vinay P. Namboodiri
Abstract	In order to successfully perform tasks specified by natural language instructions, an artificial agent operating in a visual world needs to map words, concepts, and actions from the instruction to visual elements in its environment. This association is termed as Task-Oriented Grounding. In this work, we propose a novel Dynamic Attention Network architecture for the efficient multi-modal fusion of text and visual representations which can generate a robust definition of state for the policy learner. Our model assumes no prior knowledge from visual and textual domains and is an end to end trainable. For a 3D visual world where the observation changes continuously, the attention on the visual elements tends to be highly co-related from a one-time step to the next. We term this as “Dynamic Attention”. In this work, we show that Dynamic Attention helps in achieving grounding and also aids in the policy learning objective. Since most practical robotic applications take place in the real world where the observation space is continuous, our framework can be used as a generalized multi-modal fusion unit for robotic control through natural language. We show the effectiveness of using 1D convolution over Gated Attention Hadamard product on the rate of convergence of the network. We demonstrate that the cell-state of a Long Short Term Memory (LSTM) is a natural choice for modeling Dynamic Attention and shows through visualization that the generated attention is very close to how humans tend to focus on the environment.
Tasks
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06315v1
PDF	https://arxiv.org/pdf/1910.06315v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-attention-networks-for-task-oriented
Repo
Framework