October 21, 2019

3100 words 15 mins read

Paper Group AWR 144

GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations. XNMT: The eXtensible Neural Machine Translation Toolkit. No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques. Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation. Learning to Run challenge …

GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations


Title	GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations
Authors	Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun
Abstract	Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks. Our proposed transfer learning framework improves performance on various tasks including question answering, natural language inference, sentiment analysis, and image classification. We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels.
Tasks	Image Classification, Natural Language Inference, Question Answering, Sentiment Analysis, Transfer Learning, Word Embeddings
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05662v3
PDF	http://arxiv.org/pdf/1806.05662v3.pdf
PWC	https://paperswithcode.com/paper/glomo-unsupervisedly-learned-relational
Repo	https://github.com/YJHMITWEB/GLoMo-tensorflow
Framework	tf

XNMT: The eXtensible Neural Machine Translation Toolkit


Title	XNMT: The eXtensible Neural Machine Translation Toolkit
Authors	Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, Liming Wang
Abstract	This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of machine translation, speech recognition, and multi-tasked machine translation/parsing. XNMT is available open-source at https://github.com/neulab/xnmt
Tasks	Machine Translation, Speech Recognition
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00188v1
PDF	http://arxiv.org/pdf/1803.00188v1.pdf
PWC	https://paperswithcode.com/paper/xnmt-the-extensible-neural-machine
Repo	https://github.com/neulab/xnmt
Framework	none

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques


Title	No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
Authors	Tanmay Gupta, Alexander Schwing, Derek Hoiem
Abstract	We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (1) eliminating a train-inference mismatch; (2) rejecting easy negatives during mini-batch training; and (3) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset.
Tasks	Human-Object Interaction Detection
Published	2018-11-14
URL	https://arxiv.org/abs/1811.05967v2
PDF	https://arxiv.org/pdf/1811.05967v2.pdf
PWC	https://paperswithcode.com/paper/no-frills-human-object-interaction-detection
Repo	https://github.com/BigRedT/no_frills_hoi_det
Framework	pytorch

Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation


Title	Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation
Authors	Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, Tetsuya Ogata
Abstract	Synthesizing human’s movements such as dancing is a flourishing research field which has several applications in computer graphics. Recent studies have demonstrated the advantages of deep neural networks (DNNs) for achieving remarkable performance in motion and music tasks with little effort for feature pre-processing. However, applying DNNs for generating dance to a piece of music is nevertheless challenging, because of 1) DNNs need to generate large sequences while mapping the music input, 2) the DNN needs to constraint the motion beat to the music, and 3) DNNs require a considerable amount of hand-crafted data. In this study, we propose a weakly supervised deep recurrent method for real-time basic dance generation with audio power spectrum as input. The proposed model employs convolutional layers and a multilayered Long Short-Term memory (LSTM) to process the audio input. Then, another deep LSTM layer decodes the target dance sequence. Notably, this end-to-end approach has 1) an auto-conditioned decode configuration that reduces accumulation of feedback error of large dance sequence, 2) uses a contrastive cost function to regulate the mapping between the music and motion beat, and 3) trains with weak labels generated from the motion beat, reducing the amount of hand-crafted data. We evaluate the proposed network based on i) the similarities between generated and the baseline dancer motion with a cross entropy measure for large dance sequences, and ii) accurate timing between the music and motion beat with an F-measure. Experimental results revealed that, after training using a small dataset, the model generates basic dance steps with low cross entropy and maintains an F-measure score similar to that of a baseline dancer.
Tasks	Motion Estimation
Published	2018-07-03
URL	https://arxiv.org/abs/1807.01126v3
PDF	https://arxiv.org/pdf/1807.01126v3.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-deep-recurrent-neural
Repo	https://github.com/audiofhrozen/motion_dance
Framework	none

Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments


Title	Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
Authors	Łukasz Kidziński, Sharada Prasanna Mohanty, Carmichael Ong, Zhewei Huang, Shuchang Zhou, Anton Pechenko, Adam Stelmaszczyk, Piotr Jarosik, Mikhail Pavlov, Sergey Kolesnikov, Sergey Plis, Zhibo Chen, Zhizheng Zhang, Jiale Chen, Jun Shi, Zhuobin Zheng, Chun Yuan, Zhihui Lin, Henryk Michalewski, Piotr Miłoś, Błażej Osiński, Andrew Melnik, Malte Schilling, Helge Ritter, Sean Carroll, Jennifer Hicks, Sergey Levine, Marcel Salathé, Scott Delp
Abstract	In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.
Tasks
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00361v1
PDF	http://arxiv.org/pdf/1804.00361v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-run-challenge-solutions-adapting
Repo	https://github.com/AdamStelmaszczyk/learning2run
Framework	none

Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning


Title	Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning
Authors	Junjun Jiang, Yi Yu, Suhua Tang, Jiayi Ma, Akiko Aizawa, Kiyoharu Aizawa
Abstract	Face hallucination is a technique that reconstruct high-resolution (HR) faces from low-resolution (LR) faces, by using the prior knowledge learned from HR/LR face pairs. Most state-of-the-arts leverage position-patch prior knowledge of human face to estimate the optimal representation coefficients for each image patch. However, they focus only the position information and usually ignore the context information of image patch. In addition, when they are confronted with misalignment or the Small Sample Size (SSS) problem, the hallucination performance is very poor. To this end, this study incorporates the contextual information of image patch and proposes a powerful and efficient context-patch based face hallucination approach, namely Thresholding Locality-constrained Representation and Reproducing learning (TLcR-RL). Under the context-patch based framework, we advance a thresholding based representation method to enhance the reconstruction accuracy and reduce the computational complexity. To further improve the performance of the proposed algorithm, we propose a promotion strategy called reproducing learning. By adding the estimated HR face to the training set, which can simulates the case that the HR version of the input LR face is present in the training set, thus iteratively enhancing the final hallucination result. Experiments demonstrate that the proposed TLcR-RL method achieves a substantial increase in the hallucinated results, both subjectively and objectively. Additionally, the proposed framework is more robust to face misalignment and the SSS problem, and its hallucinated HR face is still very good when the LR test face is from the real-world. The MATLAB source code is available at https://github.com/junjun-jiang/TLcR-RL
Tasks	Face Hallucination
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00665v2
PDF	http://arxiv.org/pdf/1809.00665v2.pdf
PWC	https://paperswithcode.com/paper/context-patch-face-hallucination-based-on
Repo	https://github.com/junjun-jiang/TLcR-RL
Framework	none

Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination


Title	Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination
Authors	Junjun Jiang, Yi Yu, Jinhui Hu, Suhua Tang, Jiayi Ma
Abstract	Most of the current face hallucination methods, whether they are shallow learning-based or deep learning-based, all try to learn a relationship model between Low-Resolution (LR) and High-Resolution (HR) spaces with the help of a training set. They mainly focus on modeling image prior through either model-based optimization or discriminative inference learning. However, when the input LR face is tiny, the learned prior knowledge is no longer effective and their performance will drop sharply. To solve this problem, in this paper we propose a general face hallucination method that can integrate model-based optimization and discriminative inference. In particular, to exploit the model based prior, the Deep Convolutional Neural Networks (CNN) denoiser prior is plugged into the super-resolution optimization model with the aid of image-adaptive Laplacian regularization. Additionally, we further develop a high-frequency details compensation method by dividing the face image to facial components and performing face hallucination in a multi-layer neighbor embedding manner. Experiments demonstrate that the proposed method can achieve promising super-resolution results for tiny input LR faces.
Tasks	Face Hallucination, Super-Resolution
Published	2018-06-28
URL	http://arxiv.org/abs/1806.10726v1
PDF	http://arxiv.org/pdf/1806.10726v1.pdf
PWC	https://paperswithcode.com/paper/deep-cnn-denoiser-and-multi-layer-neighbor
Repo	https://github.com/ZoieMo/Multi-task
Framework	none

Deep Reinforcement Learning for Playing 2.5D Fighting Games


Title	Deep Reinforcement Learning for Playing 2.5D Fighting Games
Authors	Yu-Jhe Li, Hsin-Yu Chang, Yu-Jing Lin, Po-Wei Wu, Yu-Chiang Frank Wang
Abstract	Deep reinforcement learning has shown its success in game playing. However, 2.5D fighting games would be a challenging task to handle due to ambiguity in visual appearances like height or depth of the characters. Moreover, actions in such games typically involve particular sequential action orders, which also makes the network design very difficult. Based on the network of Asynchronous Advantage Actor-Critic (A3C), we create an OpenAI-gym-like gaming environment with the game of Little Fighter 2 (LF2), and present a novel A3C+ network for learning RL agents. The introduced model includes a Recurrent Info network, which utilizes game-related info features with recurrent layers to observe combo skills for fighting. In the experiments, we consider LF2 in different settings, which successfully demonstrates the use of our proposed model for learning 2.5D fighting games.
Tasks
Published	2018-05-05
URL	http://arxiv.org/abs/1805.02070v1
PDF	http://arxiv.org/pdf/1805.02070v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-playing-25d
Repo	https://github.com/TobiKick/DL_LF2
Framework	none

Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer


Title	Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer
Authors	Kanishka Rao, Haşim Sak, Rohit Prabhavalkar
Abstract	We investigate training end-to-end speech recognition models with the recurrent neural network transducer (RNN-T): a streaming, all-neural, sequence-to-sequence architecture which jointly learns acoustic and language model components from transcribed acoustic data. We explore various model architectures and demonstrate how the model can be improved further if additional text or pronunciation data are available. The model consists of an `encoder', which is initialized from a connectionist temporal classification-based (CTC) acoustic model, and a` decoder’ which is partially initialized from a recurrent neural network language model trained on text data alone. The entire neural network is trained with the RNN-T loss and directly outputs the recognized transcript as a sequence of graphemes, thus performing end-to-end speech recognition. We find that performance can be improved further through the use of sub-word units (`wordpieces’) which capture longer context and significantly reduce substitution errors. The best RNN-T system, a twelve-layer LSTM encoder with a two-layer LSTM decoder trained with 30,000 wordpieces as output targets achieves a word error rate of 8.5% on voice-search and 5.2% on voice-dictation tasks and is comparable to a state-of-the-art baseline at 8.3% on voice-search and 5.4% voice-dictation. \|
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2018-01-02
URL	http://arxiv.org/abs/1801.00841v1
PDF	http://arxiv.org/pdf/1801.00841v1.pdf
PWC	https://paperswithcode.com/paper/exploring-architectures-data-and-units-for
Repo	https://github.com/ZhengkunTian/Speech-Recognition-Paper-List
Framework	none

Pre-gen metrics: Predicting caption quality metrics without generating captions


Title	Pre-gen metrics: Predicting caption quality metrics without generating captions
Authors	Marc Tanti, Albert Gatt, Adrian Muscat
Abstract	Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions. Such pre-gen metrics are strongly correlated to standard evaluation metrics.
Tasks
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05474v1
PDF	http://arxiv.org/pdf/1810.05474v1.pdf
PWC	https://paperswithcode.com/paper/pre-gen-metrics-predicting-caption-quality
Repo	https://github.com/mtanti/pregen-metrics
Framework	tf

Evaluating the squared-exponential covariance function in Gaussian processes with integral observations


Title	Evaluating the squared-exponential covariance function in Gaussian processes with integral observations
Authors	J. N. Hendriks, C. Jidling, A. Wills, T. B. Schön
Abstract	This paper deals with the evaluation of double line integrals of the squared exponential covariance function. We propose a new approach in which the double integral is reduced to a single integral using the error function. This single integral is then computed with efficiently implemented numerical techniques. The performance is compared against existing state of the art methods and the results show superior properties in numerical robustness and accuracy per computation time.
Tasks	Gaussian Processes
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07319v1
PDF	http://arxiv.org/pdf/1812.07319v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-the-squared-exponential-covariance
Repo	https://github.com/jnh277/lineIntSquaredExponential
Framework	none

Our Practice Of Using Machine Learning To Recognize Species By Voice


Title	Our Practice Of Using Machine Learning To Recognize Species By Voice
Authors	Siddhardha Balemarthy, Atul Sajjanhar, James Xi Zheng
Abstract	As the technology is advancing, audio recognition in machine learning is improved as well. Research in audio recognition has traditionally focused on speech. Living creatures (especially the small ones) are part of the whole ecosystem, monitoring as well as maintaining them are important tasks. Species such as animals and birds are tending to change their activities as well as their habitats due to the adverse effects on the environment or due to other natural or man-made calamities. For those in far deserted areas, we will not have any idea about their existence until we can continuously monitor them. Continuous monitoring will take a lot of hard work and labor. If there is no continuous monitoring, then there might be instances where endangered species may encounter dangerous situations. The best way to monitor those species are through audio recognition. Classifying sound can be a difficult task even for humans. Powerful audio signals and their processing techniques make it possible to detect audio of various species. There might be many ways wherein audio recognition can be done. We can train machines either by pre-recorded audio files or by recording them live and detecting them. The audio of species can be detected by removing all the background noise and echoes. Smallest sound is considered as a syllable. Extracting various syllables is the process we are focusing on which is known as audio recognition in terms of Machine Learning (ML).
Tasks
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09078v1
PDF	http://arxiv.org/pdf/1810.09078v1.pdf
PWC	https://paperswithcode.com/paper/our-practice-of-using-machine-learning-to
Repo	https://github.com/siyangBai/twittering_sparkles
Framework	none

Generalization of Equilibrium Propagation to Vector Field Dynamics


Title	Generalization of Equilibrium Propagation to Vector Field Dynamics
Authors	Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio
Abstract	The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forward and backward phases, and that pairs of neurons would need to communicate through symmetric bidirectional connections. We present a simple two-phase learning procedure for fixed point recurrent networks that addresses both these issues. In our model, neurons perform leaky integration and synaptic weights are updated through a local mechanism. Our learning method generalizes Equilibrium Propagation to vector field dynamics, relaxing the requirement of an energy function. As a consequence of this generalization, the algorithm does not compute the true gradient of the objective function, but rather approximates it at a precision which is proven to be directly related to the degree of symmetry of the feedforward and feedback weights. We show experimentally that our algorithm optimizes the objective function.
Tasks
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04873v1
PDF	http://arxiv.org/pdf/1808.04873v1.pdf
PWC	https://paperswithcode.com/paper/generalization-of-equilibrium-propagation-to
Repo	https://github.com/musyoku/equilibrium-propagation
Framework	none

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks


Title	Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Authors	Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Quanquan Gu
Abstract	Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD) with momentum in training deep neural networks. This leaves how to close the generalization gap of adaptive gradient methods an open problem. In this work, we show that adaptive gradient methods such as Adam, Amsgrad, are sometimes “over adapted”. We design a new algorithm, called Partially adaptive momentum estimation method, which unifies the Adam/Amsgrad with SGD by introducing a partial adaptive parameter $p$, to achieve the best from both worlds. We also prove the convergence rate of our proposed algorithm to a stationary point in the stochastic nonconvex optimization setting. Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks. These results would suggest practitioners pick up adaptive gradient methods once again for faster training of deep neural networks.
Tasks
Published	2018-06-18
URL	https://arxiv.org/abs/1806.06763v2
PDF	https://arxiv.org/pdf/1806.06763v2.pdf
PWC	https://paperswithcode.com/paper/closing-the-generalization-gap-of-adaptive
Repo	https://github.com/thughost2/Padam
Framework	pytorch

Inverting The Generator Of A Generative Adversarial Network (II)


Title	Inverting The Generator Of A Generative Adversarial Network (II)
Authors	Antonia Creswell, Anil A Bharath
Abstract	Generative adversarial networks (GANs) learn a deep generative model that is able to synthesise novel, high-dimensional data samples. New data samples are synthesised by passing latent samples, drawn from a chosen prior distribution, through the generative model. Once trained, the latent space exhibits interesting properties, that may be useful for down stream tasks such as classification or retrieval. Unfortunately, GANs do not offer an “inverse model”, a mapping from data space back to latent space, making it difficult to infer a latent representation for a given data sample. In this paper, we introduce a technique, inversion, to project data samples, specifically images, to the latent space using a pre-trained GAN. Using our proposed inversion technique, we are able to identify which attributes of a dataset a trained GAN is able to model and quantify GAN performance, based on a reconstruction loss. We demonstrate how our proposed inversion technique may be used to quantitatively compare performance of various GAN models trained on three image datasets. We provide code for all of our experiments, https://github.com/ToniCreswell/InvertingGAN.
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05701v1
PDF	http://arxiv.org/pdf/1802.05701v1.pdf
PWC	https://paperswithcode.com/paper/inverting-the-generator-of-a-generative-1
Repo	https://github.com/ToniCreswell/InvertingGAN
Framework	pytorch