Paper Group AWR 144
GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations. XNMT: The eXtensible Neural Machine Translation Toolkit. No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques. Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation. Learning to Run challenge …
GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations
Title | GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations |
Authors | Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun |
Abstract | Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks. Our proposed transfer learning framework improves performance on various tasks including question answering, natural language inference, sentiment analysis, and image classification. We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels. |
Tasks | Image Classification, Natural Language Inference, Question Answering, Sentiment Analysis, Transfer Learning, Word Embeddings |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05662v3 |
http://arxiv.org/pdf/1806.05662v3.pdf | |
PWC | https://paperswithcode.com/paper/glomo-unsupervisedly-learned-relational |
Repo | https://github.com/YJHMITWEB/GLoMo-tensorflow |
Framework | tf |
XNMT: The eXtensible Neural Machine Translation Toolkit
Title | XNMT: The eXtensible Neural Machine Translation Toolkit |
Authors | Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, Liming Wang |
Abstract | This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of machine translation, speech recognition, and multi-tasked machine translation/parsing. XNMT is available open-source at https://github.com/neulab/xnmt |
Tasks | Machine Translation, Speech Recognition |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00188v1 |
http://arxiv.org/pdf/1803.00188v1.pdf | |
PWC | https://paperswithcode.com/paper/xnmt-the-extensible-neural-machine |
Repo | https://github.com/neulab/xnmt |
Framework | none |
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
Title | No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques |
Authors | Tanmay Gupta, Alexander Schwing, Derek Hoiem |
Abstract | We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (1) eliminating a train-inference mismatch; (2) rejecting easy negatives during mini-batch training; and (3) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset. |
Tasks | Human-Object Interaction Detection |
Published | 2018-11-14 |
URL | https://arxiv.org/abs/1811.05967v2 |
https://arxiv.org/pdf/1811.05967v2.pdf | |
PWC | https://paperswithcode.com/paper/no-frills-human-object-interaction-detection |
Repo | https://github.com/BigRedT/no_frills_hoi_det |
Framework | pytorch |
Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation
Title | Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation |
Authors | Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, Tetsuya Ogata |
Abstract | Synthesizing human’s movements such as dancing is a flourishing research field which has several applications in computer graphics. Recent studies have demonstrated the advantages of deep neural networks (DNNs) for achieving remarkable performance in motion and music tasks with little effort for feature pre-processing. However, applying DNNs for generating dance to a piece of music is nevertheless challenging, because of 1) DNNs need to generate large sequences while mapping the music input, 2) the DNN needs to constraint the motion beat to the music, and 3) DNNs require a considerable amount of hand-crafted data. In this study, we propose a weakly supervised deep recurrent method for real-time basic dance generation with audio power spectrum as input. The proposed model employs convolutional layers and a multilayered Long Short-Term memory (LSTM) to process the audio input. Then, another deep LSTM layer decodes the target dance sequence. Notably, this end-to-end approach has 1) an auto-conditioned decode configuration that reduces accumulation of feedback error of large dance sequence, 2) uses a contrastive cost function to regulate the mapping between the music and motion beat, and 3) trains with weak labels generated from the motion beat, reducing the amount of hand-crafted data. We evaluate the proposed network based on i) the similarities between generated and the baseline dancer motion with a cross entropy measure for large dance sequences, and ii) accurate timing between the music and motion beat with an F-measure. Experimental results revealed that, after training using a small dataset, the model generates basic dance steps with low cross entropy and maintains an F-measure score similar to that of a baseline dancer. |
Tasks | Motion Estimation |
Published | 2018-07-03 |
URL | https://arxiv.org/abs/1807.01126v3 |
https://arxiv.org/pdf/1807.01126v3.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-deep-recurrent-neural |
Repo | https://github.com/audiofhrozen/motion_dance |
Framework | none |
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
Title | Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments |
Authors | Łukasz Kidziński, Sharada Prasanna Mohanty, Carmichael Ong, Zhewei Huang, Shuchang Zhou, Anton Pechenko, Adam Stelmaszczyk, Piotr Jarosik, Mikhail Pavlov, Sergey Kolesnikov, Sergey Plis, Zhibo Chen, Zhizheng Zhang, Jiale Chen, Jun Shi, Zhuobin Zheng, Chun Yuan, Zhihui Lin, Henryk Michalewski, Piotr Miłoś, Błażej Osiński, Andrew Melnik, Malte Schilling, Helge Ritter, Sean Carroll, Jennifer Hicks, Sergey Levine, Marcel Salathé, Scott Delp |
Abstract | In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms. |
Tasks | |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00361v1 |
http://arxiv.org/pdf/1804.00361v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-run-challenge-solutions-adapting |
Repo | https://github.com/AdamStelmaszczyk/learning2run |
Framework | none |
Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning
Title | Context-Patch Face Hallucination Based on Thresholding Locality-constrained Representation and Reproducing Learning |
Authors | Junjun Jiang, Yi Yu, Suhua Tang, Jiayi Ma, Akiko Aizawa, Kiyoharu Aizawa |
Abstract | Face hallucination is a technique that reconstruct high-resolution (HR) faces from low-resolution (LR) faces, by using the prior knowledge learned from HR/LR face pairs. Most state-of-the-arts leverage position-patch prior knowledge of human face to estimate the optimal representation coefficients for each image patch. However, they focus only the position information and usually ignore the context information of image patch. In addition, when they are confronted with misalignment or the Small Sample Size (SSS) problem, the hallucination performance is very poor. To this end, this study incorporates the contextual information of image patch and proposes a powerful and efficient context-patch based face hallucination approach, namely Thresholding Locality-constrained Representation and Reproducing learning (TLcR-RL). Under the context-patch based framework, we advance a thresholding based representation method to enhance the reconstruction accuracy and reduce the computational complexity. To further improve the performance of the proposed algorithm, we propose a promotion strategy called reproducing learning. By adding the estimated HR face to the training set, which can simulates the case that the HR version of the input LR face is present in the training set, thus iteratively enhancing the final hallucination result. Experiments demonstrate that the proposed TLcR-RL method achieves a substantial increase in the hallucinated results, both subjectively and objectively. Additionally, the proposed framework is more robust to face misalignment and the SSS problem, and its hallucinated HR face is still very good when the LR test face is from the real-world. The MATLAB source code is available at https://github.com/junjun-jiang/TLcR-RL |
Tasks | Face Hallucination |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00665v2 |
http://arxiv.org/pdf/1809.00665v2.pdf | |
PWC | https://paperswithcode.com/paper/context-patch-face-hallucination-based-on |
Repo | https://github.com/junjun-jiang/TLcR-RL |
Framework | none |
Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination
Title | Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination |
Authors | Junjun Jiang, Yi Yu, Jinhui Hu, Suhua Tang, Jiayi Ma |
Abstract | Most of the current face hallucination methods, whether they are shallow learning-based or deep learning-based, all try to learn a relationship model between Low-Resolution (LR) and High-Resolution (HR) spaces with the help of a training set. They mainly focus on modeling image prior through either model-based optimization or discriminative inference learning. However, when the input LR face is tiny, the learned prior knowledge is no longer effective and their performance will drop sharply. To solve this problem, in this paper we propose a general face hallucination method that can integrate model-based optimization and discriminative inference. In particular, to exploit the model based prior, the Deep Convolutional Neural Networks (CNN) denoiser prior is plugged into the super-resolution optimization model with the aid of image-adaptive Laplacian regularization. Additionally, we further develop a high-frequency details compensation method by dividing the face image to facial components and performing face hallucination in a multi-layer neighbor embedding manner. Experiments demonstrate that the proposed method can achieve promising super-resolution results for tiny input LR faces. |
Tasks | Face Hallucination, Super-Resolution |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.10726v1 |
http://arxiv.org/pdf/1806.10726v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-cnn-denoiser-and-multi-layer-neighbor |
Repo | https://github.com/ZoieMo/Multi-task |
Framework | none |
Deep Reinforcement Learning for Playing 2.5D Fighting Games
Title | Deep Reinforcement Learning for Playing 2.5D Fighting Games |
Authors | Yu-Jhe Li, Hsin-Yu Chang, Yu-Jing Lin, Po-Wei Wu, Yu-Chiang Frank Wang |
Abstract | Deep reinforcement learning has shown its success in game playing. However, 2.5D fighting games would be a challenging task to handle due to ambiguity in visual appearances like height or depth of the characters. Moreover, actions in such games typically involve particular sequential action orders, which also makes the network design very difficult. Based on the network of Asynchronous Advantage Actor-Critic (A3C), we create an OpenAI-gym-like gaming environment with the game of Little Fighter 2 (LF2), and present a novel A3C+ network for learning RL agents. The introduced model includes a Recurrent Info network, which utilizes game-related info features with recurrent layers to observe combo skills for fighting. In the experiments, we consider LF2 in different settings, which successfully demonstrates the use of our proposed model for learning 2.5D fighting games. |
Tasks | |
Published | 2018-05-05 |
URL | http://arxiv.org/abs/1805.02070v1 |
http://arxiv.org/pdf/1805.02070v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-playing-25d |
Repo | https://github.com/TobiKick/DL_LF2 |
Framework | none |
Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer
Title | Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer |
Authors | Kanishka Rao, Haşim Sak, Rohit Prabhavalkar |
Abstract | We investigate training end-to-end speech recognition models with the recurrent neural network transducer (RNN-T): a streaming, all-neural, sequence-to-sequence architecture which jointly learns acoustic and language model components from transcribed acoustic data. We explore various model architectures and demonstrate how the model can be improved further if additional text or pronunciation data are available. The model consists of an encoder', which is initialized from a connectionist temporal classification-based (CTC) acoustic model, and a decoder’ which is partially initialized from a recurrent neural network language model trained on text data alone. The entire neural network is trained with the RNN-T loss and directly outputs the recognized transcript as a sequence of graphemes, thus performing end-to-end speech recognition. We find that performance can be improved further through the use of sub-word units (`wordpieces’) which capture longer context and significantly reduce substitution errors. The best RNN-T system, a twelve-layer LSTM encoder with a two-layer LSTM decoder trained with 30,000 wordpieces as output targets achieves a word error rate of 8.5% on voice-search and 5.2% on voice-dictation tasks and is comparable to a state-of-the-art baseline at 8.3% on voice-search and 5.4% voice-dictation. | |
Tasks | End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2018-01-02 |
URL | http://arxiv.org/abs/1801.00841v1 |
http://arxiv.org/pdf/1801.00841v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-architectures-data-and-units-for |
Repo | https://github.com/ZhengkunTian/Speech-Recognition-Paper-List |
Framework | none |
Pre-gen metrics: Predicting caption quality metrics without generating captions
Title | Pre-gen metrics: Predicting caption quality metrics without generating captions |
Authors | Marc Tanti, Albert Gatt, Adrian Muscat |
Abstract | Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions. Such pre-gen metrics are strongly correlated to standard evaluation metrics. |
Tasks | |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.05474v1 |
http://arxiv.org/pdf/1810.05474v1.pdf | |
PWC | https://paperswithcode.com/paper/pre-gen-metrics-predicting-caption-quality |
Repo | https://github.com/mtanti/pregen-metrics |
Framework | tf |
Evaluating the squared-exponential covariance function in Gaussian processes with integral observations
Title | Evaluating the squared-exponential covariance function in Gaussian processes with integral observations |
Authors | J. N. Hendriks, C. Jidling, A. Wills, T. B. Schön |
Abstract | This paper deals with the evaluation of double line integrals of the squared exponential covariance function. We propose a new approach in which the double integral is reduced to a single integral using the error function. This single integral is then computed with efficiently implemented numerical techniques. The performance is compared against existing state of the art methods and the results show superior properties in numerical robustness and accuracy per computation time. |
Tasks | Gaussian Processes |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07319v1 |
http://arxiv.org/pdf/1812.07319v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-the-squared-exponential-covariance |
Repo | https://github.com/jnh277/lineIntSquaredExponential |
Framework | none |
Our Practice Of Using Machine Learning To Recognize Species By Voice
Title | Our Practice Of Using Machine Learning To Recognize Species By Voice |
Authors | Siddhardha Balemarthy, Atul Sajjanhar, James Xi Zheng |
Abstract | As the technology is advancing, audio recognition in machine learning is improved as well. Research in audio recognition has traditionally focused on speech. Living creatures (especially the small ones) are part of the whole ecosystem, monitoring as well as maintaining them are important tasks. Species such as animals and birds are tending to change their activities as well as their habitats due to the adverse effects on the environment or due to other natural or man-made calamities. For those in far deserted areas, we will not have any idea about their existence until we can continuously monitor them. Continuous monitoring will take a lot of hard work and labor. If there is no continuous monitoring, then there might be instances where endangered species may encounter dangerous situations. The best way to monitor those species are through audio recognition. Classifying sound can be a difficult task even for humans. Powerful audio signals and their processing techniques make it possible to detect audio of various species. There might be many ways wherein audio recognition can be done. We can train machines either by pre-recorded audio files or by recording them live and detecting them. The audio of species can be detected by removing all the background noise and echoes. Smallest sound is considered as a syllable. Extracting various syllables is the process we are focusing on which is known as audio recognition in terms of Machine Learning (ML). |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09078v1 |
http://arxiv.org/pdf/1810.09078v1.pdf | |
PWC | https://paperswithcode.com/paper/our-practice-of-using-machine-learning-to |
Repo | https://github.com/siyangBai/twittering_sparkles |
Framework | none |
Generalization of Equilibrium Propagation to Vector Field Dynamics
Title | Generalization of Equilibrium Propagation to Vector Field Dynamics |
Authors | Benjamin Scellier, Anirudh Goyal, Jonathan Binas, Thomas Mesnard, Yoshua Bengio |
Abstract | The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forward and backward phases, and that pairs of neurons would need to communicate through symmetric bidirectional connections. We present a simple two-phase learning procedure for fixed point recurrent networks that addresses both these issues. In our model, neurons perform leaky integration and synaptic weights are updated through a local mechanism. Our learning method generalizes Equilibrium Propagation to vector field dynamics, relaxing the requirement of an energy function. As a consequence of this generalization, the algorithm does not compute the true gradient of the objective function, but rather approximates it at a precision which is proven to be directly related to the degree of symmetry of the feedforward and feedback weights. We show experimentally that our algorithm optimizes the objective function. |
Tasks | |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04873v1 |
http://arxiv.org/pdf/1808.04873v1.pdf | |
PWC | https://paperswithcode.com/paper/generalization-of-equilibrium-propagation-to |
Repo | https://github.com/musyoku/equilibrium-propagation |
Framework | none |
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Title | Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks |
Authors | Jinghui Chen, Dongruo Zhou, Yiqi Tang, Ziyan Yang, Quanquan Gu |
Abstract | Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD) with momentum in training deep neural networks. This leaves how to close the generalization gap of adaptive gradient methods an open problem. In this work, we show that adaptive gradient methods such as Adam, Amsgrad, are sometimes “over adapted”. We design a new algorithm, called Partially adaptive momentum estimation method, which unifies the Adam/Amsgrad with SGD by introducing a partial adaptive parameter $p$, to achieve the best from both worlds. We also prove the convergence rate of our proposed algorithm to a stationary point in the stochastic nonconvex optimization setting. Experiments on standard benchmarks show that our proposed algorithm can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks. These results would suggest practitioners pick up adaptive gradient methods once again for faster training of deep neural networks. |
Tasks | |
Published | 2018-06-18 |
URL | https://arxiv.org/abs/1806.06763v2 |
https://arxiv.org/pdf/1806.06763v2.pdf | |
PWC | https://paperswithcode.com/paper/closing-the-generalization-gap-of-adaptive |
Repo | https://github.com/thughost2/Padam |
Framework | pytorch |
Inverting The Generator Of A Generative Adversarial Network (II)
Title | Inverting The Generator Of A Generative Adversarial Network (II) |
Authors | Antonia Creswell, Anil A Bharath |
Abstract | Generative adversarial networks (GANs) learn a deep generative model that is able to synthesise novel, high-dimensional data samples. New data samples are synthesised by passing latent samples, drawn from a chosen prior distribution, through the generative model. Once trained, the latent space exhibits interesting properties, that may be useful for down stream tasks such as classification or retrieval. Unfortunately, GANs do not offer an “inverse model”, a mapping from data space back to latent space, making it difficult to infer a latent representation for a given data sample. In this paper, we introduce a technique, inversion, to project data samples, specifically images, to the latent space using a pre-trained GAN. Using our proposed inversion technique, we are able to identify which attributes of a dataset a trained GAN is able to model and quantify GAN performance, based on a reconstruction loss. We demonstrate how our proposed inversion technique may be used to quantitatively compare performance of various GAN models trained on three image datasets. We provide code for all of our experiments, https://github.com/ToniCreswell/InvertingGAN. |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05701v1 |
http://arxiv.org/pdf/1802.05701v1.pdf | |
PWC | https://paperswithcode.com/paper/inverting-the-generator-of-a-generative-1 |
Repo | https://github.com/ToniCreswell/InvertingGAN |
Framework | pytorch |