October 15, 2019

2669 words 13 mins read

Paper Group NANR 209

An Analysis of Encoder Representations in Transformer-Based Machine Translation. Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets. Learning to Solve Nonlinear Least Squares for Monocular Stereo. Deep Sensing: Active Sensing using Multi-directional Recurrent Neural Networks. Continuous-fidelity Bayesian Optimization with Knowled …

An Analysis of Encoder Representations in Transformer-Based Machine Translation


Title	An Analysis of Encoder Representations in Transformer-Based Machine Translation
Authors	Aless Raganato, ro, J{"o}rg Tiedemann
Abstract	The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the \textit{Transformer} is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.
Tasks	Feature Engineering, Machine Translation, Transfer Learning
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-5431/
PDF	https://www.aclweb.org/anthology/W18-5431
PWC	https://paperswithcode.com/paper/an-analysis-of-encoder-representations-in
Repo
Framework

Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets


Title	Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets
Authors	Prathusha Kameswara Sarma
Abstract	This research proposal describes two algorithms that are aimed at learning word embeddings for data sparse and sentiment rich data sets. The goal is to use word embeddings adapted for domain specific data sets in downstream applications such as sentiment classification. The first approach learns word embeddings in a supervised fashion via SWESA (Supervised Word Embeddings for Sentiment Analysis), an algorithm for sentiment analysis on data sets that are of modest size. SWESA leverages document labels to jointly learn polarity-aware word embeddings and a classifier to classify unseen documents. In the second approach domain adapted (DA) word embeddings are learned by exploiting the specificity of domain specific data sets and the breadth of generic word embeddings. The new embeddings are formed by aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA. Experimental results on binary sentiment classification tasks using both approaches for standard data sets are presented.
Tasks	Learning Word Embeddings, Sentiment Analysis, Word Embeddings
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-4007/
PDF	https://www.aclweb.org/anthology/N18-4007
PWC	https://paperswithcode.com/paper/learning-word-embeddings-for-data-sparse-and
Repo
Framework

Learning to Solve Nonlinear Least Squares for Monocular Stereo


Title	Learning to Solve Nonlinear Least Squares for Monocular Stereo
Authors	Ronald Clark, Michael Bloesch, Jan Czarnowski, Stefan Leutenegger, Andrew J. Davison
Abstract	Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem.
Tasks
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Ronald_Clark_Neural_Nonlinear_least_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Ronald_Clark_Neural_Nonlinear_least_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/learning-to-solve-nonlinear-least-squares-for
Repo
Framework

Deep Sensing: Active Sensing using Multi-directional Recurrent Neural Networks


Title	Deep Sensing: Active Sensing using Multi-directional Recurrent Neural Networks
Authors	Jinsung Yoon, William R. Zame, Mihaela van der Schaar
Abstract	For every prediction we might wish to make, we must decide what to observe (what source of information) and when to observe it. Because making observations is costly, this decision must trade off the value of information against the cost of observation. Making observations (sensing) should be an active choice. To solve the problem of active sensing we develop a novel deep learning architecture: Deep Sensing. At training time, Deep Sensing learns how to issue predictions at various cost-performance points. To do this, it creates multiple representations at various performance levels associated with different measurement rates (costs). This requires learning how to estimate the value of real measurements vs. inferred measurements, which in turn requires learning how to infer missing (unobserved) measurements. To infer missing measurements, we develop a Multi-directional Recurrent Neural Network (M-RNN). An M-RNN differs from a bi-directional RNN in that it sequentially operates across streams in addition to within streams, and because the timing of inputs into the hidden layers is both lagged and advanced. At runtime, the operator prescribes a performance level or a cost constraint, and Deep Sensing determines what measurements to take and what to infer from those measurements, and then issues predictions. To demonstrate the power of our method, we apply it to two real-world medical datasets with significantly improved performance.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=r1SnX5xCb
PDF	https://openreview.net/pdf?id=r1SnX5xCb
PWC	https://paperswithcode.com/paper/deep-sensing-active-sensing-using-multi
Repo
Framework

Continuous-fidelity Bayesian Optimization with Knowledge Gradient


Title	Continuous-fidelity Bayesian Optimization with Knowledge Gradient
Authors	Jian Wu, Peter I. Frazier
Abstract	While Bayesian optimization (BO) has achieved great success in optimizing expensive-to-evaluate black-box functions, especially tuning hyperparameters of neural networks, methods such as random search (Li et al., 2016) and multi-fidelity BO (e.g. Klein et al. (2017)) that exploit cheap approximations, e.g. training on a smaller training data or with fewer iterations, can outperform standard BO approaches that use only full-fidelity observations. In this paper, we propose a novel Bayesian optimization algorithm, the continuous-fidelity knowledge gradient (cfKG) method, that can be used when fidelity is controlled by one or more continuous settings such as training data size and the number of training iterations. cfKG characterizes the value of the information gained by sampling a point at a given fidelity, choosing to sample at the point and fidelity with the largest value per unit cost. Furthermore, cfKG can be generalized, following Wu et al. (2017), to settings where derivatives are available in the optimization process, e.g. large-scale kernel learning, and where more than one point can be evaluated simultaneously. Numerical experiments show that cfKG outperforms state-of-art algorithms when optimizing synthetic functions, tuning convolutional neural networks (CNNs) on CIFAR-10 and SVHN, and in large-scale kernel learning.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=SknC0bW0-
PDF	https://openreview.net/pdf?id=SknC0bW0-
PWC	https://paperswithcode.com/paper/continuous-fidelity-bayesian-optimization
Repo
Framework

Predicting misreadings from gaze in children with reading difficulties


Title	Predicting misreadings from gaze in children with reading difficulties
Authors	Joachim Bingel, Maria Barrett, Sigrid Klerke
Abstract	We present the first work on predicting reading mistakes in children with reading difficulties based on eye-tracking data from real-world reading teaching. Our approach employs several linguistic and gaze-based features to inform an ensemble of different classifiers, including multi-task learning models that let us transfer knowledge about individual readers to attain better predictions. Notably, the data we use in this work stems from noisy readings in the wild, outside of controlled lab conditions. Our experiments show that despite the noise and despite the small fraction of misreadings, gaze data improves the performance more than any other feature group and our models achieve good performance. We further show that gaze patterns for misread words do not fully generalize across readers, but that we can transfer some knowledge between readers using multitask learning at least in some cases. Applications of our models include partial automation of reading assessment as well as personalized text simplification.
Tasks	Eye Tracking, Multi-Task Learning, Reading Comprehension, Text Simplification
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-0503/
PDF	https://www.aclweb.org/anthology/W18-0503
PWC	https://paperswithcode.com/paper/predicting-misreadings-from-gaze-in-children
Repo
Framework

Anomaly Detection with Generative Adversarial Networks


Title	Anomaly Detection with Generative Adversarial Networks
Authors	Lucas Deecke, Robert Vandermeulen, Lukas Ruff, Stephan Mandt, Marius Kloft
Abstract	Many anomaly detection methods exist that perform well on low-dimensional problems however there is a notable lack of effective methods for high-dimensional spaces, such as images. Inspired by recent successes in deep learning we propose a novel approach to anomaly detection using generative adversarial networks. Given a sample under consideration, our method is based on searching for a good representation of that sample in the latent space of the generator; if such a representation is not found, the sample is deemed anomalous. We achieve state-of-the-art performance on standard image benchmark datasets and visual inspection of the most anomalous samples reveals that our method does indeed return anomalies.
Tasks	Anomaly Detection
Published	2018-01-01
URL	https://openreview.net/forum?id=S1EfylZ0Z
PDF	https://openreview.net/pdf?id=S1EfylZ0Z
PWC	https://paperswithcode.com/paper/anomaly-detection-with-generative-adversarial-1
Repo
Framework

On Neuronal Capacity


Title	On Neuronal Capacity
Authors	Pierre Baldi, Roman Vershynin
Abstract	We define the capacity of a learning machine to be the logarithm of the number (or volume) of the functions it can implement. We review known results, and derive new results, estimating the capacity of several neuronal models: linear and polynomial threshold gates, linear and polynomial threshold gates with constrained weights (binary weights, positive weights), and ReLU neurons. We also derive capacity estimates and bounds for fully recurrent networks and layered feedforward networks.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7999-on-neuronal-capacity
PDF	http://papers.nips.cc/paper/7999-on-neuronal-capacity.pdf
PWC	https://paperswithcode.com/paper/on-neuronal-capacity
Repo
Framework

rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions


Title	rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions
Authors	Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye
Abstract	Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a “fully observable” problem—a belief MDP—and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆). This approach has been extended to solving ρ-POMDPs—i.e., for information-oriented criteria—when the reward ρ is convex in ∆. General ρ-POMDPs can also be turned into “fully observable” problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7925-rho-pomdps-have-lipschitz-continuous-epsilon-optimal-value-functions
PDF	http://papers.nips.cc/paper/7925-rho-pomdps-have-lipschitz-continuous-epsilon-optimal-value-functions.pdf
PWC	https://paperswithcode.com/paper/rho-pomdps-have-lipschitz-continuous-epsilon
Repo
Framework

Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it


Title	Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it
Authors	Andrei Malchanau, Volha Petukhova, Harry Bunt
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1121/
PDF	https://www.aclweb.org/anthology/L18-1121
PWC	https://paperswithcode.com/paper/towards-continuous-dialogue-corpus-creation
Repo
Framework

Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis


Title	Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Authors
Abstract
Tasks
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5600/
PDF	https://www.aclweb.org/anthology/W18-5600
PWC	https://paperswithcode.com/paper/proceedings-of-the-ninth-international-1
Repo
Framework

Structure Preserving Video Prediction


Title	Structure Preserving Video Prediction
Authors	Jingwei Xu, Bingbing Ni, Zefan Li, Shuo Cheng, Xiaokang Yang
Abstract	Despite recent emergence of adversarial based methods for video prediction, existing algorithms often produce unsatisfied results in image regions with rich structural information (i.e., object boundary) and detailed motion (i.e., articulated body movement). To this end, we present a structure preserving video prediction framework to explicitly address above issues and enhance video prediction quality. On one hand, our framework contains a two-stream generation architecture which deals with high frequency video content (i.e., detailed object or articulated motion structure) and low frequency video content (i.e., location or moving directions) in two separate streams. On the other hand, we propose a RNN structure for video prediction, which employs temporal-adaptive convolutional kernels to capture time-varying motion patterns as well as the tiny object within a scene. Extensive experiments on diverse scene, ranging from human motion to semantic layout prediction, demonstrate the effectiveness of the proposed video prediction approach.
Tasks	Video Prediction
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Xu_Structure_Preserving_Video_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_Structure_Preserving_Video_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/structure-preserving-video-prediction
Repo
Framework

Future Frame Prediction for Anomaly Detection â A New Baseline


Title	Future Frame Prediction for Anomaly Detection â A New Baseline
Authors	Wen Liu, Weixin Luo, Dongze Lian, Shenghua Gao
Abstract	Anomaly detection in videos refers to the identification of events that do not conform to expected behavior. However, almost all existing methods tackle the problem by minimizing the reconstruction errors of training data, which cannot guarantee a larger reconstruction error for an abnormal event. In this paper, we propose to tackle the anomaly detection problem within a video prediction framework. To the best of our knowledge, this is the first work that leverages the difference between a predicted future frame and its ground truth to detect an abnormal event. To predict a future frame with higher quality for normal events, other than the commonly used appearance (spatial) constraints on intensity and gradient, we also introduce a motion (temporal) constraint in video prediction by enforcing the optical flow between predicted frames and ground truth frames to be consistent, and this is the first work that introduces a temporal constraint into the video prediction task. Such spatial and motion constraints facilitate the future frame prediction for normal events, and consequently facilitate to identify those abnormal events that do not conform the expectation. Extensive experiments on both a toy dataset and some publicly available datasets validate the effectiveness of our method in terms of robustness to the uncertainty in normal events and the sensitivity to abnormal events.
Tasks	Anomaly Detection, Optical Flow Estimation, Video Prediction
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Liu_Future_Frame_Prediction_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Future_Frame_Prediction_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/future-frame-prediction-for-anomaly-detection-1
Repo
Framework

Controllable Video Generation With Sparse Trajectories


Title	Controllable Video Generation With Sparse Trajectories
Authors	Zekun Hao, Xun Huang, Serge Belongie
Abstract	Video generation and manipulation is an important yet challenging task in computer vision. Existing methods usually lack ways to explicitly control the synthesized motion. In this work, we present a conditional video generation model that allows detailed control over the motion of the generated video. Given the first frame and sparse motion trajectories specified by users, our model can synthesize a video with corresponding appearance and motion. We propose to combine the advantage of copying pixels from the given frame and hallucinating the lightness difference from scratch which help generate sharp video while keeping the model robust to occlusion and lightness change. We also propose a training paradigm that calculate trajectories from video clips, which eliminated the need of annotated training data. Experiments on several standard benchmarks demonstrate that our approach can generate realistic videos comparable to state-of-the-art video generation and video prediction methods while the motion of the generated videos can correspond well with user input.
Tasks	Video Generation, Video Prediction
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Hao_Controllable_Video_Generation_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Hao_Controllable_Video_Generation_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/controllable-video-generation-with-sparse
Repo
Framework

Reward Estimation via State Prediction


Title	Reward Estimation via State Prediction
Authors	Daiki Kimura, Subhajit Chaudhury, Ryuki Tachibana, Sakyasingha Dasgupta
Abstract	Reinforcement learning typically requires carefully designed reward functions in order to learn the desired behavior. We present a novel reward estimation method that is based on a finite sample of optimal state trajectories from expert demon- strations and can be used for guiding an agent to mimic the expert behavior. The optimal state trajectories are used to learn a generative or predictive model of the “good” states distribution. The reward signal is computed by a function of the difference between the actual next state acquired by the agent and the predicted next state given by the learned generative or predictive model. With this inferred reward function, we perform standard reinforcement learning in the inner loop to guide the agent to learn the given task. Experimental evaluations across a range of tasks demonstrate that the proposed method produces superior performance compared to standard reinforcement learning with both complete or sparse hand engineered rewards. Furthermore, we show that our method successfully enables an agent to learn good actions directly from expert player video of games such as the Super Mario Bros and Flappy Bird.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=HktXuGb0-
PDF	https://openreview.net/pdf?id=HktXuGb0-
PWC	https://paperswithcode.com/paper/reward-estimation-via-state-prediction
Repo
Framework