Paper Group NANR 209
An Analysis of Encoder Representations in Transformer-Based Machine Translation. Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets. Learning to Solve Nonlinear Least Squares for Monocular Stereo. Deep Sensing: Active Sensing using Multi-directional Recurrent Neural Networks. Continuous-fidelity Bayesian Optimization with Knowled …
An Analysis of Encoder Representations in Transformer-Based Machine Translation
Title | An Analysis of Encoder Representations in Transformer-Based Machine Translation |
Authors | Aless Raganato, ro, J{"o}rg Tiedemann |
Abstract | The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the \textit{Transformer} is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics. |
Tasks | Feature Engineering, Machine Translation, Transfer Learning |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-5431/ |
https://www.aclweb.org/anthology/W18-5431 | |
PWC | https://paperswithcode.com/paper/an-analysis-of-encoder-representations-in |
Repo | |
Framework | |
Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets
Title | Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets |
Authors | Prathusha Kameswara Sarma |
Abstract | This research proposal describes two algorithms that are aimed at learning word embeddings for data sparse and sentiment rich data sets. The goal is to use word embeddings adapted for domain specific data sets in downstream applications such as sentiment classification. The first approach learns word embeddings in a supervised fashion via SWESA (Supervised Word Embeddings for Sentiment Analysis), an algorithm for sentiment analysis on data sets that are of modest size. SWESA leverages document labels to jointly learn polarity-aware word embeddings and a classifier to classify unseen documents. In the second approach domain adapted (DA) word embeddings are learned by exploiting the specificity of domain specific data sets and the breadth of generic word embeddings. The new embeddings are formed by aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA. Experimental results on binary sentiment classification tasks using both approaches for standard data sets are presented. |
Tasks | Learning Word Embeddings, Sentiment Analysis, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/N18-4007/ |
https://www.aclweb.org/anthology/N18-4007 | |
PWC | https://paperswithcode.com/paper/learning-word-embeddings-for-data-sparse-and |
Repo | |
Framework | |
Learning to Solve Nonlinear Least Squares for Monocular Stereo
Title | Learning to Solve Nonlinear Least Squares for Monocular Stereo |
Authors | Ronald Clark, Michael Bloesch, Jan Czarnowski, Stefan Leutenegger, Andrew J. Davison |
Abstract | Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem. |
Tasks | |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Ronald_Clark_Neural_Nonlinear_least_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Ronald_Clark_Neural_Nonlinear_least_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-solve-nonlinear-least-squares-for |
Repo | |
Framework | |
Deep Sensing: Active Sensing using Multi-directional Recurrent Neural Networks
Title | Deep Sensing: Active Sensing using Multi-directional Recurrent Neural Networks |
Authors | Jinsung Yoon, William R. Zame, Mihaela van der Schaar |
Abstract | For every prediction we might wish to make, we must decide what to observe (what source of information) and when to observe it. Because making observations is costly, this decision must trade off the value of information against the cost of observation. Making observations (sensing) should be an active choice. To solve the problem of active sensing we develop a novel deep learning architecture: Deep Sensing. At training time, Deep Sensing learns how to issue predictions at various cost-performance points. To do this, it creates multiple representations at various performance levels associated with different measurement rates (costs). This requires learning how to estimate the value of real measurements vs. inferred measurements, which in turn requires learning how to infer missing (unobserved) measurements. To infer missing measurements, we develop a Multi-directional Recurrent Neural Network (M-RNN). An M-RNN differs from a bi-directional RNN in that it sequentially operates across streams in addition to within streams, and because the timing of inputs into the hidden layers is both lagged and advanced. At runtime, the operator prescribes a performance level or a cost constraint, and Deep Sensing determines what measurements to take and what to infer from those measurements, and then issues predictions. To demonstrate the power of our method, we apply it to two real-world medical datasets with significantly improved performance. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=r1SnX5xCb |
https://openreview.net/pdf?id=r1SnX5xCb | |
PWC | https://paperswithcode.com/paper/deep-sensing-active-sensing-using-multi |
Repo | |
Framework | |
Continuous-fidelity Bayesian Optimization with Knowledge Gradient
Title | Continuous-fidelity Bayesian Optimization with Knowledge Gradient |
Authors | Jian Wu, Peter I. Frazier |
Abstract | While Bayesian optimization (BO) has achieved great success in optimizing expensive-to-evaluate black-box functions, especially tuning hyperparameters of neural networks, methods such as random search (Li et al., 2016) and multi-fidelity BO (e.g. Klein et al. (2017)) that exploit cheap approximations, e.g. training on a smaller training data or with fewer iterations, can outperform standard BO approaches that use only full-fidelity observations. In this paper, we propose a novel Bayesian optimization algorithm, the continuous-fidelity knowledge gradient (cfKG) method, that can be used when fidelity is controlled by one or more continuous settings such as training data size and the number of training iterations. cfKG characterizes the value of the information gained by sampling a point at a given fidelity, choosing to sample at the point and fidelity with the largest value per unit cost. Furthermore, cfKG can be generalized, following Wu et al. (2017), to settings where derivatives are available in the optimization process, e.g. large-scale kernel learning, and where more than one point can be evaluated simultaneously. Numerical experiments show that cfKG outperforms state-of-art algorithms when optimizing synthetic functions, tuning convolutional neural networks (CNNs) on CIFAR-10 and SVHN, and in large-scale kernel learning. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=SknC0bW0- |
https://openreview.net/pdf?id=SknC0bW0- | |
PWC | https://paperswithcode.com/paper/continuous-fidelity-bayesian-optimization |
Repo | |
Framework | |
Predicting misreadings from gaze in children with reading difficulties
Title | Predicting misreadings from gaze in children with reading difficulties |
Authors | Joachim Bingel, Maria Barrett, Sigrid Klerke |
Abstract | We present the first work on predicting reading mistakes in children with reading difficulties based on eye-tracking data from real-world reading teaching. Our approach employs several linguistic and gaze-based features to inform an ensemble of different classifiers, including multi-task learning models that let us transfer knowledge about individual readers to attain better predictions. Notably, the data we use in this work stems from noisy readings in the wild, outside of controlled lab conditions. Our experiments show that despite the noise and despite the small fraction of misreadings, gaze data improves the performance more than any other feature group and our models achieve good performance. We further show that gaze patterns for misread words do not fully generalize across readers, but that we can transfer some knowledge between readers using multitask learning at least in some cases. Applications of our models include partial automation of reading assessment as well as personalized text simplification. |
Tasks | Eye Tracking, Multi-Task Learning, Reading Comprehension, Text Simplification |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-0503/ |
https://www.aclweb.org/anthology/W18-0503 | |
PWC | https://paperswithcode.com/paper/predicting-misreadings-from-gaze-in-children |
Repo | |
Framework | |
Anomaly Detection with Generative Adversarial Networks
Title | Anomaly Detection with Generative Adversarial Networks |
Authors | Lucas Deecke, Robert Vandermeulen, Lukas Ruff, Stephan Mandt, Marius Kloft |
Abstract | Many anomaly detection methods exist that perform well on low-dimensional problems however there is a notable lack of effective methods for high-dimensional spaces, such as images. Inspired by recent successes in deep learning we propose a novel approach to anomaly detection using generative adversarial networks. Given a sample under consideration, our method is based on searching for a good representation of that sample in the latent space of the generator; if such a representation is not found, the sample is deemed anomalous. We achieve state-of-the-art performance on standard image benchmark datasets and visual inspection of the most anomalous samples reveals that our method does indeed return anomalies. |
Tasks | Anomaly Detection |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=S1EfylZ0Z |
https://openreview.net/pdf?id=S1EfylZ0Z | |
PWC | https://paperswithcode.com/paper/anomaly-detection-with-generative-adversarial-1 |
Repo | |
Framework | |
On Neuronal Capacity
Title | On Neuronal Capacity |
Authors | Pierre Baldi, Roman Vershynin |
Abstract | We define the capacity of a learning machine to be the logarithm of the number (or volume) of the functions it can implement. We review known results, and derive new results, estimating the capacity of several neuronal models: linear and polynomial threshold gates, linear and polynomial threshold gates with constrained weights (binary weights, positive weights), and ReLU neurons. We also derive capacity estimates and bounds for fully recurrent networks and layered feedforward networks. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7999-on-neuronal-capacity |
http://papers.nips.cc/paper/7999-on-neuronal-capacity.pdf | |
PWC | https://paperswithcode.com/paper/on-neuronal-capacity |
Repo | |
Framework | |
rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions
Title | rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions |
Authors | Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye |
Abstract | Many state-of-the-art algorithms for solving Partially Observable Markov Decision Processes (POMDPs) rely on turning the problem into a “fully observable” problem—a belief MDP—and exploiting the piece-wise linearity and convexity (PWLC) of the optimal value function in this new state space (the belief simplex ∆). This approach has been extended to solving ρ-POMDPs—i.e., for information-oriented criteria—when the reward ρ is convex in ∆. General ρ-POMDPs can also be turned into “fully observable” problems, but with no means to exploit the PWLC property. In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous. Then, value function approximators are proposed for both upper- and lower-bounding the optimal value function, which are shown to provide uniformly improvable bounds. This allows proposing two algorithms derived from HSVI which are empirically evaluated on various benchmark problems. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7925-rho-pomdps-have-lipschitz-continuous-epsilon-optimal-value-functions |
http://papers.nips.cc/paper/7925-rho-pomdps-have-lipschitz-continuous-epsilon-optimal-value-functions.pdf | |
PWC | https://paperswithcode.com/paper/rho-pomdps-have-lipschitz-continuous-epsilon |
Repo | |
Framework | |
Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it
Title | Towards Continuous Dialogue Corpus Creation: writing to corpus and generating from it |
Authors | Andrei Malchanau, Volha Petukhova, Harry Bunt |
Abstract | |
Tasks | |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1121/ |
https://www.aclweb.org/anthology/L18-1121 | |
PWC | https://paperswithcode.com/paper/towards-continuous-dialogue-corpus-creation |
Repo | |
Framework | |
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Title | Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis |
Authors | |
Abstract | |
Tasks | |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-5600/ |
https://www.aclweb.org/anthology/W18-5600 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-ninth-international-1 |
Repo | |
Framework | |
Structure Preserving Video Prediction
Title | Structure Preserving Video Prediction |
Authors | Jingwei Xu, Bingbing Ni, Zefan Li, Shuo Cheng, Xiaokang Yang |
Abstract | Despite recent emergence of adversarial based methods for video prediction, existing algorithms often produce unsatisfied results in image regions with rich structural information (i.e., object boundary) and detailed motion (i.e., articulated body movement). To this end, we present a structure preserving video prediction framework to explicitly address above issues and enhance video prediction quality. On one hand, our framework contains a two-stream generation architecture which deals with high frequency video content (i.e., detailed object or articulated motion structure) and low frequency video content (i.e., location or moving directions) in two separate streams. On the other hand, we propose a RNN structure for video prediction, which employs temporal-adaptive convolutional kernels to capture time-varying motion patterns as well as the tiny object within a scene. Extensive experiments on diverse scene, ranging from human motion to semantic layout prediction, demonstrate the effectiveness of the proposed video prediction approach. |
Tasks | Video Prediction |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Xu_Structure_Preserving_Video_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_Structure_Preserving_Video_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/structure-preserving-video-prediction |
Repo | |
Framework | |
Future Frame Prediction for Anomaly Detection â A New Baseline
Title | Future Frame Prediction for Anomaly Detection â A New Baseline |
Authors | Wen Liu, Weixin Luo, Dongze Lian, Shenghua Gao |
Abstract | Anomaly detection in videos refers to the identification of events that do not conform to expected behavior. However, almost all existing methods tackle the problem by minimizing the reconstruction errors of training data, which cannot guarantee a larger reconstruction error for an abnormal event. In this paper, we propose to tackle the anomaly detection problem within a video prediction framework. To the best of our knowledge, this is the first work that leverages the difference between a predicted future frame and its ground truth to detect an abnormal event. To predict a future frame with higher quality for normal events, other than the commonly used appearance (spatial) constraints on intensity and gradient, we also introduce a motion (temporal) constraint in video prediction by enforcing the optical flow between predicted frames and ground truth frames to be consistent, and this is the first work that introduces a temporal constraint into the video prediction task. Such spatial and motion constraints facilitate the future frame prediction for normal events, and consequently facilitate to identify those abnormal events that do not conform the expectation. Extensive experiments on both a toy dataset and some publicly available datasets validate the effectiveness of our method in terms of robustness to the uncertainty in normal events and the sensitivity to abnormal events. |
Tasks | Anomaly Detection, Optical Flow Estimation, Video Prediction |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Liu_Future_Frame_Prediction_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Future_Frame_Prediction_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/future-frame-prediction-for-anomaly-detection-1 |
Repo | |
Framework | |
Controllable Video Generation With Sparse Trajectories
Title | Controllable Video Generation With Sparse Trajectories |
Authors | Zekun Hao, Xun Huang, Serge Belongie |
Abstract | Video generation and manipulation is an important yet challenging task in computer vision. Existing methods usually lack ways to explicitly control the synthesized motion. In this work, we present a conditional video generation model that allows detailed control over the motion of the generated video. Given the first frame and sparse motion trajectories specified by users, our model can synthesize a video with corresponding appearance and motion. We propose to combine the advantage of copying pixels from the given frame and hallucinating the lightness difference from scratch which help generate sharp video while keeping the model robust to occlusion and lightness change. We also propose a training paradigm that calculate trajectories from video clips, which eliminated the need of annotated training data. Experiments on several standard benchmarks demonstrate that our approach can generate realistic videos comparable to state-of-the-art video generation and video prediction methods while the motion of the generated videos can correspond well with user input. |
Tasks | Video Generation, Video Prediction |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Hao_Controllable_Video_Generation_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Hao_Controllable_Video_Generation_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/controllable-video-generation-with-sparse |
Repo | |
Framework | |
Reward Estimation via State Prediction
Title | Reward Estimation via State Prediction |
Authors | Daiki Kimura, Subhajit Chaudhury, Ryuki Tachibana, Sakyasingha Dasgupta |
Abstract | Reinforcement learning typically requires carefully designed reward functions in order to learn the desired behavior. We present a novel reward estimation method that is based on a finite sample of optimal state trajectories from expert demon- strations and can be used for guiding an agent to mimic the expert behavior. The optimal state trajectories are used to learn a generative or predictive model of the “good” states distribution. The reward signal is computed by a function of the difference between the actual next state acquired by the agent and the predicted next state given by the learned generative or predictive model. With this inferred reward function, we perform standard reinforcement learning in the inner loop to guide the agent to learn the given task. Experimental evaluations across a range of tasks demonstrate that the proposed method produces superior performance compared to standard reinforcement learning with both complete or sparse hand engineered rewards. Furthermore, we show that our method successfully enables an agent to learn good actions directly from expert player video of games such as the Super Mario Bros and Flappy Bird. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=HktXuGb0- |
https://openreview.net/pdf?id=HktXuGb0- | |
PWC | https://paperswithcode.com/paper/reward-estimation-via-state-prediction |
Repo | |
Framework | |