May 6, 2019

2756 words 13 mins read

Paper Group ANR 407

Interactive Attention for Neural Machine Translation. Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis. Comparison of Optimization Methods in Optical Flow Estimation. A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning. Quantile Reinforcement Learning. Binary Paragraph Vectors. Learning to Navigat …

Interactive Attention for Neural Machine Translation


Title	Interactive Attention for Neural Machine Translation
Authors	Fandong Meng, Zhengdong Lu, Hang Li, Qun Liu
Abstract	Conventional attention-based Neural Machine Translation (NMT) conducts dynamic alignment in generating the target sentence. By repeatedly reading the representation of source sentence, which keeps fixed after generated by the encoder (Bahdanau et al., 2015), the attention mechanism has greatly enhanced state-of-the-art NMT. In this paper, we propose a new attention mechanism, called INTERACTIVE ATTENTION, which models the interaction between the decoder and the representation of source sentence during translation by both reading and writing operations. INTERACTIVE ATTENTION can keep track of the interaction history and therefore improve the translation performance. Experiments on NIST Chinese-English translation task show that INTERACTIVE ATTENTION can achieve significant improvements over both the previous attention-based NMT baseline and some state-of-the-art variants of attention-based NMT (i.e., coverage models (Tu et al., 2016)). And neural machine translator with our INTERACTIVE ATTENTION can outperform the open source attention-based NMT system Groundhog by 4.22 BLEU points and the open source phrase-based system Moses by 3.94 BLEU points averagely on multiple test sets.
Tasks	Machine Translation
Published	2016-10-17
URL	http://arxiv.org/abs/1610.05011v1
PDF	http://arxiv.org/pdf/1610.05011v1.pdf
PWC	https://paperswithcode.com/paper/interactive-attention-for-neural-machine
Repo
Framework

Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis


Title	Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis
Authors	Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier, Xiaokui Xiao
Abstract	In aspect-based sentiment analysis, extracting aspect terms along with the opinions being expressed from user-generated content is one of the most important subtasks. Previous studies have shown that exploiting connections between aspect and opinion terms is promising for this task. In this paper, we propose a novel joint model that integrates recursive neural networks and conditional random fields into a unified framework for explicit aspect and opinion terms co-extraction. The proposed model learns high-level discriminative features and double propagate information between aspect and opinion terms, simultaneously. Moreover, it is flexible to incorporate hand-crafted features into the proposed model to further boost its information extraction performance. Experimental results on the SemEval Challenge 2014 dataset show the superiority of our proposed model over several baseline methods as well as the winning systems of the challenge.
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2016-03-22
URL	http://arxiv.org/abs/1603.06679v3
PDF	http://arxiv.org/pdf/1603.06679v3.pdf
PWC	https://paperswithcode.com/paper/recursive-neural-conditional-random-fields
Repo
Framework

Comparison of Optimization Methods in Optical Flow Estimation


Title	Comparison of Optimization Methods in Optical Flow Estimation
Authors	Noranart Vesdapunt, Utkarsh Sinha
Abstract	Optical flow estimation is a widely known problem in computer vision introduced by Gibson, J.J(1950) to describe the visual perception of human by stimulus objects. Estimation of optical flow model can be achieved by solving for the motion vectors from region of interest in the the different timeline. In this paper, we assumed slightly uniform change of velocity between two nearby frames, and solve the optical flow problem by traditional method, Lucas-Kanade(1981). This method performs minimization of errors between template and target frame warped back onto the template. Solving minimization steps requires optimization methods which have diverse convergence rate and error. We explored first and second order optimization methods, and compare their results with Gauss-Newton method in Lucas-Kanade. We generated 105 videos with 10,500 frames by synthetic objects, and 10 videos with 1,000 frames from real world footage. Our experimental results could be used as tuning parameters for Lucas-Kanade method.
Tasks	Optical Flow Estimation
Published	2016-05-02
URL	http://arxiv.org/abs/1605.00572v1
PDF	http://arxiv.org/pdf/1605.00572v1.pdf
PWC	https://paperswithcode.com/paper/comparison-of-optimization-methods-in-optical
Repo
Framework

A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning


Title	A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning
Authors	Aryan Mokhtari, Alec Koppel, Alejandro Ribeiro
Abstract	We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple parallel processors to operate on a randomly chosen subset of blocks of the feature vector. We call the algorithm stochastic because processors choose training subsets uniformly at random. Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both the selection of blocks and the selection of elements of the training set. In RAPSA, processors utilize the randomly chosen functions to compute the stochastic gradient component associated with a randomly chosen block. The technical contribution of this paper is to show that this minimally coordinated algorithm converges to the optimal classifier when the training objective is convex. Moreover, we present an accelerated version of RAPSA (ARAPSA) that incorporates the objective function curvature information by premultiplying the descent direction by a Hessian approximation matrix. We further extend the results for asynchronous settings and show that if the processors perform their updates without any coordination the algorithms are still convergent to the optimal argument. RAPSA and its extensions are then numerically evaluated on a linear estimation problem and a binary image classification task using the MNIST handwritten digit dataset.
Tasks	Image Classification
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04991v1
PDF	http://arxiv.org/pdf/1606.04991v1.pdf
PWC	https://paperswithcode.com/paper/a-class-of-parallel-doubly-stochastic
Repo
Framework

Quantile Reinforcement Learning


Title	Quantile Reinforcement Learning
Authors	Hugo Gilbert, Paul Weng
Abstract	In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire.
Tasks
Published	2016-11-03
URL	http://arxiv.org/abs/1611.00862v1
PDF	http://arxiv.org/pdf/1611.00862v1.pdf
PWC	https://paperswithcode.com/paper/quantile-reinforcement-learning
Repo
Framework

Binary Paragraph Vectors


Title	Binary Paragraph Vectors
Authors	Karol Grzegorczyk, Marcin Kurdziel
Abstract	Recently Le & Mikolov described two log-linear models, called Paragraph Vector, that can be used to learn state-of-the-art distributed representations of documents. Inspired by this work, we present Binary Paragraph Vector models: simple neural networks that learn short binary codes for fast information retrieval. We show that binary paragraph vectors outperform autoencoder-based binary codes, despite using fewer bits. We also evaluate their precision in transfer learning settings, where binary codes are inferred for documents unrelated to the training corpus. Results from these experiments indicate that binary paragraph vectors can capture semantics relevant for various domain-specific documents. Finally, we present a model that simultaneously learns short binary codes and longer, real-valued representations. This model can be used to rapidly retrieve a short list of highly relevant documents from a large document collection.
Tasks	Information Retrieval, Transfer Learning
Published	2016-11-03
URL	http://arxiv.org/abs/1611.01116v3
PDF	http://arxiv.org/pdf/1611.01116v3.pdf
PWC	https://paperswithcode.com/paper/binary-paragraph-vectors
Repo
Framework

Learning to Navigate in Complex Environments


Title	Learning to Navigate in Complex Environments
Authors	Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell
Abstract	Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities.
Tasks	Depth Estimation
Published	2016-11-11
URL	http://arxiv.org/abs/1611.03673v3
PDF	http://arxiv.org/pdf/1611.03673v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-navigate-in-complex-environments
Repo
Framework

Visualizing Natural Language Descriptions: A Survey


Title	Visualizing Natural Language Descriptions: A Survey
Authors	Kaveh Hassani, Won-Sook Lee
Abstract	A natural language interface exploits the conceptual simplicity and naturalness of the language to create a high-level user-friendly communication channel between humans and machines. One of the promising applications of such interfaces is generating visual interpretations of semantic content of a given natural language that can be then visualized either as a static scene or a dynamic animation. This survey discusses requirements and challenges of developing such systems and reports 26 graphical systems that exploit natural language interfaces and addresses both artificial intelligence and visualization aspects. This work serves as a frame of reference to researchers and to enable further advances in the field.
Tasks
Published	2016-07-03
URL	http://arxiv.org/abs/1607.00623v1
PDF	http://arxiv.org/pdf/1607.00623v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-natural-language-descriptions-a
Repo
Framework

Unit Commitment using Nearest Neighbor as a Short-Term Proxy


Title	Unit Commitment using Nearest Neighbor as a Short-Term Proxy
Authors	Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel
Abstract	We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems. Experimental results on updated versions of IEEE-RTS79 and IEEE-RTS96 show high accuracy measured on operational cost, achieved in runtimes that are lower in several orders of magnitude than the traditional approach.
Tasks
Published	2016-11-30
URL	http://arxiv.org/abs/1611.10215v3
PDF	http://arxiv.org/pdf/1611.10215v3.pdf
PWC	https://paperswithcode.com/paper/unit-commitment-using-nearest-neighbor-as-a
Repo
Framework

Reducing the error of Monte Carlo Algorithms by Learning Control Variates


Title	Reducing the error of Monte Carlo Algorithms by Learning Control Variates
Authors	Brendan D. Tracey, David H. Wolpert
Abstract	Monte Carlo (MC) sampling algorithms are an extremely widely-used technique to estimate expectations of functions f(x), especially in high dimensions. Control variates are a very powerful technique to reduce the error of such estimates, but in their conventional form rely on having an accurate approximation of f, a priori. Stacked Monte Carlo (StackMC) is a recently introduced technique designed to overcome this limitation by fitting a control variate to the data samples themselves. Done naively, forming a control variate to the data would result in overfitting, typically worsening the MC algorithm’s performance. StackMC uses in-sample / out-sample techniques to remove this overfitting. Crucially, it is a post-processing technique, requiring no additional samples, and can be applied to data generated by any MC estimator. Our preliminary experiments demonstrated that StackMC improved the estimates of expectations when it was used to post-process samples produces by a “simple sampling” MC estimator. Here we substantially extend this earlier work. We provide an in-depth analysis of the StackMC algorithm, which we use to construct an improved version of the original algorithm, with lower estimation error. We then perform experiments of StackMC on several additional kinds of MC estimators, demonstrating improved performance when the samples are generated via importance sampling, Latin-hypercube sampling and quasi-Monte Carlo sampling. We also show how to extend StackMC to combine multiple fitting functions, and how to apply it to discrete input spaces x.
Tasks
Published	2016-06-07
URL	http://arxiv.org/abs/1606.02261v1
PDF	http://arxiv.org/pdf/1606.02261v1.pdf
PWC	https://paperswithcode.com/paper/reducing-the-error-of-monte-carlo-algorithms
Repo
Framework

Efficient Convolutional Neural Network with Binary Quantization Layer


Title	Efficient Convolutional Neural Network with Binary Quantization Layer
Authors	Mahdyar Ravanbakhsh, Hossein Mousavi, Moin Nabi, Lucio Marcenaro, Carlo Regazzoni
Abstract	In this paper we introduce a novel method for segmentation that can benefit from general semantics of Convolutional Neural Network (CNN). Our segmentation proposes visually and semantically coherent image segments. We use binary encoding of CNN features to overcome the difficulty of the clustering on the high-dimensional CNN feature space. These binary encoding can be embedded into the CNN as an extra layer at the end of the network. This results in real-time segmentation. To the best of our knowledge our method is the first attempt on general semantic image segmentation using CNN. All the previous papers were limited to few number of category of the images (e.g. PASCAL VOC). Experiments show that our segmentation algorithm outperform the state-of-the-art non-semantic segmentation methods by a large margin.
Tasks	Quantization, Semantic Segmentation
Published	2016-11-21
URL	http://arxiv.org/abs/1611.06764v1
PDF	http://arxiv.org/pdf/1611.06764v1.pdf
PWC	https://paperswithcode.com/paper/efficient-convolutional-neural-network-with
Repo
Framework

The Price of Anarchy in Auctions


Title	The Price of Anarchy in Auctions
Authors	Tim Roughgarden, Vasilis Syrgkanis, Eva Tardos
Abstract	This survey outlines a general and modular theory for proving approximation guarantees for equilibria of auctions in complex settings. This theory complements traditional economic techniques, which generally focus on exact and optimal solutions and are accordingly limited to relatively stylized settings. We highlight three user-friendly analytical tools: smoothness-type inequalities, which immediately yield approximation guarantees for many auction formats of interest in the special case of complete information and deterministic strategies; extension theorems, which extend such guarantees to randomized strategies, no-regret learning outcomes, and incomplete-information settings; and composition theorems, which extend such guarantees from simpler to more complex auctions. Combining these tools yields tight worst-case approximation guarantees for the equilibria of many widely-used auction formats.
Tasks
Published	2016-07-26
URL	http://arxiv.org/abs/1607.07684v1
PDF	http://arxiv.org/pdf/1607.07684v1.pdf
PWC	https://paperswithcode.com/paper/the-price-of-anarchy-in-auctions
Repo
Framework

A Stratified Analysis of Bayesian Optimization Methods


Title	A Stratified Analysis of Bayesian Optimization Methods
Authors	Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke
Abstract	Empirical analysis serves as an important complement to theoretical analysis for studying practical Bayesian optimization. Often empirical insights expose strengths and weaknesses inaccessible to theoretical analysis. We define two metrics for comparing the performance of Bayesian optimization methods and propose a ranking mechanism for summarizing performance within various genres or strata of test functions. These test functions serve to mimic the complexity of hyperparameter optimization problems, the most prominent application of Bayesian optimization, but with a closed form which allows for rapid evaluation and more predictable behavior. This offers a flexible and efficient way to investigate functions with specific properties of interest, such as oscillatory behavior or an optimum on the domain boundary.
Tasks	Hyperparameter Optimization
Published	2016-03-31
URL	http://arxiv.org/abs/1603.09441v1
PDF	http://arxiv.org/pdf/1603.09441v1.pdf
PWC	https://paperswithcode.com/paper/a-stratified-analysis-of-bayesian
Repo
Framework

Fitting Spectral Decay with the $k$-Support Norm


Title	Fitting Spectral Decay with the $k$-Support Norm
Authors	Andrew M. McDonald, Massimiliano Pontil, Dimitris Stamos
Abstract	The spectral $k$-support norm enjoys good estimation properties in low rank matrix learning problems, empirically outperforming the trace norm. Its unit ball is the convex hull of rank $k$ matrices with unit Frobenius norm. In this paper we generalize the norm to the spectral $(k,p)$-support norm, whose additional parameter $p$ can be used to tailor the norm to the decay of the spectrum of the underlying model. We characterize the unit ball and we explicitly compute the norm. We further provide a conditional gradient method to solve regularization problems with the norm, and we derive an efficient algorithm to compute the Euclidean projection on the unit ball in the case $p=\infty$. In numerical experiments, we show that allowing $p$ to vary significantly improves performance over the spectral $k$-support norm on various matrix completion benchmarks, and better captures the spectral decay of the underlying model.
Tasks	Matrix Completion
Published	2016-01-04
URL	http://arxiv.org/abs/1601.00449v1
PDF	http://arxiv.org/pdf/1601.00449v1.pdf
PWC	https://paperswithcode.com/paper/fitting-spectral-decay-with-the-k-support
Repo
Framework

A Communication-Efficient Parallel Algorithm for Decision Tree


Title	A Communication-Efficient Parallel Algorithm for Decision Tree
Authors	Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu
Abstract	Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs. In this paper, we propose a new algorithm, called \emph{Parallel Voting Decision Tree (PV-Tree)}, to tackle this challenge. After partitioning the training data onto a number of (e.g., $M$) machines, this algorithm performs both local voting and global voting in each iteration. For local voting, the top-$k$ attributes are selected from each machine according to its local data. Then, globally top-$2k$ attributes are determined by a majority voting among these local candidates. Finally, the full-grained histograms of the globally top-$2k$ attributes are collected from local machines in order to identify the best (most informative) attribute and its split point. PV-Tree can achieve a very low communication cost (independent of the total number of attributes) and thus can scale out very well. Furthermore, theoretical analysis shows that this algorithm can learn a near optimal decision tree, since it can find the best attribute with a large probability. Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the trade-off between accuracy and efficiency.
Tasks
Published	2016-11-04
URL	http://arxiv.org/abs/1611.01276v1
PDF	http://arxiv.org/pdf/1611.01276v1.pdf
PWC	https://paperswithcode.com/paper/a-communication-efficient-parallel-algorithm
Repo
Framework