Paper Group ANR 407
Interactive Attention for Neural Machine Translation. Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis. Comparison of Optimization Methods in Optical Flow Estimation. A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning. Quantile Reinforcement Learning. Binary Paragraph Vectors. Learning to Navigat …
Interactive Attention for Neural Machine Translation
Title | Interactive Attention for Neural Machine Translation |
Authors | Fandong Meng, Zhengdong Lu, Hang Li, Qun Liu |
Abstract | Conventional attention-based Neural Machine Translation (NMT) conducts dynamic alignment in generating the target sentence. By repeatedly reading the representation of source sentence, which keeps fixed after generated by the encoder (Bahdanau et al., 2015), the attention mechanism has greatly enhanced state-of-the-art NMT. In this paper, we propose a new attention mechanism, called INTERACTIVE ATTENTION, which models the interaction between the decoder and the representation of source sentence during translation by both reading and writing operations. INTERACTIVE ATTENTION can keep track of the interaction history and therefore improve the translation performance. Experiments on NIST Chinese-English translation task show that INTERACTIVE ATTENTION can achieve significant improvements over both the previous attention-based NMT baseline and some state-of-the-art variants of attention-based NMT (i.e., coverage models (Tu et al., 2016)). And neural machine translator with our INTERACTIVE ATTENTION can outperform the open source attention-based NMT system Groundhog by 4.22 BLEU points and the open source phrase-based system Moses by 3.94 BLEU points averagely on multiple test sets. |
Tasks | Machine Translation |
Published | 2016-10-17 |
URL | http://arxiv.org/abs/1610.05011v1 |
http://arxiv.org/pdf/1610.05011v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-attention-for-neural-machine |
Repo | |
Framework | |
Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis
Title | Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis |
Authors | Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier, Xiaokui Xiao |
Abstract | In aspect-based sentiment analysis, extracting aspect terms along with the opinions being expressed from user-generated content is one of the most important subtasks. Previous studies have shown that exploiting connections between aspect and opinion terms is promising for this task. In this paper, we propose a novel joint model that integrates recursive neural networks and conditional random fields into a unified framework for explicit aspect and opinion terms co-extraction. The proposed model learns high-level discriminative features and double propagate information between aspect and opinion terms, simultaneously. Moreover, it is flexible to incorporate hand-crafted features into the proposed model to further boost its information extraction performance. Experimental results on the SemEval Challenge 2014 dataset show the superiority of our proposed model over several baseline methods as well as the winning systems of the challenge. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2016-03-22 |
URL | http://arxiv.org/abs/1603.06679v3 |
http://arxiv.org/pdf/1603.06679v3.pdf | |
PWC | https://paperswithcode.com/paper/recursive-neural-conditional-random-fields |
Repo | |
Framework | |
Comparison of Optimization Methods in Optical Flow Estimation
Title | Comparison of Optimization Methods in Optical Flow Estimation |
Authors | Noranart Vesdapunt, Utkarsh Sinha |
Abstract | Optical flow estimation is a widely known problem in computer vision introduced by Gibson, J.J(1950) to describe the visual perception of human by stimulus objects. Estimation of optical flow model can be achieved by solving for the motion vectors from region of interest in the the different timeline. In this paper, we assumed slightly uniform change of velocity between two nearby frames, and solve the optical flow problem by traditional method, Lucas-Kanade(1981). This method performs minimization of errors between template and target frame warped back onto the template. Solving minimization steps requires optimization methods which have diverse convergence rate and error. We explored first and second order optimization methods, and compare their results with Gauss-Newton method in Lucas-Kanade. We generated 105 videos with 10,500 frames by synthetic objects, and 10 videos with 1,000 frames from real world footage. Our experimental results could be used as tuning parameters for Lucas-Kanade method. |
Tasks | Optical Flow Estimation |
Published | 2016-05-02 |
URL | http://arxiv.org/abs/1605.00572v1 |
http://arxiv.org/pdf/1605.00572v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-optimization-methods-in-optical |
Repo | |
Framework | |
A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning
Title | A Class of Parallel Doubly Stochastic Algorithms for Large-Scale Learning |
Authors | Aryan Mokhtari, Alec Koppel, Alejandro Ribeiro |
Abstract | We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple parallel processors to operate on a randomly chosen subset of blocks of the feature vector. We call the algorithm stochastic because processors choose training subsets uniformly at random. Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both the selection of blocks and the selection of elements of the training set. In RAPSA, processors utilize the randomly chosen functions to compute the stochastic gradient component associated with a randomly chosen block. The technical contribution of this paper is to show that this minimally coordinated algorithm converges to the optimal classifier when the training objective is convex. Moreover, we present an accelerated version of RAPSA (ARAPSA) that incorporates the objective function curvature information by premultiplying the descent direction by a Hessian approximation matrix. We further extend the results for asynchronous settings and show that if the processors perform their updates without any coordination the algorithms are still convergent to the optimal argument. RAPSA and its extensions are then numerically evaluated on a linear estimation problem and a binary image classification task using the MNIST handwritten digit dataset. |
Tasks | Image Classification |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04991v1 |
http://arxiv.org/pdf/1606.04991v1.pdf | |
PWC | https://paperswithcode.com/paper/a-class-of-parallel-doubly-stochastic |
Repo | |
Framework | |
Quantile Reinforcement Learning
Title | Quantile Reinforcement Learning |
Authors | Hugo Gilbert, Paul Weng |
Abstract | In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire. |
Tasks | |
Published | 2016-11-03 |
URL | http://arxiv.org/abs/1611.00862v1 |
http://arxiv.org/pdf/1611.00862v1.pdf | |
PWC | https://paperswithcode.com/paper/quantile-reinforcement-learning |
Repo | |
Framework | |
Binary Paragraph Vectors
Title | Binary Paragraph Vectors |
Authors | Karol Grzegorczyk, Marcin Kurdziel |
Abstract | Recently Le & Mikolov described two log-linear models, called Paragraph Vector, that can be used to learn state-of-the-art distributed representations of documents. Inspired by this work, we present Binary Paragraph Vector models: simple neural networks that learn short binary codes for fast information retrieval. We show that binary paragraph vectors outperform autoencoder-based binary codes, despite using fewer bits. We also evaluate their precision in transfer learning settings, where binary codes are inferred for documents unrelated to the training corpus. Results from these experiments indicate that binary paragraph vectors can capture semantics relevant for various domain-specific documents. Finally, we present a model that simultaneously learns short binary codes and longer, real-valued representations. This model can be used to rapidly retrieve a short list of highly relevant documents from a large document collection. |
Tasks | Information Retrieval, Transfer Learning |
Published | 2016-11-03 |
URL | http://arxiv.org/abs/1611.01116v3 |
http://arxiv.org/pdf/1611.01116v3.pdf | |
PWC | https://paperswithcode.com/paper/binary-paragraph-vectors |
Repo | |
Framework | |
Learning to Navigate in Complex Environments
Title | Learning to Navigate in Complex Environments |
Authors | Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell |
Abstract | Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities. |
Tasks | Depth Estimation |
Published | 2016-11-11 |
URL | http://arxiv.org/abs/1611.03673v3 |
http://arxiv.org/pdf/1611.03673v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-navigate-in-complex-environments |
Repo | |
Framework | |
Visualizing Natural Language Descriptions: A Survey
Title | Visualizing Natural Language Descriptions: A Survey |
Authors | Kaveh Hassani, Won-Sook Lee |
Abstract | A natural language interface exploits the conceptual simplicity and naturalness of the language to create a high-level user-friendly communication channel between humans and machines. One of the promising applications of such interfaces is generating visual interpretations of semantic content of a given natural language that can be then visualized either as a static scene or a dynamic animation. This survey discusses requirements and challenges of developing such systems and reports 26 graphical systems that exploit natural language interfaces and addresses both artificial intelligence and visualization aspects. This work serves as a frame of reference to researchers and to enable further advances in the field. |
Tasks | |
Published | 2016-07-03 |
URL | http://arxiv.org/abs/1607.00623v1 |
http://arxiv.org/pdf/1607.00623v1.pdf | |
PWC | https://paperswithcode.com/paper/visualizing-natural-language-descriptions-a |
Repo | |
Framework | |
Unit Commitment using Nearest Neighbor as a Short-Term Proxy
Title | Unit Commitment using Nearest Neighbor as a Short-Term Proxy |
Authors | Gal Dalal, Elad Gilboa, Shie Mannor, Louis Wehenkel |
Abstract | We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems. Experimental results on updated versions of IEEE-RTS79 and IEEE-RTS96 show high accuracy measured on operational cost, achieved in runtimes that are lower in several orders of magnitude than the traditional approach. |
Tasks | |
Published | 2016-11-30 |
URL | http://arxiv.org/abs/1611.10215v3 |
http://arxiv.org/pdf/1611.10215v3.pdf | |
PWC | https://paperswithcode.com/paper/unit-commitment-using-nearest-neighbor-as-a |
Repo | |
Framework | |
Reducing the error of Monte Carlo Algorithms by Learning Control Variates
Title | Reducing the error of Monte Carlo Algorithms by Learning Control Variates |
Authors | Brendan D. Tracey, David H. Wolpert |
Abstract | Monte Carlo (MC) sampling algorithms are an extremely widely-used technique to estimate expectations of functions f(x), especially in high dimensions. Control variates are a very powerful technique to reduce the error of such estimates, but in their conventional form rely on having an accurate approximation of f, a priori. Stacked Monte Carlo (StackMC) is a recently introduced technique designed to overcome this limitation by fitting a control variate to the data samples themselves. Done naively, forming a control variate to the data would result in overfitting, typically worsening the MC algorithm’s performance. StackMC uses in-sample / out-sample techniques to remove this overfitting. Crucially, it is a post-processing technique, requiring no additional samples, and can be applied to data generated by any MC estimator. Our preliminary experiments demonstrated that StackMC improved the estimates of expectations when it was used to post-process samples produces by a “simple sampling” MC estimator. Here we substantially extend this earlier work. We provide an in-depth analysis of the StackMC algorithm, which we use to construct an improved version of the original algorithm, with lower estimation error. We then perform experiments of StackMC on several additional kinds of MC estimators, demonstrating improved performance when the samples are generated via importance sampling, Latin-hypercube sampling and quasi-Monte Carlo sampling. We also show how to extend StackMC to combine multiple fitting functions, and how to apply it to discrete input spaces x. |
Tasks | |
Published | 2016-06-07 |
URL | http://arxiv.org/abs/1606.02261v1 |
http://arxiv.org/pdf/1606.02261v1.pdf | |
PWC | https://paperswithcode.com/paper/reducing-the-error-of-monte-carlo-algorithms |
Repo | |
Framework | |
Efficient Convolutional Neural Network with Binary Quantization Layer
Title | Efficient Convolutional Neural Network with Binary Quantization Layer |
Authors | Mahdyar Ravanbakhsh, Hossein Mousavi, Moin Nabi, Lucio Marcenaro, Carlo Regazzoni |
Abstract | In this paper we introduce a novel method for segmentation that can benefit from general semantics of Convolutional Neural Network (CNN). Our segmentation proposes visually and semantically coherent image segments. We use binary encoding of CNN features to overcome the difficulty of the clustering on the high-dimensional CNN feature space. These binary encoding can be embedded into the CNN as an extra layer at the end of the network. This results in real-time segmentation. To the best of our knowledge our method is the first attempt on general semantic image segmentation using CNN. All the previous papers were limited to few number of category of the images (e.g. PASCAL VOC). Experiments show that our segmentation algorithm outperform the state-of-the-art non-semantic segmentation methods by a large margin. |
Tasks | Quantization, Semantic Segmentation |
Published | 2016-11-21 |
URL | http://arxiv.org/abs/1611.06764v1 |
http://arxiv.org/pdf/1611.06764v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-convolutional-neural-network-with |
Repo | |
Framework | |
The Price of Anarchy in Auctions
Title | The Price of Anarchy in Auctions |
Authors | Tim Roughgarden, Vasilis Syrgkanis, Eva Tardos |
Abstract | This survey outlines a general and modular theory for proving approximation guarantees for equilibria of auctions in complex settings. This theory complements traditional economic techniques, which generally focus on exact and optimal solutions and are accordingly limited to relatively stylized settings. We highlight three user-friendly analytical tools: smoothness-type inequalities, which immediately yield approximation guarantees for many auction formats of interest in the special case of complete information and deterministic strategies; extension theorems, which extend such guarantees to randomized strategies, no-regret learning outcomes, and incomplete-information settings; and composition theorems, which extend such guarantees from simpler to more complex auctions. Combining these tools yields tight worst-case approximation guarantees for the equilibria of many widely-used auction formats. |
Tasks | |
Published | 2016-07-26 |
URL | http://arxiv.org/abs/1607.07684v1 |
http://arxiv.org/pdf/1607.07684v1.pdf | |
PWC | https://paperswithcode.com/paper/the-price-of-anarchy-in-auctions |
Repo | |
Framework | |
A Stratified Analysis of Bayesian Optimization Methods
Title | A Stratified Analysis of Bayesian Optimization Methods |
Authors | Ian Dewancker, Michael McCourt, Scott Clark, Patrick Hayes, Alexandra Johnson, George Ke |
Abstract | Empirical analysis serves as an important complement to theoretical analysis for studying practical Bayesian optimization. Often empirical insights expose strengths and weaknesses inaccessible to theoretical analysis. We define two metrics for comparing the performance of Bayesian optimization methods and propose a ranking mechanism for summarizing performance within various genres or strata of test functions. These test functions serve to mimic the complexity of hyperparameter optimization problems, the most prominent application of Bayesian optimization, but with a closed form which allows for rapid evaluation and more predictable behavior. This offers a flexible and efficient way to investigate functions with specific properties of interest, such as oscillatory behavior or an optimum on the domain boundary. |
Tasks | Hyperparameter Optimization |
Published | 2016-03-31 |
URL | http://arxiv.org/abs/1603.09441v1 |
http://arxiv.org/pdf/1603.09441v1.pdf | |
PWC | https://paperswithcode.com/paper/a-stratified-analysis-of-bayesian |
Repo | |
Framework | |
Fitting Spectral Decay with the $k$-Support Norm
Title | Fitting Spectral Decay with the $k$-Support Norm |
Authors | Andrew M. McDonald, Massimiliano Pontil, Dimitris Stamos |
Abstract | The spectral $k$-support norm enjoys good estimation properties in low rank matrix learning problems, empirically outperforming the trace norm. Its unit ball is the convex hull of rank $k$ matrices with unit Frobenius norm. In this paper we generalize the norm to the spectral $(k,p)$-support norm, whose additional parameter $p$ can be used to tailor the norm to the decay of the spectrum of the underlying model. We characterize the unit ball and we explicitly compute the norm. We further provide a conditional gradient method to solve regularization problems with the norm, and we derive an efficient algorithm to compute the Euclidean projection on the unit ball in the case $p=\infty$. In numerical experiments, we show that allowing $p$ to vary significantly improves performance over the spectral $k$-support norm on various matrix completion benchmarks, and better captures the spectral decay of the underlying model. |
Tasks | Matrix Completion |
Published | 2016-01-04 |
URL | http://arxiv.org/abs/1601.00449v1 |
http://arxiv.org/pdf/1601.00449v1.pdf | |
PWC | https://paperswithcode.com/paper/fitting-spectral-decay-with-the-k-support |
Repo | |
Framework | |
A Communication-Efficient Parallel Algorithm for Decision Tree
Title | A Communication-Efficient Parallel Algorithm for Decision Tree |
Authors | Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu |
Abstract | Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs. In this paper, we propose a new algorithm, called \emph{Parallel Voting Decision Tree (PV-Tree)}, to tackle this challenge. After partitioning the training data onto a number of (e.g., $M$) machines, this algorithm performs both local voting and global voting in each iteration. For local voting, the top-$k$ attributes are selected from each machine according to its local data. Then, globally top-$2k$ attributes are determined by a majority voting among these local candidates. Finally, the full-grained histograms of the globally top-$2k$ attributes are collected from local machines in order to identify the best (most informative) attribute and its split point. PV-Tree can achieve a very low communication cost (independent of the total number of attributes) and thus can scale out very well. Furthermore, theoretical analysis shows that this algorithm can learn a near optimal decision tree, since it can find the best attribute with a large probability. Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the trade-off between accuracy and efficiency. |
Tasks | |
Published | 2016-11-04 |
URL | http://arxiv.org/abs/1611.01276v1 |
http://arxiv.org/pdf/1611.01276v1.pdf | |
PWC | https://paperswithcode.com/paper/a-communication-efficient-parallel-algorithm |
Repo | |
Framework | |