July 26, 2019

2028 words 10 mins read

Paper Group NANR 154

Paper Group NANR 154

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent. The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach. Automatic Threshold Detection for Data Selection in Machine Translation. A New Alternating Direction Method for Linear Programming. Adaptive Accelerated Gradient Converging Method under H"{o}lder …

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent

Title When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent
Authors Mert Gurbuzbalaban, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Vanli
Abstract The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/7275-when-cyclic-coordinate-descent-outperforms-randomized-coordinate-descent
PDF http://papers.nips.cc/paper/7275-when-cyclic-coordinate-descent-outperforms-randomized-coordinate-descent.pdf
PWC https://paperswithcode.com/paper/when-cyclic-coordinate-descent-outperforms
Repo
Framework

The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach

Title The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach
Authors John Duselis, Michael Hutt, Jeremy Gwinnup, James Davis, S, Joshua vick
Abstract
Tasks Image Captioning, Machine Translation, Multimodal Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4748/
PDF https://www.aclweb.org/anthology/W17-4748
PWC https://paperswithcode.com/paper/the-afrl-osu-wmt17-multimodal-translation
Repo
Framework

Automatic Threshold Detection for Data Selection in Machine Translation

Title Automatic Threshold Detection for Data Selection in Machine Translation
Authors Mirela-Stefania Duma, Wolfgang Menzel
Abstract
Tasks Domain Adaptation, Information Retrieval, Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4754/
PDF https://www.aclweb.org/anthology/W17-4754
PWC https://paperswithcode.com/paper/automatic-threshold-detection-for-data
Repo
Framework

A New Alternating Direction Method for Linear Programming

Title A New Alternating Direction Method for Linear Programming
Authors Sinong Wang, Ness Shroff
Abstract It is well known that, for a linear program (LP) with constraint matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, the Alternating Direction Method of Multiplier converges globally and linearly at a rate $O((\mathbf{A}_F^2+mn)\log(1/\epsilon))$. However, such a rate is related to the problem dimension and the algorithm exhibits a slow and fluctuating ``tail convergence’’ in practice. In this paper, we propose a new variable splitting method of LP and prove that our method has a convergence rate of $O(\mathbf{A}^2\log(1/\epsilon))$. The proof is based on simultaneously estimating the distance from a pair of primal dual iterates to the optimal primal and dual solution set by certain residuals. In practice, we result in a new first-order LP solver that can exploit both the sparsity and the specific structure of matrix $\mathbf{A}$ and a significant speedup for important problems such as basis pursuit, inverse covariance matrix estimation, L1 SVM and nonnegative matrix factorization problem compared with current fastest LP solvers. |
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6746-a-new-alternating-direction-method-for-linear-programming
PDF http://papers.nips.cc/paper/6746-a-new-alternating-direction-method-for-linear-programming.pdf
PWC https://paperswithcode.com/paper/a-new-alternating-direction-method-for-linear
Repo
Framework

Adaptive Accelerated Gradient Converging Method under H"{o}lderian Error Bound Condition

Title Adaptive Accelerated Gradient Converging Method under H"{o}lderian Error Bound Condition
Authors Mingrui Liu, Tianbao Yang
Abstract Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC). However, the faster convergence of restarting APG method relies on the potentially unknown constant in QGC to appropriately restart APG, which restricts its applicability. We address this issue by developing a novel adaptive gradient converging methods, i.e., leveraging the magnitude of proximal gradient as a criterion for restart and termination. Our analysis extends to a much more general condition beyond the QGC, namely the H"{o}lderian error bound (HEB) condition. {\it The key technique} for our development is a novel synthesis of {\it adaptive regularization and a conditional restarting scheme}, which extends previous work focusing on strongly convex problems to a much broader family of problems. Furthermore, we demonstrate that our results have important implication and applications in machine learning: (i) if the objective function is coercive and semi-algebraic, PG’s convergence speed is essentially $o(\frac{1}{t})$, where $t$ is the total number of iterations; (ii) if the objective function consists of an $\ell_1$, $\ell_\infty$, $\ell_{1,\infty}$, or huber norm regularization and a convex smooth piecewise quadratic loss (e.g., square loss, squared hinge loss and huber loss), the proposed algorithm is parameter-free and enjoys a {\it faster linear convergence} than PG without any other assumptions (e.g., restricted eigen-value condition). It is notable that our linear convergence results for the aforementioned problems are global instead of local. To the best of our knowledge, these improved results are first shown in this work.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6903-adaptive-accelerated-gradient-converging-method-under-holderian-error-bound-condition
PDF http://papers.nips.cc/paper/6903-adaptive-accelerated-gradient-converging-method-under-holderian-error-bound-condition.pdf
PWC https://paperswithcode.com/paper/adaptive-accelerated-gradient-converging
Repo
Framework

Creating register sub-corpora for the Finnish Internet Parsebank

Title Creating register sub-corpora for the Finnish Internet Parsebank
Authors Veronika Laippala, Juhani Luotolahti, Aki-Juhani Kyr{"o}l{"a}inen, Tapio Salakoski, Filip Ginter
Abstract
Tasks Information Retrieval
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0218/
PDF https://www.aclweb.org/anthology/W17-0218
PWC https://paperswithcode.com/paper/creating-register-sub-corpora-for-the-finnish
Repo
Framework

A Two-Stage Parsing Method for Text-Level Discourse Analysis

Title A Two-Stage Parsing Method for Text-Level Discourse Analysis
Authors Yizhong Wang, Sujian Li, Houfeng Wang
Abstract Previous work introduced transition-based algorithms to form a unified architecture of parsing rhetorical structures (including span, nuclearity and relation), but did not achieve satisfactory performance. In this paper, we propose that transition-based model is more appropriate for parsing the naked discourse tree (i.e., identifying span and nuclearity) due to data sparsity. At the same time, we argue that relation labeling can benefit from naked tree structure and should be treated elaborately with consideration of three kinds of relations including within-sentence, across-sentence and across-paragraph relations. Thus, we design a pipelined two-stage parsing method for generating an RST tree from text. Experimental results show that our method achieves state-of-the-art performance, especially on span and nuclearity identification.
Tasks Dependency Parsing, Document Summarization, Sentiment Analysis
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2029/
PDF https://www.aclweb.org/anthology/P17-2029
PWC https://paperswithcode.com/paper/a-two-stage-parsing-method-for-text-level
Repo
Framework

Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers

Title Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers
Authors Julien Tourille, Olivier Ferret, Aur{'e}lie N{'e}v{'e}ol, Xavier Tannier
Abstract We present a neural architecture for containment relation identification between medical events and/or temporal expressions. We experiment on a corpus of de-identified clinical notes in English from the Mayo Clinic, namely the THYME corpus. Our model achieves an F-measure of 0.613 and outperforms the best result reported on this corpus to date.
Tasks Relation Extraction, Temporal Information Extraction
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2035/
PDF https://www.aclweb.org/anthology/P17-2035
PWC https://paperswithcode.com/paper/neural-architecture-for-temporal-relation
Repo
Framework

Building a Better Bitext for Structurally Different Languages through Self-training

Title Building a Better Bitext for Structurally Different Languages through Self-training
Authors Jungyeul Park, Lo{"\i}c Dugast, Jeen-Pyo Hong, Chang-Uk Shin, Jeong-Won Cha
Abstract We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment.
Tasks Information Retrieval, Machine Translation
Published 2017-11-01
URL https://www.aclweb.org/anthology/W17-5601/
PDF https://www.aclweb.org/anthology/W17-5601
PWC https://paperswithcode.com/paper/building-a-better-bitext-for-structurally
Repo
Framework

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation

Title An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation
Authors David Anderson, Ming Gu
Abstract Low-rank matrix approximation is a fundamental tool in data analysis for processing large datasets, reducing noise, and finding important signals. In this work, we present a novel truncated LU factorization called Spectrum-Revealing LU (SRLU) for effective low-rank matrix approximation, and develop a fast algorithm to compute an SRLU factorization. We provide both matrix and singular value approximation error bounds for the SRLU approximation computed by our algorithm. Our analysis suggests that SRLU is competitive with the best low-rank matrix approximation methods, deterministic or randomized, in both computational complexity and approximation quality. Numeric experiments illustrate that SRLU preserves sparsity, highlights important data features and variables, can be efficiently updated, and calculates data approximations nearly as accurately as the best possible. To the best of our knowledge this is the first practical variant of the LU factorization for effective and efficient low-rank matrix approximation.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=833
PDF http://proceedings.mlr.press/v70/anderson17a/anderson17a.pdf
PWC https://paperswithcode.com/paper/an-efficient-sparsity-preserving-online
Repo
Framework

Core Arguments in Universal Dependencies

Title Core Arguments in Universal Dependencies
Authors Daniel Zeman
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-6532/
PDF https://www.aclweb.org/anthology/W17-6532
PWC https://paperswithcode.com/paper/core-arguments-in-universal-dependencies
Repo
Framework

Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology

Title Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Authors
Abstract
Tasks
Published 2017-10-01
URL https://www.aclweb.org/anthology/W17-6600/
PDF https://www.aclweb.org/anthology/W17-6600
PWC https://paperswithcode.com/paper/proceedings-of-the-11th-brazilian-symposium
Repo
Framework

Subset Selection and Summarization in Sequential Data

Title Subset Selection and Summarization in Sequential Data
Authors Ehsan Elhamifar, M. Clara De Paolis Kaluza
Abstract Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relationships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of representatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives.
Tasks Time Series, Video Summarization
Published 2017-12-01
URL http://papers.nips.cc/paper/6704-subset-selection-and-summarization-in-sequential-data
PDF http://papers.nips.cc/paper/6704-subset-selection-and-summarization-in-sequential-data.pdf
PWC https://paperswithcode.com/paper/subset-selection-and-summarization-in
Repo
Framework

Algorithms for $\ell_p$ Low-Rank Approximation

Title Algorithms for $\ell_p$ Low-Rank Approximation
Authors Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff
Abstract We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem. We obtain the first provably good approximation algorithms for this robust version of low-rank approximation that work for every value of $p$. Our algorithms are simple, easy to implement, work well in practice, and illustrate interesting tradeoffs between the approximation quality, the running time, and the rank of the approximating matrix.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=710
PDF http://proceedings.mlr.press/v70/chierichetti17a/chierichetti17a.pdf
PWC https://paperswithcode.com/paper/algorithms-for-ell_p-low-rank-approximation
Repo
Framework

Relative Fisher Information and Natural Gradient for Learning Large Modular Models

Title Relative Fisher Information and Natural Gradient for Learning Large Modular Models
Authors Ke Sun, Frank Nielsen
Abstract Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks. However related analysis becomes more and more difficult as the learner’s structure turns large and complex. This paper makes a preliminary step towards a new direction. We extract a local component from a large neural system, and define its relative Fisher information metric that describes accurately this small component, and is invariant to the other parts of the system. This concept is important because the geometry structure is much simplified and it can be easily applied to guide the learning of neural networks. We provide an analysis on a list of commonly used components, and demonstrate how to use this concept to further improve optimization.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=458
PDF http://proceedings.mlr.press/v70/sun17b/sun17b.pdf
PWC https://paperswithcode.com/paper/relative-fisher-information-and-natural
Repo
Framework
comments powered by Disqus