July 26, 2019

2028 words 10 mins read

Paper Group NANR 154

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent. The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach. Automatic Threshold Detection for Data Selection in Machine Translation. A New Alternating Direction Method for Linear Programming. Adaptive Accelerated Gradient Converging Method under H"{o}lder …

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent


Title	When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent
Authors	Mert Gurbuzbalaban, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Vanli
Abstract	The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/7275-when-cyclic-coordinate-descent-outperforms-randomized-coordinate-descent
PDF	http://papers.nips.cc/paper/7275-when-cyclic-coordinate-descent-outperforms-randomized-coordinate-descent.pdf
PWC	https://paperswithcode.com/paper/when-cyclic-coordinate-descent-outperforms
Repo
Framework

The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach


Title	The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach
Authors	John Duselis, Michael Hutt, Jeremy Gwinnup, James Davis, S, Joshua vick
Abstract
Tasks	Image Captioning, Machine Translation, Multimodal Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4748/
PDF	https://www.aclweb.org/anthology/W17-4748
PWC	https://paperswithcode.com/paper/the-afrl-osu-wmt17-multimodal-translation
Repo
Framework

Automatic Threshold Detection for Data Selection in Machine Translation


Title	Automatic Threshold Detection for Data Selection in Machine Translation
Authors	Mirela-Stefania Duma, Wolfgang Menzel
Abstract
Tasks	Domain Adaptation, Information Retrieval, Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4754/
PDF	https://www.aclweb.org/anthology/W17-4754
PWC	https://paperswithcode.com/paper/automatic-threshold-detection-for-data
Repo
Framework

A New Alternating Direction Method for Linear Programming


Title	A New Alternating Direction Method for Linear Programming
Authors	Sinong Wang, Ness Shroff
Abstract	It is well known that, for a linear program (LP) with constraint matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, the Alternating Direction Method of Multiplier converges globally and linearly at a rate $O((\mathbf{A}_F^2+mn)\log(1/\epsilon))$. However, such a rate is related to the problem dimension and the algorithm exhibits a slow and fluctuating ``tail convergence’’ in practice. In this paper, we propose a new variable splitting method of LP and prove that our method has a convergence rate of $O(\mathbf{A}^2\log(1/\epsilon))$. The proof is based on simultaneously estimating the distance from a pair of primal dual iterates to the optimal primal and dual solution set by certain residuals. In practice, we result in a new first-order LP solver that can exploit both the sparsity and the specific structure of matrix $\mathbf{A}$ and a significant speedup for important problems such as basis pursuit, inverse covariance matrix estimation, L1 SVM and nonnegative matrix factorization problem compared with current fastest LP solvers. \|
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6746-a-new-alternating-direction-method-for-linear-programming
PDF	http://papers.nips.cc/paper/6746-a-new-alternating-direction-method-for-linear-programming.pdf
PWC	https://paperswithcode.com/paper/a-new-alternating-direction-method-for-linear
Repo
Framework

Adaptive Accelerated Gradient Converging Method under H"{o}lderian Error Bound Condition


Title	Adaptive Accelerated Gradient Converging Method under H"{o}lderian Error Bound Condition
Authors	Mingrui Liu, Tianbao Yang
Abstract	Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC). However, the faster convergence of restarting APG method relies on the potentially unknown constant in QGC to appropriately restart APG, which restricts its applicability. We address this issue by developing a novel adaptive gradient converging methods, i.e., leveraging the magnitude of proximal gradient as a criterion for restart and termination. Our analysis extends to a much more general condition beyond the QGC, namely the H"{o}lderian error bound (HEB) condition. {\it The key technique} for our development is a novel synthesis of {\it adaptive regularization and a conditional restarting scheme}, which extends previous work focusing on strongly convex problems to a much broader family of problems. Furthermore, we demonstrate that our results have important implication and applications in machine learning: (i) if the objective function is coercive and semi-algebraic, PG’s convergence speed is essentially $o(\frac{1}{t})$, where $t$ is the total number of iterations; (ii) if the objective function consists of an $\ell_1$, $\ell_\infty$, $\ell_{1,\infty}$, or huber norm regularization and a convex smooth piecewise quadratic loss (e.g., square loss, squared hinge loss and huber loss), the proposed algorithm is parameter-free and enjoys a {\it faster linear convergence} than PG without any other assumptions (e.g., restricted eigen-value condition). It is notable that our linear convergence results for the aforementioned problems are global instead of local. To the best of our knowledge, these improved results are first shown in this work.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6903-adaptive-accelerated-gradient-converging-method-under-holderian-error-bound-condition
PDF	http://papers.nips.cc/paper/6903-adaptive-accelerated-gradient-converging-method-under-holderian-error-bound-condition.pdf
PWC	https://paperswithcode.com/paper/adaptive-accelerated-gradient-converging
Repo
Framework

Creating register sub-corpora for the Finnish Internet Parsebank


Title	Creating register sub-corpora for the Finnish Internet Parsebank
Authors	Veronika Laippala, Juhani Luotolahti, Aki-Juhani Kyr{"o}l{"a}inen, Tapio Salakoski, Filip Ginter
Abstract
Tasks	Information Retrieval
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0218/
PDF	https://www.aclweb.org/anthology/W17-0218
PWC	https://paperswithcode.com/paper/creating-register-sub-corpora-for-the-finnish
Repo
Framework

A Two-Stage Parsing Method for Text-Level Discourse Analysis


Title	A Two-Stage Parsing Method for Text-Level Discourse Analysis
Authors	Yizhong Wang, Sujian Li, Houfeng Wang
Abstract	Previous work introduced transition-based algorithms to form a unified architecture of parsing rhetorical structures (including span, nuclearity and relation), but did not achieve satisfactory performance. In this paper, we propose that transition-based model is more appropriate for parsing the naked discourse tree (i.e., identifying span and nuclearity) due to data sparsity. At the same time, we argue that relation labeling can benefit from naked tree structure and should be treated elaborately with consideration of three kinds of relations including within-sentence, across-sentence and across-paragraph relations. Thus, we design a pipelined two-stage parsing method for generating an RST tree from text. Experimental results show that our method achieves state-of-the-art performance, especially on span and nuclearity identification.
Tasks	Dependency Parsing, Document Summarization, Sentiment Analysis
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2029/
PDF	https://www.aclweb.org/anthology/P17-2029
PWC	https://paperswithcode.com/paper/a-two-stage-parsing-method-for-text-level
Repo
Framework

Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers


Title	Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers
Authors	Julien Tourille, Olivier Ferret, Aur{'e}lie N{'e}v{'e}ol, Xavier Tannier
Abstract	We present a neural architecture for containment relation identification between medical events and/or temporal expressions. We experiment on a corpus of de-identified clinical notes in English from the Mayo Clinic, namely the THYME corpus. Our model achieves an F-measure of 0.613 and outperforms the best result reported on this corpus to date.
Tasks	Relation Extraction, Temporal Information Extraction
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2035/
PDF	https://www.aclweb.org/anthology/P17-2035
PWC	https://paperswithcode.com/paper/neural-architecture-for-temporal-relation
Repo
Framework

Building a Better Bitext for Structurally Different Languages through Self-training


Title	Building a Better Bitext for Structurally Different Languages through Self-training
Authors	Jungyeul Park, Lo{"\i}c Dugast, Jeen-Pyo Hong, Chang-Uk Shin, Jeong-Won Cha
Abstract	We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment.
Tasks	Information Retrieval, Machine Translation
Published	2017-11-01
URL	https://www.aclweb.org/anthology/W17-5601/
PDF	https://www.aclweb.org/anthology/W17-5601
PWC	https://paperswithcode.com/paper/building-a-better-bitext-for-structurally
Repo
Framework

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation


Title	An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation
Authors	David Anderson, Ming Gu
Abstract	Low-rank matrix approximation is a fundamental tool in data analysis for processing large datasets, reducing noise, and finding important signals. In this work, we present a novel truncated LU factorization called Spectrum-Revealing LU (SRLU) for effective low-rank matrix approximation, and develop a fast algorithm to compute an SRLU factorization. We provide both matrix and singular value approximation error bounds for the SRLU approximation computed by our algorithm. Our analysis suggests that SRLU is competitive with the best low-rank matrix approximation methods, deterministic or randomized, in both computational complexity and approximation quality. Numeric experiments illustrate that SRLU preserves sparsity, highlights important data features and variables, can be efficiently updated, and calculates data approximations nearly as accurately as the best possible. To the best of our knowledge this is the first practical variant of the LU factorization for effective and efficient low-rank matrix approximation.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=833
PDF	http://proceedings.mlr.press/v70/anderson17a/anderson17a.pdf
PWC	https://paperswithcode.com/paper/an-efficient-sparsity-preserving-online
Repo
Framework

Core Arguments in Universal Dependencies


Title	Core Arguments in Universal Dependencies
Authors	Daniel Zeman
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-6532/
PDF	https://www.aclweb.org/anthology/W17-6532
PWC	https://paperswithcode.com/paper/core-arguments-in-universal-dependencies
Repo
Framework

Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology


Title	Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Authors
Abstract
Tasks
Published	2017-10-01
URL	https://www.aclweb.org/anthology/W17-6600/
PDF	https://www.aclweb.org/anthology/W17-6600
PWC	https://paperswithcode.com/paper/proceedings-of-the-11th-brazilian-symposium
Repo
Framework

Subset Selection and Summarization in Sequential Data


Title	Subset Selection and Summarization in Sequential Data
Authors	Ehsan Elhamifar, M. Clara De Paolis Kaluza
Abstract	Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relationships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of representatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives.
Tasks	Time Series, Video Summarization
Published	2017-12-01
URL	http://papers.nips.cc/paper/6704-subset-selection-and-summarization-in-sequential-data
PDF	http://papers.nips.cc/paper/6704-subset-selection-and-summarization-in-sequential-data.pdf
PWC	https://paperswithcode.com/paper/subset-selection-and-summarization-in
Repo
Framework

Algorithms for $\ell_p$ Low-Rank Approximation


Title	Algorithms for $\ell_p$ Low-Rank Approximation
Authors	Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff
Abstract	We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem. We obtain the first provably good approximation algorithms for this robust version of low-rank approximation that work for every value of $p$. Our algorithms are simple, easy to implement, work well in practice, and illustrate interesting tradeoffs between the approximation quality, the running time, and the rank of the approximating matrix.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=710
PDF	http://proceedings.mlr.press/v70/chierichetti17a/chierichetti17a.pdf
PWC	https://paperswithcode.com/paper/algorithms-for-ell_p-low-rank-approximation
Repo
Framework

Relative Fisher Information and Natural Gradient for Learning Large Modular Models


Title	Relative Fisher Information and Natural Gradient for Learning Large Modular Models
Authors	Ke Sun, Frank Nielsen
Abstract	Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks. However related analysis becomes more and more difficult as the learner’s structure turns large and complex. This paper makes a preliminary step towards a new direction. We extract a local component from a large neural system, and define its relative Fisher information metric that describes accurately this small component, and is invariant to the other parts of the system. This concept is important because the geometry structure is much simplified and it can be easily applied to guide the learning of neural networks. We provide an analysis on a list of commonly used components, and demonstrate how to use this concept to further improve optimization.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=458
PDF	http://proceedings.mlr.press/v70/sun17b/sun17b.pdf
PWC	https://paperswithcode.com/paper/relative-fisher-information-and-natural
Repo
Framework