Paper Group NANR 154
When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent. The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach. Automatic Threshold Detection for Data Selection in Machine Translation. A New Alternating Direction Method for Linear Programming. Adaptive Accelerated Gradient Converging Method under H"{o}lder …
When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent
Title | When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent |
Authors | Mert Gurbuzbalaban, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Vanli |
Abstract | The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7275-when-cyclic-coordinate-descent-outperforms-randomized-coordinate-descent |
http://papers.nips.cc/paper/7275-when-cyclic-coordinate-descent-outperforms-randomized-coordinate-descent.pdf | |
PWC | https://paperswithcode.com/paper/when-cyclic-coordinate-descent-outperforms |
Repo | |
Framework | |
The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach
Title | The AFRL-OSU WMT17 Multimodal Translation System: An Image Processing Approach |
Authors | John Duselis, Michael Hutt, Jeremy Gwinnup, James Davis, S, Joshua vick |
Abstract | |
Tasks | Image Captioning, Machine Translation, Multimodal Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4748/ |
https://www.aclweb.org/anthology/W17-4748 | |
PWC | https://paperswithcode.com/paper/the-afrl-osu-wmt17-multimodal-translation |
Repo | |
Framework | |
Automatic Threshold Detection for Data Selection in Machine Translation
Title | Automatic Threshold Detection for Data Selection in Machine Translation |
Authors | Mirela-Stefania Duma, Wolfgang Menzel |
Abstract | |
Tasks | Domain Adaptation, Information Retrieval, Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4754/ |
https://www.aclweb.org/anthology/W17-4754 | |
PWC | https://paperswithcode.com/paper/automatic-threshold-detection-for-data |
Repo | |
Framework | |
A New Alternating Direction Method for Linear Programming
Title | A New Alternating Direction Method for Linear Programming |
Authors | Sinong Wang, Ness Shroff |
Abstract | It is well known that, for a linear program (LP) with constraint matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, the Alternating Direction Method of Multiplier converges globally and linearly at a rate $O((\mathbf{A}_F^2+mn)\log(1/\epsilon))$. However, such a rate is related to the problem dimension and the algorithm exhibits a slow and fluctuating ``tail convergence’’ in practice. In this paper, we propose a new variable splitting method of LP and prove that our method has a convergence rate of $O(\mathbf{A}^2\log(1/\epsilon))$. The proof is based on simultaneously estimating the distance from a pair of primal dual iterates to the optimal primal and dual solution set by certain residuals. In practice, we result in a new first-order LP solver that can exploit both the sparsity and the specific structure of matrix $\mathbf{A}$ and a significant speedup for important problems such as basis pursuit, inverse covariance matrix estimation, L1 SVM and nonnegative matrix factorization problem compared with current fastest LP solvers. | |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6746-a-new-alternating-direction-method-for-linear-programming |
http://papers.nips.cc/paper/6746-a-new-alternating-direction-method-for-linear-programming.pdf | |
PWC | https://paperswithcode.com/paper/a-new-alternating-direction-method-for-linear |
Repo | |
Framework | |
Adaptive Accelerated Gradient Converging Method under H"{o}lderian Error Bound Condition
Title | Adaptive Accelerated Gradient Converging Method under H"{o}lderian Error Bound Condition |
Authors | Mingrui Liu, Tianbao Yang |
Abstract | Recent studies have shown that proximal gradient (PG) method and accelerated gradient method (APG) with restarting can enjoy a linear convergence under a weaker condition than strong convexity, namely a quadratic growth condition (QGC). However, the faster convergence of restarting APG method relies on the potentially unknown constant in QGC to appropriately restart APG, which restricts its applicability. We address this issue by developing a novel adaptive gradient converging methods, i.e., leveraging the magnitude of proximal gradient as a criterion for restart and termination. Our analysis extends to a much more general condition beyond the QGC, namely the H"{o}lderian error bound (HEB) condition. {\it The key technique} for our development is a novel synthesis of {\it adaptive regularization and a conditional restarting scheme}, which extends previous work focusing on strongly convex problems to a much broader family of problems. Furthermore, we demonstrate that our results have important implication and applications in machine learning: (i) if the objective function is coercive and semi-algebraic, PG’s convergence speed is essentially $o(\frac{1}{t})$, where $t$ is the total number of iterations; (ii) if the objective function consists of an $\ell_1$, $\ell_\infty$, $\ell_{1,\infty}$, or huber norm regularization and a convex smooth piecewise quadratic loss (e.g., square loss, squared hinge loss and huber loss), the proposed algorithm is parameter-free and enjoys a {\it faster linear convergence} than PG without any other assumptions (e.g., restricted eigen-value condition). It is notable that our linear convergence results for the aforementioned problems are global instead of local. To the best of our knowledge, these improved results are first shown in this work. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6903-adaptive-accelerated-gradient-converging-method-under-holderian-error-bound-condition |
http://papers.nips.cc/paper/6903-adaptive-accelerated-gradient-converging-method-under-holderian-error-bound-condition.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-accelerated-gradient-converging |
Repo | |
Framework | |
Creating register sub-corpora for the Finnish Internet Parsebank
Title | Creating register sub-corpora for the Finnish Internet Parsebank |
Authors | Veronika Laippala, Juhani Luotolahti, Aki-Juhani Kyr{"o}l{"a}inen, Tapio Salakoski, Filip Ginter |
Abstract | |
Tasks | Information Retrieval |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0218/ |
https://www.aclweb.org/anthology/W17-0218 | |
PWC | https://paperswithcode.com/paper/creating-register-sub-corpora-for-the-finnish |
Repo | |
Framework | |
A Two-Stage Parsing Method for Text-Level Discourse Analysis
Title | A Two-Stage Parsing Method for Text-Level Discourse Analysis |
Authors | Yizhong Wang, Sujian Li, Houfeng Wang |
Abstract | Previous work introduced transition-based algorithms to form a unified architecture of parsing rhetorical structures (including span, nuclearity and relation), but did not achieve satisfactory performance. In this paper, we propose that transition-based model is more appropriate for parsing the naked discourse tree (i.e., identifying span and nuclearity) due to data sparsity. At the same time, we argue that relation labeling can benefit from naked tree structure and should be treated elaborately with consideration of three kinds of relations including within-sentence, across-sentence and across-paragraph relations. Thus, we design a pipelined two-stage parsing method for generating an RST tree from text. Experimental results show that our method achieves state-of-the-art performance, especially on span and nuclearity identification. |
Tasks | Dependency Parsing, Document Summarization, Sentiment Analysis |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-2029/ |
https://www.aclweb.org/anthology/P17-2029 | |
PWC | https://paperswithcode.com/paper/a-two-stage-parsing-method-for-text-level |
Repo | |
Framework | |
Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers
Title | Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers |
Authors | Julien Tourille, Olivier Ferret, Aur{'e}lie N{'e}v{'e}ol, Xavier Tannier |
Abstract | We present a neural architecture for containment relation identification between medical events and/or temporal expressions. We experiment on a corpus of de-identified clinical notes in English from the Mayo Clinic, namely the THYME corpus. Our model achieves an F-measure of 0.613 and outperforms the best result reported on this corpus to date. |
Tasks | Relation Extraction, Temporal Information Extraction |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-2035/ |
https://www.aclweb.org/anthology/P17-2035 | |
PWC | https://paperswithcode.com/paper/neural-architecture-for-temporal-relation |
Repo | |
Framework | |
Building a Better Bitext for Structurally Different Languages through Self-training
Title | Building a Better Bitext for Structurally Different Languages through Self-training |
Authors | Jungyeul Park, Lo{"\i}c Dugast, Jeen-Pyo Hong, Chang-Uk Shin, Jeong-Won Cha |
Abstract | We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment. |
Tasks | Information Retrieval, Machine Translation |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/W17-5601/ |
https://www.aclweb.org/anthology/W17-5601 | |
PWC | https://paperswithcode.com/paper/building-a-better-bitext-for-structurally |
Repo | |
Framework | |
An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation
Title | An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation |
Authors | David Anderson, Ming Gu |
Abstract | Low-rank matrix approximation is a fundamental tool in data analysis for processing large datasets, reducing noise, and finding important signals. In this work, we present a novel truncated LU factorization called Spectrum-Revealing LU (SRLU) for effective low-rank matrix approximation, and develop a fast algorithm to compute an SRLU factorization. We provide both matrix and singular value approximation error bounds for the SRLU approximation computed by our algorithm. Our analysis suggests that SRLU is competitive with the best low-rank matrix approximation methods, deterministic or randomized, in both computational complexity and approximation quality. Numeric experiments illustrate that SRLU preserves sparsity, highlights important data features and variables, can be efficiently updated, and calculates data approximations nearly as accurately as the best possible. To the best of our knowledge this is the first practical variant of the LU factorization for effective and efficient low-rank matrix approximation. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=833 |
http://proceedings.mlr.press/v70/anderson17a/anderson17a.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-sparsity-preserving-online |
Repo | |
Framework | |
Core Arguments in Universal Dependencies
Title | Core Arguments in Universal Dependencies |
Authors | Daniel Zeman |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-6532/ |
https://www.aclweb.org/anthology/W17-6532 | |
PWC | https://paperswithcode.com/paper/core-arguments-in-universal-dependencies |
Repo | |
Framework | |
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Title | Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology |
Authors | |
Abstract | |
Tasks | |
Published | 2017-10-01 |
URL | https://www.aclweb.org/anthology/W17-6600/ |
https://www.aclweb.org/anthology/W17-6600 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-11th-brazilian-symposium |
Repo | |
Framework | |
Subset Selection and Summarization in Sequential Data
Title | Subset Selection and Summarization in Sequential Data |
Authors | Ehsan Elhamifar, M. Clara De Paolis Kaluza |
Abstract | Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relationships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of representatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives. |
Tasks | Time Series, Video Summarization |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6704-subset-selection-and-summarization-in-sequential-data |
http://papers.nips.cc/paper/6704-subset-selection-and-summarization-in-sequential-data.pdf | |
PWC | https://paperswithcode.com/paper/subset-selection-and-summarization-in |
Repo | |
Framework | |
Algorithms for $\ell_p$ Low-Rank Approximation
Title | Algorithms for $\ell_p$ Low-Rank Approximation |
Authors | Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff |
Abstract | We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem. We obtain the first provably good approximation algorithms for this robust version of low-rank approximation that work for every value of $p$. Our algorithms are simple, easy to implement, work well in practice, and illustrate interesting tradeoffs between the approximation quality, the running time, and the rank of the approximating matrix. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=710 |
http://proceedings.mlr.press/v70/chierichetti17a/chierichetti17a.pdf | |
PWC | https://paperswithcode.com/paper/algorithms-for-ell_p-low-rank-approximation |
Repo | |
Framework | |
Relative Fisher Information and Natural Gradient for Learning Large Modular Models
Title | Relative Fisher Information and Natural Gradient for Learning Large Modular Models |
Authors | Ke Sun, Frank Nielsen |
Abstract | Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks. However related analysis becomes more and more difficult as the learner’s structure turns large and complex. This paper makes a preliminary step towards a new direction. We extract a local component from a large neural system, and define its relative Fisher information metric that describes accurately this small component, and is invariant to the other parts of the system. This concept is important because the geometry structure is much simplified and it can be easily applied to guide the learning of neural networks. We provide an analysis on a list of commonly used components, and demonstrate how to use this concept to further improve optimization. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=458 |
http://proceedings.mlr.press/v70/sun17b/sun17b.pdf | |
PWC | https://paperswithcode.com/paper/relative-fisher-information-and-natural |
Repo | |
Framework | |