Paper Group NANR 265
Training Hybrid Language Models by Marginalizing over Segmentations. Neural Networks with Cheap Differential Operators. Efficient Dictionary Learning with Gradient Descent. Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation. Transformer-XL: Language Modeling with Longer-Term Dependency. A Refined …
Training Hybrid Language Models by Marginalizing over Segmentations
Title | Training Hybrid Language Models by Marginalizing over Segmentations |
Authors | Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin, |
Abstract | In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words. Using such models, multiple potential segmentations usually exist for a given string, for example one using words and one using characters only. Thus, the probability of a string is the sum of the probabilities of all the possible segmentations. Here, we show how it is possible to marginalize over the segmentations efficiently, in order to compute the true probability of a sequence. We apply our technique on three datasets, comprising seven languages, showing improvements over a strong character level language model. |
Tasks | Language Modelling |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1143/ |
https://www.aclweb.org/anthology/P19-1143 | |
PWC | https://paperswithcode.com/paper/training-hybrid-language-models-by |
Repo | |
Framework | |
Neural Networks with Cheap Differential Operators
Title | Neural Networks with Cheap Differential Operators |
Authors | Tian Qi Chen, David K. Duvenaud |
Abstract | Gradients of neural networks can be computed efficiently for any architecture, but some applications require computing differential operators with higher time complexity. We describe a family of neural network architectures that allow easy access to a family of differential operators involving \emph{dimension-wise derivatives}, and we show how to modify the backward computation graph to compute them efficiently. We demonstrate the use of these operators for solving root-finding subproblems in implicit ODE solvers, exact density evaluation for continuous normalizing flows, and evaluating the Fokker-Planck equation for training stochastic differential equation models. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9187-neural-networks-with-cheap-differential-operators |
http://papers.nips.cc/paper/9187-neural-networks-with-cheap-differential-operators.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-with-cheap-differential |
Repo | |
Framework | |
Efficient Dictionary Learning with Gradient Descent
Title | Efficient Dictionary Learning with Gradient Descent |
Authors | Dar Gilboa, Sam Buchanan, John Wright |
Abstract | Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem – complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well. |
Tasks | Dictionary Learning |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HyxlHsActm |
https://openreview.net/pdf?id=HyxlHsActm | |
PWC | https://paperswithcode.com/paper/efficient-dictionary-learning-with-gradient |
Repo | |
Framework | |
Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation
Title | Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation |
Authors | Po-Yi Chen, Alexander H. Liu, Yen-Cheng Liu, Yu-Chiang Frank Wang |
Abstract | Monocular depth estimation is a challenging task in scene understanding, with the goal to acquire the geometric properties of 3D space from 2D images. Due to the lack of RGB-depth image pairs, unsupervised learning methods aim at deriving depth information with alternative supervision such as stereo pairs. However, most existing works fail to model the geometric structure of objects, which generally results from considering pixel-level objective functions during training. In this paper, we propose SceneNet to overcome this limitation with the aid of semantic understanding from segmentation. Moreover, our proposed model is able to perform region-aware depth estimation by enforcing semantics consistency between stereo pairs. In our experiments, we qualitatively and quantitatively verify the effectiveness and robustness of our model, which produces favorable results against the state-of-the-art approaches do. |
Tasks | Depth Estimation, Monocular Depth Estimation, Scene Understanding |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Towards_Scene_Understanding_Unsupervised_Monocular_Depth_Estimation_With_Semantic-Aware_Representation_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Towards_Scene_Understanding_Unsupervised_Monocular_Depth_Estimation_With_Semantic-Aware_Representation_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/towards-scene-understanding-unsupervised |
Repo | |
Framework | |
Transformer-XL: Language Modeling with Longer-Term Dependency
Title | Transformer-XL: Language Modeling with Longer-Term Dependency |
Authors | Zihang Dai*, Zhilin Yang*, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov |
Abstract | We propose a novel neural architecture, Transformer-XL, for modeling longer-term dependency. To address the limitation of fixed-length contexts, we introduce a notion of recurrence by reusing the representations from the history. Empirically, we show state-of-the-art (SoTA) results on both word-level and character-level language modeling datasets, including WikiText-103, One Billion Word, Penn Treebank, and enwiki8. Notably, we improve the SoTA results from 1.06 to 0.99 in bpc on enwiki8, from 33.0 to 18.9 in perplexity on WikiText-103, and from 28.0 to 23.5 in perplexity on One Billion Word. Performance improves when the attention length increases during evaluation, and our best model attends to up to 1,600 words and 3,800 characters. To quantify the effective length of dependency, we devise a new metric and show that on WikiText-103 Transformer-XL manages to model dependency that is about 80% longer than recurrent networks and 450% longer than Transformer. Moreover, Transformer-XL is up to 1,800+ times faster than vanilla Transformer during evaluation. |
Tasks | Language Modelling |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HJePno0cYm |
https://openreview.net/pdf?id=HJePno0cYm | |
PWC | https://paperswithcode.com/paper/transformer-xl-language-modeling-with-longer |
Repo | |
Framework | |
A Refined Margin Distribution Analysis for Forest Representation Learning
Title | A Refined Margin Distribution Analysis for Forest Representation Learning |
Authors | Shen-Huan Lyu, Liang Yang, Zhi-Hua Zhou |
Abstract | In this paper, we formulate the forest representation learning approach called \textsc{CasDF} as an additive model which boosts the augmented feature instead of the prediction. We substantially improve the upper bound of the generalization gap from $\mathcal{O}(\sqrt{\ln m/m})$ to $\mathcal{O}(\ln m/m)$, while the margin ratio of the margin standard deviation to the margin mean is sufficiently small. This tighter upper bound inspires us to optimize the ratio. Therefore, we design a margin distribution reweighting approach for deep forest to achieve a small margin ratio by boosting the augmented feature. Experiments confirm the correlation between the margin distribution and generalization performance. We remark that this study offers a novel understanding of \textsc{CasDF} from the perspective of the margin theory and further guides the layer-by-layer forest representation learning. |
Tasks | Representation Learning |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8791-a-refined-margin-distribution-analysis-for-forest-representation-learning |
http://papers.nips.cc/paper/8791-a-refined-margin-distribution-analysis-for-forest-representation-learning.pdf | |
PWC | https://paperswithcode.com/paper/a-refined-margin-distribution-analysis-for |
Repo | |
Framework | |
A Bayesian Optimization Framework for Neural Network Compression
Title | A Bayesian Optimization Framework for Neural Network Compression |
Authors | Xingchen Ma, Amal Rannen Triki, Maxim Berman, Christos Sagonas, Jacques Cali, Matthew B. Blaschko |
Abstract | Neural network compression is an important step for deploying neural networks where speed is of high importance, or on devices with limited memory. It is necessary to tune compression parameters in order to achieve the desired trade-off between size and performance. This is often done by optimizing the loss on a validation set of data, which should be large enough to approximate the true risk and therefore yield sufficient generalization ability. However, using a full validation set can be computationally expensive. In this work, we develop a general Bayesian optimization framework for optimizing functions that are computed based on U-statistics. We propagate Gaussian uncertainties from the statistics through the Bayesian optimization framework yielding a method that gives a probabilistic approximation certificate of the result. We then apply this to parameter selection in neural network compression. Compression objectives that can be written as U-statistics are typically based on empirical risk and knowledge distillation for deep discriminative models. We demonstrate our method on VGG and ResNet models, and the resulting system can find optimal compression parameters for relatively high-dimensional parametrizations in a matter of minutes on a standard desktop machine, orders of magnitude faster than competing methods. |
Tasks | Neural Network Compression |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Ma_A_Bayesian_Optimization_Framework_for_Neural_Network_Compression_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Ma_A_Bayesian_Optimization_Framework_for_Neural_Network_Compression_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/a-bayesian-optimization-framework-for-neural |
Repo | |
Framework | |
Motion Estimation of Non-Holonomic Ground Vehicles From a Single Feature Correspondence Measured Over N Views
Title | Motion Estimation of Non-Holonomic Ground Vehicles From a Single Feature Correspondence Measured Over N Views |
Authors | Kun Huang, Yifu Wang, Laurent Kneip |
Abstract | The planar motion of ground vehicles is often non-holonomic, which enables a solution of the two-view relative pose problem from a single point feature correspondence. Man-made environments such as underground parking lots are however dominated by line features. Inspired by the planar tri-focal tensor and its ability to handle lines, we establish an n-linear constraint on the locally circular motion of non-holonomic vehicles able to handle an arbitrarily large and dense window of views. We prove that this stays a uni-variate problem under the assumption of locally constant vehicle speed, and it can transparently handle both point and vertical line correspondences. In particular, we prove that an application of Viete’s formulas for extrapolating trigonometric functions of angle multiples and the Weierstrass substitution casts the problem as one that merely seeks the roots of a uni-variate polynomial. We present the complete theory of this novel solver, and test it on both simulated and real data. Our results prove that it successfully handles a variety of relevant scenarios, eventually outperforming the 1-point two-view solver. |
Tasks | Motion Estimation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Huang_Motion_Estimation_of_Non-Holonomic_Ground_Vehicles_From_a_Single_Feature_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Motion_Estimation_of_Non-Holonomic_Ground_Vehicles_From_a_Single_Feature_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/motion-estimation-of-non-holonomic-ground |
Repo | |
Framework | |
Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses
Title | Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses |
Authors | {'E}tienne Simon, Vincent Guigue, Benjamin Piwowarski |
Abstract | Unsupervised relation extraction aims at extracting relations between entities in text. Previous unsupervised approaches are either generative or discriminative. In a supervised setting, discriminative approaches, such as deep neural network classifiers, have demonstrated substantial improvement. However, these models are hard to train without supervision, and the currently proposed solutions are unstable. To overcome this limitation, we introduce a skewness loss which encourages the classifier to predict a relation with confidence given a sentence, and a distribution distance loss enforcing that all relations are predicted in average. These losses improve the performance of discriminative based models, and enable us to train deep neural networks satisfactorily, surpassing current state of the art on three different datasets. |
Tasks | Relation Extraction |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1133/ |
https://www.aclweb.org/anthology/P19-1133 | |
PWC | https://paperswithcode.com/paper/unsupervised-information-extraction |
Repo | |
Framework | |
Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding U.S. Gun Violence
Title | Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding U.S. Gun Violence |
Authors | Siyi Liu, Lei Guo, Kate Mays, Margrit Betke, Derry Tanti Wijaya |
Abstract | Different news articles about the same topic often offer a variety of perspectives: an article written about gun violence might emphasize gun control, while another might promote 2nd Amendment rights, and yet a third might focus on mental health issues. In communication research, these different perspectives are known as {``}frames{''}, which, when used in news media will influence the opinion of their readers in multiple ways. In this paper, we present a method for effectively detecting frames in news headlines. Our training and performance evaluation is based on a new dataset of news headlines related to the issue of gun violence in the United States. This Gun Violence Frame Corpus (GVFC) was curated and annotated by journalism and communication experts. Our proposed approach sets a new state-of-the-art performance for multiclass news frame detection, significantly outperforming a recent baseline by 35.9{%} absolute difference in accuracy. We apply our frame detection approach in a large scale study of 88k news headlines about the coverage of gun violence in the U.S. between 2016 and 2018. | |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/K19-1047/ |
https://www.aclweb.org/anthology/K19-1047 | |
PWC | https://paperswithcode.com/paper/detecting-frames-in-news-headlines-and-its |
Repo | |
Framework | |
Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation
Title | Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation |
Authors | Rocco Tripodi, Roberto Navigli |
Abstract | Game-theoretic models, thanks to their intrinsic ability to exploit contextual information, have shown to be particularly suited for the Word Sense Disambiguation task. They represent ambiguous words as the players of a non cooperative game and their senses as the strategies that the players can select in order to play the games. The interaction among the players is modeled with a weighted graph and the payoff as an embedding similarity function, that the players try to maximize. The impact of the word and sense embedding representations in the framework has been tested and analyzed extensively: experiments on standard benchmarks show state-of-art performances and different tests hint at the usefulness of using disambiguation to obtain contextualized word representations. |
Tasks | Word Sense Disambiguation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1009/ |
https://www.aclweb.org/anthology/D19-1009 | |
PWC | https://paperswithcode.com/paper/game-theory-meets-embeddings-a-unified |
Repo | |
Framework | |
USAAR-DFKI – The Transference Architecture for English–German Automatic Post-Editing
Title | USAAR-DFKI – The Transference Architecture for English–German Automatic Post-Editing |
Authors | Santanu Pal, Hongfei Xu, Nico Herbig, Antonio Kr{"u}ger, Josef van Genabith |
Abstract | In this paper we present an English{–}German Automatic Post-Editing (APE) system called transference, submitted to the APE Task organized at WMT 2019. Our transference model is based on a multi-encoder transformer architecture. Unlike previous approaches, it (i) uses a transformer encoder block for src, (ii) followed by a transformer decoder block, but without masking, for self-attention on mt, which effectively acts as second encoder combining src {–}{\textgreater} mt, and (iii) feeds this representation into a final decoder block generating pe. Our model improves over the raw black-box neural machine translation system by 0.9 and 1.0 absolute BLEU points on the WMT 2019 APE development and test set. Our submission ranked 3rd, however compared to the two top systems, performance differences are not statistically significant. |
Tasks | Automatic Post-Editing, Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5414/ |
https://www.aclweb.org/anthology/W19-5414 | |
PWC | https://paperswithcode.com/paper/usaar-dfki-the-transference-architecture-for |
Repo | |
Framework | |
APE through Neural and Statistical MT with Augmented Data. ADAPT/DCU Submission to the WMT 2019 APE Shared Task
Title | APE through Neural and Statistical MT with Augmented Data. ADAPT/DCU Submission to the WMT 2019 APE Shared Task |
Authors | Dimitar Shterionov, Joachim Wagner, F{'e}lix do Carmo |
Abstract | Automatic post-editing (APE) can be reduced to a machine translation (MT) task, where the source is the output of a specific MT system and the target is its post-edited variant. However, this approach does not consider context information that can be found in the original source of the MT system. Thus a better approach is to employ multi-source MT, where two input sequences are considered {–} the one being the original source and the other being the MT output. Extra context information can be introduced in the form of extra tokens that identify certain global property of a group of segments, added as a prefix or a suffix to each segment. Successfully applied in domain adaptation of MT as well as on APE, this technique deserves further attention. In this work we investigate multi-source neural APE (or NPE) systems with training data which has been augmented with two types of extra context tokens. We experiment with authentic and synthetic data provided by WMT 2019 and submit our results to the APE shared task. We also experiment with using statistical machine translation (SMT) methods for APE. While our systems score bellow the baseline, we consider this work a step towards understanding the added value of extra context in the case of APE. |
Tasks | Automatic Post-Editing, Domain Adaptation, Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5415/ |
https://www.aclweb.org/anthology/W19-5415 | |
PWC | https://paperswithcode.com/paper/ape-through-neural-and-statistical-mt-with |
Repo | |
Framework | |
Effort-Aware Neural Automatic Post-Editing
Title | Effort-Aware Neural Automatic Post-Editing |
Authors | Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi |
Abstract | For this round of the WMT 2019 APE shared task, our submission focuses on addressing the {``}over-correction{''} problem in APE. Over-correction occurs when the APE system tends to rephrase an already correct MT output, and the resulting sentence is penalized by a reference-based evaluation against human post-edits. Our intuition is that this problem can be prevented by informing the system about the predicted quality of the MT output or, in other terms, the expected amount of needed corrections. For this purpose, following the common approach in multilingual NMT, we prepend a special token to the beginning of both the source text and the MT output indicating the required amount of post-editing. Following the best submissions to the WMT 2018 APE shared task, our backbone architecture is based on multi-source Transformer to encode both the MT output and the corresponding source text. We participated both in the English-German and English-Russian subtasks. In the first subtask, our best submission improved the original MT output quality up to +0.98 BLEU and -0.47 TER. In the second subtask, where the higher quality of the MT output increases the risk of over-correction, none of our submitted runs was able to improve the MT output. | |
Tasks | Automatic Post-Editing |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5416/ |
https://www.aclweb.org/anthology/W19-5416 | |
PWC | https://paperswithcode.com/paper/effort-aware-neural-automatic-post-editing |
Repo | |
Framework | |
Volumetric Correspondence Networks for Optical Flow
Title | Volumetric Correspondence Networks for Optical Flow |
Authors | Gengshan Yang, Deva Ramanan |
Abstract | Many classic tasks in vision – such as the estimation of optical flow or stereo disparities – can be cast as dense correspondence matching. Well-known techniques for doing so make use of a cost volume, typically a 4D tensor of match costs between all pixels in a 2D image and their potential matches in a 2D search window. State-of-the-art (SOTA) deep networks for flow/stereo make use of such volumetric representations as internal layers. However, such layers require significant amounts of memory and compute, making them cumbersome to use in practice. As a result, SOTA networks also employ various heuristics designed to limit volumetric processing, leading to limited accuracy and overfitting. Instead, we introduce several simple modifications that dramatically simplify the use of volumetric layers - (1) volumetric encoder-decoder architectures that efficiently capture large receptive fields, (2) multi-channel cost volumes that capture multi-dimensional notions of pixel similarities, and finally, (3) separable volumetric filtering that significantly reduces computation and parameters while preserving accuracy. Our innovations dramatically improve accuracy over SOTA on standard benchmarks while being significantly easier to work with - training converges in 10X fewer iterations, and most importantly, our networks generalize across correspondence tasks. On-the-fly adaptation of search windows allows us to repurpose optical flow networks for stereo (and vice versa), and can also be used to implement adaptive networks that increase search window sizes on-demand. |
Tasks | Optical Flow Estimation |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8367-volumetric-correspondence-networks-for-optical-flow |
http://papers.nips.cc/paper/8367-volumetric-correspondence-networks-for-optical-flow.pdf | |
PWC | https://paperswithcode.com/paper/volumetric-correspondence-networks-for |
Repo | |
Framework | |