January 24, 2020

2982 words 14 mins read

Paper Group NANR 265

Training Hybrid Language Models by Marginalizing over Segmentations. Neural Networks with Cheap Differential Operators. Efficient Dictionary Learning with Gradient Descent. Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation. Transformer-XL: Language Modeling with Longer-Term Dependency. A Refined …

Training Hybrid Language Models by Marginalizing over Segmentations


Title	Training Hybrid Language Models by Marginalizing over Segmentations
Authors	Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin,
Abstract	In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words. Using such models, multiple potential segmentations usually exist for a given string, for example one using words and one using characters only. Thus, the probability of a string is the sum of the probabilities of all the possible segmentations. Here, we show how it is possible to marginalize over the segmentations efficiently, in order to compute the true probability of a sequence. We apply our technique on three datasets, comprising seven languages, showing improvements over a strong character level language model.
Tasks	Language Modelling
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1143/
PDF	https://www.aclweb.org/anthology/P19-1143
PWC	https://paperswithcode.com/paper/training-hybrid-language-models-by
Repo
Framework

Neural Networks with Cheap Differential Operators


Title	Neural Networks with Cheap Differential Operators
Authors	Tian Qi Chen, David K. Duvenaud
Abstract	Gradients of neural networks can be computed efficiently for any architecture, but some applications require computing differential operators with higher time complexity. We describe a family of neural network architectures that allow easy access to a family of differential operators involving \emph{dimension-wise derivatives}, and we show how to modify the backward computation graph to compute them efficiently. We demonstrate the use of these operators for solving root-finding subproblems in implicit ODE solvers, exact density evaluation for continuous normalizing flows, and evaluating the Fokker-Planck equation for training stochastic differential equation models.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9187-neural-networks-with-cheap-differential-operators
PDF	http://papers.nips.cc/paper/9187-neural-networks-with-cheap-differential-operators.pdf
PWC	https://paperswithcode.com/paper/neural-networks-with-cheap-differential
Repo
Framework

Efficient Dictionary Learning with Gradient Descent


Title	Efficient Dictionary Learning with Gradient Descent
Authors	Dar Gilboa, Sam Buchanan, John Wright
Abstract	Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem – complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well.
Tasks	Dictionary Learning
Published	2019-05-01
URL	https://openreview.net/forum?id=HyxlHsActm
PDF	https://openreview.net/pdf?id=HyxlHsActm
PWC	https://paperswithcode.com/paper/efficient-dictionary-learning-with-gradient
Repo
Framework

Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation


Title	Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation
Authors	Po-Yi Chen, Alexander H. Liu, Yen-Cheng Liu, Yu-Chiang Frank Wang
Abstract	Monocular depth estimation is a challenging task in scene understanding, with the goal to acquire the geometric properties of 3D space from 2D images. Due to the lack of RGB-depth image pairs, unsupervised learning methods aim at deriving depth information with alternative supervision such as stereo pairs. However, most existing works fail to model the geometric structure of objects, which generally results from considering pixel-level objective functions during training. In this paper, we propose SceneNet to overcome this limitation with the aid of semantic understanding from segmentation. Moreover, our proposed model is able to perform region-aware depth estimation by enforcing semantics consistency between stereo pairs. In our experiments, we qualitatively and quantitatively verify the effectiveness and robustness of our model, which produces favorable results against the state-of-the-art approaches do.
Tasks	Depth Estimation, Monocular Depth Estimation, Scene Understanding
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Towards_Scene_Understanding_Unsupervised_Monocular_Depth_Estimation_With_Semantic-Aware_Representation_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Towards_Scene_Understanding_Unsupervised_Monocular_Depth_Estimation_With_Semantic-Aware_Representation_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/towards-scene-understanding-unsupervised
Repo
Framework

Transformer-XL: Language Modeling with Longer-Term Dependency


Title	Transformer-XL: Language Modeling with Longer-Term Dependency
Authors	Zihang Dai, Zhilin Yang, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Abstract	We propose a novel neural architecture, Transformer-XL, for modeling longer-term dependency. To address the limitation of fixed-length contexts, we introduce a notion of recurrence by reusing the representations from the history. Empirically, we show state-of-the-art (SoTA) results on both word-level and character-level language modeling datasets, including WikiText-103, One Billion Word, Penn Treebank, and enwiki8. Notably, we improve the SoTA results from 1.06 to 0.99 in bpc on enwiki8, from 33.0 to 18.9 in perplexity on WikiText-103, and from 28.0 to 23.5 in perplexity on One Billion Word. Performance improves when the attention length increases during evaluation, and our best model attends to up to 1,600 words and 3,800 characters. To quantify the effective length of dependency, we devise a new metric and show that on WikiText-103 Transformer-XL manages to model dependency that is about 80% longer than recurrent networks and 450% longer than Transformer. Moreover, Transformer-XL is up to 1,800+ times faster than vanilla Transformer during evaluation.
Tasks	Language Modelling
Published	2019-05-01
URL	https://openreview.net/forum?id=HJePno0cYm
PDF	https://openreview.net/pdf?id=HJePno0cYm
PWC	https://paperswithcode.com/paper/transformer-xl-language-modeling-with-longer
Repo
Framework

A Refined Margin Distribution Analysis for Forest Representation Learning


Title	A Refined Margin Distribution Analysis for Forest Representation Learning
Authors	Shen-Huan Lyu, Liang Yang, Zhi-Hua Zhou
Abstract	In this paper, we formulate the forest representation learning approach called \textsc{CasDF} as an additive model which boosts the augmented feature instead of the prediction. We substantially improve the upper bound of the generalization gap from $\mathcal{O}(\sqrt{\ln m/m})$ to $\mathcal{O}(\ln m/m)$, while the margin ratio of the margin standard deviation to the margin mean is sufficiently small. This tighter upper bound inspires us to optimize the ratio. Therefore, we design a margin distribution reweighting approach for deep forest to achieve a small margin ratio by boosting the augmented feature. Experiments confirm the correlation between the margin distribution and generalization performance. We remark that this study offers a novel understanding of \textsc{CasDF} from the perspective of the margin theory and further guides the layer-by-layer forest representation learning.
Tasks	Representation Learning
Published	2019-12-01
URL	http://papers.nips.cc/paper/8791-a-refined-margin-distribution-analysis-for-forest-representation-learning
PDF	http://papers.nips.cc/paper/8791-a-refined-margin-distribution-analysis-for-forest-representation-learning.pdf
PWC	https://paperswithcode.com/paper/a-refined-margin-distribution-analysis-for
Repo
Framework

A Bayesian Optimization Framework for Neural Network Compression


Title	A Bayesian Optimization Framework for Neural Network Compression
Authors	Xingchen Ma, Amal Rannen Triki, Maxim Berman, Christos Sagonas, Jacques Cali, Matthew B. Blaschko
Abstract	Neural network compression is an important step for deploying neural networks where speed is of high importance, or on devices with limited memory. It is necessary to tune compression parameters in order to achieve the desired trade-off between size and performance. This is often done by optimizing the loss on a validation set of data, which should be large enough to approximate the true risk and therefore yield sufficient generalization ability. However, using a full validation set can be computationally expensive. In this work, we develop a general Bayesian optimization framework for optimizing functions that are computed based on U-statistics. We propagate Gaussian uncertainties from the statistics through the Bayesian optimization framework yielding a method that gives a probabilistic approximation certificate of the result. We then apply this to parameter selection in neural network compression. Compression objectives that can be written as U-statistics are typically based on empirical risk and knowledge distillation for deep discriminative models. We demonstrate our method on VGG and ResNet models, and the resulting system can find optimal compression parameters for relatively high-dimensional parametrizations in a matter of minutes on a standard desktop machine, orders of magnitude faster than competing methods.
Tasks	Neural Network Compression
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Ma_A_Bayesian_Optimization_Framework_for_Neural_Network_Compression_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Ma_A_Bayesian_Optimization_Framework_for_Neural_Network_Compression_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-optimization-framework-for-neural
Repo
Framework

Motion Estimation of Non-Holonomic Ground Vehicles From a Single Feature Correspondence Measured Over N Views


Title	Motion Estimation of Non-Holonomic Ground Vehicles From a Single Feature Correspondence Measured Over N Views
Authors	Kun Huang, Yifu Wang, Laurent Kneip
Abstract	The planar motion of ground vehicles is often non-holonomic, which enables a solution of the two-view relative pose problem from a single point feature correspondence. Man-made environments such as underground parking lots are however dominated by line features. Inspired by the planar tri-focal tensor and its ability to handle lines, we establish an n-linear constraint on the locally circular motion of non-holonomic vehicles able to handle an arbitrarily large and dense window of views. We prove that this stays a uni-variate problem under the assumption of locally constant vehicle speed, and it can transparently handle both point and vertical line correspondences. In particular, we prove that an application of Viete’s formulas for extrapolating trigonometric functions of angle multiples and the Weierstrass substitution casts the problem as one that merely seeks the roots of a uni-variate polynomial. We present the complete theory of this novel solver, and test it on both simulated and real data. Our results prove that it successfully handles a variety of relevant scenarios, eventually outperforming the 1-point two-view solver.
Tasks	Motion Estimation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Huang_Motion_Estimation_of_Non-Holonomic_Ground_Vehicles_From_a_Single_Feature_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Motion_Estimation_of_Non-Holonomic_Ground_Vehicles_From_a_Single_Feature_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/motion-estimation-of-non-holonomic-ground
Repo
Framework

Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses


Title	Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses
Authors	{'E}tienne Simon, Vincent Guigue, Benjamin Piwowarski
Abstract	Unsupervised relation extraction aims at extracting relations between entities in text. Previous unsupervised approaches are either generative or discriminative. In a supervised setting, discriminative approaches, such as deep neural network classifiers, have demonstrated substantial improvement. However, these models are hard to train without supervision, and the currently proposed solutions are unstable. To overcome this limitation, we introduce a skewness loss which encourages the classifier to predict a relation with confidence given a sentence, and a distribution distance loss enforcing that all relations are predicted in average. These losses improve the performance of discriminative based models, and enable us to train deep neural networks satisfactorily, surpassing current state of the art on three different datasets.
Tasks	Relation Extraction
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1133/
PDF	https://www.aclweb.org/anthology/P19-1133
PWC	https://paperswithcode.com/paper/unsupervised-information-extraction
Repo
Framework

Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding U.S. Gun Violence


Title	Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding U.S. Gun Violence
Authors	Siyi Liu, Lei Guo, Kate Mays, Margrit Betke, Derry Tanti Wijaya
Abstract	Different news articles about the same topic often offer a variety of perspectives: an article written about gun violence might emphasize gun control, while another might promote 2nd Amendment rights, and yet a third might focus on mental health issues. In communication research, these different perspectives are known as {``}frames{''}, which, when used in news media will influence the opinion of their readers in multiple ways. In this paper, we present a method for effectively detecting frames in news headlines. Our training and performance evaluation is based on a new dataset of news headlines related to the issue of gun violence in the United States. This Gun Violence Frame Corpus (GVFC) was curated and annotated by journalism and communication experts. Our proposed approach sets a new state-of-the-art performance for multiclass news frame detection, significantly outperforming a recent baseline by 35.9{%} absolute difference in accuracy. We apply our frame detection approach in a large scale study of 88k news headlines about the coverage of gun violence in the U.S. between 2016 and 2018. \|
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/K19-1047/
PDF	https://www.aclweb.org/anthology/K19-1047
PWC	https://paperswithcode.com/paper/detecting-frames-in-news-headlines-and-its
Repo
Framework

Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation


Title	Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation
Authors	Rocco Tripodi, Roberto Navigli
Abstract	Game-theoretic models, thanks to their intrinsic ability to exploit contextual information, have shown to be particularly suited for the Word Sense Disambiguation task. They represent ambiguous words as the players of a non cooperative game and their senses as the strategies that the players can select in order to play the games. The interaction among the players is modeled with a weighted graph and the payoff as an embedding similarity function, that the players try to maximize. The impact of the word and sense embedding representations in the framework has been tested and analyzed extensively: experiments on standard benchmarks show state-of-art performances and different tests hint at the usefulness of using disambiguation to obtain contextualized word representations.
Tasks	Word Sense Disambiguation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1009/
PDF	https://www.aclweb.org/anthology/D19-1009
PWC	https://paperswithcode.com/paper/game-theory-meets-embeddings-a-unified
Repo
Framework

USAAR-DFKI – The Transference Architecture for English–German Automatic Post-Editing


Title	USAAR-DFKI – The Transference Architecture for English–German Automatic Post-Editing
Authors	Santanu Pal, Hongfei Xu, Nico Herbig, Antonio Kr{"u}ger, Josef van Genabith
Abstract	In this paper we present an English{–}German Automatic Post-Editing (APE) system called transference, submitted to the APE Task organized at WMT 2019. Our transference model is based on a multi-encoder transformer architecture. Unlike previous approaches, it (i) uses a transformer encoder block for src, (ii) followed by a transformer decoder block, but without masking, for self-attention on mt, which effectively acts as second encoder combining src {–}{\textgreater} mt, and (iii) feeds this representation into a final decoder block generating pe. Our model improves over the raw black-box neural machine translation system by 0.9 and 1.0 absolute BLEU points on the WMT 2019 APE development and test set. Our submission ranked 3rd, however compared to the two top systems, performance differences are not statistically significant.
Tasks	Automatic Post-Editing, Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5414/
PDF	https://www.aclweb.org/anthology/W19-5414
PWC	https://paperswithcode.com/paper/usaar-dfki-the-transference-architecture-for
Repo
Framework

APE through Neural and Statistical MT with Augmented Data. ADAPT/DCU Submission to the WMT 2019 APE Shared Task


Title	APE through Neural and Statistical MT with Augmented Data. ADAPT/DCU Submission to the WMT 2019 APE Shared Task
Authors	Dimitar Shterionov, Joachim Wagner, F{'e}lix do Carmo
Abstract	Automatic post-editing (APE) can be reduced to a machine translation (MT) task, where the source is the output of a specific MT system and the target is its post-edited variant. However, this approach does not consider context information that can be found in the original source of the MT system. Thus a better approach is to employ multi-source MT, where two input sequences are considered {–} the one being the original source and the other being the MT output. Extra context information can be introduced in the form of extra tokens that identify certain global property of a group of segments, added as a prefix or a suffix to each segment. Successfully applied in domain adaptation of MT as well as on APE, this technique deserves further attention. In this work we investigate multi-source neural APE (or NPE) systems with training data which has been augmented with two types of extra context tokens. We experiment with authentic and synthetic data provided by WMT 2019 and submit our results to the APE shared task. We also experiment with using statistical machine translation (SMT) methods for APE. While our systems score bellow the baseline, we consider this work a step towards understanding the added value of extra context in the case of APE.
Tasks	Automatic Post-Editing, Domain Adaptation, Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5415/
PDF	https://www.aclweb.org/anthology/W19-5415
PWC	https://paperswithcode.com/paper/ape-through-neural-and-statistical-mt-with
Repo
Framework

Effort-Aware Neural Automatic Post-Editing


Title	Effort-Aware Neural Automatic Post-Editing
Authors	Amirhossein Tebbifakhr, Matteo Negri, Marco Turchi
Abstract	For this round of the WMT 2019 APE shared task, our submission focuses on addressing the {``}over-correction{''} problem in APE. Over-correction occurs when the APE system tends to rephrase an already correct MT output, and the resulting sentence is penalized by a reference-based evaluation against human post-edits. Our intuition is that this problem can be prevented by informing the system about the predicted quality of the MT output or, in other terms, the expected amount of needed corrections. For this purpose, following the common approach in multilingual NMT, we prepend a special token to the beginning of both the source text and the MT output indicating the required amount of post-editing. Following the best submissions to the WMT 2018 APE shared task, our backbone architecture is based on multi-source Transformer to encode both the MT output and the corresponding source text. We participated both in the English-German and English-Russian subtasks. In the first subtask, our best submission improved the original MT output quality up to +0.98 BLEU and -0.47 TER. In the second subtask, where the higher quality of the MT output increases the risk of over-correction, none of our submitted runs was able to improve the MT output. \|
Tasks	Automatic Post-Editing
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5416/
PDF	https://www.aclweb.org/anthology/W19-5416
PWC	https://paperswithcode.com/paper/effort-aware-neural-automatic-post-editing
Repo
Framework

Volumetric Correspondence Networks for Optical Flow


Title	Volumetric Correspondence Networks for Optical Flow
Authors	Gengshan Yang, Deva Ramanan
Abstract	Many classic tasks in vision – such as the estimation of optical flow or stereo disparities – can be cast as dense correspondence matching. Well-known techniques for doing so make use of a cost volume, typically a 4D tensor of match costs between all pixels in a 2D image and their potential matches in a 2D search window. State-of-the-art (SOTA) deep networks for flow/stereo make use of such volumetric representations as internal layers. However, such layers require significant amounts of memory and compute, making them cumbersome to use in practice. As a result, SOTA networks also employ various heuristics designed to limit volumetric processing, leading to limited accuracy and overfitting. Instead, we introduce several simple modifications that dramatically simplify the use of volumetric layers - (1) volumetric encoder-decoder architectures that efficiently capture large receptive fields, (2) multi-channel cost volumes that capture multi-dimensional notions of pixel similarities, and finally, (3) separable volumetric filtering that significantly reduces computation and parameters while preserving accuracy. Our innovations dramatically improve accuracy over SOTA on standard benchmarks while being significantly easier to work with - training converges in 10X fewer iterations, and most importantly, our networks generalize across correspondence tasks. On-the-fly adaptation of search windows allows us to repurpose optical flow networks for stereo (and vice versa), and can also be used to implement adaptive networks that increase search window sizes on-demand.
Tasks	Optical Flow Estimation
Published	2019-12-01
URL	http://papers.nips.cc/paper/8367-volumetric-correspondence-networks-for-optical-flow
PDF	http://papers.nips.cc/paper/8367-volumetric-correspondence-networks-for-optical-flow.pdf
PWC	https://paperswithcode.com/paper/volumetric-correspondence-networks-for
Repo
Framework