January 24, 2020

2550 words 12 mins read

Paper Group NANR 111

Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing. Predicting Cognitive Effort in Translation Production. Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation. Globally Soft Filter Pruning For Efficient Convolutional Neural Networks. Understand the dynamics of GANs via Primal-Du …

Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing


Title	Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing
Authors	Mohab Elkaref, Bernd Bohnet
Abstract
Tasks	Dependency Parsing, Transition-Based Dependency Parsing
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-8012/
PDF	https://www.aclweb.org/anthology/W19-8012
PWC	https://paperswithcode.com/paper/recursive-lstm-tree-representation-for-arc
Repo
Framework

Predicting Cognitive Effort in Translation Production


Title	Predicting Cognitive Effort in Translation Production
Authors	Yuxiang Wei
Abstract
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-7008/
PDF	https://www.aclweb.org/anthology/W19-7008
PWC	https://paperswithcode.com/paper/predicting-cognitive-effort-in-translation
Repo
Framework

Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation


Title	Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation
Authors	Xin Tan, Longyin Zhang, Deyi Xiong, Guodong Zhou
Abstract	Document-level machine translation (MT) remains challenging due to the difficulty in efficiently using document context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted global document context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results on several benchmark corpora show that our proposed model can significantly improve document-level translation performance over several strong NMT baselines.
Tasks	Machine Translation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1168/
PDF	https://www.aclweb.org/anthology/D19-1168
PWC	https://paperswithcode.com/paper/hierarchical-modeling-of-global-context-for
Repo
Framework

Globally Soft Filter Pruning For Efficient Convolutional Neural Networks


Title	Globally Soft Filter Pruning For Efficient Convolutional Neural Networks
Authors	Ke Xu, Xiaoyun Wang, Qun Jia, Jianjing An, Dong Wang
Abstract	This paper propose a cumulative saliency based Globally Soft Filter Pruning (GSFP) scheme to prune redundant filters of Convolutional Neural Networks (CNNs).Specifically, the GSFP adopts a robust pruning method, which measures the global redundancy of the filter in the whole model by using the soft pruning strategy. In addition, in the model recovery process after pruning, we use the cumulative saliency strategy to improve the accuracy of pruning. GSFP has two advantages over previous works:(1) More accurate pruning guidance. For a pre-trained CNN model, the saliency of the filter varies with different input data. Therefore, accumulating the saliency of the filter over the entire data set can provide more accurate guidance for pruning. On the other hand, pruning from a global perspective is more accurate than local pruning. (2) More robust pruning strategy. We propose a reasonable normalization formula to prevent certain layers of filters in the network from being completely clipped due to excessive pruning rate.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=H1fevoAcKX
PDF	https://openreview.net/pdf?id=H1fevoAcKX
PWC	https://paperswithcode.com/paper/globally-soft-filter-pruning-for-efficient
Repo
Framework

Understand the dynamics of GANs via Primal-Dual Optimization


Title	Understand the dynamics of GANs via Primal-Dual Optimization
Authors	Songtao Lu, Rahul Singh, Xiangyi Chen, Yongxin Chen, Mingyi Hong
Abstract	Generative adversarial network (GAN) is one of the best known unsupervised learning techniques these days due to its superior ability to learn data distributions. In spite of its great success in applications, GAN is known to be notoriously hard to train. The tremendous amount of time it takes to run the training algorithm and its sensitivity to hyper-parameter tuning have been haunting researchers in this area. To resolve these issues, we need to first understand how GANs work. Herein, we take a step toward this direction by examining the dynamics of GANs. We relate a large class of GANs including the Wasserstein GANs to max-min optimization problems with the coupling term being linear over the discriminator. By developing new primal-dual optimization tools, we show that, with a proper stepsize choice, the widely used first-order iterative algorithm in training GANs would in fact converge to a stationary solution with a sublinear rate. The same framework also applies to multi-task learning and distributional robust learning problems. We verify our analysis on numerical examples with both synthetic and real data sets. We hope our analysis shed light on future studies on the theoretical properties of relevant machine learning problems.
Tasks	Multi-Task Learning
Published	2019-05-01
URL	https://openreview.net/forum?id=rylIy3R9K7
PDF	https://openreview.net/pdf?id=rylIy3R9K7
PWC	https://paperswithcode.com/paper/understand-the-dynamics-of-gans-via-primal
Repo
Framework

Optimal margin Distribution Network


Title	Optimal margin Distribution Network
Authors	Shen-Huan Lv, Lu Wang, Zhi-Hua Zhou
Abstract	Recent research about margin theory has proved that maximizing the minimum margin like support vector machines does not necessarily lead to better performance, and instead, it is crucial to optimize the margin distribution. In the meantime, margin theory has been used to explain the empirical success of deep network in recent studies. In this paper, we present ODN (the Optimal margin Distribution Network), a network which embeds a loss function in regard to the optimal margin distribution. We give a theoretical analysis for our method using the PAC-Bayesian framework, which confirms the significance of the margin distribution for classification within the framework of deep networks. In addition, empirical results show that the ODN model always outperforms the baseline cross-entropy loss model consistently across different regularization situations. And our ODN model also outperforms the cross-entropy loss (Xent), hinge loss and soft hinge loss model in generalization task through limited training data.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=HygcvsAcFX
PDF	https://openreview.net/pdf?id=HygcvsAcFX
PWC	https://paperswithcode.com/paper/optimal-margin-distribution-network-1
Repo
Framework

Modeling Frames in Argumentation


Title	Modeling Frames in Argumentation
Authors	Yamen Ajjour, Milad Alshomary, Henning Wachsmuth, Benno Stein
Abstract	In argumentation, framing is used to emphasize a specific aspect of a controversial topic while concealing others. When talking about legalizing drugs, for instance, its economical aspect may be emphasized. In general, we call a set of arguments that focus on the same aspect a frame. An argumentative text has to serve the {``}right{''} frame(s) to convince the audience to adopt the author{'}s stance (e.g., being pro or con legalizing drugs). More specifically, an author has to choose frames that fit the audience{'}s cultural background and interests. This paper introduces frame identification, which is the task of splitting a set of arguments into non-overlapping frames. We present a fully unsupervised approach to this task, which first removes topical information and then identifies frames using clustering. For evaluation purposes, we provide a corpus with 12, 326 debate-portal arguments, organized along the frames of the debates{'} topics. On this corpus, our approach outperforms different strong baselines, achieving an F1-score of 0.28. \|
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1290/
PDF	https://www.aclweb.org/anthology/D19-1290
PWC	https://paperswithcode.com/paper/modeling-frames-in-argumentation
Repo
Framework

Study of lexical aspect in the French medical language. Development of a lexical resource


Title	Study of lexical aspect in the French medical language. Development of a lexical resource
Authors	Agathe Pierson, C{'e}drick Fairon
Abstract	This paper details the development of a linguistic resource designed to improve temporal information extraction systems and to integrate aspectual values. After a brief review of recent works in temporal information extraction for the medical area, we discuss the linguistic notion of aspect and how it got a place in the NLP field. Then, we present our clinical data and describe the five-step approach adopted in this study. Finally, we represent the linguistic resource itself and explain how we elaborated it and which properties were selected for the creation of the tables.
Tasks	Temporal Information Extraction
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-1907/
PDF	https://www.aclweb.org/anthology/W19-1907
PWC	https://paperswithcode.com/paper/study-of-lexical-aspect-in-the-french-medical
Repo
Framework

ELAN as a search engine for hierarchically structured, tagged corpora


Title	ELAN as a search engine for hierarchically structured, tagged corpora
Authors	Joshua Wilbur
Abstract
Tasks
Published	2019-01-01
URL	https://www.aclweb.org/anthology/W19-0308/
PDF	https://www.aclweb.org/anthology/W19-0308
PWC	https://paperswithcode.com/paper/elan-as-a-search-engine-for-hierarchically
Repo
Framework

Learning Abstract Models for Long-Horizon Exploration


Title	Learning Abstract Models for Long-Horizon Exploration
Authors	Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang
Abstract	In high-dimensional reinforcement learning settings with sparse rewards, performing effective exploration to even obtain any reward signal is an open challenge. While model-based approaches hold promise of better exploration via planning, it is extremely difficult to learn a reliable enough Markov Decision Process (MDP) in high dimensions (e.g., over 10^100 states). In this paper, we propose learning an abstract MDP over a much smaller number of states (e.g., 10^5), which we can plan over for effective exploration. We assume we have an abstraction function that maps concrete states (e.g., raw pixels) to abstract states (e.g., agent position, ignoring other objects). In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP). Concurrently, we learn a worker policy to travel between abstract states; the worker deals with the messiness of concrete states and presents a clean abstraction to the manager. On three of the hardest games from the Arcade Learning Environment (Montezuma’s, Pitfall!, and Private Eye), our approach outperforms the previous state-of-the-art by over a factor of 2 in each game. In Pitfall!, our approach is the first to achieve superhuman performance without demonstrations.
Tasks	Atari Games
Published	2019-05-01
URL	https://openreview.net/forum?id=ryxLG2RcYX
PDF	https://openreview.net/pdf?id=ryxLG2RcYX
PWC	https://paperswithcode.com/paper/learning-abstract-models-for-long-horizon
Repo
Framework

Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation


Title	Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation
Authors	Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, Rui Yao
Abstract	One-shot image segmentation aims to undertake the segmentation task of a novel class with only one training image available. The difficulty lies in that image segmentation has structured data representations, which yields a many-to-many message passing problem. Previous methods often simplify it to a one-to-many problem by squeezing support data to a global descriptor. However, a mixed global representation drops the data structure and information of individual elements. In this paper, we propose to model structured segmentation data with graphs and apply attentive graph reasoning to propagate label information from support data to query data. The graph attention mechanism could establish the element-to-element correspondence across structured data by learning attention weights between connected graph nodes. To capture correspondence at different semantic levels, we further propose a pyramid-like structure that models different sizes of image regions as graph nodes and undertakes graph reasoning at different levels. Experiments on PASCAL VOC 2012 dataset demonstrate that our proposed network significantly outperforms the baseline method and leads to new state-of-the-art performance on 1-shot and 5-shot segmentation benchmarks.
Tasks	Semantic Segmentation
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Zhang_Pyramid_Graph_Networks_With_Connection_Attentions_for_Region-Based_One-Shot_Semantic_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Pyramid_Graph_Networks_With_Connection_Attentions_for_Region-Based_One-Shot_Semantic_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/pyramid-graph-networks-with-connection
Repo
Framework

Online Hyperparameter Adaptation via Amortized Proximal Optimization


Title	Online Hyperparameter Adaptation via Amortized Proximal Optimization
Authors	Paul Vicol, Jeffery Z. HaoChen, Roger Grosse
Abstract	Effective performance of neural networks depends critically on effective tuning of optimization hyperparameters, especially learning rates (and schedules thereof). We present Amortized Proximal Optimization (APO), which takes the perspective that each optimization step should approximately minimize a proximal objective (similar to the ones used to motivate natural gradient and trust region policy optimization). Optimization hyperparameters are adapted to best minimize the proximal objective after one weight update. We show that an idealized version of APO (where an oracle minimizes the proximal objective exactly) achieves global convergence to stationary point and locally second-order convergence to global optimum for neural networks. APO incurs minimal computational overhead. We experiment with using APO to adapt a variety of optimization hyperparameters online during training, including (possibly layer-specific) learning rates, damping coefficients, and gradient variance exponents. For a variety of network architectures and optimization algorithms (including SGD, RMSprop, and K-FAC), we show that with minimal tuning, APO performs competitively with carefully tuned optimizers.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=rJl6M2C5Y7
PDF	https://openreview.net/pdf?id=rJl6M2C5Y7
PWC	https://paperswithcode.com/paper/online-hyperparameter-adaptation-via
Repo
Framework

Multiple People Tracking using Body and Joint Detections


Title	Multiple People Tracking using Body and Joint Detections
Authors	Roberto Henschel, Yunzhe Zou, Bodo Rosenhahn
Abstract	Most multiple people tracking systems compute trajectories based on the tracking-by-detection paradigm. Consequently, the performance depends to a large extent on the quality of the employed input detections. However, despite an enormous progress in recent years, partially occluded people are still often not recognized. Also, many correct detections are mistakenly discarded when the non-maximum suppression is performed. Improving the tracking performance thus requires to augment the coarse input. Wellsuited for this task are fine-graded body joint detections, as they allow to locate even strongly occluded persons. Thus in this work, we analyze the suitability of including joint detections for multiple people tracking. We introduce different affinities between the two detection types and evaluate their performances. Tracking is then performed within a near-online framework based on a min cost graph labeling formulation. As a result, our framework can recover heavily occluded persons and solve the data association efficiently. We evaluate our framework on the MOT16/17 benchmark. Experimental results demonstrate that our framework achieves state-of-the-art results.
Tasks	Multiple People Tracking
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPRW_2019/html/BMTT/Henschel_Multiple_People_Tracking_Using_Body_and_Joint_Detections_CVPRW_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPRW_2019/papers/BMTT/Henschel_Multiple_People_Tracking_Using_Body_and_Joint_Detections_CVPRW_2019_paper.pdf
PWC	https://paperswithcode.com/paper/multiple-people-tracking-using-body-and-joint
Repo
Framework

Optimization on Multiple Manifolds


Title	Optimization on Multiple Manifolds
Authors	Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-ming Ma, Tie-yan Liu
Abstract	Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint. Most previous works focus on the case with a single manifold. However, in practice it is quite common that the optimization problem involves more than one constraints, (each constraint corresponding to one manifold). It is not clear in general how to optimize on multiple manifolds effectively and provably especially when the intersection of multiple manifolds is not a manifold or cannot be easily calculated. We propose a unified algorithm framework to handle the optimization on multiple manifolds. Specifically, we integrate information from multiple manifolds and move along an ensemble direction by viewing the information from each manifold as a drift and adding them together. We prove the convergence properties of the proposed algorithms. We also apply the algorithms into training neural network with batch normalization layers and achieve preferable empirical results.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=HJerDj05tQ
PDF	https://openreview.net/pdf?id=HJerDj05tQ
PWC	https://paperswithcode.com/paper/optimization-on-multiple-manifolds
Repo
Framework

Cutting Down Training Memory by Re-fowarding


Title	Cutting Down Training Memory by Re-fowarding
Authors	Jianwei Feng, Dong Huang
Abstract	Deep Neutral Networks(DNNs) require huge GPU memory when training on modern image/video databases. Unfortunately, the GPU memory as a hardware resource is always finite, which limits the image resolution, batch size, and learning rate that could be used for better DNN performance. In this paper, we propose a novel training approach, called Re-forwarding, that substantially reduces memory usage in training. Our approach automatically finds a subset of vertices in a DNN computation graph, and stores tensors only at these vertices during the first forward. During backward, extra local forwards (called the Re-forwarding process) are conducted to compute the missing tensors between the subset of vertices. The total memory cost becomes the sum of (1) the memory cost at the subset of vertices and (2) the maximum memory cost among local re-forwards. Re-forwarding trades training time overheads for memory and does not compromise any performance in testing. We propose theories and algorithms that achieve the optimal memory solutions for DNNs with either linear or arbitrary computation graphs. Experiments show that Re-forwarding cuts down up-to 80% of training memory on popular DNNs such as Alexnet, VGG, ResNet, Densenet and Inception net.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=BJMvBjC5YQ
PDF	https://openreview.net/pdf?id=BJMvBjC5YQ
PWC	https://paperswithcode.com/paper/cutting-down-training-memory-by-re-fowarding-1
Repo
Framework