January 24, 2020

2550 words 12 mins read

Paper Group NANR 111

Paper Group NANR 111

Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing. Predicting Cognitive Effort in Translation Production. Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation. Globally Soft Filter Pruning For Efficient Convolutional Neural Networks. Understand the dynamics of GANs via Primal-Du …

Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing

Title Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing
Authors Mohab Elkaref, Bernd Bohnet
Abstract
Tasks Dependency Parsing, Transition-Based Dependency Parsing
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-8012/
PDF https://www.aclweb.org/anthology/W19-8012
PWC https://paperswithcode.com/paper/recursive-lstm-tree-representation-for-arc
Repo
Framework

Predicting Cognitive Effort in Translation Production

Title Predicting Cognitive Effort in Translation Production
Authors Yuxiang Wei
Abstract
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-7008/
PDF https://www.aclweb.org/anthology/W19-7008
PWC https://paperswithcode.com/paper/predicting-cognitive-effort-in-translation
Repo
Framework

Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation

Title Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation
Authors Xin Tan, Longyin Zhang, Deyi Xiong, Guodong Zhou
Abstract Document-level machine translation (MT) remains challenging due to the difficulty in efficiently using document context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted global document context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results on several benchmark corpora show that our proposed model can significantly improve document-level translation performance over several strong NMT baselines.
Tasks Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1168/
PDF https://www.aclweb.org/anthology/D19-1168
PWC https://paperswithcode.com/paper/hierarchical-modeling-of-global-context-for
Repo
Framework

Globally Soft Filter Pruning For Efficient Convolutional Neural Networks

Title Globally Soft Filter Pruning For Efficient Convolutional Neural Networks
Authors Ke Xu, Xiaoyun Wang, Qun Jia, Jianjing An, Dong Wang
Abstract This paper propose a cumulative saliency based Globally Soft Filter Pruning (GSFP) scheme to prune redundant filters of Convolutional Neural Networks (CNNs).Specifically, the GSFP adopts a robust pruning method, which measures the global redundancy of the filter in the whole model by using the soft pruning strategy. In addition, in the model recovery process after pruning, we use the cumulative saliency strategy to improve the accuracy of pruning. GSFP has two advantages over previous works:(1) More accurate pruning guidance. For a pre-trained CNN model, the saliency of the filter varies with different input data. Therefore, accumulating the saliency of the filter over the entire data set can provide more accurate guidance for pruning. On the other hand, pruning from a global perspective is more accurate than local pruning. (2) More robust pruning strategy. We propose a reasonable normalization formula to prevent certain layers of filters in the network from being completely clipped due to excessive pruning rate.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=H1fevoAcKX
PDF https://openreview.net/pdf?id=H1fevoAcKX
PWC https://paperswithcode.com/paper/globally-soft-filter-pruning-for-efficient
Repo
Framework

Understand the dynamics of GANs via Primal-Dual Optimization

Title Understand the dynamics of GANs via Primal-Dual Optimization
Authors Songtao Lu, Rahul Singh, Xiangyi Chen, Yongxin Chen, Mingyi Hong
Abstract Generative adversarial network (GAN) is one of the best known unsupervised learning techniques these days due to its superior ability to learn data distributions. In spite of its great success in applications, GAN is known to be notoriously hard to train. The tremendous amount of time it takes to run the training algorithm and its sensitivity to hyper-parameter tuning have been haunting researchers in this area. To resolve these issues, we need to first understand how GANs work. Herein, we take a step toward this direction by examining the dynamics of GANs. We relate a large class of GANs including the Wasserstein GANs to max-min optimization problems with the coupling term being linear over the discriminator. By developing new primal-dual optimization tools, we show that, with a proper stepsize choice, the widely used first-order iterative algorithm in training GANs would in fact converge to a stationary solution with a sublinear rate. The same framework also applies to multi-task learning and distributional robust learning problems. We verify our analysis on numerical examples with both synthetic and real data sets. We hope our analysis shed light on future studies on the theoretical properties of relevant machine learning problems.
Tasks Multi-Task Learning
Published 2019-05-01
URL https://openreview.net/forum?id=rylIy3R9K7
PDF https://openreview.net/pdf?id=rylIy3R9K7
PWC https://paperswithcode.com/paper/understand-the-dynamics-of-gans-via-primal
Repo
Framework

Optimal margin Distribution Network

Title Optimal margin Distribution Network
Authors Shen-Huan Lv, Lu Wang, Zhi-Hua Zhou
Abstract Recent research about margin theory has proved that maximizing the minimum margin like support vector machines does not necessarily lead to better performance, and instead, it is crucial to optimize the margin distribution. In the meantime, margin theory has been used to explain the empirical success of deep network in recent studies. In this paper, we present ODN (the Optimal margin Distribution Network), a network which embeds a loss function in regard to the optimal margin distribution. We give a theoretical analysis for our method using the PAC-Bayesian framework, which confirms the significance of the margin distribution for classification within the framework of deep networks. In addition, empirical results show that the ODN model always outperforms the baseline cross-entropy loss model consistently across different regularization situations. And our ODN model also outperforms the cross-entropy loss (Xent), hinge loss and soft hinge loss model in generalization task through limited training data.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=HygcvsAcFX
PDF https://openreview.net/pdf?id=HygcvsAcFX
PWC https://paperswithcode.com/paper/optimal-margin-distribution-network-1
Repo
Framework

Modeling Frames in Argumentation

Title Modeling Frames in Argumentation
Authors Yamen Ajjour, Milad Alshomary, Henning Wachsmuth, Benno Stein
Abstract In argumentation, framing is used to emphasize a specific aspect of a controversial topic while concealing others. When talking about legalizing drugs, for instance, its economical aspect may be emphasized. In general, we call a set of arguments that focus on the same aspect a frame. An argumentative text has to serve the {``}right{''} frame(s) to convince the audience to adopt the author{'}s stance (e.g., being pro or con legalizing drugs). More specifically, an author has to choose frames that fit the audience{'}s cultural background and interests. This paper introduces frame identification, which is the task of splitting a set of arguments into non-overlapping frames. We present a fully unsupervised approach to this task, which first removes topical information and then identifies frames using clustering. For evaluation purposes, we provide a corpus with 12, 326 debate-portal arguments, organized along the frames of the debates{'} topics. On this corpus, our approach outperforms different strong baselines, achieving an F1-score of 0.28. |
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1290/
PDF https://www.aclweb.org/anthology/D19-1290
PWC https://paperswithcode.com/paper/modeling-frames-in-argumentation
Repo
Framework

Study of lexical aspect in the French medical language. Development of a lexical resource

Title Study of lexical aspect in the French medical language. Development of a lexical resource
Authors Agathe Pierson, C{'e}drick Fairon
Abstract This paper details the development of a linguistic resource designed to improve temporal information extraction systems and to integrate aspectual values. After a brief review of recent works in temporal information extraction for the medical area, we discuss the linguistic notion of aspect and how it got a place in the NLP field. Then, we present our clinical data and describe the five-step approach adopted in this study. Finally, we represent the linguistic resource itself and explain how we elaborated it and which properties were selected for the creation of the tables.
Tasks Temporal Information Extraction
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-1907/
PDF https://www.aclweb.org/anthology/W19-1907
PWC https://paperswithcode.com/paper/study-of-lexical-aspect-in-the-french-medical
Repo
Framework

ELAN as a search engine for hierarchically structured, tagged corpora

Title ELAN as a search engine for hierarchically structured, tagged corpora
Authors Joshua Wilbur
Abstract
Tasks
Published 2019-01-01
URL https://www.aclweb.org/anthology/W19-0308/
PDF https://www.aclweb.org/anthology/W19-0308
PWC https://paperswithcode.com/paper/elan-as-a-search-engine-for-hierarchically
Repo
Framework

Learning Abstract Models for Long-Horizon Exploration

Title Learning Abstract Models for Long-Horizon Exploration
Authors Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang
Abstract In high-dimensional reinforcement learning settings with sparse rewards, performing effective exploration to even obtain any reward signal is an open challenge. While model-based approaches hold promise of better exploration via planning, it is extremely difficult to learn a reliable enough Markov Decision Process (MDP) in high dimensions (e.g., over 10^100 states). In this paper, we propose learning an abstract MDP over a much smaller number of states (e.g., 10^5), which we can plan over for effective exploration. We assume we have an abstraction function that maps concrete states (e.g., raw pixels) to abstract states (e.g., agent position, ignoring other objects). In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP). Concurrently, we learn a worker policy to travel between abstract states; the worker deals with the messiness of concrete states and presents a clean abstraction to the manager. On three of the hardest games from the Arcade Learning Environment (Montezuma’s, Pitfall!, and Private Eye), our approach outperforms the previous state-of-the-art by over a factor of 2 in each game. In Pitfall!, our approach is the first to achieve superhuman performance without demonstrations.
Tasks Atari Games
Published 2019-05-01
URL https://openreview.net/forum?id=ryxLG2RcYX
PDF https://openreview.net/pdf?id=ryxLG2RcYX
PWC https://paperswithcode.com/paper/learning-abstract-models-for-long-horizon
Repo
Framework

Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation

Title Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation
Authors Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, Rui Yao
Abstract One-shot image segmentation aims to undertake the segmentation task of a novel class with only one training image available. The difficulty lies in that image segmentation has structured data representations, which yields a many-to-many message passing problem. Previous methods often simplify it to a one-to-many problem by squeezing support data to a global descriptor. However, a mixed global representation drops the data structure and information of individual elements. In this paper, we propose to model structured segmentation data with graphs and apply attentive graph reasoning to propagate label information from support data to query data. The graph attention mechanism could establish the element-to-element correspondence across structured data by learning attention weights between connected graph nodes. To capture correspondence at different semantic levels, we further propose a pyramid-like structure that models different sizes of image regions as graph nodes and undertakes graph reasoning at different levels. Experiments on PASCAL VOC 2012 dataset demonstrate that our proposed network significantly outperforms the baseline method and leads to new state-of-the-art performance on 1-shot and 5-shot segmentation benchmarks.
Tasks Semantic Segmentation
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Zhang_Pyramid_Graph_Networks_With_Connection_Attentions_for_Region-Based_One-Shot_Semantic_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Pyramid_Graph_Networks_With_Connection_Attentions_for_Region-Based_One-Shot_Semantic_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/pyramid-graph-networks-with-connection
Repo
Framework

Online Hyperparameter Adaptation via Amortized Proximal Optimization

Title Online Hyperparameter Adaptation via Amortized Proximal Optimization
Authors Paul Vicol, Jeffery Z. HaoChen, Roger Grosse
Abstract Effective performance of neural networks depends critically on effective tuning of optimization hyperparameters, especially learning rates (and schedules thereof). We present Amortized Proximal Optimization (APO), which takes the perspective that each optimization step should approximately minimize a proximal objective (similar to the ones used to motivate natural gradient and trust region policy optimization). Optimization hyperparameters are adapted to best minimize the proximal objective after one weight update. We show that an idealized version of APO (where an oracle minimizes the proximal objective exactly) achieves global convergence to stationary point and locally second-order convergence to global optimum for neural networks. APO incurs minimal computational overhead. We experiment with using APO to adapt a variety of optimization hyperparameters online during training, including (possibly layer-specific) learning rates, damping coefficients, and gradient variance exponents. For a variety of network architectures and optimization algorithms (including SGD, RMSprop, and K-FAC), we show that with minimal tuning, APO performs competitively with carefully tuned optimizers.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=rJl6M2C5Y7
PDF https://openreview.net/pdf?id=rJl6M2C5Y7
PWC https://paperswithcode.com/paper/online-hyperparameter-adaptation-via
Repo
Framework

Multiple People Tracking using Body and Joint Detections

Title Multiple People Tracking using Body and Joint Detections
Authors Roberto Henschel, Yunzhe Zou, Bodo Rosenhahn
Abstract Most multiple people tracking systems compute trajectories based on the tracking-by-detection paradigm. Consequently, the performance depends to a large extent on the quality of the employed input detections. However, despite an enormous progress in recent years, partially occluded people are still often not recognized. Also, many correct detections are mistakenly discarded when the non-maximum suppression is performed. Improving the tracking performance thus requires to augment the coarse input. Wellsuited for this task are fine-graded body joint detections, as they allow to locate even strongly occluded persons. Thus in this work, we analyze the suitability of including joint detections for multiple people tracking. We introduce different affinities between the two detection types and evaluate their performances. Tracking is then performed within a near-online framework based on a min cost graph labeling formulation. As a result, our framework can recover heavily occluded persons and solve the data association efficiently. We evaluate our framework on the MOT16/17 benchmark. Experimental results demonstrate that our framework achieves state-of-the-art results.
Tasks Multiple People Tracking
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPRW_2019/html/BMTT/Henschel_Multiple_People_Tracking_Using_Body_and_Joint_Detections_CVPRW_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPRW_2019/papers/BMTT/Henschel_Multiple_People_Tracking_Using_Body_and_Joint_Detections_CVPRW_2019_paper.pdf
PWC https://paperswithcode.com/paper/multiple-people-tracking-using-body-and-joint
Repo
Framework

Optimization on Multiple Manifolds

Title Optimization on Multiple Manifolds
Authors Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-ming Ma, Tie-yan Liu
Abstract Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint. Most previous works focus on the case with a single manifold. However, in practice it is quite common that the optimization problem involves more than one constraints, (each constraint corresponding to one manifold). It is not clear in general how to optimize on multiple manifolds effectively and provably especially when the intersection of multiple manifolds is not a manifold or cannot be easily calculated. We propose a unified algorithm framework to handle the optimization on multiple manifolds. Specifically, we integrate information from multiple manifolds and move along an ensemble direction by viewing the information from each manifold as a drift and adding them together. We prove the convergence properties of the proposed algorithms. We also apply the algorithms into training neural network with batch normalization layers and achieve preferable empirical results.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=HJerDj05tQ
PDF https://openreview.net/pdf?id=HJerDj05tQ
PWC https://paperswithcode.com/paper/optimization-on-multiple-manifolds
Repo
Framework

Cutting Down Training Memory by Re-fowarding

Title Cutting Down Training Memory by Re-fowarding
Authors Jianwei Feng, Dong Huang
Abstract Deep Neutral Networks(DNNs) require huge GPU memory when training on modern image/video databases. Unfortunately, the GPU memory as a hardware resource is always finite, which limits the image resolution, batch size, and learning rate that could be used for better DNN performance. In this paper, we propose a novel training approach, called Re-forwarding, that substantially reduces memory usage in training. Our approach automatically finds a subset of vertices in a DNN computation graph, and stores tensors only at these vertices during the first forward. During backward, extra local forwards (called the Re-forwarding process) are conducted to compute the missing tensors between the subset of vertices. The total memory cost becomes the sum of (1) the memory cost at the subset of vertices and (2) the maximum memory cost among local re-forwards. Re-forwarding trades training time overheads for memory and does not compromise any performance in testing. We propose theories and algorithms that achieve the optimal memory solutions for DNNs with either linear or arbitrary computation graphs. Experiments show that Re-forwarding cuts down up-to 80% of training memory on popular DNNs such as Alexnet, VGG, ResNet, Densenet and Inception net.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=BJMvBjC5YQ
PDF https://openreview.net/pdf?id=BJMvBjC5YQ
PWC https://paperswithcode.com/paper/cutting-down-training-memory-by-re-fowarding-1
Repo
Framework
comments powered by Disqus