Paper Group NANR 111
Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing. Predicting Cognitive Effort in Translation Production. Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation. Globally Soft Filter Pruning For Efficient Convolutional Neural Networks. Understand the dynamics of GANs via Primal-Du …
Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing
Title | Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing |
Authors | Mohab Elkaref, Bernd Bohnet |
Abstract | |
Tasks | Dependency Parsing, Transition-Based Dependency Parsing |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-8012/ |
https://www.aclweb.org/anthology/W19-8012 | |
PWC | https://paperswithcode.com/paper/recursive-lstm-tree-representation-for-arc |
Repo | |
Framework | |
Predicting Cognitive Effort in Translation Production
Title | Predicting Cognitive Effort in Translation Production |
Authors | Yuxiang Wei |
Abstract | |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-7008/ |
https://www.aclweb.org/anthology/W19-7008 | |
PWC | https://paperswithcode.com/paper/predicting-cognitive-effort-in-translation |
Repo | |
Framework | |
Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation
Title | Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation |
Authors | Xin Tan, Longyin Zhang, Deyi Xiong, Guodong Zhou |
Abstract | Document-level machine translation (MT) remains challenging due to the difficulty in efficiently using document context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted global document context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results on several benchmark corpora show that our proposed model can significantly improve document-level translation performance over several strong NMT baselines. |
Tasks | Machine Translation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1168/ |
https://www.aclweb.org/anthology/D19-1168 | |
PWC | https://paperswithcode.com/paper/hierarchical-modeling-of-global-context-for |
Repo | |
Framework | |
Globally Soft Filter Pruning For Efficient Convolutional Neural Networks
Title | Globally Soft Filter Pruning For Efficient Convolutional Neural Networks |
Authors | Ke Xu, Xiaoyun Wang, Qun Jia, Jianjing An, Dong Wang |
Abstract | This paper propose a cumulative saliency based Globally Soft Filter Pruning (GSFP) scheme to prune redundant filters of Convolutional Neural Networks (CNNs).Specifically, the GSFP adopts a robust pruning method, which measures the global redundancy of the filter in the whole model by using the soft pruning strategy. In addition, in the model recovery process after pruning, we use the cumulative saliency strategy to improve the accuracy of pruning. GSFP has two advantages over previous works:(1) More accurate pruning guidance. For a pre-trained CNN model, the saliency of the filter varies with different input data. Therefore, accumulating the saliency of the filter over the entire data set can provide more accurate guidance for pruning. On the other hand, pruning from a global perspective is more accurate than local pruning. (2) More robust pruning strategy. We propose a reasonable normalization formula to prevent certain layers of filters in the network from being completely clipped due to excessive pruning rate. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=H1fevoAcKX |
https://openreview.net/pdf?id=H1fevoAcKX | |
PWC | https://paperswithcode.com/paper/globally-soft-filter-pruning-for-efficient |
Repo | |
Framework | |
Understand the dynamics of GANs via Primal-Dual Optimization
Title | Understand the dynamics of GANs via Primal-Dual Optimization |
Authors | Songtao Lu, Rahul Singh, Xiangyi Chen, Yongxin Chen, Mingyi Hong |
Abstract | Generative adversarial network (GAN) is one of the best known unsupervised learning techniques these days due to its superior ability to learn data distributions. In spite of its great success in applications, GAN is known to be notoriously hard to train. The tremendous amount of time it takes to run the training algorithm and its sensitivity to hyper-parameter tuning have been haunting researchers in this area. To resolve these issues, we need to first understand how GANs work. Herein, we take a step toward this direction by examining the dynamics of GANs. We relate a large class of GANs including the Wasserstein GANs to max-min optimization problems with the coupling term being linear over the discriminator. By developing new primal-dual optimization tools, we show that, with a proper stepsize choice, the widely used first-order iterative algorithm in training GANs would in fact converge to a stationary solution with a sublinear rate. The same framework also applies to multi-task learning and distributional robust learning problems. We verify our analysis on numerical examples with both synthetic and real data sets. We hope our analysis shed light on future studies on the theoretical properties of relevant machine learning problems. |
Tasks | Multi-Task Learning |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rylIy3R9K7 |
https://openreview.net/pdf?id=rylIy3R9K7 | |
PWC | https://paperswithcode.com/paper/understand-the-dynamics-of-gans-via-primal |
Repo | |
Framework | |
Optimal margin Distribution Network
Title | Optimal margin Distribution Network |
Authors | Shen-Huan Lv, Lu Wang, Zhi-Hua Zhou |
Abstract | Recent research about margin theory has proved that maximizing the minimum margin like support vector machines does not necessarily lead to better performance, and instead, it is crucial to optimize the margin distribution. In the meantime, margin theory has been used to explain the empirical success of deep network in recent studies. In this paper, we present ODN (the Optimal margin Distribution Network), a network which embeds a loss function in regard to the optimal margin distribution. We give a theoretical analysis for our method using the PAC-Bayesian framework, which confirms the significance of the margin distribution for classification within the framework of deep networks. In addition, empirical results show that the ODN model always outperforms the baseline cross-entropy loss model consistently across different regularization situations. And our ODN model also outperforms the cross-entropy loss (Xent), hinge loss and soft hinge loss model in generalization task through limited training data. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HygcvsAcFX |
https://openreview.net/pdf?id=HygcvsAcFX | |
PWC | https://paperswithcode.com/paper/optimal-margin-distribution-network-1 |
Repo | |
Framework | |
Modeling Frames in Argumentation
Title | Modeling Frames in Argumentation |
Authors | Yamen Ajjour, Milad Alshomary, Henning Wachsmuth, Benno Stein |
Abstract | In argumentation, framing is used to emphasize a specific aspect of a controversial topic while concealing others. When talking about legalizing drugs, for instance, its economical aspect may be emphasized. In general, we call a set of arguments that focus on the same aspect a frame. An argumentative text has to serve the {``}right{''} frame(s) to convince the audience to adopt the author{'}s stance (e.g., being pro or con legalizing drugs). More specifically, an author has to choose frames that fit the audience{'}s cultural background and interests. This paper introduces frame identification, which is the task of splitting a set of arguments into non-overlapping frames. We present a fully unsupervised approach to this task, which first removes topical information and then identifies frames using clustering. For evaluation purposes, we provide a corpus with 12, 326 debate-portal arguments, organized along the frames of the debates{'} topics. On this corpus, our approach outperforms different strong baselines, achieving an F1-score of 0.28. | |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1290/ |
https://www.aclweb.org/anthology/D19-1290 | |
PWC | https://paperswithcode.com/paper/modeling-frames-in-argumentation |
Repo | |
Framework | |
Study of lexical aspect in the French medical language. Development of a lexical resource
Title | Study of lexical aspect in the French medical language. Development of a lexical resource |
Authors | Agathe Pierson, C{'e}drick Fairon |
Abstract | This paper details the development of a linguistic resource designed to improve temporal information extraction systems and to integrate aspectual values. After a brief review of recent works in temporal information extraction for the medical area, we discuss the linguistic notion of aspect and how it got a place in the NLP field. Then, we present our clinical data and describe the five-step approach adopted in this study. Finally, we represent the linguistic resource itself and explain how we elaborated it and which properties were selected for the creation of the tables. |
Tasks | Temporal Information Extraction |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-1907/ |
https://www.aclweb.org/anthology/W19-1907 | |
PWC | https://paperswithcode.com/paper/study-of-lexical-aspect-in-the-french-medical |
Repo | |
Framework | |
ELAN as a search engine for hierarchically structured, tagged corpora
Title | ELAN as a search engine for hierarchically structured, tagged corpora |
Authors | Joshua Wilbur |
Abstract | |
Tasks | |
Published | 2019-01-01 |
URL | https://www.aclweb.org/anthology/W19-0308/ |
https://www.aclweb.org/anthology/W19-0308 | |
PWC | https://paperswithcode.com/paper/elan-as-a-search-engine-for-hierarchically |
Repo | |
Framework | |
Learning Abstract Models for Long-Horizon Exploration
Title | Learning Abstract Models for Long-Horizon Exploration |
Authors | Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang |
Abstract | In high-dimensional reinforcement learning settings with sparse rewards, performing effective exploration to even obtain any reward signal is an open challenge. While model-based approaches hold promise of better exploration via planning, it is extremely difficult to learn a reliable enough Markov Decision Process (MDP) in high dimensions (e.g., over 10^100 states). In this paper, we propose learning an abstract MDP over a much smaller number of states (e.g., 10^5), which we can plan over for effective exploration. We assume we have an abstraction function that maps concrete states (e.g., raw pixels) to abstract states (e.g., agent position, ignoring other objects). In our approach, a manager maintains an abstract MDP over a subset of the abstract states, which grows monotonically through targeted exploration (possible due to the abstract MDP). Concurrently, we learn a worker policy to travel between abstract states; the worker deals with the messiness of concrete states and presents a clean abstraction to the manager. On three of the hardest games from the Arcade Learning Environment (Montezuma’s, Pitfall!, and Private Eye), our approach outperforms the previous state-of-the-art by over a factor of 2 in each game. In Pitfall!, our approach is the first to achieve superhuman performance without demonstrations. |
Tasks | Atari Games |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=ryxLG2RcYX |
https://openreview.net/pdf?id=ryxLG2RcYX | |
PWC | https://paperswithcode.com/paper/learning-abstract-models-for-long-horizon |
Repo | |
Framework | |
Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation
Title | Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation |
Authors | Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, Rui Yao |
Abstract | One-shot image segmentation aims to undertake the segmentation task of a novel class with only one training image available. The difficulty lies in that image segmentation has structured data representations, which yields a many-to-many message passing problem. Previous methods often simplify it to a one-to-many problem by squeezing support data to a global descriptor. However, a mixed global representation drops the data structure and information of individual elements. In this paper, we propose to model structured segmentation data with graphs and apply attentive graph reasoning to propagate label information from support data to query data. The graph attention mechanism could establish the element-to-element correspondence across structured data by learning attention weights between connected graph nodes. To capture correspondence at different semantic levels, we further propose a pyramid-like structure that models different sizes of image regions as graph nodes and undertakes graph reasoning at different levels. Experiments on PASCAL VOC 2012 dataset demonstrate that our proposed network significantly outperforms the baseline method and leads to new state-of-the-art performance on 1-shot and 5-shot segmentation benchmarks. |
Tasks | Semantic Segmentation |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Zhang_Pyramid_Graph_Networks_With_Connection_Attentions_for_Region-Based_One-Shot_Semantic_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Pyramid_Graph_Networks_With_Connection_Attentions_for_Region-Based_One-Shot_Semantic_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/pyramid-graph-networks-with-connection |
Repo | |
Framework | |
Online Hyperparameter Adaptation via Amortized Proximal Optimization
Title | Online Hyperparameter Adaptation via Amortized Proximal Optimization |
Authors | Paul Vicol, Jeffery Z. HaoChen, Roger Grosse |
Abstract | Effective performance of neural networks depends critically on effective tuning of optimization hyperparameters, especially learning rates (and schedules thereof). We present Amortized Proximal Optimization (APO), which takes the perspective that each optimization step should approximately minimize a proximal objective (similar to the ones used to motivate natural gradient and trust region policy optimization). Optimization hyperparameters are adapted to best minimize the proximal objective after one weight update. We show that an idealized version of APO (where an oracle minimizes the proximal objective exactly) achieves global convergence to stationary point and locally second-order convergence to global optimum for neural networks. APO incurs minimal computational overhead. We experiment with using APO to adapt a variety of optimization hyperparameters online during training, including (possibly layer-specific) learning rates, damping coefficients, and gradient variance exponents. For a variety of network architectures and optimization algorithms (including SGD, RMSprop, and K-FAC), we show that with minimal tuning, APO performs competitively with carefully tuned optimizers. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rJl6M2C5Y7 |
https://openreview.net/pdf?id=rJl6M2C5Y7 | |
PWC | https://paperswithcode.com/paper/online-hyperparameter-adaptation-via |
Repo | |
Framework | |
Multiple People Tracking using Body and Joint Detections
Title | Multiple People Tracking using Body and Joint Detections |
Authors | Roberto Henschel, Yunzhe Zou, Bodo Rosenhahn |
Abstract | Most multiple people tracking systems compute trajectories based on the tracking-by-detection paradigm. Consequently, the performance depends to a large extent on the quality of the employed input detections. However, despite an enormous progress in recent years, partially occluded people are still often not recognized. Also, many correct detections are mistakenly discarded when the non-maximum suppression is performed. Improving the tracking performance thus requires to augment the coarse input. Wellsuited for this task are fine-graded body joint detections, as they allow to locate even strongly occluded persons. Thus in this work, we analyze the suitability of including joint detections for multiple people tracking. We introduce different affinities between the two detection types and evaluate their performances. Tracking is then performed within a near-online framework based on a min cost graph labeling formulation. As a result, our framework can recover heavily occluded persons and solve the data association efficiently. We evaluate our framework on the MOT16/17 benchmark. Experimental results demonstrate that our framework achieves state-of-the-art results. |
Tasks | Multiple People Tracking |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPRW_2019/html/BMTT/Henschel_Multiple_People_Tracking_Using_Body_and_Joint_Detections_CVPRW_2019_paper.html |
http://openaccess.thecvf.com/content_CVPRW_2019/papers/BMTT/Henschel_Multiple_People_Tracking_Using_Body_and_Joint_Detections_CVPRW_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/multiple-people-tracking-using-body-and-joint |
Repo | |
Framework | |
Optimization on Multiple Manifolds
Title | Optimization on Multiple Manifolds |
Authors | Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-ming Ma, Tie-yan Liu |
Abstract | Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint. Most previous works focus on the case with a single manifold. However, in practice it is quite common that the optimization problem involves more than one constraints, (each constraint corresponding to one manifold). It is not clear in general how to optimize on multiple manifolds effectively and provably especially when the intersection of multiple manifolds is not a manifold or cannot be easily calculated. We propose a unified algorithm framework to handle the optimization on multiple manifolds. Specifically, we integrate information from multiple manifolds and move along an ensemble direction by viewing the information from each manifold as a drift and adding them together. We prove the convergence properties of the proposed algorithms. We also apply the algorithms into training neural network with batch normalization layers and achieve preferable empirical results. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HJerDj05tQ |
https://openreview.net/pdf?id=HJerDj05tQ | |
PWC | https://paperswithcode.com/paper/optimization-on-multiple-manifolds |
Repo | |
Framework | |
Cutting Down Training Memory by Re-fowarding
Title | Cutting Down Training Memory by Re-fowarding |
Authors | Jianwei Feng, Dong Huang |
Abstract | Deep Neutral Networks(DNNs) require huge GPU memory when training on modern image/video databases. Unfortunately, the GPU memory as a hardware resource is always finite, which limits the image resolution, batch size, and learning rate that could be used for better DNN performance. In this paper, we propose a novel training approach, called Re-forwarding, that substantially reduces memory usage in training. Our approach automatically finds a subset of vertices in a DNN computation graph, and stores tensors only at these vertices during the first forward. During backward, extra local forwards (called the Re-forwarding process) are conducted to compute the missing tensors between the subset of vertices. The total memory cost becomes the sum of (1) the memory cost at the subset of vertices and (2) the maximum memory cost among local re-forwards. Re-forwarding trades training time overheads for memory and does not compromise any performance in testing. We propose theories and algorithms that achieve the optimal memory solutions for DNNs with either linear or arbitrary computation graphs. Experiments show that Re-forwarding cuts down up-to 80% of training memory on popular DNNs such as Alexnet, VGG, ResNet, Densenet and Inception net. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=BJMvBjC5YQ |
https://openreview.net/pdf?id=BJMvBjC5YQ | |
PWC | https://paperswithcode.com/paper/cutting-down-training-memory-by-re-fowarding-1 |
Repo | |
Framework | |