October 15, 2019

2585 words 13 mins read

Paper Group NANR 227

Theoretical properties of the global optimizer of two-layer Neural Network. UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes. On Hapax Legomena and Morphological Productivity. Toward learning better metrics for sequence generation training with policy gradient. The DipInfo-UniTo system for SR …

Theoretical properties of the global optimizer of two-layer Neural Network


Title	Theoretical properties of the global optimizer of two-layer Neural Network
Authors	Digvijay Boob, Guanghui Lan
Abstract	In this paper, we study the problem of optimizing a two-layer artificial neural network that best fits a training dataset. We look at this problem in the setting where the number of parameters is greater than the number of sampled points. We show that for a wide class of differentiable activation functions (this class involves most nonlinear functions and excludes piecewise linear functions), we have that arbitrary first-order optimal solutions satisfy global optimality provided the hidden layer is non-singular. We essentially show that these non-singular hidden layer matrix satisfy a `"good" property for these big class of activation functions. Techniques involved in proving this result inspire us to look at a new algorithmic, where in between two gradient step of hidden layer, we add a stochastic gradient descent (SGD) step of the output layer. In this new algorithmic framework, we extend our earlier result and show that for all finite iterations the hidden layer satisfies the`good” property mentioned earlier therefore partially explaining success of noisy gradient methods and addressing the issue of data independency of our earlier result. Both of these results are easily extended to hidden layers given by a flat matrix from that of a square matrix. Results are applicable even if network has more than one hidden layer provided all inner hidden layers are arbitrary, satisfy non-singularity, all activations are from the given class of differentiable functions and optimization is only with respect to the outermost hidden layer. Separately, we also study the smoothness properties of the objective function and show that it is actually Lipschitz smooth, i.e., its gradients do not change sharply. We use smoothness properties to guarantee asymptotic convergence of $O(1/\text{number of iterations})$ to a first-order optimal solution.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=BkIkkseAZ
PDF	https://openreview.net/pdf?id=BkIkkseAZ
PWC	https://paperswithcode.com/paper/theoretical-properties-of-the-global-1
Repo
Framework

UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes


Title	UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes
Authors	Milton King, Ali Hakimi Parizi, Paul Cook
Abstract	In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency. We show that, of these approaches, the simple approach based on word co-occurrence performs best. We further consider supervised and unsupervised approaches to combining information from these models, but these approaches do not improve on the word co-occurrence model.
Tasks	Semantic Textual Similarity, Word Embeddings
Published	2018-06-01
URL	https://www.aclweb.org/anthology/S18-1168/
PDF	https://www.aclweb.org/anthology/S18-1168
PWC	https://paperswithcode.com/paper/unbnlp-at-semeval-2018-task-10-evaluating
Repo
Framework

On Hapax Legomena and Morphological Productivity


Title	On Hapax Legomena and Morphological Productivity
Authors	Janet Pierrehumbert, Ramon Granell
Abstract	Quantifying and predicting morphological productivity is a long-standing challenge in corpus linguistics and psycholinguistics. The same challenge reappears in natural language processing in the context of handling words that were not seen in the training set (out-of-vocabulary, or OOV, words). Prior research showed that a good indicator of the productivity of a morpheme is the number of words involving it that occur exactly once (the \textit{hapax legomena}). A technical connection was adduced between this result and Good-Turing smoothing, which assigns probability mass to unseen events on the basis of the simplifying assumption that word frequencies are stationary. In a large-scale study of 133 affixes in Wikipedia, we develop evidence that success in fact depends on tapping the frequency range in which the assumptions of Good-Turing are violated.
Tasks
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5814/
PDF	https://www.aclweb.org/anthology/W18-5814
PWC	https://paperswithcode.com/paper/on-hapax-legomena-and-morphological
Repo
Framework

Toward learning better metrics for sequence generation training with policy gradient


Title	Toward learning better metrics for sequence generation training with policy gradient
Authors	Joji Toyama, Yusuke Iwasawa, Kotaro Nakayama, Yutaka Matsuo
Abstract	Designing a metric manually for unsupervised sequence generation tasks, such as text generation, is essentially difficult. In a such situation, learning a metric of a sequence from data is one possible solution. The previous study, SeqGAN, proposed the framework for unsupervised sequence generation, in which a metric is learned from data, and a generator is optimized with regard to the learned metric with policy gradient, inspired by generative adversarial nets (GANs) and reinforcement learning. In this paper, we make two proposals to learn better metric than SeqGAN’s: partial reward function and expert-based reward function training. The partial reward function is a reward function for a partial sequence of a certain length. SeqGAN employs a reward function for completed sequence only. By combining long-scale and short-scale partial reward functions, we expect a learned metric to be able to evaluate a partial correctness as well as a coherence of a sequence, as a whole. In expert-based reward function training, a reward function is trained to discriminate between an expert (or true) sequence and a fake sequence that is produced by editing an expert sequence. Expert-based reward function training is not a kind of GAN frameworks. This makes the optimization of the generator easier. We examine the effect of the partial reward function and expert-based reward function training on synthetic data and real text data, and show improvements over SeqGAN and the model trained with MLE. Specifically, whereas SeqGAN gains 0.42 improvement of NLL over MLE on synthetic data, our best model gains 3.02 improvement, and whereas SeqGAN gains 0.029 improvement of BLEU over MLE, our best model gains 0.250 improvement.
Tasks	Text Generation
Published	2018-01-01
URL	https://openreview.net/forum?id=r1kP7vlRb
PDF	https://openreview.net/pdf?id=r1kP7vlRb
PWC	https://paperswithcode.com/paper/toward-learning-better-metrics-for-sequence
Repo
Framework

The DipInfo-UniTo system for SRST 2018


Title	The DipInfo-UniTo system for SRST 2018
Authors	Valerio Basile, Aless Mazzei, ro
Abstract	This paper describes the system developed by the DipInfo-UniTo team to participate to the shallow track of the Surface Realization Shared Task 2018. The system employs two separate neural networks with different architectures to predict the word ordering and the morphological inflection independently from each other. The UniTO realizer is language independent, and its simple architecture allowed it to be scored in the central part of the final ranking of the shared task.
Tasks	Morphological Inflection, Text Generation
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-3609/
PDF	https://www.aclweb.org/anthology/W18-3609
PWC	https://paperswithcode.com/paper/the-dipinfo-unito-system-for-srst-2018
Repo
Framework

Scalar Posterior Sampling with Applications


Title	Scalar Posterior Sampling with Applications
Authors	Georgios Theocharous, Zheng Wen, Yasin Abbasi, Nikos Vlassis
Abstract	We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7995-scalar-posterior-sampling-with-applications
PDF	http://papers.nips.cc/paper/7995-scalar-posterior-sampling-with-applications.pdf
PWC	https://paperswithcode.com/paper/scalar-posterior-sampling-with-applications
Repo
Framework

A Robust Approach to Sequential Information Theoretic Planning


Title	A Robust Approach to Sequential Information Theoretic Planning
Authors	Sue Zheng, Jason Pacheco, John Fisher
Abstract	In many sequential planning applications a natural approach to generating high quality plans is to maximize an information reward such as mutual information (MI). Unfortunately, MI lacks a closed form in all but trivial models, and so must be estimated. In applications where the cost of plan execution is expensive, one desires planning estimates which admit theoretical guarantees. Through the use of robust M-estimators we obtain bounds on absolute deviation of estimated MI. Moreover, we propose a sequential algorithm which integrates inference and planning by maximally reusing particles in each stage. We validate the utility of using robust estimators in the sequential approach on a Gaussian Markov Random Field wherein information measures have a closed form. Lastly, we demonstrate the benefits of our integrated approach in the context of sequential experiment design for inferring causal regulatory networks from gene expression levels. Our method shows improvements over a recent method which selects intervention experiments based on the same MI objective.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2372
PDF	http://proceedings.mlr.press/v80/zheng18b/zheng18b.pdf
PWC	https://paperswithcode.com/paper/a-robust-approach-to-sequential-information
Repo
Framework

Towards an ISO Standard for the Annotation of Quantification


Title	Towards an ISO Standard for the Annotation of Quantification
Authors	Harry Bunt, James Pustejovsky, Kiyong Lee
Abstract
Tasks	Question Answering
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1282/
PDF	https://www.aclweb.org/anthology/L18-1282
PWC	https://paperswithcode.com/paper/towards-an-iso-standard-for-the-annotation-of
Repo
Framework

Towards Human-Level License Plate Recognition


Title	Towards Human-Level License Plate Recognition
Authors	Jiafan Zhuang, Saihui Hou, Zilei Wang, Zheng-Jun Zha
Abstract	License plate recognition (LPR) is a fundamental component of various intelligent transport systems, which is always expected to be accurate and efficient enough. In this paper, we propose a novel LPR framework consisting of semantic segmentation and character counting, towards achieving human-level performance. Benefiting from innovative structure, our method can recognize a whole license plate once rather than conducting character detection or sliding window followed by per-character recognition. Moreover, our method can achieve higher recognition accuracy due to more effectively exploiting global information and avoiding sensitive character detection, and is time-saving due to eliminating one-by-one character recognition. Finally, we experimentally verify the effectiveness of the proposed method on two public datasets (AOLP and Media Lab) and our License Plate Dataset. The results demonstrate our method significantly outperforms the previous state-of-the-art methods, and achieves the accuracies of more than 99% for almost all settings.
Tasks	License Plate Recognition, Semantic Segmentation
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Jiafan_Zhuang_Towards_Human-Level_License_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Jiafan_Zhuang_Towards_Human-Level_License_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/towards-human-level-license-plate-recognition
Repo
Framework

Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order


Title	Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order
Authors	Vladimir Braverman, Stephen Chestnut, Robert Krauthgamer, Yi Li, David Woodruff, Lin Yang
Abstract	A central problem in mining massive data streams is characterizing which functions of an underlying frequency vector can be approximated efficiently. Given the prevalence of large scale linear algebra problems in machine learning, recently there has been considerable effort in extending this data stream problem to that of estimating functions of a matrix. This setting generalizes classical problems to the analogous ones for matrices. For example, instead of estimating frequent-item counts, we now wish to estimate “frequent-direction” counts. A related example is to estimate norms, which now correspond to estimating a vector norm on the singular values of the matrix. Despite recent efforts, the current understanding for such matrix problems is considerably weaker than that for vector problems. We study a number of aspects of estimating matrix norms in a stream that have not previously been considered: (1) multi-pass algorithms, (2) algorithms that see the underlying matrix one row at a time, and (3) time-efficient algorithms. Our multi-pass and row-order algorithms use less memory than what is provably required in the single-pass and entrywise-update models, and thus give separations between these models (in terms of memory). Moreover, all of our algorithms are considerably faster than previous ones. We also prove a number of lower bounds, and obtain for instance, a near-complete characterization of the memory required of row-order algorithms for estimating Schatten $p$-norms of sparse matrices. We complement our results with numerical experiments.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2069
PDF	http://proceedings.mlr.press/v80/braverman18a/braverman18a.pdf
PWC	https://paperswithcode.com/paper/matrix-norms-in-data-streams-faster-multi
Repo
Framework

Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning


Title	Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning
Authors	Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, Sheila McIlraith
Abstract	In this paper we propose Reward Machines {—} a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.
Tasks	Hierarchical Reinforcement Learning, Q-Learning
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2454
PDF	http://proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
PWC	https://paperswithcode.com/paper/using-reward-machines-for-high-level-task
Repo
Framework

Extracting structured data from invoices


Title	Extracting structured data from invoices
Authors	Xavier Holt, Andrew Chisholm
Abstract	Business documents encode a wealth of information in a format tailored to human consumption {–} i.e. aesthetically disbursed natural language text, graphics and tables. We address the task of extracting key fields (e.g. the amount due on an invoice) from a wide-variety of potentially unseen document formats. In contrast to traditional template driven extraction systems, we introduce a content-driven machine-learning approach which is both robust to noise and generalises to unseen document formats. In a comparison of our approach with alternative invoice extraction systems, we observe an absolute accuracy gain of 20{\textbackslash}{%} across compared fields, and a 25{\textbackslash}{%}{–}94{\textbackslash}{%} reduction in extraction latency.
Tasks	Optical Character Recognition
Published	2018-12-01
URL	https://www.aclweb.org/anthology/U18-1006/
PDF	https://www.aclweb.org/anthology/U18-1006
PWC	https://paperswithcode.com/paper/extracting-structured-data-from-invoices
Repo
Framework

Interpretation of Implicit Conditions in Database Search Dialogues


Title	Interpretation of Implicit Conditions in Database Search Dialogues
Authors	Shunya Fukunaga, Hitoshi Nishikawa, Takenobu Tokunaga, Hikaru Yokono, Tetsuro Takahashi
Abstract	Targeting the database search dialogue, we propose to utilise information in the user utterances that do not directly mention the database (DB) field of the backend database system but are useful for constructing database queries. We call this kind of information implicit conditions. Interpreting the implicit conditions enables the dialogue system more natural and efficient in communicating with humans. We formalised the interpretation of the implicit conditions as classifying user utterances into the related DB field while identifying the evidence for that classification at the same time. Introducing this new task is one of the contributions of this paper. We implemented two models for this task: an SVM-based model and an RCNN-based model. Through the evaluation using a corpus of simulated dialogues between a real estate agent and a customer, we found that the SVM-based model showed better performance than the RCNN-based model.
Tasks
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1040/
PDF	https://www.aclweb.org/anthology/C18-1040
PWC	https://paperswithcode.com/paper/interpretation-of-implicit-conditions-in
Repo
Framework


Title	Investigating the importance of linguistic complexity features across different datasets related to language learning
Authors	Ildik{'o} Pil{'a}n, Elena Volodina
Abstract	We present the results of our investigations aiming at identifying the most informative linguistic complexity features for classifying language learning levels in three different datasets. The datasets vary across two dimensions: the size of the instances (texts vs. sentences) and the language learning skill they involve (reading comprehension texts vs. texts written by learners themselves). We present a subset of the most predictive features for each dataset, taking into consideration significant differences in their per-class mean values and show that these subsets lead not only to simpler models, but also to an improved classification performance. Furthermore, we pinpoint fourteen central features that are good predictors regardless of the size of the linguistic unit analyzed or the skills involved, which include both morpho-syntactic and lexical dimensions.
Tasks	Reading Comprehension
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-4606/
PDF	https://www.aclweb.org/anthology/W18-4606
PWC	https://paperswithcode.com/paper/investigating-the-importance-of-linguistic
Repo
Framework

Proceedings of the First Workshop on Multilingual Surface Realisation


Title	Proceedings of the First Workshop on Multilingual Surface Realisation
Authors
Abstract
Tasks
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-3600/
PDF	https://www.aclweb.org/anthology/W18-3600
PWC	https://paperswithcode.com/paper/proceedings-of-the-first-workshop-on-10
Repo
Framework