Paper Group NANR 227
Theoretical properties of the global optimizer of two-layer Neural Network. UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes. On Hapax Legomena and Morphological Productivity. Toward learning better metrics for sequence generation training with policy gradient. The DipInfo-UniTo system for SR …
Theoretical properties of the global optimizer of two-layer Neural Network
Title | Theoretical properties of the global optimizer of two-layer Neural Network |
Authors | Digvijay Boob, Guanghui Lan |
Abstract | In this paper, we study the problem of optimizing a two-layer artificial neural network that best fits a training dataset. We look at this problem in the setting where the number of parameters is greater than the number of sampled points. We show that for a wide class of differentiable activation functions (this class involves most nonlinear functions and excludes piecewise linear functions), we have that arbitrary first-order optimal solutions satisfy global optimality provided the hidden layer is non-singular. We essentially show that these non-singular hidden layer matrix satisfy a "good" property for these big class of activation functions. Techniques involved in proving this result inspire us to look at a new algorithmic, where in between two gradient step of hidden layer, we add a stochastic gradient descent (SGD) step of the output layer. In this new algorithmic framework, we extend our earlier result and show that for all finite iterations the hidden layer satisfies the good” property mentioned earlier therefore partially explaining success of noisy gradient methods and addressing the issue of data independency of our earlier result. Both of these results are easily extended to hidden layers given by a flat matrix from that of a square matrix. Results are applicable even if network has more than one hidden layer provided all inner hidden layers are arbitrary, satisfy non-singularity, all activations are from the given class of differentiable functions and optimization is only with respect to the outermost hidden layer. Separately, we also study the smoothness properties of the objective function and show that it is actually Lipschitz smooth, i.e., its gradients do not change sharply. We use smoothness properties to guarantee asymptotic convergence of $O(1/\text{number of iterations})$ to a first-order optimal solution. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=BkIkkseAZ |
https://openreview.net/pdf?id=BkIkkseAZ | |
PWC | https://paperswithcode.com/paper/theoretical-properties-of-the-global-1 |
Repo | |
Framework | |
UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes
Title | UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes |
Authors | Milton King, Ali Hakimi Parizi, Paul Cook |
Abstract | In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency. We show that, of these approaches, the simple approach based on word co-occurrence performs best. We further consider supervised and unsupervised approaches to combining information from these models, but these approaches do not improve on the word co-occurrence model. |
Tasks | Semantic Textual Similarity, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1168/ |
https://www.aclweb.org/anthology/S18-1168 | |
PWC | https://paperswithcode.com/paper/unbnlp-at-semeval-2018-task-10-evaluating |
Repo | |
Framework | |
On Hapax Legomena and Morphological Productivity
Title | On Hapax Legomena and Morphological Productivity |
Authors | Janet Pierrehumbert, Ramon Granell |
Abstract | Quantifying and predicting morphological productivity is a long-standing challenge in corpus linguistics and psycholinguistics. The same challenge reappears in natural language processing in the context of handling words that were not seen in the training set (out-of-vocabulary, or OOV, words). Prior research showed that a good indicator of the productivity of a morpheme is the number of words involving it that occur exactly once (the \textit{hapax legomena}). A technical connection was adduced between this result and Good-Turing smoothing, which assigns probability mass to unseen events on the basis of the simplifying assumption that word frequencies are stationary. In a large-scale study of 133 affixes in Wikipedia, we develop evidence that success in fact depends on tapping the frequency range in which the assumptions of Good-Turing are violated. |
Tasks | |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-5814/ |
https://www.aclweb.org/anthology/W18-5814 | |
PWC | https://paperswithcode.com/paper/on-hapax-legomena-and-morphological |
Repo | |
Framework | |
Toward learning better metrics for sequence generation training with policy gradient
Title | Toward learning better metrics for sequence generation training with policy gradient |
Authors | Joji Toyama, Yusuke Iwasawa, Kotaro Nakayama, Yutaka Matsuo |
Abstract | Designing a metric manually for unsupervised sequence generation tasks, such as text generation, is essentially difficult. In a such situation, learning a metric of a sequence from data is one possible solution. The previous study, SeqGAN, proposed the framework for unsupervised sequence generation, in which a metric is learned from data, and a generator is optimized with regard to the learned metric with policy gradient, inspired by generative adversarial nets (GANs) and reinforcement learning. In this paper, we make two proposals to learn better metric than SeqGAN’s: partial reward function and expert-based reward function training. The partial reward function is a reward function for a partial sequence of a certain length. SeqGAN employs a reward function for completed sequence only. By combining long-scale and short-scale partial reward functions, we expect a learned metric to be able to evaluate a partial correctness as well as a coherence of a sequence, as a whole. In expert-based reward function training, a reward function is trained to discriminate between an expert (or true) sequence and a fake sequence that is produced by editing an expert sequence. Expert-based reward function training is not a kind of GAN frameworks. This makes the optimization of the generator easier. We examine the effect of the partial reward function and expert-based reward function training on synthetic data and real text data, and show improvements over SeqGAN and the model trained with MLE. Specifically, whereas SeqGAN gains 0.42 improvement of NLL over MLE on synthetic data, our best model gains 3.02 improvement, and whereas SeqGAN gains 0.029 improvement of BLEU over MLE, our best model gains 0.250 improvement. |
Tasks | Text Generation |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=r1kP7vlRb |
https://openreview.net/pdf?id=r1kP7vlRb | |
PWC | https://paperswithcode.com/paper/toward-learning-better-metrics-for-sequence |
Repo | |
Framework | |
The DipInfo-UniTo system for SRST 2018
Title | The DipInfo-UniTo system for SRST 2018 |
Authors | Valerio Basile, Aless Mazzei, ro |
Abstract | This paper describes the system developed by the DipInfo-UniTo team to participate to the shallow track of the Surface Realization Shared Task 2018. The system employs two separate neural networks with different architectures to predict the word ordering and the morphological inflection independently from each other. The UniTO realizer is language independent, and its simple architecture allowed it to be scored in the central part of the final ranking of the shared task. |
Tasks | Morphological Inflection, Text Generation |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-3609/ |
https://www.aclweb.org/anthology/W18-3609 | |
PWC | https://paperswithcode.com/paper/the-dipinfo-unito-system-for-srst-2018 |
Repo | |
Framework | |
Scalar Posterior Sampling with Applications
Title | Scalar Posterior Sampling with Applications |
Authors | Georgios Theocharous, Zheng Wen, Yasin Abbasi, Nikos Vlassis |
Abstract | We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parameterization for a large class of problems in sequential recommendations. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7995-scalar-posterior-sampling-with-applications |
http://papers.nips.cc/paper/7995-scalar-posterior-sampling-with-applications.pdf | |
PWC | https://paperswithcode.com/paper/scalar-posterior-sampling-with-applications |
Repo | |
Framework | |
A Robust Approach to Sequential Information Theoretic Planning
Title | A Robust Approach to Sequential Information Theoretic Planning |
Authors | Sue Zheng, Jason Pacheco, John Fisher |
Abstract | In many sequential planning applications a natural approach to generating high quality plans is to maximize an information reward such as mutual information (MI). Unfortunately, MI lacks a closed form in all but trivial models, and so must be estimated. In applications where the cost of plan execution is expensive, one desires planning estimates which admit theoretical guarantees. Through the use of robust M-estimators we obtain bounds on absolute deviation of estimated MI. Moreover, we propose a sequential algorithm which integrates inference and planning by maximally reusing particles in each stage. We validate the utility of using robust estimators in the sequential approach on a Gaussian Markov Random Field wherein information measures have a closed form. Lastly, we demonstrate the benefits of our integrated approach in the context of sequential experiment design for inferring causal regulatory networks from gene expression levels. Our method shows improvements over a recent method which selects intervention experiments based on the same MI objective. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2372 |
http://proceedings.mlr.press/v80/zheng18b/zheng18b.pdf | |
PWC | https://paperswithcode.com/paper/a-robust-approach-to-sequential-information |
Repo | |
Framework | |
Towards an ISO Standard for the Annotation of Quantification
Title | Towards an ISO Standard for the Annotation of Quantification |
Authors | Harry Bunt, James Pustejovsky, Kiyong Lee |
Abstract | |
Tasks | Question Answering |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1282/ |
https://www.aclweb.org/anthology/L18-1282 | |
PWC | https://paperswithcode.com/paper/towards-an-iso-standard-for-the-annotation-of |
Repo | |
Framework | |
Towards Human-Level License Plate Recognition
Title | Towards Human-Level License Plate Recognition |
Authors | Jiafan Zhuang, Saihui Hou, Zilei Wang, Zheng-Jun Zha |
Abstract | License plate recognition (LPR) is a fundamental component of various intelligent transport systems, which is always expected to be accurate and efficient enough. In this paper, we propose a novel LPR framework consisting of semantic segmentation and character counting, towards achieving human-level performance. Benefiting from innovative structure, our method can recognize a whole license plate once rather than conducting character detection or sliding window followed by per-character recognition. Moreover, our method can achieve higher recognition accuracy due to more effectively exploiting global information and avoiding sensitive character detection, and is time-saving due to eliminating one-by-one character recognition. Finally, we experimentally verify the effectiveness of the proposed method on two public datasets (AOLP and Media Lab) and our License Plate Dataset. The results demonstrate our method significantly outperforms the previous state-of-the-art methods, and achieves the accuracies of more than 99% for almost all settings. |
Tasks | License Plate Recognition, Semantic Segmentation |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Jiafan_Zhuang_Towards_Human-Level_License_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Jiafan_Zhuang_Towards_Human-Level_License_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/towards-human-level-license-plate-recognition |
Repo | |
Framework | |
Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order
Title | Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order |
Authors | Vladimir Braverman, Stephen Chestnut, Robert Krauthgamer, Yi Li, David Woodruff, Lin Yang |
Abstract | A central problem in mining massive data streams is characterizing which functions of an underlying frequency vector can be approximated efficiently. Given the prevalence of large scale linear algebra problems in machine learning, recently there has been considerable effort in extending this data stream problem to that of estimating functions of a matrix. This setting generalizes classical problems to the analogous ones for matrices. For example, instead of estimating frequent-item counts, we now wish to estimate “frequent-direction” counts. A related example is to estimate norms, which now correspond to estimating a vector norm on the singular values of the matrix. Despite recent efforts, the current understanding for such matrix problems is considerably weaker than that for vector problems. We study a number of aspects of estimating matrix norms in a stream that have not previously been considered: (1) multi-pass algorithms, (2) algorithms that see the underlying matrix one row at a time, and (3) time-efficient algorithms. Our multi-pass and row-order algorithms use less memory than what is provably required in the single-pass and entrywise-update models, and thus give separations between these models (in terms of memory). Moreover, all of our algorithms are considerably faster than previous ones. We also prove a number of lower bounds, and obtain for instance, a near-complete characterization of the memory required of row-order algorithms for estimating Schatten $p$-norms of sparse matrices. We complement our results with numerical experiments. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2069 |
http://proceedings.mlr.press/v80/braverman18a/braverman18a.pdf | |
PWC | https://paperswithcode.com/paper/matrix-norms-in-data-streams-faster-multi |
Repo | |
Framework | |
Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning
Title | Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning |
Authors | Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, Sheila McIlraith |
Abstract | In this paper we propose Reward Machines {—} a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space. |
Tasks | Hierarchical Reinforcement Learning, Q-Learning |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2454 |
http://proceedings.mlr.press/v80/icarte18a/icarte18a.pdf | |
PWC | https://paperswithcode.com/paper/using-reward-machines-for-high-level-task |
Repo | |
Framework | |
Extracting structured data from invoices
Title | Extracting structured data from invoices |
Authors | Xavier Holt, Andrew Chisholm |
Abstract | Business documents encode a wealth of information in a format tailored to human consumption {–} i.e. aesthetically disbursed natural language text, graphics and tables. We address the task of extracting key fields (e.g. the amount due on an invoice) from a wide-variety of potentially unseen document formats. In contrast to traditional template driven extraction systems, we introduce a content-driven machine-learning approach which is both robust to noise and generalises to unseen document formats. In a comparison of our approach with alternative invoice extraction systems, we observe an absolute accuracy gain of 20{\textbackslash}{%} across compared fields, and a 25{\textbackslash}{%}{–}94{\textbackslash}{%} reduction in extraction latency. |
Tasks | Optical Character Recognition |
Published | 2018-12-01 |
URL | https://www.aclweb.org/anthology/U18-1006/ |
https://www.aclweb.org/anthology/U18-1006 | |
PWC | https://paperswithcode.com/paper/extracting-structured-data-from-invoices |
Repo | |
Framework | |
Interpretation of Implicit Conditions in Database Search Dialogues
Title | Interpretation of Implicit Conditions in Database Search Dialogues |
Authors | Shunya Fukunaga, Hitoshi Nishikawa, Takenobu Tokunaga, Hikaru Yokono, Tetsuro Takahashi |
Abstract | Targeting the database search dialogue, we propose to utilise information in the user utterances that do not directly mention the database (DB) field of the backend database system but are useful for constructing database queries. We call this kind of information implicit conditions. Interpreting the implicit conditions enables the dialogue system more natural and efficient in communicating with humans. We formalised the interpretation of the implicit conditions as classifying user utterances into the related DB field while identifying the evidence for that classification at the same time. Introducing this new task is one of the contributions of this paper. We implemented two models for this task: an SVM-based model and an RCNN-based model. Through the evaluation using a corpus of simulated dialogues between a real estate agent and a customer, we found that the SVM-based model showed better performance than the RCNN-based model. |
Tasks | |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1040/ |
https://www.aclweb.org/anthology/C18-1040 | |
PWC | https://paperswithcode.com/paper/interpretation-of-implicit-conditions-in |
Repo | |
Framework | |
Investigating the importance of linguistic complexity features across different datasets related to language learning
Title | Investigating the importance of linguistic complexity features across different datasets related to language learning |
Authors | Ildik{'o} Pil{'a}n, Elena Volodina |
Abstract | We present the results of our investigations aiming at identifying the most informative linguistic complexity features for classifying language learning levels in three different datasets. The datasets vary across two dimensions: the size of the instances (texts vs. sentences) and the language learning skill they involve (reading comprehension texts vs. texts written by learners themselves). We present a subset of the most predictive features for each dataset, taking into consideration significant differences in their per-class mean values and show that these subsets lead not only to simpler models, but also to an improved classification performance. Furthermore, we pinpoint fourteen central features that are good predictors regardless of the size of the linguistic unit analyzed or the skills involved, which include both morpho-syntactic and lexical dimensions. |
Tasks | Reading Comprehension |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/W18-4606/ |
https://www.aclweb.org/anthology/W18-4606 | |
PWC | https://paperswithcode.com/paper/investigating-the-importance-of-linguistic |
Repo | |
Framework | |
Proceedings of the First Workshop on Multilingual Surface Realisation
Title | Proceedings of the First Workshop on Multilingual Surface Realisation |
Authors | |
Abstract | |
Tasks | |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-3600/ |
https://www.aclweb.org/anthology/W18-3600 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-first-workshop-on-10 |
Repo | |
Framework | |