October 16, 2019

2413 words 12 mins read

Paper Group NAWR 22

Paper Group NAWR 22

Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions. Fully Statistical Neural Belief Tracking. An Empirical Study of Building a Strong Baseline for Constituency Parsing. Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem. Fusing Document, Collection and L …

Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions

Title Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
Authors Junru Wu, Yue Wang, Zhenyu Wu, Zhangyang Wang, Ashok Veeraraghavan, Yingyan Lin
Abstract The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weight-sharing, by only recording $K$ cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed $k$-means regularization, which tends to make hard assignments of convolutional layer weights to $K$ learned cluster centers during re-training. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep $k$-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://github.com/Sandbox3aster/Deep-K-Means
Tasks
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=2219
PDF http://proceedings.mlr.press/v80/wu18h/wu18h.pdf
PWC https://paperswithcode.com/paper/deep-k-means-re-training-and-parameter
Repo https://github.com/Sandbox3aster/Deep-K-Means
Framework pytorch

Fully Statistical Neural Belief Tracking

Title Fully Statistical Neural Belief Tracking
Authors Nikola Mrk{\v{s}}i{'c}, Ivan Vuli{'c}
Abstract This paper proposes an improvement to the existing data-driven Neural Belief Tracking (NBT) framework for Dialogue State Tracking (DST). The existing NBT model uses a hand-crafted belief state update mechanism which involves an expensive manual retuning step whenever the model is deployed to a new dialogue domain. We show that this update mechanism can be learned jointly with the semantic decoding and context modelling parts of the NBT model, eliminating the last rule-based module from this DST framework. We propose two different statistical update mechanisms and show that dialogue dynamics can be modelled with a very small number of additional model parameters. In our DST evaluation over three languages, we show that this model achieves competitive performance and provides a robust framework for building resource-light DST models.
Tasks Dialogue Management, Dialogue State Tracking, Spoken Language Understanding, Word Embeddings
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-2018/
PDF https://www.aclweb.org/anthology/P18-2018
PWC https://paperswithcode.com/paper/fully-statistical-neural-belief-tracking-1
Repo https://github.com/nmrksic/neural-belief-tracker
Framework tf

An Empirical Study of Building a Strong Baseline for Constituency Parsing

Title An Empirical Study of Building a Strong Baseline for Constituency Parsing
Authors Jun Suzuki, Sho Takase, Hidetaka Kamigaito, Makoto Morishita, Masaaki Nagata
Abstract This paper investigates the construction of a strong baseline based on general purpose sequence-to-sequence models for constituency parsing. We incorporate several techniques that were mainly developed in natural language generation tasks, e.g., machine translation and summarization, and demonstrate that the sequence-to-sequence model achieves the current top-notch parsers{'} performance (almost) without requiring any explicit task-specific knowledge or architecture of constituent parsing.
Tasks Abstractive Text Summarization, Constituency Parsing, Machine Translation, Text Generation
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-2097/
PDF https://www.aclweb.org/anthology/P18-2097
PWC https://paperswithcode.com/paper/an-empirical-study-of-building-a-strong
Repo https://github.com/nttcslab-nlp/strong_s2s_baseline_parser
Framework none

Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem

Title Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem
Authors Sven Buechel, Udo Hahn
Abstract Predicting the emotional value of lexical items is a well-known problem in sentiment analysis. While research has focused on polarity for quite a long time, meanwhile this early focus has been shifted to more expressive emotion representation models (such as Basic Emotions or Valence-Arousal-Dominance). This change resulted in a proliferation of heterogeneous formats and, in parallel, often small-sized, non-interoperable resources (lexicons and corpus annotations). In particular, the limitations in size hampered the application of deep learning methods in this area because they typically require large amounts of input data. We here present a solution to get around this language data bottleneck by rephrasing word emotion induction as a multi-task learning problem. In this approach, the prediction of each independent emotion dimension is considered as an individual task and hidden layers are shared between these dimensions. We investigate whether multi-task learning is more advantageous than single-task learning for emotion prediction by comparing our model against a wide range of alternative emotion and polarity induction methods featuring 9 typologically diverse languages and a total of 15 conditions. Our model turns out to outperform each one of them. Against all odds, the proposed deep learning approach yields the largest gain on the smallest data sets, merely composed of one thousand samples.
Tasks Multi-Task Learning, Sentiment Analysis
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-1173/
PDF https://www.aclweb.org/anthology/N18-1173
PWC https://paperswithcode.com/paper/word-emotion-induction-for-multiple-languages
Repo https://github.com/JULIELab/wordEmotions
Framework tf

Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification

Title Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification
Authors Konstantinos Skianis, Fragkiskos Malliaros, Michalis Vazirgiannis
Abstract Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words(GoW) model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, the importance of a term is determined by weighting the corresponding node in the document, collection and label graphs, using node centrality criteria. We also introduce novel graph-based weighting schemes by enriching graphs with word-embedding similarities, in order to reward or penalize semantic relationships. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria.
Tasks Sentiment Analysis, Text Categorization, Text Classification, Word Embeddings
Published 2018-06-01
URL https://www.aclweb.org/anthology/W18-1707/
PDF https://www.aclweb.org/anthology/W18-1707
PWC https://paperswithcode.com/paper/fusing-document-collection-and-label-graph
Repo https://github.com/y3nk0/Graph-Based-TC
Framework tf

Word-like character n-gram embedding

Title Word-like character n-gram embedding
Authors Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira
Abstract We propose a new word embedding method called \textit{word-like character} n\textit{-gram embedding}, which learns distributed representations of words by embedding word-like character n-grams. Our method is an extension of recently proposed \textit{segmentation-free word embedding}, which directly embeds frequent character n-grams from a raw corpus. However, its n-gram vocabulary tends to contain too many non-word n-grams. We solved this problem by introducing an idea of \textit{expected word frequency}. Compared to the previously proposed methods, our method can embed more words, along with the words that are not included in a given basic word dictionary. Since our method does not rely on word segmentation with rich word dictionaries, it is especially effective when the text in the corpus is in unsegmented language and contains many neologisms and informal words (e.g., Chinese SNS dataset). Our experimental results on Sina Weibo (a Chinese microblog service) and Twitter show that the proposed method can embed more words and improve the performance of downstream tasks.
Tasks Word Embeddings
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6120/
PDF https://www.aclweb.org/anthology/W18-6120
PWC https://paperswithcode.com/paper/word-like-character-n-gram-embedding
Repo https://github.com/kdrl/WNE
Framework none

From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations

Title From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations
Authors Egoitz Laparra, Dongfang Xu, Steven Bethard
Abstract This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus.
Tasks Semantic Composition, Semantic Parsing
Published 2018-01-01
URL https://www.aclweb.org/anthology/Q18-1025/
PDF https://www.aclweb.org/anthology/Q18-1025
PWC https://paperswithcode.com/paper/from-characters-to-time-intervals-new
Repo https://github.com/clulab/timenorm
Framework none

Arrows are the Verbs of Diagrams

Title Arrows are the Verbs of Diagrams
Authors Malihe Alikhani, Matthew Stone
Abstract Arrows are a key ingredient of schematic pictorial communication. This paper investigates the interpretation of arrows through linguistic, crowdsourcing and machine-learning methodology. Our work establishes a novel analogy between arrows and verbs: we advocate representing arrows in terms of qualitatively different structural and semantic frames, and resolving frames to specific interpretations using shallow world knowledge.
Tasks
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1301/
PDF https://www.aclweb.org/anthology/C18-1301
PWC https://paperswithcode.com/paper/arrows-are-the-verbs-of-diagrams
Repo https://github.com/malihealikhani/Arrows_are_Verbs
Framework pytorch

E2E NLG Challenge: Neural Models vs. Templates

Title E2E NLG Challenge: Neural Models vs. Templates
Authors Yevgeniy Puzikov, Iryna Gurevych
Abstract E2E NLG Challenge is a shared task on generating restaurant descriptions from sets of key-value pairs. This paper describes the results of our participation in the challenge. We develop a simple, yet effective neural encoder-decoder model which produces fluent restaurant descriptions and outperforms a strong baseline. We further analyze the data provided by the organizers and conclude that the task can also be approached with a template-based model developed in just a few hours.
Tasks Data-to-Text Generation, Text Generation
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6557/
PDF https://www.aclweb.org/anthology/W18-6557
PWC https://paperswithcode.com/paper/e2e-nlg-challenge-neural-models-vs-templates
Repo https://github.com/UKPLab/e2e-nlg-challenge-2017
Framework pytorch

Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch

Title Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch
Authors Osman Asif Malik, Stephen Becker
Abstract We propose two randomized algorithms for low-rank Tucker decomposition of tensors. The algorithms, which incorporate sketching, only require a single pass of the input tensor and can handle tensors whose elements are streamed in any order. To the best of our knowledge, ours are the only algorithms which can do this. We test our algorithms on sparse synthetic data and compare them to multiple other methods. We also apply one of our algorithms to a real dense 38 GB tensor representing a video and use the resulting decomposition to correctly classify frames containing disturbances.
Tasks
Published 2018-12-01
URL http://papers.nips.cc/paper/8213-low-rank-tucker-decomposition-of-large-tensors-using-tensorsketch
PDF http://papers.nips.cc/paper/8213-low-rank-tucker-decomposition-of-large-tensors-using-tensorsketch.pdf
PWC https://paperswithcode.com/paper/low-rank-tucker-decomposition-of-large
Repo https://github.com/OsmanMalik/tucker-tensorsketch
Framework none

BioRead: A New Dataset for Biomedical Reading Comprehension

Title BioRead: A New Dataset for Biomedical Reading Comprehension
Authors Dimitris Pappas, Ion Androutsopoulos, Haris Papageorgiou
Abstract
Tasks Information Retrieval, Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1439/
PDF https://www.aclweb.org/anthology/L18-1439
PWC https://paperswithcode.com/paper/bioread-a-new-dataset-for-biomedical-reading
Repo https://github.com/dpappas/BIOREAD_code
Framework pytorch

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable

Title Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable
Authors Viktor Hangya, Fabienne Braune, Alex Fraser, er, Hinrich Sch{"u}tze
Abstract Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language. Resources required for such tasks are often out-of-domain, thus domain adaptation is an important problem here. We make two contributions. First, we test a delightfully simple method for domain adaptation of bilingual word embeddings. We evaluate these embeddings on two bilingual tasks involving different domains: cross-lingual twitter sentiment classification and medical bilingual lexicon induction. Second, we tailor a broadly applicable semi-supervised classification method from computer vision to these tasks. We show that this method also helps in low-resource setups. Using both methods together we achieve large improvements over our baselines, by using only additional unlabeled data.
Tasks Domain Adaptation, Image Classification, Semi-Supervised Image Classification, Sentiment Analysis, Transfer Learning, Word Embeddings
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-1075/
PDF https://www.aclweb.org/anthology/P18-1075
PWC https://paperswithcode.com/paper/two-methods-for-domain-adaptation-of
Repo https://github.com/hangyav/biadapt
Framework tf

Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks

Title Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks
Authors Anirvan Sengupta, Cengiz Pehlevan, Mariano Tepper, Alexander Genkin, Dmitri Chklovskii
Abstract Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space. What is the functional significance of such representations and how can they arise? Here, we propose that localized receptive fields emerge in similarity-preserving networks of rectifying neurons that learn low-dimensional manifolds populated by sensory inputs. Numerical simulations of such networks on standard datasets yield manifold-tiling localized receptive fields. More generally, we show analytically that, for data lying on symmetric manifolds, optimal solutions of objectives, from which similarity-preserving networks are derived, have localized receptive fields. Therefore, nonnegative similarity-preserving mapping (NSM) implemented by neural networks can model representations of continuous manifolds in the brain.
Tasks
Published 2018-12-01
URL http://papers.nips.cc/paper/7939-manifold-tiling-localized-receptive-fields-are-optimal-in-similarity-preserving-neural-networks
PDF http://papers.nips.cc/paper/7939-manifold-tiling-localized-receptive-fields-are-optimal-in-similarity-preserving-neural-networks.pdf
PWC https://paperswithcode.com/paper/manifold-tiling-localized-receptive-fields
Repo https://github.com/flatironinstitute/mantis
Framework none

Hierarchical Relational Networks for Group Activity Recognition and Retrieval

Title Hierarchical Relational Networks for Group Activity Recognition and Retrieval
Authors Mostafa S. Ibrahim, Greg Mori
Abstract Modeling structured relationships between people in a scene is an important step toward visual understanding. We present a Hierarchical Relational Network that computes relational representations of people, given graph structures describing potential interactions. Each relational layer is fed individual person representations and a potential relationship graph. Relational representations of each person are created based on their connections in this particular graph. We demonstrate the efficacy of this model by applying it in both supervised and unsupervised learning paradigms. First, given a video sequence of people doing a collective activity, the relational scene representation is utilized for multi-person activity recognition. Second, we propose a Relational Autoencoder model for unsupervised learning of features for action and scene retrieval. Finally, a Denoising Autoencoder variant is presented to infer missing people in the scene from their context. Empirical results demonstrate that this approach learns relational feature representations that can effectively discriminate person and group activity classes.
Tasks Activity Recognition, Denoising, Group Activity Recognition
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Mostafa_Ibrahim_Hierarchical_Relational_Networks_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Mostafa_Ibrahim_Hierarchical_Relational_Networks_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/hierarchical-relational-networks-for-group
Repo https://github.com/mostafa-saad/hierarchical-relational-network
Framework none

Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML

Title Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML
Authors Benjamin Strang, Peter van der Putten, Jan N. van Rijn, Frank Hutter
Abstract A basic step for each data-mining or machine learning task is to determine which model to choose based on the problem and the data at hand. In this paper we investigate when non-linear classifiers outperform linear classifiers by means of a large scale experiment. We benchmark linear and non-linear versions of three types of classifiers (support vector machines; neural networks; and decision trees), and analyze the results to determine on what type of datasets the non-linear version performs better. To the best of our knowledge, this work is the first principled and large scale attempt to support the common assumption that non-linear classifiers excel only when large amounts of data are available.
Tasks
Published 2018-10-05
URL https://link.springer.com/chapter/10.1007/978-3-030-01768-2_25
PDF http://liacs.leidenuniv.nl/~rijnjnvan/pdf/pub/ida2018.pdf
PWC https://paperswithcode.com/paper/dont-rule-out-simple-models-prematurely-a
Repo https://github.com/openml/openml-python/blob/develop/examples/40_paper/2018_ida_strang_example.py
Framework none
comments powered by Disqus