October 16, 2019

2413 words 12 mins read

Paper Group NAWR 22

Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions. Fully Statistical Neural Belief Tracking. An Empirical Study of Building a Strong Baseline for Constituency Parsing. Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem. Fusing Document, Collection and L …


Title	Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
Authors	Junru Wu, Yue Wang, Zhenyu Wu, Zhangyang Wang, Ashok Veeraraghavan, Yingyan Lin
Abstract	The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weight-sharing, by only recording $K$ cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed $k$-means regularization, which tends to make hard assignments of convolutional layer weights to $K$ learned cluster centers during re-training. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep $k$-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://github.com/Sandbox3aster/Deep-K-Means
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2219
PDF	http://proceedings.mlr.press/v80/wu18h/wu18h.pdf
PWC	https://paperswithcode.com/paper/deep-k-means-re-training-and-parameter
Repo	https://github.com/Sandbox3aster/Deep-K-Means
Framework	pytorch

Fully Statistical Neural Belief Tracking


Title	Fully Statistical Neural Belief Tracking
Authors	Nikola Mrk{\v{s}}i{'c}, Ivan Vuli{'c}
Abstract	This paper proposes an improvement to the existing data-driven Neural Belief Tracking (NBT) framework for Dialogue State Tracking (DST). The existing NBT model uses a hand-crafted belief state update mechanism which involves an expensive manual retuning step whenever the model is deployed to a new dialogue domain. We show that this update mechanism can be learned jointly with the semantic decoding and context modelling parts of the NBT model, eliminating the last rule-based module from this DST framework. We propose two different statistical update mechanisms and show that dialogue dynamics can be modelled with a very small number of additional model parameters. In our DST evaluation over three languages, we show that this model achieves competitive performance and provides a robust framework for building resource-light DST models.
Tasks	Dialogue Management, Dialogue State Tracking, Spoken Language Understanding, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2018/
PDF	https://www.aclweb.org/anthology/P18-2018
PWC	https://paperswithcode.com/paper/fully-statistical-neural-belief-tracking-1
Repo	https://github.com/nmrksic/neural-belief-tracker
Framework	tf

An Empirical Study of Building a Strong Baseline for Constituency Parsing


Title	An Empirical Study of Building a Strong Baseline for Constituency Parsing
Authors	Jun Suzuki, Sho Takase, Hidetaka Kamigaito, Makoto Morishita, Masaaki Nagata
Abstract	This paper investigates the construction of a strong baseline based on general purpose sequence-to-sequence models for constituency parsing. We incorporate several techniques that were mainly developed in natural language generation tasks, e.g., machine translation and summarization, and demonstrate that the sequence-to-sequence model achieves the current top-notch parsers{'} performance (almost) without requiring any explicit task-specific knowledge or architecture of constituent parsing.
Tasks	Abstractive Text Summarization, Constituency Parsing, Machine Translation, Text Generation
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2097/
PDF	https://www.aclweb.org/anthology/P18-2097
PWC	https://paperswithcode.com/paper/an-empirical-study-of-building-a-strong
Repo	https://github.com/nttcslab-nlp/strong_s2s_baseline_parser
Framework	none

Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem


Title	Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem
Authors	Sven Buechel, Udo Hahn
Abstract	Predicting the emotional value of lexical items is a well-known problem in sentiment analysis. While research has focused on polarity for quite a long time, meanwhile this early focus has been shifted to more expressive emotion representation models (such as Basic Emotions or Valence-Arousal-Dominance). This change resulted in a proliferation of heterogeneous formats and, in parallel, often small-sized, non-interoperable resources (lexicons and corpus annotations). In particular, the limitations in size hampered the application of deep learning methods in this area because they typically require large amounts of input data. We here present a solution to get around this language data bottleneck by rephrasing word emotion induction as a multi-task learning problem. In this approach, the prediction of each independent emotion dimension is considered as an individual task and hidden layers are shared between these dimensions. We investigate whether multi-task learning is more advantageous than single-task learning for emotion prediction by comparing our model against a wide range of alternative emotion and polarity induction methods featuring 9 typologically diverse languages and a total of 15 conditions. Our model turns out to outperform each one of them. Against all odds, the proposed deep learning approach yields the largest gain on the smallest data sets, merely composed of one thousand samples.
Tasks	Multi-Task Learning, Sentiment Analysis
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-1173/
PDF	https://www.aclweb.org/anthology/N18-1173
PWC	https://paperswithcode.com/paper/word-emotion-induction-for-multiple-languages
Repo	https://github.com/JULIELab/wordEmotions
Framework	tf

Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification


Title	Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification
Authors	Konstantinos Skianis, Fragkiskos Malliaros, Michalis Vazirgiannis
Abstract	Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words(GoW) model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, the importance of a term is determined by weighting the corresponding node in the document, collection and label graphs, using node centrality criteria. We also introduce novel graph-based weighting schemes by enriching graphs with word-embedding similarities, in order to reward or penalize semantic relationships. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria.
Tasks	Sentiment Analysis, Text Categorization, Text Classification, Word Embeddings
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-1707/
PDF	https://www.aclweb.org/anthology/W18-1707
PWC	https://paperswithcode.com/paper/fusing-document-collection-and-label-graph
Repo	https://github.com/y3nk0/Graph-Based-TC
Framework	tf

Word-like character n-gram embedding


Title	Word-like character n-gram embedding
Authors	Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira
Abstract	We propose a new word embedding method called \textit{word-like character} n\textit{-gram embedding}, which learns distributed representations of words by embedding word-like character n-grams. Our method is an extension of recently proposed \textit{segmentation-free word embedding}, which directly embeds frequent character n-grams from a raw corpus. However, its n-gram vocabulary tends to contain too many non-word n-grams. We solved this problem by introducing an idea of \textit{expected word frequency}. Compared to the previously proposed methods, our method can embed more words, along with the words that are not included in a given basic word dictionary. Since our method does not rely on word segmentation with rich word dictionaries, it is especially effective when the text in the corpus is in unsegmented language and contains many neologisms and informal words (e.g., Chinese SNS dataset). Our experimental results on Sina Weibo (a Chinese microblog service) and Twitter show that the proposed method can embed more words and improve the performance of downstream tasks.
Tasks	Word Embeddings
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-6120/
PDF	https://www.aclweb.org/anthology/W18-6120
PWC	https://paperswithcode.com/paper/word-like-character-n-gram-embedding
Repo	https://github.com/kdrl/WNE
Framework	none

From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations


Title	From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations
Authors	Egoitz Laparra, Dongfang Xu, Steven Bethard
Abstract	This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus.
Tasks	Semantic Composition, Semantic Parsing
Published	2018-01-01
URL	https://www.aclweb.org/anthology/Q18-1025/
PDF	https://www.aclweb.org/anthology/Q18-1025
PWC	https://paperswithcode.com/paper/from-characters-to-time-intervals-new
Repo	https://github.com/clulab/timenorm
Framework	none

Arrows are the Verbs of Diagrams


Title	Arrows are the Verbs of Diagrams
Authors	Malihe Alikhani, Matthew Stone
Abstract	Arrows are a key ingredient of schematic pictorial communication. This paper investigates the interpretation of arrows through linguistic, crowdsourcing and machine-learning methodology. Our work establishes a novel analogy between arrows and verbs: we advocate representing arrows in terms of qualitatively different structural and semantic frames, and resolving frames to specific interpretations using shallow world knowledge.
Tasks
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1301/
PDF	https://www.aclweb.org/anthology/C18-1301
PWC	https://paperswithcode.com/paper/arrows-are-the-verbs-of-diagrams
Repo	https://github.com/malihealikhani/Arrows_are_Verbs
Framework	pytorch

E2E NLG Challenge: Neural Models vs. Templates


Title	E2E NLG Challenge: Neural Models vs. Templates
Authors	Yevgeniy Puzikov, Iryna Gurevych
Abstract	E2E NLG Challenge is a shared task on generating restaurant descriptions from sets of key-value pairs. This paper describes the results of our participation in the challenge. We develop a simple, yet effective neural encoder-decoder model which produces fluent restaurant descriptions and outperforms a strong baseline. We further analyze the data provided by the organizers and conclude that the task can also be approached with a template-based model developed in just a few hours.
Tasks	Data-to-Text Generation, Text Generation
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-6557/
PDF	https://www.aclweb.org/anthology/W18-6557
PWC	https://paperswithcode.com/paper/e2e-nlg-challenge-neural-models-vs-templates
Repo	https://github.com/UKPLab/e2e-nlg-challenge-2017
Framework	pytorch

Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch


Title	Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch
Authors	Osman Asif Malik, Stephen Becker
Abstract	We propose two randomized algorithms for low-rank Tucker decomposition of tensors. The algorithms, which incorporate sketching, only require a single pass of the input tensor and can handle tensors whose elements are streamed in any order. To the best of our knowledge, ours are the only algorithms which can do this. We test our algorithms on sparse synthetic data and compare them to multiple other methods. We also apply one of our algorithms to a real dense 38 GB tensor representing a video and use the resulting decomposition to correctly classify frames containing disturbances.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/8213-low-rank-tucker-decomposition-of-large-tensors-using-tensorsketch
PDF	http://papers.nips.cc/paper/8213-low-rank-tucker-decomposition-of-large-tensors-using-tensorsketch.pdf
PWC	https://paperswithcode.com/paper/low-rank-tucker-decomposition-of-large
Repo	https://github.com/OsmanMalik/tucker-tensorsketch
Framework	none

BioRead: A New Dataset for Biomedical Reading Comprehension


Title	BioRead: A New Dataset for Biomedical Reading Comprehension
Authors	Dimitris Pappas, Ion Androutsopoulos, Haris Papageorgiou
Abstract
Tasks	Information Retrieval, Machine Reading Comprehension, Question Answering, Reading Comprehension
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1439/
PDF	https://www.aclweb.org/anthology/L18-1439
PWC	https://paperswithcode.com/paper/bioread-a-new-dataset-for-biomedical-reading
Repo	https://github.com/dpappas/BIOREAD_code
Framework	pytorch

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable


Title	Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable
Authors	Viktor Hangya, Fabienne Braune, Alex Fraser, er, Hinrich Sch{"u}tze
Abstract	Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language. Resources required for such tasks are often out-of-domain, thus domain adaptation is an important problem here. We make two contributions. First, we test a delightfully simple method for domain adaptation of bilingual word embeddings. We evaluate these embeddings on two bilingual tasks involving different domains: cross-lingual twitter sentiment classification and medical bilingual lexicon induction. Second, we tailor a broadly applicable semi-supervised classification method from computer vision to these tasks. We show that this method also helps in low-resource setups. Using both methods together we achieve large improvements over our baselines, by using only additional unlabeled data.
Tasks	Domain Adaptation, Image Classification, Semi-Supervised Image Classification, Sentiment Analysis, Transfer Learning, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-1075/
PDF	https://www.aclweb.org/anthology/P18-1075
PWC	https://paperswithcode.com/paper/two-methods-for-domain-adaptation-of
Repo	https://github.com/hangyav/biadapt
Framework	tf

Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks


Title	Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks
Authors	Anirvan Sengupta, Cengiz Pehlevan, Mariano Tepper, Alexander Genkin, Dmitri Chklovskii
Abstract	Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space. What is the functional significance of such representations and how can they arise? Here, we propose that localized receptive fields emerge in similarity-preserving networks of rectifying neurons that learn low-dimensional manifolds populated by sensory inputs. Numerical simulations of such networks on standard datasets yield manifold-tiling localized receptive fields. More generally, we show analytically that, for data lying on symmetric manifolds, optimal solutions of objectives, from which similarity-preserving networks are derived, have localized receptive fields. Therefore, nonnegative similarity-preserving mapping (NSM) implemented by neural networks can model representations of continuous manifolds in the brain.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7939-manifold-tiling-localized-receptive-fields-are-optimal-in-similarity-preserving-neural-networks
PDF	http://papers.nips.cc/paper/7939-manifold-tiling-localized-receptive-fields-are-optimal-in-similarity-preserving-neural-networks.pdf
PWC	https://paperswithcode.com/paper/manifold-tiling-localized-receptive-fields
Repo	https://github.com/flatironinstitute/mantis
Framework	none

Hierarchical Relational Networks for Group Activity Recognition and Retrieval


Title	Hierarchical Relational Networks for Group Activity Recognition and Retrieval
Authors	Mostafa S. Ibrahim, Greg Mori
Abstract	Modeling structured relationships between people in a scene is an important step toward visual understanding. We present a Hierarchical Relational Network that computes relational representations of people, given graph structures describing potential interactions. Each relational layer is fed individual person representations and a potential relationship graph. Relational representations of each person are created based on their connections in this particular graph. We demonstrate the efficacy of this model by applying it in both supervised and unsupervised learning paradigms. First, given a video sequence of people doing a collective activity, the relational scene representation is utilized for multi-person activity recognition. Second, we propose a Relational Autoencoder model for unsupervised learning of features for action and scene retrieval. Finally, a Denoising Autoencoder variant is presented to infer missing people in the scene from their context. Empirical results demonstrate that this approach learns relational feature representations that can effectively discriminate person and group activity classes.
Tasks	Activity Recognition, Denoising, Group Activity Recognition
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Mostafa_Ibrahim_Hierarchical_Relational_Networks_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Mostafa_Ibrahim_Hierarchical_Relational_Networks_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/hierarchical-relational-networks-for-group
Repo	https://github.com/mostafa-saad/hierarchical-relational-network
Framework	none

Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML


Title	Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML
Authors	Benjamin Strang, Peter van der Putten, Jan N. van Rijn, Frank Hutter
Abstract	A basic step for each data-mining or machine learning task is to determine which model to choose based on the problem and the data at hand. In this paper we investigate when non-linear classifiers outperform linear classifiers by means of a large scale experiment. We benchmark linear and non-linear versions of three types of classifiers (support vector machines; neural networks; and decision trees), and analyze the results to determine on what type of datasets the non-linear version performs better. To the best of our knowledge, this work is the first principled and large scale attempt to support the common assumption that non-linear classifiers excel only when large amounts of data are available.
Tasks
Published	2018-10-05
URL	https://link.springer.com/chapter/10.1007/978-3-030-01768-2_25
PDF	http://liacs.leidenuniv.nl/~rijnjnvan/pdf/pub/ida2018.pdf
PWC	https://paperswithcode.com/paper/dont-rule-out-simple-models-prematurely-a
Repo	https://github.com/openml/openml-python/blob/develop/examples/40_paper/2018_ida_strang_example.py
Framework	none