Paper Group NAWR 22
Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions. Fully Statistical Neural Belief Tracking. An Empirical Study of Building a Strong Baseline for Constituency Parsing. Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem. Fusing Document, Collection and L …
Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
Title | Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions |
Authors | Junru Wu, Yue Wang, Zhenyu Wu, Zhangyang Wang, Ashok Veeraraghavan, Yingyan Lin |
Abstract | The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weight-sharing, by only recording $K$ cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed $k$-means regularization, which tends to make hard assignments of convolutional layer weights to $K$ learned cluster centers during re-training. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep $k$-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://github.com/Sandbox3aster/Deep-K-Means |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2219 |
http://proceedings.mlr.press/v80/wu18h/wu18h.pdf | |
PWC | https://paperswithcode.com/paper/deep-k-means-re-training-and-parameter |
Repo | https://github.com/Sandbox3aster/Deep-K-Means |
Framework | pytorch |
Fully Statistical Neural Belief Tracking
Title | Fully Statistical Neural Belief Tracking |
Authors | Nikola Mrk{\v{s}}i{'c}, Ivan Vuli{'c} |
Abstract | This paper proposes an improvement to the existing data-driven Neural Belief Tracking (NBT) framework for Dialogue State Tracking (DST). The existing NBT model uses a hand-crafted belief state update mechanism which involves an expensive manual retuning step whenever the model is deployed to a new dialogue domain. We show that this update mechanism can be learned jointly with the semantic decoding and context modelling parts of the NBT model, eliminating the last rule-based module from this DST framework. We propose two different statistical update mechanisms and show that dialogue dynamics can be modelled with a very small number of additional model parameters. In our DST evaluation over three languages, we show that this model achieves competitive performance and provides a robust framework for building resource-light DST models. |
Tasks | Dialogue Management, Dialogue State Tracking, Spoken Language Understanding, Word Embeddings |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2018/ |
https://www.aclweb.org/anthology/P18-2018 | |
PWC | https://paperswithcode.com/paper/fully-statistical-neural-belief-tracking-1 |
Repo | https://github.com/nmrksic/neural-belief-tracker |
Framework | tf |
An Empirical Study of Building a Strong Baseline for Constituency Parsing
Title | An Empirical Study of Building a Strong Baseline for Constituency Parsing |
Authors | Jun Suzuki, Sho Takase, Hidetaka Kamigaito, Makoto Morishita, Masaaki Nagata |
Abstract | This paper investigates the construction of a strong baseline based on general purpose sequence-to-sequence models for constituency parsing. We incorporate several techniques that were mainly developed in natural language generation tasks, e.g., machine translation and summarization, and demonstrate that the sequence-to-sequence model achieves the current top-notch parsers{'} performance (almost) without requiring any explicit task-specific knowledge or architecture of constituent parsing. |
Tasks | Abstractive Text Summarization, Constituency Parsing, Machine Translation, Text Generation |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2097/ |
https://www.aclweb.org/anthology/P18-2097 | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-building-a-strong |
Repo | https://github.com/nttcslab-nlp/strong_s2s_baseline_parser |
Framework | none |
Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem
Title | Word Emotion Induction for Multiple Languages as a Deep Multi-Task Learning Problem |
Authors | Sven Buechel, Udo Hahn |
Abstract | Predicting the emotional value of lexical items is a well-known problem in sentiment analysis. While research has focused on polarity for quite a long time, meanwhile this early focus has been shifted to more expressive emotion representation models (such as Basic Emotions or Valence-Arousal-Dominance). This change resulted in a proliferation of heterogeneous formats and, in parallel, often small-sized, non-interoperable resources (lexicons and corpus annotations). In particular, the limitations in size hampered the application of deep learning methods in this area because they typically require large amounts of input data. We here present a solution to get around this language data bottleneck by rephrasing word emotion induction as a multi-task learning problem. In this approach, the prediction of each independent emotion dimension is considered as an individual task and hidden layers are shared between these dimensions. We investigate whether multi-task learning is more advantageous than single-task learning for emotion prediction by comparing our model against a wide range of alternative emotion and polarity induction methods featuring 9 typologically diverse languages and a total of 15 conditions. Our model turns out to outperform each one of them. Against all odds, the proposed deep learning approach yields the largest gain on the smallest data sets, merely composed of one thousand samples. |
Tasks | Multi-Task Learning, Sentiment Analysis |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/N18-1173/ |
https://www.aclweb.org/anthology/N18-1173 | |
PWC | https://paperswithcode.com/paper/word-emotion-induction-for-multiple-languages |
Repo | https://github.com/JULIELab/wordEmotions |
Framework | tf |
Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification
Title | Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification |
Authors | Konstantinos Skianis, Fragkiskos Malliaros, Michalis Vazirgiannis |
Abstract | Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words(GoW) model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, the importance of a term is determined by weighting the corresponding node in the document, collection and label graphs, using node centrality criteria. We also introduce novel graph-based weighting schemes by enriching graphs with word-embedding similarities, in order to reward or penalize semantic relationships. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria. |
Tasks | Sentiment Analysis, Text Categorization, Text Classification, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-1707/ |
https://www.aclweb.org/anthology/W18-1707 | |
PWC | https://paperswithcode.com/paper/fusing-document-collection-and-label-graph |
Repo | https://github.com/y3nk0/Graph-Based-TC |
Framework | tf |
Word-like character n-gram embedding
Title | Word-like character n-gram embedding |
Authors | Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira |
Abstract | We propose a new word embedding method called \textit{word-like character} n\textit{-gram embedding}, which learns distributed representations of words by embedding word-like character n-grams. Our method is an extension of recently proposed \textit{segmentation-free word embedding}, which directly embeds frequent character n-grams from a raw corpus. However, its n-gram vocabulary tends to contain too many non-word n-grams. We solved this problem by introducing an idea of \textit{expected word frequency}. Compared to the previously proposed methods, our method can embed more words, along with the words that are not included in a given basic word dictionary. Since our method does not rely on word segmentation with rich word dictionaries, it is especially effective when the text in the corpus is in unsegmented language and contains many neologisms and informal words (e.g., Chinese SNS dataset). Our experimental results on Sina Weibo (a Chinese microblog service) and Twitter show that the proposed method can embed more words and improve the performance of downstream tasks. |
Tasks | Word Embeddings |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6120/ |
https://www.aclweb.org/anthology/W18-6120 | |
PWC | https://paperswithcode.com/paper/word-like-character-n-gram-embedding |
Repo | https://github.com/kdrl/WNE |
Framework | none |
From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations
Title | From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations |
Authors | Egoitz Laparra, Dongfang Xu, Steven Bethard |
Abstract | This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus. |
Tasks | Semantic Composition, Semantic Parsing |
Published | 2018-01-01 |
URL | https://www.aclweb.org/anthology/Q18-1025/ |
https://www.aclweb.org/anthology/Q18-1025 | |
PWC | https://paperswithcode.com/paper/from-characters-to-time-intervals-new |
Repo | https://github.com/clulab/timenorm |
Framework | none |
Arrows are the Verbs of Diagrams
Title | Arrows are the Verbs of Diagrams |
Authors | Malihe Alikhani, Matthew Stone |
Abstract | Arrows are a key ingredient of schematic pictorial communication. This paper investigates the interpretation of arrows through linguistic, crowdsourcing and machine-learning methodology. Our work establishes a novel analogy between arrows and verbs: we advocate representing arrows in terms of qualitatively different structural and semantic frames, and resolving frames to specific interpretations using shallow world knowledge. |
Tasks | |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1301/ |
https://www.aclweb.org/anthology/C18-1301 | |
PWC | https://paperswithcode.com/paper/arrows-are-the-verbs-of-diagrams |
Repo | https://github.com/malihealikhani/Arrows_are_Verbs |
Framework | pytorch |
E2E NLG Challenge: Neural Models vs. Templates
Title | E2E NLG Challenge: Neural Models vs. Templates |
Authors | Yevgeniy Puzikov, Iryna Gurevych |
Abstract | E2E NLG Challenge is a shared task on generating restaurant descriptions from sets of key-value pairs. This paper describes the results of our participation in the challenge. We develop a simple, yet effective neural encoder-decoder model which produces fluent restaurant descriptions and outperforms a strong baseline. We further analyze the data provided by the organizers and conclude that the task can also be approached with a template-based model developed in just a few hours. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6557/ |
https://www.aclweb.org/anthology/W18-6557 | |
PWC | https://paperswithcode.com/paper/e2e-nlg-challenge-neural-models-vs-templates |
Repo | https://github.com/UKPLab/e2e-nlg-challenge-2017 |
Framework | pytorch |
Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch
Title | Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch |
Authors | Osman Asif Malik, Stephen Becker |
Abstract | We propose two randomized algorithms for low-rank Tucker decomposition of tensors. The algorithms, which incorporate sketching, only require a single pass of the input tensor and can handle tensors whose elements are streamed in any order. To the best of our knowledge, ours are the only algorithms which can do this. We test our algorithms on sparse synthetic data and compare them to multiple other methods. We also apply one of our algorithms to a real dense 38 GB tensor representing a video and use the resulting decomposition to correctly classify frames containing disturbances. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/8213-low-rank-tucker-decomposition-of-large-tensors-using-tensorsketch |
http://papers.nips.cc/paper/8213-low-rank-tucker-decomposition-of-large-tensors-using-tensorsketch.pdf | |
PWC | https://paperswithcode.com/paper/low-rank-tucker-decomposition-of-large |
Repo | https://github.com/OsmanMalik/tucker-tensorsketch |
Framework | none |
BioRead: A New Dataset for Biomedical Reading Comprehension
Title | BioRead: A New Dataset for Biomedical Reading Comprehension |
Authors | Dimitris Pappas, Ion Androutsopoulos, Haris Papageorgiou |
Abstract | |
Tasks | Information Retrieval, Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1439/ |
https://www.aclweb.org/anthology/L18-1439 | |
PWC | https://paperswithcode.com/paper/bioread-a-new-dataset-for-biomedical-reading |
Repo | https://github.com/dpappas/BIOREAD_code |
Framework | pytorch |
Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable
Title | Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable |
Authors | Viktor Hangya, Fabienne Braune, Alex Fraser, er, Hinrich Sch{"u}tze |
Abstract | Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language. Resources required for such tasks are often out-of-domain, thus domain adaptation is an important problem here. We make two contributions. First, we test a delightfully simple method for domain adaptation of bilingual word embeddings. We evaluate these embeddings on two bilingual tasks involving different domains: cross-lingual twitter sentiment classification and medical bilingual lexicon induction. Second, we tailor a broadly applicable semi-supervised classification method from computer vision to these tasks. We show that this method also helps in low-resource setups. Using both methods together we achieve large improvements over our baselines, by using only additional unlabeled data. |
Tasks | Domain Adaptation, Image Classification, Semi-Supervised Image Classification, Sentiment Analysis, Transfer Learning, Word Embeddings |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1075/ |
https://www.aclweb.org/anthology/P18-1075 | |
PWC | https://paperswithcode.com/paper/two-methods-for-domain-adaptation-of |
Repo | https://github.com/hangyav/biadapt |
Framework | tf |
Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks
Title | Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks |
Authors | Anirvan Sengupta, Cengiz Pehlevan, Mariano Tepper, Alexander Genkin, Dmitri Chklovskii |
Abstract | Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space. What is the functional significance of such representations and how can they arise? Here, we propose that localized receptive fields emerge in similarity-preserving networks of rectifying neurons that learn low-dimensional manifolds populated by sensory inputs. Numerical simulations of such networks on standard datasets yield manifold-tiling localized receptive fields. More generally, we show analytically that, for data lying on symmetric manifolds, optimal solutions of objectives, from which similarity-preserving networks are derived, have localized receptive fields. Therefore, nonnegative similarity-preserving mapping (NSM) implemented by neural networks can model representations of continuous manifolds in the brain. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7939-manifold-tiling-localized-receptive-fields-are-optimal-in-similarity-preserving-neural-networks |
http://papers.nips.cc/paper/7939-manifold-tiling-localized-receptive-fields-are-optimal-in-similarity-preserving-neural-networks.pdf | |
PWC | https://paperswithcode.com/paper/manifold-tiling-localized-receptive-fields |
Repo | https://github.com/flatironinstitute/mantis |
Framework | none |
Hierarchical Relational Networks for Group Activity Recognition and Retrieval
Title | Hierarchical Relational Networks for Group Activity Recognition and Retrieval |
Authors | Mostafa S. Ibrahim, Greg Mori |
Abstract | Modeling structured relationships between people in a scene is an important step toward visual understanding. We present a Hierarchical Relational Network that computes relational representations of people, given graph structures describing potential interactions. Each relational layer is fed individual person representations and a potential relationship graph. Relational representations of each person are created based on their connections in this particular graph. We demonstrate the efficacy of this model by applying it in both supervised and unsupervised learning paradigms. First, given a video sequence of people doing a collective activity, the relational scene representation is utilized for multi-person activity recognition. Second, we propose a Relational Autoencoder model for unsupervised learning of features for action and scene retrieval. Finally, a Denoising Autoencoder variant is presented to infer missing people in the scene from their context. Empirical results demonstrate that this approach learns relational feature representations that can effectively discriminate person and group activity classes. |
Tasks | Activity Recognition, Denoising, Group Activity Recognition |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Mostafa_Ibrahim_Hierarchical_Relational_Networks_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Mostafa_Ibrahim_Hierarchical_Relational_Networks_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-relational-networks-for-group |
Repo | https://github.com/mostafa-saad/hierarchical-relational-network |
Framework | none |
Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML
Title | Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML |
Authors | Benjamin Strang, Peter van der Putten, Jan N. van Rijn, Frank Hutter |
Abstract | A basic step for each data-mining or machine learning task is to determine which model to choose based on the problem and the data at hand. In this paper we investigate when non-linear classifiers outperform linear classifiers by means of a large scale experiment. We benchmark linear and non-linear versions of three types of classifiers (support vector machines; neural networks; and decision trees), and analyze the results to determine on what type of datasets the non-linear version performs better. To the best of our knowledge, this work is the first principled and large scale attempt to support the common assumption that non-linear classifiers excel only when large amounts of data are available. |
Tasks | |
Published | 2018-10-05 |
URL | https://link.springer.com/chapter/10.1007/978-3-030-01768-2_25 |
http://liacs.leidenuniv.nl/~rijnjnvan/pdf/pub/ida2018.pdf | |
PWC | https://paperswithcode.com/paper/dont-rule-out-simple-models-prematurely-a |
Repo | https://github.com/openml/openml-python/blob/develop/examples/40_paper/2018_ida_strang_example.py |
Framework | none |