Paper Group NANR 256
Zero-shot Learning of Classifiers from Natural Language Quantification. Randomized Block Cubic Newton Method. Convolutions Are All You Need (For Classifying Character Sequences). Orthographic Features for Bilingual Lexicon Induction. Adversarial Multiple Source Domain Adaptation. Binary Rating Estimation with Graph Side Information. UZH at CoNLL–S …
Zero-shot Learning of Classifiers from Natural Language Quantification
Title | Zero-shot Learning of Classifiers from Natural Language Quantification |
Authors | Shashank Srivastava, Igor Labutov, Tom Mitchell |
Abstract | Humans can efficiently learn new concepts using language. We present a framework through which a set of explanations of a concept can be used to learn a classifier without access to any labeled examples. We use semantic parsing to map explanations to probabilistic assertions grounded in latent class labels and observed attributes of unlabeled data, and leverage the differential semantics of linguistic quantifiers (e.g., {}usually{'} vs { }always{'}) to drive model training. Experiments on three domains show that the learned classifiers outperform previous approaches for learning with limited data, and are comparable with fully supervised classifiers trained from a small number of labeled examples. |
Tasks | Semantic Parsing, Zero-Shot Learning |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1029/ |
https://www.aclweb.org/anthology/P18-1029 | |
PWC | https://paperswithcode.com/paper/zero-shot-learning-of-classifiers-from |
Repo | |
Framework | |
Randomized Block Cubic Newton Method
Title | Randomized Block Cubic Newton Method |
Authors | Nikita Doikov, Peter Richtarik, University Edinburgh |
Abstract | We study the problem of minimizing the sum of three convex functions: a differentiable, twice-differentiable and a non-smooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2322 |
http://proceedings.mlr.press/v80/doikov18a/doikov18a.pdf | |
PWC | https://paperswithcode.com/paper/randomized-block-cubic-newton-method |
Repo | |
Framework | |
Convolutions Are All You Need (For Classifying Character Sequences)
Title | Convolutions Are All You Need (For Classifying Character Sequences) |
Authors | Zach Wood-Doughty, Nicholas Andrews, Mark Dredze |
Abstract | While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences. When text is modeled as characters instead of words, the longer sequences make RNNs a poor choice. Convolutional neural networks (CNNs), although somewhat less ubiquitous than RNNs, have an internal structure more appropriate for long-distance character dependencies. To better understand how CNNs and RNNs differ in handling long sequences, we use them for text classification tasks in several character-level social media datasets. The CNN models vastly outperform the RNN models in our experiments, suggesting that CNNs are superior to RNNs at learning to classify character-level data. |
Tasks | Document Classification, Language Modelling, Machine Translation, Text Classification |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-6127/ |
https://www.aclweb.org/anthology/W18-6127 | |
PWC | https://paperswithcode.com/paper/convolutions-are-all-you-need-for-classifying |
Repo | |
Framework | |
Orthographic Features for Bilingual Lexicon Induction
Title | Orthographic Features for Bilingual Lexicon Induction |
Authors | Parker Riley, Daniel Gildea |
Abstract | Recent embedding-based methods in bilingual lexicon induction show good results, but do not take advantage of orthographic features, such as edit distance, which can be helpful for pairs of related languages. This work extends embedding-based methods to incorporate these features, resulting in significant accuracy gains for related languages. |
Tasks | Machine Translation, Multilingual Word Embeddings, Unsupervised Machine Translation, Word Alignment, Word Embeddings |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2062/ |
https://www.aclweb.org/anthology/P18-2062 | |
PWC | https://paperswithcode.com/paper/orthographic-features-for-bilingual-lexicon |
Repo | |
Framework | |
Adversarial Multiple Source Domain Adaptation
Title | Adversarial Multiple Source Domain Adaptation |
Authors | Han Zhao, Shanghang Zhang, Guanhang Wu, José M. F. Moura, Joao P. Costeira, Geoffrey J. Gordon |
Abstract | While domain adaptation has been actively researched, most algorithms focus on the single-source-single-target adaptation setting. In this paper we propose new generalization bounds and algorithms under both classification and regression settings for unsupervised multiple source domain adaptation. Our theoretical analysis naturally leads to an efficient learning strategy using adversarial neural networks: we show how to interpret it as learning feature representations that are invariant to the multiple domain shifts while still being discriminative for the learning task. To this end, we propose multisource domain adversarial networks (MDAN) that approach domain adaptation by optimizing task-adaptive generalization bounds. To demonstrate the effectiveness of MDAN, we conduct extensive experiments showing superior adaptation performance on both classification and regression problems: sentiment analysis, digit classification, and vehicle counting. |
Tasks | Domain Adaptation, Sentiment Analysis |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/8075-adversarial-multiple-source-domain-adaptation |
http://papers.nips.cc/paper/8075-adversarial-multiple-source-domain-adaptation.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-multiple-source-domain-adaptation |
Repo | |
Framework | |
Binary Rating Estimation with Graph Side Information
Title | Binary Rating Estimation with Graph Side Information |
Authors | Kwangjun Ahn, Kangwook Lee, Hyunseung Cha, Changho Suh |
Abstract | Rich experimental evidences show that one can better estimate users’ unknown ratings with the aid of graph side information such as social graphs. However, the gain is not theoretically quantified. In this work, we study the binary rating estimation problem to understand the fundamental value of graph side information. Considering a simple correlation model between a rating matrix and a graph, we characterize the sharp threshold on the number of observed entries required to recover the rating matrix (called the optimal sample complexity) as a function of the quality of graph side information (to be detailed). To the best of our knowledge, we are the first to reveal how much the graph side information reduces sample complexity. Further, we propose a computationally efficient algorithm that achieves the limit. Our experimental results demonstrate that the algorithm performs well even with real-world graphs. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7681-binary-rating-estimation-with-graph-side-information |
http://papers.nips.cc/paper/7681-binary-rating-estimation-with-graph-side-information.pdf | |
PWC | https://paperswithcode.com/paper/binary-rating-estimation-with-graph-side |
Repo | |
Framework | |
UZH at CoNLL–SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection
Title | UZH at CoNLL–SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection |
Authors | Peter Makarov, Simon Clematide |
Abstract | |
Tasks | Imitation Learning, Morphological Inflection |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/K18-3008/ |
https://www.aclweb.org/anthology/K18-3008 | |
PWC | https://paperswithcode.com/paper/uzh-at-conll-sigmorphon-2018-shared-task-on |
Repo | |
Framework | |
T"ubingen-Oslo system at SIGMORPHON shared task on morphological inflection. A multi-tasking multilingual sequence to sequence model.
Title | T"ubingen-Oslo system at SIGMORPHON shared task on morphological inflection. A multi-tasking multilingual sequence to sequence model. |
Authors | Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{"o}ltekin |
Abstract | |
Tasks | Data Augmentation, Morphological Inflection |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/K18-3014/ |
https://www.aclweb.org/anthology/K18-3014 | |
PWC | https://paperswithcode.com/paper/ta14bingen-oslo-system-at-sigmorphon-shared |
Repo | |
Framework | |
Model-based imitation learning from state trajectories
Title | Model-based imitation learning from state trajectories |
Authors | Subhajit Chaudhury, Daiki Kimura, Tadanobu Inoue, Ryuki Tachibana |
Abstract | Imitation learning from demonstrations usually relies on learning a policy from trajectories of optimal states and actions. However, in real life expert demonstrations, often the action information is missing and only state trajectories are available. We present a model-based imitation learning method that can learn environment-specific optimal actions only from expert state trajectories. Our proposed method starts with a model-free reinforcement learning algorithm with a heuristic reward signal to sample environment dynamics, which is then used to train the state-transition probability. Subsequently, we learn the optimal actions from expert state trajectories by supervised learning, while back-propagating the error gradients through the modeled environment dynamics. Experimental evaluations show that our proposed method successfully achieves performance similar to (state, action) trajectory-based traditional imitation learning methods even in the absence of action information, with much fewer iterations compared to conventional model-free reinforcement learning methods. We also demonstrate that our method can learn to act from only video demonstrations of expert agent for simple games and can learn to achieve desired performance in less number of iterations. |
Tasks | Imitation Learning |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=S1GDXzb0b |
https://openreview.net/pdf?id=S1GDXzb0b | |
PWC | https://paperswithcode.com/paper/model-based-imitation-learning-from-state |
Repo | |
Framework | |
Sequence Classification with Human Attention
Title | Sequence Classification with Human Attention |
Authors | Maria Barrett, Joachim Bingel, Nora Hollenstein, Marek Rei, Anders S{\o}gaard |
Abstract | Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP. Specifically, we use estimated human attention derived from eye-tracking corpora to regularize attention functions in recurrent neural networks. We show substantial improvements across a range of tasks, including sentiment analysis, grammatical error detection, and detection of abusive language. |
Tasks | Eye Tracking, Grammatical Error Detection, Sentiment Analysis |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/K18-1030/ |
https://www.aclweb.org/anthology/K18-1030 | |
PWC | https://paperswithcode.com/paper/sequence-classification-with-human-attention |
Repo | |
Framework | |
Language Codes
Title | Language Codes |
Authors | Jennifer DeCamp |
Abstract | |
Tasks | Machine Translation |
Published | 2018-03-01 |
URL | https://www.aclweb.org/anthology/W18-2001/ |
https://www.aclweb.org/anthology/W18-2001 | |
PWC | https://paperswithcode.com/paper/language-codes |
Repo | |
Framework | |
Noise-Robust Morphological Disambiguation for Dialectal Arabic
Title | Noise-Robust Morphological Disambiguation for Dialectal Arabic |
Authors | Nasser Zalmout, Alex Erdmann, er, Nizar Habash |
Abstract | User-generated text tends to be noisy with many lexical and orthographic inconsistencies, making natural language processing (NLP) tasks more challenging. The challenging nature of noisy text processing is exacerbated for dialectal content, where in addition to spelling and lexical differences, dialectal text is characterized with morpho-syntactic and phonetic variations. These issues increase sparsity in NLP models and reduce accuracy. We present a neural morphological tagging and disambiguation model for Egyptian Arabic, with various extensions to handle noisy and inconsistent content. Our models achieve about 5{%} relative error reduction (1.1{%} absolute improvement) for full morphological analysis, and around 22{%} relative error reduction (1.8{%} absolute improvement) for part-of-speech tagging, over a state-of-the-art baseline. |
Tasks | Lexical Normalization, Morphological Analysis, Morphological Tagging, Part-Of-Speech Tagging |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/N18-1087/ |
https://www.aclweb.org/anthology/N18-1087 | |
PWC | https://paperswithcode.com/paper/noise-robust-morphological-disambiguation-for |
Repo | |
Framework | |
Neural Morphological Tagging of Lemma Sequences for Machine Translation
Title | Neural Morphological Tagging of Lemma Sequences for Machine Translation |
Authors | Costanza Conforti, Matthias Huck, Alex Fraser, er |
Abstract | |
Tasks | Machine Translation, Morphological Tagging |
Published | 2018-03-01 |
URL | https://www.aclweb.org/anthology/W18-1805/ |
https://www.aclweb.org/anthology/W18-1805 | |
PWC | https://paperswithcode.com/paper/neural-morphological-tagging-of-lemma |
Repo | |
Framework | |
Phonologically Informed Edit Distance Algorithms for Word Alignment with Low-Resource Languages
Title | Phonologically Informed Edit Distance Algorithms for Word Alignment with Low-Resource Languages |
Authors | Richard T. McCoy, Robert Frank |
Abstract | |
Tasks | Machine Translation, Speech Recognition, Word Alignment |
Published | 2018-01-01 |
URL | https://www.aclweb.org/anthology/W18-0311/ |
https://www.aclweb.org/anthology/W18-0311 | |
PWC | https://paperswithcode.com/paper/phonologically-informed-edit-distance |
Repo | |
Framework | |
Alternating Randomized Block Coordinate Descent
Title | Alternating Randomized Block Coordinate Descent |
Authors | Jelena Diakonikolas, Lorenzo Orecchia |
Abstract | Block-coordinate descent algorithms and alternating minimization methods are fundamental optimization algorithms and an important primitive in large-scale optimization and machine learning. While various block-coordinate-descent-type methods have been studied extensively, only alternating minimization – which applies to the setting of only two blocks – is known to have convergence time that scales independently of the least smooth block. A natural question is then: is the setting of two blocks special? We show that the answer is “no” as long as the least smooth block can be optimized exactly – an assumption that is also needed in the setting of alternating minimization. We do so by introducing a novel algorithm AR-BCD, whose convergence time scales independently of the least smooth (possibly non-smooth) block. The basic algorithm generalizes both alternating minimization and randomized block coordinate (gradient) descent, and we also provide its accelerated version – AAR-BCD. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2445 |
http://proceedings.mlr.press/v80/diakonikolas18a/diakonikolas18a.pdf | |
PWC | https://paperswithcode.com/paper/alternating-randomized-block-coordinate |
Repo | |
Framework | |