October 15, 2019

2430 words 12 mins read

Paper Group NANR 251

Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods. Extreme Learning to Rank via Low Rank Assumption. Domain transfer through deep activation matching. UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF). Recognizing Complex Entity Mentions: A Review and Future Directions. A Conditional Gradient Framework …

Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods


Title	Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods
Authors	Junhong Lin, Volkan Cevher
Abstract	We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not too large. Our results are superior to the state-of-the-art theory, covering the cases that the regression function may not be in the hypothesis spaces. Particularly, our results show that distributed SGM has a smaller theoretical computational complexity, compared with distributed kernel ridge regression (KRR) and classic SGM.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2484
PDF	http://proceedings.mlr.press/v80/lin18a/lin18a.pdf
PWC	https://paperswithcode.com/paper/optimal-distributed-learning-with-multi-pass
Repo
Framework

Extreme Learning to Rank via Low Rank Assumption


Title	Extreme Learning to Rank via Low Rank Assumption
Authors	Minhao Cheng, Ian Davidson, Cho-Jui Hsieh
Abstract	We consider the setting where we wish to perform ranking for hundreds of thousands of users which is common in recommender systems and web search ranking. Learning a single ranking function is unlikely to capture the variability across all users while learning a ranking function for each person is time-consuming and requires large amounts of data from each user. To address this situation, we propose a Factorization RankSVM algorithm which learns a series of k basic ranking functions and then constructs for each user a local ranking function that is a combination of them. We develop a fast algorithm to reduce the time complexity of gradient descent solver by exploiting the low-rank structure, and the resulting algorithm is much faster than existing methods. Furthermore, we prove that the generalization error of the proposed method can be significantly better than training individual RankSVMs. Finally, we present some interesting patterns in the principal ranking functions learned by our algorithms.
Tasks	Learning-To-Rank, Recommendation Systems
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2102
PDF	http://proceedings.mlr.press/v80/cheng18a/cheng18a.pdf
PWC	https://paperswithcode.com/paper/extreme-learning-to-rank-via-low-rank
Repo
Framework

Domain transfer through deep activation matching


Title	Domain transfer through deep activation matching
Authors	Haoshuo Huang, Qixing Huang, Philipp Krahenbuhl
Abstract	We introduce a layer-wise unsupervised domain adaptation approach for the task of semantic segmentation. Instead of merely matching the output distributions of the source and target domains, our approach aligns the distributions of activations of intermediate layers. This scheme exhibits two key advantages. First, matching across intermediate layers introduces more constraints for training the network in the target domain, making the optimization problem better conditioned. Second, the matched activations at each layer provide similar inputs to the next layer for both training and adaptation, and thus alleviate covariate shift. We use a Generative Adversarial Network (or GAN) to align activation distributions. Experimental results show that our approach achieves state-of-the-art results on a variety of popular domain adaptation tasks, including (1) from GTA to Cityscapes for semantic segmentation, (2) from SYNTHIA to Cityscapes for semantic segmentation, and (3) adaptations on USPS and MNIST for image classification.
Tasks	Domain Adaptation, Image Classification, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Haoshuo_Huang_Domain_transfer_through_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Haoshuo_Huang_Domain_transfer_through_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/domain-transfer-through-deep-activation
Repo
Framework

UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF)


Title	UCL Machine Reading Group: Four Factor Framework For Fact Finding (HexaF)
Authors	Takuma Yoneda, Jeff Mitchell, Johannes Welbl, Pontus Stenetorp, Sebastian Riedel
Abstract	In this paper we describe our 2nd place FEVER shared-task system that achieved a FEVER score of 62.52{%} on the provisional test set (without additional human evaluation), and 65.41{%} on the development set. Our system is a four stage model consisting of document retrieval, sentence retrieval, natural language inference and aggregation. Retrieval is performed leveraging task-specific features, and then a natural language inference model takes each of the retrieved sentences paired with the claimed fact. The resulting predictions are aggregated across retrieved sentences with a Multi-Layer Perceptron, and re-ranked corresponding to the final prediction.
Tasks	Information Retrieval, Natural Language Inference, Reading Comprehension
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-5515/
PDF	https://www.aclweb.org/anthology/W18-5515
PWC	https://paperswithcode.com/paper/ucl-machine-reading-group-four-factor
Repo
Framework

Recognizing Complex Entity Mentions: A Review and Future Directions


Title	Recognizing Complex Entity Mentions: A Review and Future Directions
Authors	Xiang Dai
Abstract	Standard named entity recognizers can effectively recognize entity mentions that consist of contiguous tokens and do not overlap with each other. However, in practice, there are many domains, such as the biomedical domain, in which there are nested, overlapping, and discontinuous entity mentions. These complex mentions cannot be directly recognized by conventional sequence tagging models because they may break the assumptions based on which sequence tagging techniques are built. We review the existing methods which are revised to tackle complex entity mentions and categorize them as tokenlevel and sentence-level approaches. We then identify the research gap, and discuss some directions that we are exploring.
Tasks	Entity Linking, Named Entity Recognition, Question Answering, Relation Extraction
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-3006/
PDF	https://www.aclweb.org/anthology/P18-3006
PWC	https://paperswithcode.com/paper/recognizing-complex-entity-mentions-a-review
Repo
Framework

A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming


Title	A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming
Authors	Alp Yurtsever, Olivier Fercoq, Francesco Locatello, Volkan Cevher
Abstract	We propose a conditional gradient framework for a composite convex minimization template with broad applications. Our approach combines smoothing and homotopy techniques under the CGM framework, and provably achieves the optimal convergence rate. We demonstrate that the same rate holds if the linear subproblems are solved approximately with additive or multiplicative error. In contrast with the relevant work, we are able to characterize the convergence when the non-smooth term is an indicator function. Specific applications of our framework include the non-smooth minimization, semidefinite programming, and minimization with linear inclusion constraints over a compact domain. Numerical evidence demonstrates the benefits of our framework.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2152
PDF	http://proceedings.mlr.press/v80/yurtsever18a/yurtsever18a.pdf
PWC	https://paperswithcode.com/paper/a-conditional-gradient-framework-for
Repo
Framework

Explaining the Mistakes of Neural Networks with Latent Sympathetic Examples


Title	Explaining the Mistakes of Neural Networks with Latent Sympathetic Examples
Authors	Riaan Zoetmulder, Efstratios Gavves, Peter O’Connor
Abstract	Neural networks make mistakes. The reason why a mistake is made often remains a mystery. As such neural networks often are considered a black box. It would be useful to have a method that can give an explanation that is intuitive to a user as to why an image is misclassified. In this paper we develop a method for explaining the mistakes of a classifier model by visually showing what must be added to an image such that it is correctly classified. Our work combines the fields of adversarial examples, generative modeling and a correction technique based on difference target propagation to create an technique that creates explanations of why an image is misclassified. In this paper we explain our method and demonstrate it on MNIST and CelebA. This approach could aid in demystifying neural networks for a user.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=S1EzRgb0W
PDF	https://openreview.net/pdf?id=S1EzRgb0W
PWC	https://paperswithcode.com/paper/explaining-the-mistakes-of-neural-networks
Repo
Framework

Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text


Title	Handling Normalization Issues for Part-of-Speech Tagging of Online Conversational Text
Authors	G{'e}raldine Damnati, Jeremy Auguste, Alexis Nasr, Delphine Charlet, Johannes Heinecke, Fr{'e}d{'e}ric B{'e}chet
Abstract
Tasks	Lexical Normalization, Part-Of-Speech Tagging, Word Embeddings
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1014/
PDF	https://www.aclweb.org/anthology/L18-1014
PWC	https://paperswithcode.com/paper/handling-normalization-issues-for-part-of
Repo
Framework

Let’s be Honest: An Optimal No-Regret Framework for Zero-Sum Games


Title	Let’s be Honest: An Optimal No-Regret Framework for Zero-Sum Games
Authors	Ehsan Asadi Kangarshahi, Ya-Ping Hsieh, Mehmet Fatih Sahin, Volkan Cevher
Abstract	We revisit the problem of solving two-player zero-sum games in the decentralized setting. We propose a simple algorithmic framework that simultaneously achieves the best rates for honest regret as well as adversarial regret, and in addition resolves the open problem of removing the logarithmic terms in convergence to the value of the game. We achieve this goal in three steps. First, we provide a novel analysis of the optimistic mirror descent (OMD), showing that it can be modified to guarantee fast convergence for both honest regret and value of the game, when the players are playing collaboratively. Second, we propose a new algorithm, dubbed as robust optimistic mirror descent (ROMD), which attains optimal adversarial regret without knowing the time horizon beforehand. Finally, we propose a simple signaling scheme, which enables us to bridge OMD and ROMD to achieve the best of both worlds. Numerical examples are presented to support our theoretical claims and show that our non-adaptive ROMD algorithm can be competitive to OMD with adaptive step-size selection.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=1975
PDF	http://proceedings.mlr.press/v80/kangarshahi18a/kangarshahi18a.pdf
PWC	https://paperswithcode.com/paper/lets-be-honest-an-optimal-no-regret-framework
Repo
Framework

UWB at IEST 2018: Emotion Prediction in Tweets with Bidirectional Long Short-Term Memory Neural Network


Title	UWB at IEST 2018: Emotion Prediction in Tweets with Bidirectional Long Short-Term Memory Neural Network
Authors	Pavel P{\v{r}}ib{'a}{\v{n}}, Ji{\v{r}}{'\i} Mart{'\i}nek
Abstract	This paper describes our system created for the WASSA 2018 Implicit Emotion Shared Task. The goal of this task is to predict the emotion of a given tweet, from which a certain emotion word is removed. The removed word can be \textit{sad}, \textit{happy}, \textit{disgusted}, \textit{angry}, \textit{afraid} or a synonym of one of them. Our proposed system is based on deep-learning methods. We use Bidirectional Long Short-Term Memory (BiLSTM) with word embeddings as an input. Pre-trained DeepMoji model and pre-trained emoji2vec emoji embeddings are also used as additional inputs. Our System achieves 0.657 macro F1 score and our rank is 13th out of 30.
Tasks	Sentiment Analysis, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6232/
PDF	https://www.aclweb.org/anthology/W18-6232
PWC	https://paperswithcode.com/paper/uwb-at-iest-2018-emotion-prediction-in-tweets
Repo
Framework


Title	Deep Learning for Social Media Health Text Classification
Authors	Santosh Tokala, Vaibhav Gambhir, Animesh Mukherjee
Abstract	This paper describes the systems developed for 1st and 2nd tasks of the 3rd Social Media Mining for Health Applications Shared Task at EMNLP 2018. The first task focuses on automatic detection of posts mentioning a drug name or dietary supplement, a binary classification. The second task is about distinguishing the tweets that present personal medication intake, possible medication intake and non-intake. We performed extensive experiments with various classifiers like Logistic Regression, Random Forest, SVMs, Gradient Boosted Decision Trees (GBDT) and deep learning architectures such as Long Short-Term Memory Networks (LSTM), jointed Convolutional Neural Networks (CNN) and LSTM architecture, and attention based LSTM architecture both at word and character level. We have also explored using various pre-trained embeddings like Global Vectors for Word Representation (GloVe), Word2Vec and task-specific embeddings learned using CNN-LSTM and LSTMs.
Tasks	Speech Recognition, Text Classification, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5917/
PDF	https://www.aclweb.org/anthology/W18-5917
PWC	https://paperswithcode.com/paper/deep-learning-for-social-media-health-text
Repo
Framework

Phonetic Vector Representations for Sound Sequence Alignment


Title	Phonetic Vector Representations for Sound Sequence Alignment
Authors	Pavel Sofroniev, {\c{C}}a{\u{g}}r{\i} {\c{C}}{"o}ltekin
Abstract	This study explores a number of data-driven vector representations of the IPA-encoded sound segments for the purpose of sound sequence alignment. We test the alternative representations based on the alignment accuracy in the context of computational historical linguistics. We show that the data-driven methods consistently do better than linguistically-motivated articulatory-acoustic features. The similarity scores obtained using the data-driven representations in a monolingual context, however, performs worse than the state-of-the-art distance (or similarity) scoring methods proposed in earlier studies of computational historical linguistics. We also show that adapting representations to the task at hand improves the results, yielding alignment accuracy comparable to the state of the art methods.
Tasks
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5812/
PDF	https://www.aclweb.org/anthology/W18-5812
PWC	https://paperswithcode.com/paper/phonetic-vector-representations-for-sound
Repo
Framework

Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop


Title	Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop
Authors	Rujun Han, Michael Gill, Arthur Spirling, Kyunghyun Cho
Abstract	Conventional word embedding models do not leverage information from document meta-data, and they do not model uncertainty. We address these concerns with a model that incorporates document covariates to estimate conditional word embedding distributions. Our model allows for (a) hypothesis tests about the meanings of terms, (b) assessments as to whether a word is near or far from another conditioned on different covariate values, and (c) assessments as to whether estimated differences are statistically significant.
Tasks	Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1527/
PDF	https://www.aclweb.org/anthology/D18-1527
PWC	https://paperswithcode.com/paper/conditional-word-embedding-and-hypothesis
Repo
Framework

FOI DSS at SemEval-2018 Task 1: Combining LSTM States, Embeddings, and Lexical Features for Affect Analysis


Title	FOI DSS at SemEval-2018 Task 1: Combining LSTM States, Embeddings, and Lexical Features for Affect Analysis
Authors	Maja Karasalo, Mattias Nilsson, Magnus Rosell, Ulrika Wickenberg Bolin
Abstract	This paper describes the system used and results obtained for team FOI DSS at SemEval-2018 Task 1: Affect In Tweets. The team participated in all English language subtasks, with a method utilizing transfer learning from LSTM nets trained on large sentiment datasets combined with embeddings and lexical features. For four out of five subtasks, the system performed in the range of 92-95{%} of the winning systems, in terms of the competition metrics. Analysis of the results suggests that improved pre-processing and addition of more lexical features may further elevate performance.
Tasks	Emotion Classification, Transfer Learning
Published	2018-06-01
URL	https://www.aclweb.org/anthology/S18-1014/
PDF	https://www.aclweb.org/anthology/S18-1014
PWC	https://paperswithcode.com/paper/foi-dss-at-semeval-2018-task-1-combining-lstm
Repo
Framework

SADAGRAD: Strongly Adaptive Stochastic Gradient Methods


Title	SADAGRAD: Strongly Adaptive Stochastic Gradient Methods
Authors	Zaiyi Chen, Yi Xu, Enhong Chen, Tianbao Yang
Abstract	Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition, their iteration complexities have a explicitly linear dependence on the dimensionality of the problem. To alleviate this bad dependence, we propose a simple yet novel variant of ADAGRAD for stochastic (weakly) strongly convex optimization. Different from existing variants, the proposed variant (referred to as SADAGRAD) uses an adaptive restarting scheme in which (i) ADAGRAD serves as a sub-routine and is restarted periodically; (ii) the number of iterations for restarting ADAGRAD depends on the history of learning that incorporates knowledge of the geometry of the data. In addition to the adaptive proximal functions and adaptive number of iterations for restarting, we also develop a variant that is adaptive to the (implicit) strong convexity from the data, which together makes the proposed algorithm strongly adaptive. In terms of iteration complexity, in the worst case SADAGRAD has an O(1/\epsilon) for finding an \epsilon-optimal solution similar to other variants. However, it could enjoy faster convergence and much better dependence on the problem’s dimensionality when stochastic gradients are sparse. Extensive experiments on large-scale data sets demonstrate the efficiency of the proposed algorithms in comparison with several variants of ADAGRAD and stochastic gradient method.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2010
PDF	http://proceedings.mlr.press/v80/chen18m/chen18m.pdf
PWC	https://paperswithcode.com/paper/sadagrad-strongly-adaptive-stochastic
Repo
Framework