Paper Group NANR 273
Distributed Clustering via LSH Based Data Partitioning. Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases. Fast Node Embeddings: Learning Ego-Centric Representations. ECNU at SemEval-2018 Task 10: Evaluating Simple but Effective Features on Machine Learning Methods for Semantic …
Distributed Clustering via LSH Based Data Partitioning
Title | Distributed Clustering via LSH Based Data Partitioning |
Authors | Aditya Bhaskara, Maheshakya Wijewardena |
Abstract | Given the importance of clustering in the analysisof large scale data, distributed algorithms for formulations such as k-means, k-median, etc. have been extensively studied. A successful approach here has been the “reduce and merge” paradigm, in which each machine reduces its input size to {Õ}(k), and this data reduction continues (possibly iteratively) until all the data fits on one machine, at which point the problem is solved locally. This approach has the intrinsic bottleneck that each machine must solve a problem of size $\geq$ k, and needs to communicate at least $\Omega$(k) points to the other machines. We propose a novel data partitioning idea to overcome this bottleneck, and in effect, have different machines focus on “finding different clusters”. Under the assumption that we know the optimum value of the objective up to a poly(n) factor (arbitrary polynomial), we establish worst-case approximation guarantees for our method. We see that our algorithm results in lower communication as well as a near-optimal number of ‘rounds’ of computation (in the popular MapReduce framework). |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2443 |
http://proceedings.mlr.press/v80/bhaskara18a/bhaskara18a.pdf | |
PWC | https://paperswithcode.com/paper/distributed-clustering-via-lsh-based-data |
Repo | |
Framework | |
Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases
Title | Lexical and Semantic Features for Cross-lingual Text Reuse Classification: an Experiment in English and Latin Paraphrases |
Authors | Maria Moritz, David Steding |
Abstract | |
Tasks | Semantic Textual Similarity, Word Embeddings, Word Sense Disambiguation |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1311/ |
https://www.aclweb.org/anthology/L18-1311 | |
PWC | https://paperswithcode.com/paper/lexical-and-semantic-features-for-cross |
Repo | |
Framework | |
Fast Node Embeddings: Learning Ego-Centric Representations
Title | Fast Node Embeddings: Learning Ego-Centric Representations |
Authors | Tiago Pimentel, Adriano Veloso, Nivio Ziviani |
Abstract | Representation learning is one of the foundations of Deep Learning and allowed important improvements on several Machine Learning tasks, such as Neural Machine Translation, Question Answering and Speech Recognition. Recent works have proposed new methods for learning representations for nodes and edges in graphs. Several of these methods are based on the SkipGram algorithm, and they usually process a large number of multi-hop neighbors in order to produce the context from which node representations are learned. In this paper, we propose an effective and also efficient method for generating node embeddings in graphs that employs a restricted number of permutations over the immediate neighborhood of a node as context to generate its representation, thus ego-centric representations. We present a thorough evaluation showing that our method outperforms state-of-the-art methods in six different datasets related to the problems of link prediction and node classification, being one to three orders of magnitude faster than baselines when generating node embeddings for very large graphs. |
Tasks | Link Prediction, Machine Translation, Node Classification, Question Answering, Representation Learning, Speech Recognition |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=SJyfrl-0b |
https://openreview.net/pdf?id=SJyfrl-0b | |
PWC | https://paperswithcode.com/paper/fast-node-embeddings-learning-ego-centric |
Repo | |
Framework | |
ECNU at SemEval-2018 Task 10: Evaluating Simple but Effective Features on Machine Learning Methods for Semantic Difference Detection
Title | ECNU at SemEval-2018 Task 10: Evaluating Simple but Effective Features on Machine Learning Methods for Semantic Difference Detection |
Authors | Yunxiao Zhou, Man Lan, Yuanbin Wu |
Abstract | This paper describes the system we submitted to Task 10 (Capturing Discriminative Attributes) in SemEval 2018. Given a triple (word1, word2, attribute), this task is to predict whether it exemplifies a semantic difference or not. We design and investigate several word embedding features, PMI features and WordNet features together with supervised machine learning methods to address this task. Officially released results show that our system ranks above average. |
Tasks | Feature Engineering, Machine Translation, Semantic Textual Similarity, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1165/ |
https://www.aclweb.org/anthology/S18-1165 | |
PWC | https://paperswithcode.com/paper/ecnu-at-semeval-2018-task-10-evaluating |
Repo | |
Framework | |
Improving User Impression in Spoken Dialog System with Gradual Speech Form Control
Title | Improving User Impression in Spoken Dialog System with Gradual Speech Form Control |
Authors | Yukiko Kageyama, Yuya Chiba, Takashi Nose, Akinori Ito |
Abstract | This paper examines a method to improve the user impression of a spoken dialog system by introducing a mechanism that gradually changes form of utterances every time the user uses the system. In some languages, including Japanese, the form of utterances changes corresponding to social relationship between the talker and the listener. Thus, this mechanism can be effective to express the system{'}s intention to make social distance to the user closer; however, an actual effect of this method is not investigated enough when introduced to the dialog system. In this paper, we conduct dialog experiments and show that controlling the form of system utterances can improve the users{'} impression. |
Tasks | |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-5026/ |
https://www.aclweb.org/anthology/W18-5026 | |
PWC | https://paperswithcode.com/paper/improving-user-impression-in-spoken-dialog |
Repo | |
Framework | |
AmritaNLP at SemEval-2018 Task 10: Capturing discriminative attributes using convolution neural network over global vector representation.
Title | AmritaNLP at SemEval-2018 Task 10: Capturing discriminative attributes using convolution neural network over global vector representation. |
Authors | Vivek Vinayan, An Kumar M, , Soman K P |
Abstract | The {``}Capturing Discriminative Attributes{''} sharedtask is the tenth task, conjoint with SemEval2018. The task is to predict if a word can capture distinguishing attributes of one word from another. We use GloVe word embedding, pre-trained on openly sourced corpus for this task. A base representation is initially established over varied dimensions. These representations are evaluated based on validation scores over two models, first on an SVM based classifier and second on a one dimension CNN model. The scores are used to further develop the representation with vector combinations, by considering various distance measures. These measures correspond to offset vectors which are concatenated as features, mainly to improve upon the F1score, with the best accuracy. The features are then further tuned on the validation scores, to achieve highest F1score. Our evaluation narrowed down to two representations, classified on CNN models, having a total dimension length of 1204 {&} 1203 for the final submissions. Of the two, the latter feature representation delivered our best F1score of 0.658024 (as per result). | |
Tasks | |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1166/ |
https://www.aclweb.org/anthology/S18-1166 | |
PWC | https://paperswithcode.com/paper/amritanlp-at-semeval-2018-task-10-capturing |
Repo | |
Framework | |
UMDuluth-CS8761 at SemEval-2018 Task 2: Emojis: Too many Choices?
Title | UMDuluth-CS8761 at SemEval-2018 Task 2: Emojis: Too many Choices? |
Authors | Jonathan Beaulieu, Dennis Asamoah Owusu |
Abstract | In this paper, we present our system for assigning an emoji to a tweet based on the text. Each tweet was originally posted with an emoji which the task providers removed. Our task was to decide out of 20 emojis, which originally came with the tweet. Two datasets were provided - one in English and the other in Spanish. We treated the task as a standard classification task with the emojis as our classes and the tweets as our documents. Our best performing system used a Bag of Words model with a Linear Support Vector Machine as its{'} classifier. We achieved a macro F1 score of 32.73{%} for the English data and 17.98{%} for the Spanish data. |
Tasks | |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1061/ |
https://www.aclweb.org/anthology/S18-1061 | |
PWC | https://paperswithcode.com/paper/umduluth-cs8761-at-semeval-2018-task-2-emojis |
Repo | |
Framework | |
Learning to Exploit Stability for 3D Scene Parsing
Title | Learning to Exploit Stability for 3D Scene Parsing |
Authors | Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu |
Abstract | Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue which we exploit to enhance scene understanding. We integrate the physical cue of stability into the learning process using a REINFORCE approach coupled to a physics engine, and apply this to the problem of producing the 3D bounding boxes and poses of objects in a scene. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that applying physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data. |
Tasks | Scene Parsing, Scene Understanding |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7444-learning-to-exploit-stability-for-3d-scene-parsing |
http://papers.nips.cc/paper/7444-learning-to-exploit-stability-for-3d-scene-parsing.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-exploit-stability-for-3d-scene |
Repo | |
Framework | |
UMD at SemEval-2018 Task 10: Can Word Embeddings Capture Discriminative Attributes?
Title | UMD at SemEval-2018 Task 10: Can Word Embeddings Capture Discriminative Attributes? |
Authors | Alex Zhang, er, Marine Carpuat |
Abstract | We describe the University of Maryland{'}s submission to SemEval-018 Task 10, {``}Capturing Discriminative Attributes{''}: given word triples (w1, w2, d), the goal is to determine whether d is a discriminating attribute belonging to w1 but not w2. Our study aims to determine whether word embeddings can address this challenging task. Our submission casts this problem as supervised binary classification using only word embedding features. Using a gaussian SVM model trained only on validation data results in an F-score of 60{%}. We also show that cosine similarity features are more effective, both in unsupervised systems (F-score of 65{%}) and supervised systems (F-score of 67{%}). | |
Tasks | Semantic Textual Similarity, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1170/ |
https://www.aclweb.org/anthology/S18-1170 | |
PWC | https://paperswithcode.com/paper/umd-at-semeval-2018-task-10-can-word |
Repo | |
Framework | |
NTU NLP Lab System at SemEval-2018 Task 10: Verifying Semantic Differences by Integrating Distributional Information and Expert Knowledge
Title | NTU NLP Lab System at SemEval-2018 Task 10: Verifying Semantic Differences by Integrating Distributional Information and Expert Knowledge |
Authors | Yow-Ting Shiue, Hen-Hsen Huang, Hsin-Hsi Chen |
Abstract | This paper presents the NTU NLP Lab system for the SemEval-2018 Capturing Discriminative Attributes task. Word embeddings, pointwise mutual information (PMI), ConceptNet edges and shortest path lengths are utilized as input features to build binary classifiers to tell whether an attribute is discriminative for a pair of concepts. Our neural network model reaches about 73{%} F1 score on the test set and ranks the 3rd in the task. Though the attributes to deal with in this task are all visual, our models are not provided with any image data. The results indicate that visual information can be derived from textual data. |
Tasks | Machine Translation, Semantic Textual Similarity, Sentiment Analysis, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1171/ |
https://www.aclweb.org/anthology/S18-1171 | |
PWC | https://paperswithcode.com/paper/ntu-nlp-lab-system-at-semeval-2018-task-10 |
Repo | |
Framework | |
CSReader at SemEval-2018 Task 11: Multiple Choice Question Answering as Textual Entailment
Title | CSReader at SemEval-2018 Task 11: Multiple Choice Question Answering as Textual Entailment |
Authors | Zhengping Jiang, Qi Sun |
Abstract | In this document we present an end-to-end machine reading comprehension system that solves multiple choice questions with a textual entailment perspective. Since some of the knowledge required is not explicitly mentioned in the text, we try to exploit commonsense knowledge by using pretrained word embeddings during contextual embeddings and by dynamically generating a weighted representation of related script knowledge. In the model two kinds of prediction structure are ensembled, and the final accuracy of our system is 10 percent higher than the naiive baseline. |
Tasks | Common Sense Reasoning, Language Modelling, Machine Reading Comprehension, Natural Language Inference, Question Answering, Reading Comprehension, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1176/ |
https://www.aclweb.org/anthology/S18-1176 | |
PWC | https://paperswithcode.com/paper/csreader-at-semeval-2018-task-11-multiple |
Repo | |
Framework | |
ECNU at SemEval-2018 Task 11: Using Deep Learning Method to Address Machine Comprehension Task
Title | ECNU at SemEval-2018 Task 11: Using Deep Learning Method to Address Machine Comprehension Task |
Authors | Yixuan Sheng, Man Lan, Yuanbin Wu |
Abstract | This paper describes the system we submitted to the Task 11 in SemEval 2018, i.e., Machine Comprehension using Commonsense Knowledge. Given a passage and some questions that each have two candidate answers, this task requires the participate system to select out one answer meet the meaning of original text or commonsense knowledge from the candidate answers. For this task, we use a deep learning method to obtain final predict answer by calculating relevance of choices representations and question-aware document representation. |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1175/ |
https://www.aclweb.org/anthology/S18-1175 | |
PWC | https://paperswithcode.com/paper/ecnu-at-semeval-2018-task-11-using-deep |
Repo | |
Framework | |
YNU-HPCC at Semeval-2018 Task 11: Using an Attention-based CNN-LSTM for Machine Comprehension using Commonsense Knowledge
Title | YNU-HPCC at Semeval-2018 Task 11: Using an Attention-based CNN-LSTM for Machine Comprehension using Commonsense Knowledge |
Authors | Hang Yuan, Jin Wang, Xuejie Zhang |
Abstract | This shared task is a typical question answering task. Compared with the normal question and answer system, it needs to give the answer to the question based on the text provided. The essence of the problem is actually reading comprehension. Typically, there are several questions for each text that correspond to it. And for each question, there are two candidate answers (and only one of them is correct). To solve this problem, the usual approach is to use convolutional neural networks (CNN) and recurrent neural network (RNN) or their improved models (such as long short-term memory (LSTM)). In this paper, an attention-based CNN-LSTM model is proposed for this task. By adding an attention mechanism and combining the two models, this experimental result has been significantly improved. |
Tasks | Question Answering, Reading Comprehension |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1177/ |
https://www.aclweb.org/anthology/S18-1177 | |
PWC | https://paperswithcode.com/paper/ynu-hpcc-at-semeval-2018-task-11-using-an |
Repo | |
Framework | |
Aligning Infinite-Dimensional Covariance Matrices in Reproducing Kernel Hilbert Spaces for Domain Adaptation
Title | Aligning Infinite-Dimensional Covariance Matrices in Reproducing Kernel Hilbert Spaces for Domain Adaptation |
Authors | Zhen Zhang, Mianzhi Wang, Yan Huang, Arye Nehorai |
Abstract | Domain shift, which occurs when there is a mismatch between the distributions of training (source) and testing (target) datasets, usually results in poor performance of the trained model on the target domain. Existing algorithms typically solve this issue by reducing the distribution discrepancy in the input spaces. However, for kernel-based learning machines, performance highly depends on the statistical properties of data in reproducing kernel Hilbert spaces (RKHS). Motivated by these considerations, we propose a novel strategy for matching distributions in RKHS, which is done by aligning the RKHS covariance matrices (descriptors) across domains. This strategy is a generalization of the correlation alignment problem in Euclidean spaces to (potentially) infinite-dimensional feature spaces. In this paper, we provide two alignment approaches, for both of which we obtain closed-form expressions via kernel matrices. Furthermore, our approaches are scalable to large datasets since they can naturally handle out-of-sample instances. We conduct extensive experiments (248 domain adaptation tasks) to evaluate our approaches. Experiment results show that our approaches outperform other state-of-the-art methods in both accuracy and computationally efficiency. |
Tasks | Domain Adaptation |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Aligning_Infinite-Dimensional_Covariance_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Aligning_Infinite-Dimensional_Covariance_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/aligning-infinite-dimensional-covariance |
Repo | |
Framework | |
BLCU_NLP at SemEval-2018 Task 12: An Ensemble Model for Argument Reasoning Based on Hierarchical Attention
Title | BLCU_NLP at SemEval-2018 Task 12: An Ensemble Model for Argument Reasoning Based on Hierarchical Attention |
Authors | Meiqian Zhao, Chunhua Liu, Lu Liu, Yan Zhao, Dong Yu |
Abstract | To comprehend an argument and fill the gap between claims and reasons, it is vital to find the implicit supporting warrants behind. In this paper, we propose a hierarchical attention model to identify the right warrant which explains why the reason stands for the claim. Our model focuses not only on the similar part between warrants and other information but also on the contradictory part between two opposing warrants. In addition, we use the ensemble method for different models. Our model achieves an accuracy of 61{%}, ranking second in this task. Experimental results demonstrate that our model is effective to make correct choices. |
Tasks | Common Sense Reasoning, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1186/ |
https://www.aclweb.org/anthology/S18-1186 | |
PWC | https://paperswithcode.com/paper/blcu_nlp-at-semeval-2018-task-12-an-ensemble |
Repo | |
Framework | |