Paper Group AWR 386
Issue Framing in Online Discussion Fora. Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping. NRPA: Neural Recommendation with Personalized Attention. DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm. CrowdFix: An Eyetracking Dat …
Issue Framing in Online Discussion Fora
Title | Issue Framing in Online Discussion Fora |
Authors | Mareike Hartmann, Tallulah Jansen, Isabelle Augenstein, Anders Søgaard |
Abstract | In online discussion fora, speakers often make arguments for or against something, say birth control, by highlighting certain aspects of the topic. In social science, this is referred to as issue framing. In this paper, we introduce a new issue frame annotated corpus of online discussions. We explore to what extent models trained to detect issue frames in newswire and social media can be transferred to the domain of discussion fora, using a combination of multi-task and adversarial training, assuming only unlabeled training data in the target domain. |
Tasks | |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.03969v2 |
http://arxiv.org/pdf/1904.03969v2.pdf | |
PWC | https://paperswithcode.com/paper/issue-framing-in-online-discussion-fora |
Repo | https://github.com/coastalcph/issue_framing |
Framework | none |
Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping
Title | Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping |
Authors | Yunzhi Lin, Chao Tang, Fu-Jen Chu, Patricio A. Vela |
Abstract | A segmentation-based architecture is proposed to decompose objects into multiple primitive shapes from monocular depth input for robotic manipulation. The backbone deep network is trained on synthetic data with 6 classes of primitive shapes generated by a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candidates per shape primitive region. The grasps are priority ordered via proposed ranking algorithm, with the first feasible one chosen for execution. On task-free grasping of individual objects, the method achieves a 94% success rate. On task-oriented grasping, it achieves a 76% success rate. Overall, the method supports the hypothesis that shape primitives can support task-free and task-relevant grasp prediction. |
Tasks | |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.08508v1 |
https://arxiv.org/pdf/1909.08508v1.pdf | |
PWC | https://paperswithcode.com/paper/using-synthetic-data-and-deep-networks-to |
Repo | https://github.com/ivalab/grasp_primitiveShape |
Framework | pytorch |
NRPA: Neural Recommendation with Personalized Attention
Title | NRPA: Neural Recommendation with Personalized Attention |
Authors | Hongtao Liu, Fangzhao Wu, Wenjun Wang, Xianchen Wang, Pengfei Jiao, Chuhan Wu, Xing Xie |
Abstract | Existing review-based recommendation methods usually use the same model to learn the representations of all users/items from reviews posted by users towards items. However, different users have different preference and different items have different characteristics. Thus, the same word or similar reviews may have different informativeness for different users and items. In this paper we propose a neural recommendation approach with personalized attention to learn personalized representations of users and items from reviews. We use a review encoder to learn representations of reviews from words, and a user/item encoder to learn representations of users or items from reviews. We propose a personalized attention model, and apply it to both review and user/item encoders to select different important words and reviews for different users/items. Experiments on five datasets validate our approach can effectively improve the performance of neural recommendation. |
Tasks | |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12480v1 |
https://arxiv.org/pdf/1905.12480v1.pdf | |
PWC | https://paperswithcode.com/paper/nrpa-neural-recommendation-with-personalized |
Repo | https://github.com/TianHongTao/Recommendation-System-Graduation-Design |
Framework | pytorch |
DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm
Title | DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm |
Authors | Gabriel Meseguer-Brocal, Alice Cohen-Hadria, Geoffroy Peeters |
Abstract | The goal of this paper is twofold. First, we introduce DALI, a large and rich multimodal dataset containing 5358 audio tracks with their time-aligned vocal melody notes and lyrics at four levels of granularity. The second goal is to explain our methodology where dataset creation and learning models interact using a teacher-student machine learning paradigm that benefits each other. We start with a set of manual annotations of draft time-aligned lyrics and notes made by non-expert users of Karaoke games. This set comes without audio. Therefore, we need to find the corresponding audio and adapt the annotations to it. To that end, we retrieve audio candidates from the Web. Each candidate is then turned into a singing-voice probability over time using a teacher, a deep convolutional neural network singing-voice detection system (SVD), trained on cleaned data. Comparing the time-aligned lyrics and the singing-voice probability, we detect matches and update the time-alignment lyrics accordingly. From this, we obtain new audio sets. They are then used to train new SVD students used to perform again the above comparison. The process could be repeated iteratively. We show that this allows to progressively improve the performances of our SVD and get better audio-matching and alignment. |
Tasks | |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10606v1 |
https://arxiv.org/pdf/1906.10606v1.pdf | |
PWC | https://paperswithcode.com/paper/dali-a-large-dataset-of-synchronized-audio |
Repo | https://github.com/gabolsgabs/DALI |
Framework | none |
CrowdFix: An Eyetracking Dataset of Real Life Crowd Videos
Title | CrowdFix: An Eyetracking Dataset of Real Life Crowd Videos |
Authors | Memoona Tahira, Sobas Mehboob, Anis U. Rahman, Omar Arif |
Abstract | Understanding human visual attention and saliency is an integral part of vision research. In this context, there is an ever-present need for fresh and diverse benchmark datasets, particularly for insight into special use cases like crowded scenes. We contribute to this end by: (1) reviewing the dynamics behind saliency and crowds. (2) using eye tracking to create a dynamic human eye fixation dataset over a new set of crowd videos gathered from the Internet. The videos are annotated into three distinct density levels. (3) Finally, we evaluate state-of-the-art saliency models on our dataset to identify possible improvements for the design and creation of a more robust saliency model. |
Tasks | Eye Tracking |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02618v2 |
https://arxiv.org/pdf/1910.02618v2.pdf | |
PWC | https://paperswithcode.com/paper/crowdfix-an-eyetracking-data-set-of-human |
Repo | https://github.com/MemoonaTahira/CrowdFix |
Framework | none |
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
Title | CrossWeigh: Training Named Entity Tagger from Imperfect Annotations |
Authors | Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei Han |
Abstract | Everyone makes mistakes. So do human annotators when curating labels for named entity recognition (NER). Such label mistakes might hurt model training and interfere model comparison. In this study, we dive deep into one of the widely-adopted NER benchmark datasets, CoNLL03 NER. We are able to identify label mistakes in about 5.38% test sentences, which is a significant ratio considering that the state-of-the-art test F1 score is already around 93%. Therefore, we manually correct these label mistakes and form a cleaner test set. Our re-evaluation of popular models on this corrected test set leads to more accurate assessments, compared to those on the original test set. More importantly, we propose a simple yet effective framework, CrossWeigh, to handle label mistakes during NER model training. Specifically, it partitions the training data into several folds and train independent NER models to identify potential mistakes in each fold. Then it adjusts the weights of training data accordingly to train the final NER model. Extensive experiments demonstrate significant improvements of plugging various NER models into our proposed framework on three datasets. All implementations and corrected test set are available at our Github repo: https://github.com/ZihanWangKi/CrossWeigh. |
Tasks | Named Entity Recognition |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01441v1 |
https://arxiv.org/pdf/1909.01441v1.pdf | |
PWC | https://paperswithcode.com/paper/crossweigh-training-named-entity-tagger-from |
Repo | https://github.com/ZihanWangKi/CrossWeigh |
Framework | none |
Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek
Title | Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek |
Authors | Dong Quan Vu, Patrick Loiseau, Alonso Silva, Long Tran-Thanh |
Abstract | Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11151v3 |
https://arxiv.org/pdf/1905.11151v3.pdf | |
PWC | https://paperswithcode.com/paper/colonel-blotto-games-and-hide-and-seek-games |
Repo | https://github.com/dongquan11/SOPPP_CB_and_HS_games |
Framework | none |
GBDT-MO: Gradient Boosted Decision Trees for Multiple Outputs
Title | GBDT-MO: Gradient Boosted Decision Trees for Multiple Outputs |
Authors | Zhendong Zhang, Cheolkon Jung |
Abstract | Gradient boosted decision trees (GBDTs) are widely used in machine learning, and the output of current GBDT implementations is a single variable. When there are multiple outputs, GBDT constructs multiple trees corresponding to the output variables. The correlations between variables are ignored by such a strategy causing redundancy of the learned tree structures. In this paper, we propose a general method to learn GBDT for multiple outputs, called GBDT-MO. Each leaf of GBDT-MO constructs predictions of all variables or a subset of automatically selected variables. This is achieved by considering the summation of objective gains over all output variables. Moreover, we extend histogram approximation into multiple output case to speed up the training process. Various experiments on synthetic and real-world datasets verify that GBDT-MO achieves outstanding performance in terms of both accuracy and training speed. Our codes are available on-line. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04373v2 |
https://arxiv.org/pdf/1909.04373v2.pdf | |
PWC | https://paperswithcode.com/paper/gbdt-mo-gradient-boosted-decision-trees-for |
Repo | https://github.com/zzd1992/GBDTMO-EX |
Framework | none |
InterpretML: A Unified Framework for Machine Learning Interpretability
Title | InterpretML: A Unified Framework for Machine Learning Interpretability |
Authors | Harsha Nori, Samuel Jenkins, Paul Koch, Rich Caruana |
Abstract | InterpretML is an open-source Python package which exposes machine learning interpretability algorithms to practitioners and researchers. InterpretML exposes two types of interpretability - glassbox models, which are machine learning models designed for interpretability (ex: linear models, rule lists, generalized additive models), and blackbox explainability techniques for explaining existing systems (ex: Partial Dependence, LIME). The package enables practitioners to easily compare interpretability algorithms by exposing multiple methods under a unified API, and by having a built-in, extensible visualization platform. InterpretML also includes the first implementation of the Explainable Boosting Machine, a powerful, interpretable, glassbox model that can be as accurate as many blackbox models. The MIT licensed source code can be downloaded from github.com/microsoft/interpret. |
Tasks | |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09223v1 |
https://arxiv.org/pdf/1909.09223v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretml-a-unified-framework-for-machine |
Repo | https://github.com/microsoft/interpret |
Framework | none |
Computationally Efficient Feature Significance and Importance for Machine Learning Models
Title | Computationally Efficient Feature Significance and Importance for Machine Learning Models |
Authors | Enguerrand Horel, Kay Giesecke |
Abstract | We develop a simple and computationally efficient significance test for the features of a machine learning model. Our forward-selection approach applies to any model specification, learning task and variable type. The test is non-asymptotic, straightforward to implement, and does not require model refitting. It identifies the statistically significant features as well as feature interactions of any order in a hierarchical manner, and generates a model-free notion of feature importance. Experimental and empirical results illustrate its performance. |
Tasks | Feature Importance |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09849v2 |
https://arxiv.org/pdf/1905.09849v2.pdf | |
PWC | https://paperswithcode.com/paper/computationally-efficient-feature |
Repo | https://github.com/fintechstanford/SFIT |
Framework | tf |
TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems
Title | TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems |
Authors | Wenbo Guo, Lun Wang, Xinyu Xing, Min Du, Dawn Song |
Abstract | A trojan backdoor is a hidden pattern typically implanted in a deep neural network. It could be activated and thus forces that infected model behaving abnormally only when an input data sample with a particular trigger present is fed to that model. As such, given a deep neural network model and clean input samples, it is very challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this acute problem. They demonstrate the proposed techniques have a great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption (e.g. assuming availability of the contaminated training database). On the other hand, the proposed techniques cannot accurately detect the existence of trojan backdoors, nor restore high-fidelity trojan backdoor images, especially when the triggers pertaining to the trojan vary in size, shape and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes a trojan detection task as a non-convex optimization problem, and the detection of a trojan backdoor as the task of resolving the optimization through an objective function. Different from the existing technique also modeling trojan detection as an optimization problem, TABOR designs a new objective function–under the guidance of explainable AI techniques as well as heuristics–that could guide optimization to identify a trojan backdoor in a more effective fashion. In addition, TABOR defines a new metric to measure the quality of a trojan backdoor identified. Using an anomaly detection method, we show the new metric could better facilitate TABOR to identify intentionally injected triggers in an infected model and filter out false alarms…… |
Tasks | Anomaly Detection |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.01763v2 |
https://arxiv.org/pdf/1908.01763v2.pdf | |
PWC | https://paperswithcode.com/paper/tabor-a-highly-accurate-approach-to |
Repo | https://github.com/UsmannK/TABOR |
Framework | tf |
What You Say and How You Say it: Joint Modeling of Topics and Discourse in Microblog Conversations
Title | What You Say and How You Say it: Joint Modeling of Topics and Discourse in Microblog Conversations |
Authors | Jichuan Zeng, Jing Li, Yulan He, Cuiyun Gao, Michael R. Lyu, Irwin King |
Abstract | This paper presents an unsupervised framework for jointly modeling topic content and discourse behavior in microblog conversations. Concretely, we propose a neural model to discover word clusters indicating what a conversation concerns (i.e., topics) and those reflecting how participants voice their opinions (i.e., discourse). Extensive experiments show that our model can yield both coherent topics and meaningful discourse behavior. Further study shows that our topic and discourse representations can benefit the classification of microblog messages, especially when they are jointly trained with the classifier. |
Tasks | |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07319v1 |
http://arxiv.org/pdf/1903.07319v1.pdf | |
PWC | https://paperswithcode.com/paper/what-you-say-and-how-you-say-it-joint |
Repo | https://github.com/zengjichuan/Topic_Disc |
Framework | pytorch |
TuckER: Tensor Factorization for Knowledge Graph Completion
Title | TuckER: Tensor Factorization for Knowledge Graph Completion |
Authors | Ivana Balažević, Carl Allen, Timothy M. Hospedales |
Abstract | Knowledge graphs are structured representations of real world facts. However, they typically contain only a small subset of all possible facts. Link prediction is a task of inferring missing facts based on existing ones. We propose TuckER, a relatively straightforward but powerful linear model based on Tucker decomposition of the binary tensor representation of knowledge graph triples. TuckER outperforms previous state-of-the-art models across standard link prediction datasets, acting as a strong baseline for more elaborate models. We show that TuckER is a fully expressive model, derive sufficient bounds on its embedding dimensionalities and demonstrate that several previously introduced linear models can be viewed as special cases of TuckER. |
Tasks | Knowledge Graph Completion, Knowledge Graphs, Link Prediction |
Published | 2019-01-28 |
URL | https://arxiv.org/abs/1901.09590v2 |
https://arxiv.org/pdf/1901.09590v2.pdf | |
PWC | https://paperswithcode.com/paper/tucker-tensor-factorization-for-knowledge |
Repo | https://github.com/Sujit-O/pykg2vec |
Framework | tf |
Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting Behavior
Title | Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting Behavior |
Authors | William Hoiles, Vikram Krishnamurthy, Kunal Pattanayak |
Abstract | We consider a novel application of inverse reinforcement learning which involves modeling, learning and predicting the commenting behavior of YouTube viewers. Each group of users is modeled as a rationally inattentive Bayesian agent. Our methodology integrates three key components. First, to identify distinct commenting patterns, we use deep embedded clustering to estimate framing information (essential extrinsic features) that clusters users into distinct groups. Second, we present an inverse reinforcement learning algorithm that uses Bayesian revealed preferences to test for rationality: does there exist a utility function that rationalizes the given data, and if yes, can it be used to predict future behavior? Finally, we impose behavioral economics constraints stemming from rational inattention to characterize the attention span of groups of users.The test imposes a R{'e}nyi mutual information cost constraint which impacts how the agent can select attention strategies to maximize their expected utility. After a careful analysis of a massive YouTube dataset, our surprising result is that in most YouTube user groups, the commenting behavior is consistent with optimizing a Bayesian utility with rationally inattentive constraints. The paper also highlights how the rational inattention model can accurately predict future commenting behavior. The massive YouTube dataset and analysis used in this paper are available on GitHub and completely reproducible. |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11703v1 |
https://arxiv.org/pdf/1910.11703v1.pdf | |
PWC | https://paperswithcode.com/paper/rationally-inattentive-inverse-reinforcement |
Repo | https://github.com/KunalP117/YouTube_project |
Framework | none |
Robust Re-identification of Manta Rays from Natural Markings by Learning Pose Invariant Embeddings
Title | Robust Re-identification of Manta Rays from Natural Markings by Learning Pose Invariant Embeddings |
Authors | Olga Moskvyak, Frederic Maire, Asia O. Armstrong, Feras Dayoub, Mahsa Baktashmotlagh |
Abstract | Visual identification of individual animals that bear unique natural body markings is an important task in wildlife conservation. The photo databases of animal markings grow larger and each new observation has to be matched against thousands of images. Existing photo-identification solutions have constraints on image quality and appearance of the pattern of interest in the image. These constraints limit the use of photos from citizen scientists. We present a novel system for visual re-identification based on unique natural markings that is robust to occlusions, viewpoint and illumination changes. We adapt methods developed for face re-identification and implement a deep convolutional neural network (CNN) to learn embeddings for images of natural markings. The distance between the learned embedding points provides a dissimilarity measure between the corresponding input images. The network is optimized using the triplet loss function and the online semi-hard triplet mining strategy. The proposed re-identification method is generic and not species specific. We evaluate the proposed system on image databases of manta ray belly patterns and humpback whale flukes. To be of practical value and adopted by marine biologists, a re-identification system needs to have a top-10 accuracy of at least 95%. The proposed system achieves this performance standard. |
Tasks | |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.10847v1 |
http://arxiv.org/pdf/1902.10847v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-re-identification-of-manta-rays-from |
Repo | https://github.com/olgamoskvyak/reid-manta |
Framework | tf |