Paper Group AWR 402
An Empirical Study of Content Understanding in Conversational Question Answering. AutoML: Exploration v.s. Exploitation. Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels. Copy mechanism and tailored training for character-based data-to-text generation. Construct Dynamic Graphs for Hand Gesture Recognition via Spa …
An Empirical Study of Content Understanding in Conversational Question Answering
Title | An Empirical Study of Content Understanding in Conversational Question Answering |
Authors | Ting-Rui Chiang, Hao-Tong Ye, Yun-Nung Chen |
Abstract | With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and CoQA, there has been more work on conversational question answering, and recent work has achieved competitive performance on both datasets. However, to best of our knowledge, two important questions for conversational comprehension research have not been well studied: 1) How well can the benchmark dataset reflect models’ content understanding? 2) Do the models well utilize the conversation content when answering questions? To investigate these questions, we design different training settings, testing settings, as well as an attack to verify the models’ capability of content understanding on QuAC and CoQA. The experimental results indicate some potential hazards in the benchmark datasets, QuAC and CoQA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models. With deep investigation of the task, it is believed that this work can benefit the future progress of conversation comprehension. The source code is available at https://github.com/MiuLab/CQA-Study. |
Tasks | Question Answering |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10743v2 |
https://arxiv.org/pdf/1909.10743v2.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-content-understanding |
Repo | https://github.com/MiuLab/CQA-Study |
Framework | pytorch |
AutoML: Exploration v.s. Exploitation
Title | AutoML: Exploration v.s. Exploitation |
Authors | Hassan Eldeeb, Abdelrhman Eldallal |
Abstract | Building a machine learning (ML) pipeline in an automated way is a crucial and complex task as it is constrained with the available time budget and resources. This encouraged the research community to introduce several solutions to utilize the available time and resources. A lot of work is done to suggest the most promising classifiers for a given dataset using sundry of techniques including meta-learning based techniques. This gives the autoML framework the chance to spend more time exploiting those classifiers and tuning their hyper-parameters. In this paper, we empirically study the hypothesis of improving the pipeline performance by exploiting the most promising classifiers within the limited time budget. We also study the effect of increasing the time budget over the pipeline performance. The empirical results across autoSKLearn, TPOT and ATM, show that exploiting the most promising classifiers does not achieve a statistically better performance than exploring the entire search space. The same conclusion is also applied for long time budgets. |
Tasks | AutoML, Meta-Learning |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10746v2 |
https://arxiv.org/pdf/1912.10746v2.pdf | |
PWC | https://paperswithcode.com/paper/automl-exploration-vs-exploitation |
Repo | https://github.com/DataSystemsGroupUT/automl_exploration_vs_exploitation |
Framework | none |
Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels
Title | Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels |
Authors | Lukas Lange, Michael A. Hedderich, Dietrich Klakow |
Abstract | In low-resource settings, the performance of supervised labeling models can be improved with automatically annotated or distantly supervised data, which is cheap to create but often noisy. Previous works have shown that significant improvements can be reached by injecting information about the confusion between clean and noisy labels in this additional training data into the classifier training. However, for noise estimation, these approaches either do not take the input features (in our case word embeddings) into account, or they need to learn the noise modeling from scratch which can be difficult in a low-resource setting. We propose to cluster the training data using the input features and then compute different confusion matrices for each cluster. To the best of our knowledge, our approach is the first to leverage feature-dependent noise modeling with pre-initialized confusion matrices. We evaluate on low-resource named entity recognition settings in several languages, showing that our methods improve upon other confusion-matrix based methods by up to 9%. |
Tasks | Named Entity Recognition, Word Embeddings |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06061v2 |
https://arxiv.org/pdf/1910.06061v2.pdf | |
PWC | https://paperswithcode.com/paper/feature-dependent-confusion-matrices-for-low |
Repo | https://github.com/uds-lsv/noise-matrix-ner |
Framework | tf |
Copy mechanism and tailored training for character-based data-to-text generation
Title | Copy mechanism and tailored training for character-based data-to-text generation |
Authors | Marco Roberti, Giovanni Bonetta, Rossella Cancelliere, Patrick Gallinari |
Abstract | In the last few years, many different methods have been focusing on using deep recurrent neural networks for natural language generation. The most widely used sequence-to-sequence neural methods are word-based: as such, they need a pre-processing step called delexicalization (conversely, relexicalization) to deal with uncommon or unknown words. These forms of processing, however, give rise to models that depend on the vocabulary used and are not completely neural. In this work, we present an end-to-end sequence-to-sequence model with attention mechanism which reads and generates at a character level, no longer requiring delexicalization, tokenization, nor even lowercasing. Moreover, since characters constitute the common “building blocks” of every text, it also allows a more general approach to text generation, enabling the possibility to exploit transfer learning for training. These skills are obtained thanks to two major features: (i) the possibility to alternate between the standard generation mechanism and a copy one, which allows to directly copy input facts to produce outputs, and (ii) the use of an original training pipeline that further improves the quality of the generated texts. We also introduce a new dataset called E2E+, designed to highlight the copying capabilities of character-based models, that is a modified version of the well-known E2E dataset used in the E2E Challenge. We tested our model according to five broadly accepted metrics (including the widely used BLEU), showing that it yields competitive performance with respect to both character-based and word-based approaches. |
Tasks | Data-to-Text Generation, Text Generation, Tokenization, Transfer Learning |
Published | 2019-04-26 |
URL | https://arxiv.org/abs/1904.11838v3 |
https://arxiv.org/pdf/1904.11838v3.pdf | |
PWC | https://paperswithcode.com/paper/copy-mechanism-and-tailored-training-for |
Repo | https://github.com/marco-roberti/char-data-to-text-gen |
Framework | pytorch |
Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention
Title | Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention |
Authors | Yuxiao Chen, Long Zhao, Xi Peng, Jianbo Yuan, Dimitris N. Metaxas |
Abstract | We propose a Dynamic Graph-Based Spatial-Temporal Attention (DG-STA) method for hand gesture recognition. The key idea is to first construct a fully-connected graph from a hand skeleton, where the node features and edges are then automatically learned via a self-attention mechanism that performs in both spatial and temporal domains. We further propose to leverage the spatial-temporal cues of joint positions to guarantee robust recognition in challenging conditions. In addition, a novel spatial-temporal mask is applied to significantly cut down the computational cost by 99%. We carry out extensive experiments on benchmarks (DHG-14/28 and SHREC’17) and prove the superior performance of our method compared with the state-of-the-art methods. The source code can be found at https://github.com/yuxiaochen1103/DG-STA. |
Tasks | Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition, Skeleton Based Action Recognition |
Published | 2019-07-20 |
URL | https://arxiv.org/abs/1907.08871v1 |
https://arxiv.org/pdf/1907.08871v1.pdf | |
PWC | https://paperswithcode.com/paper/construct-dynamic-graphs-for-hand-gesture |
Repo | https://github.com/yuxiaochen1103/DG-STA |
Framework | pytorch |
Unifying Human and Statistical Evaluation for Natural Language Generation
Title | Unifying Human and Statistical Evaluation for Natural Language Generation |
Authors | Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang |
Abstract | How can we measure whether a natural language generation system produces both high quality and diverse outputs? Human evaluation captures quality but not diversity, as it does not catch models that simply plagiarize from the training set. On the other hand, statistical evaluation (i.e., perplexity) captures diversity but not quality, as models that occasionally emit low quality samples would be insufficiently penalized. In this paper, we propose a unified framework which evaluates both diversity and quality, based on the optimal error rate of predicting whether a sentence is human- or machine-generated. We demonstrate that this error rate can be efficiently estimated by combining human and statistical evaluation, using an evaluation metric which we call HUSE. On summarization and chit-chat dialogue, we show that (i) HUSE detects diversity defects which fool pure human evaluation and that (ii) techniques such as annealing for improving quality actually decrease HUSE due to decreased diversity. |
Tasks | Text Generation |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02792v1 |
http://arxiv.org/pdf/1904.02792v1.pdf | |
PWC | https://paperswithcode.com/paper/unifying-human-and-statistical-evaluation-for |
Repo | https://github.com/hughbzhang/HUSE |
Framework | none |
Don’t take it lightly: Phasing optical random projections with unknown operators
Title | Don’t take it lightly: Phasing optical random projections with unknown operators |
Authors | Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić |
Abstract | In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media. A signal of interest $\mathbf{\xi} \in \mathbb{R}^N$ is mixed by a random scattering medium to compute the projection $\mathbf{y} = \mathbf{A} \mathbf{\xi}$, with $\mathbf{A} \in \mathbb{C}^{M \times N}$ being a realization of a standard complex Gaussian iid random matrix. Such optics-based matrix multiplications can be much faster and energy-efficient than their CPU or GPU counterparts, yet two difficulties must be resolved: only the intensity ${\mathbf{y}}^2$ can be recorded by the camera, and the transmission matrix $\mathbf{A}$ is unknown. We show that even without knowing $\mathbf{A}$, we can recover the unknown phase of $\mathbf{y}$ for some equivalent transmission matrix with the same distribution as $\mathbf{A}$. Our method is based on two observations: first, conjugating or changing the phase of any row of $\mathbf{A}$ does not change its distribution; and second, since we control the input we can interfere $\mathbf{\xi}$ with arbitrary reference signals. We show how to leverage these observations to cast the measurement phase retrieval problem as a Euclidean distance geometry problem. We demonstrate appealing properties of the proposed algorithm in both numerical simulations and real hardware experiments. Not only does our algorithm accurately recover the missing phase, but it mitigates the effects of quantization and the sensitivity threshold, thus improving the measured magnitudes. |
Tasks | Quantization |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.01703v3 |
https://arxiv.org/pdf/1907.01703v3.pdf | |
PWC | https://paperswithcode.com/paper/dont-take-it-lightly-phasing-optical-random |
Repo | https://github.com/swing-research/opu_phase |
Framework | none |
A Scheme for Continuous Input to the Tsetlin Machine with Applications to Forecasting Disease Outbreaks
Title | A Scheme for Continuous Input to the Tsetlin Machine with Applications to Forecasting Disease Outbreaks |
Authors | K. Darshana Abeyrathna, Ole-Christoffer Granmo, Xuan Zhang, Morten Goodwin |
Abstract | In this paper, we apply a new promising tool for pattern classification, namely, the Tsetlin Machine (TM), to the field of disease forecasting. The TM is interpretable because it is based on manipulating expressions in propositional logic, leveraging a large team of Tsetlin Automata (TA). Apart from being interpretable, this approach is attractive due to its low computational cost and its capacity to handle noise. To attack the problem of forecasting, we introduce a preprocessing method that extends the TM so that it can handle continuous input. Briefly stated, we convert continuous input into a binary representation based on thresholding. The resulting extended TM is evaluated and analyzed using an artificial dataset. The TM is further applied to forecast dengue outbreaks of all the seventeen regions in the Philippines using the spatio-temporal properties of the data. Experimental results show that dengue outbreak forecasts made by the TM are more accurate than those obtained by a Support Vector Machine (SVM), Decision Trees (DTs), and several multi-layered Artificial Neural Networks (ANNs), both in terms of forecasting precision and F1-score. |
Tasks | Disease Prediction |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04199v2 |
https://arxiv.org/pdf/1905.04199v2.pdf | |
PWC | https://paperswithcode.com/paper/a-scheme-for-continuous-input-to-the-tsetlin |
Repo | https://github.com/zdx3578/pyTsetlinMachine |
Framework | none |
Eliciting Knowledge from Experts:Automatic Transcript Parsing for Cognitive Task Analysis
Title | Eliciting Knowledge from Experts:Automatic Transcript Parsing for Cognitive Task Analysis |
Authors | Junyi Du, He Jiang, Jiaming Shen, Xiang Ren |
Abstract | Cognitive task analysis (CTA) is a type of analysis in applied psychology aimed at eliciting and representing the knowledge and thought processes of domain experts. In CTA, often heavy human labor is involved to parse the interview transcript into structured knowledge (e.g., flowchart for different actions). To reduce human efforts and scale the process, automated CTA transcript parsing is desirable. However, this task has unique challenges as (1) it requires the understanding of long-range context information in conversational text; and (2) the amount of labeled data is limited and indirect—i.e., context-aware, noisy, and low-resource. In this paper, we propose a weakly-supervised information extraction framework for automated CTA transcript parsing. We partition the parsing process into a sequence labeling task and a text span-pair relation extraction task, with distant supervision from human-curated protocol files. To model long-range context information for extracting sentence relations, neighbor sentences are involved as a part of input. Different types of models for capturing context dependency are then applied. We manually annotate real-world CTA transcripts to facilitate the evaluation of the parsing tasks |
Tasks | Relation Extraction |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11384v1 |
https://arxiv.org/pdf/1906.11384v1.pdf | |
PWC | https://paperswithcode.com/paper/eliciting-knowledge-from-expertsautomatic |
Repo | https://github.com/INK-USC/procedural-extraction |
Framework | none |
I Stand With You: Using Emojis to Study Solidarity in Crisis Events
Title | I Stand With You: Using Emojis to Study Solidarity in Crisis Events |
Authors | Sashank Santhanam, Vidhushini Srinivasan, Shaina Glass, Samira Shaikh |
Abstract | We study how emojis are used to express solidarity in social media in the context of two major crisis events - a natural disaster, Hurricane Irma in 2017 and terrorist attacks that occurred on November 2015 in Paris. Using annotated corpora, we first train a recurrent neural network model to classify expressions of solidarity in text. Next, we use these expressions of solidarity to characterize human behavior in online social networks, through the temporal and geospatial diffusion of emojis. Our analysis reveals that emojis are a powerful indicator of sociolinguistic behaviors (solidarity) that are exhibited on social media as the crisis events unfold. |
Tasks | |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08326v1 |
https://arxiv.org/pdf/1907.08326v1.pdf | |
PWC | https://paperswithcode.com/paper/i-stand-with-you-using-emojis-to-study |
Repo | https://github.com/sashank06/ICWSM_Emoji |
Framework | none |
On zero-shot recognition of generic objects
Title | On zero-shot recognition of generic objects |
Authors | Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi |
Abstract | Many recent advances in computer vision are the result of a healthy competition among researchers on high quality, task-specific, benchmarks. After a decade of active research, zero-shot learning (ZSL) models accuracy on the Imagenet benchmark remains far too low to be considered for practical object recognition applications. In this paper, we argue that the main reason behind this apparent lack of progress is the poor quality of this benchmark. We highlight major structural flaws of the current benchmark and analyze different factors impacting the accuracy of ZSL models. We show that the actual classification accuracy of existing ZSL models is significantly higher than was previously thought as we account for these flaws. We then introduce the notion of structural bias specific to ZSL datasets. We discuss how the presence of this new form of bias allows for a trivial solution to the standard benchmark and conclude on the need for a new benchmark. We then detail the semi-automated construction of a new benchmark to address these flaws. |
Tasks | Object Recognition, Zero-Shot Learning |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.04957v1 |
http://arxiv.org/pdf/1904.04957v1.pdf | |
PWC | https://paperswithcode.com/paper/on-zero-shot-recognition-of-generic-objects |
Repo | https://github.com/TristHas/GOZ |
Framework | pytorch |
A Distributed Synchronous SGD Algorithm with Global Top-$k$ Sparsification for Low Bandwidth Networks
Title | A Distributed Synchronous SGD Algorithm with Global Top-$k$ Sparsification for Low Bandwidth Networks |
Authors | Shaohuai Shi, Qiang Wang, Kaiyong Zhao, Zhenheng Tang, Yuxin Wang, Xiang Huang, Xiaowen Chu |
Abstract | Distributed synchronous stochastic gradient descent (S-SGD) has been widely used in training large-scale deep neural networks (DNNs), but it typically requires very high communication bandwidth between computational workers (e.g., GPUs) to exchange gradients iteratively. Recently, Top-$k$ sparsification techniques have been proposed to reduce the volume of data to be exchanged among workers. Top-$k$ sparsification can zero-out a significant portion of gradients without impacting the model convergence. However, the sparse gradients should be transferred with their irregular indices, which makes the sparse gradients aggregation difficult. Current methods that use AllGather to accumulate the sparse gradients have a communication complexity of $O(kP)$, where $P$ is the number of workers, which is inefficient on low bandwidth networks with a large number of workers. We observe that not all top-$k$ gradients from $P$ workers are needed for the model update, and therefore we propose a novel global Top-$k$ (gTop-$k$) sparsification mechanism to address the problem. Specifically, we choose global top-$k$ largest absolute values of gradients from $P$ workers, instead of accumulating all local top-$k$ gradients to update the model in each iteration. The gradient aggregation method based on gTop-$k$ sparsification reduces the communication complexity from $O(kP)$ to $O(k\log P)$. Through extensive experiments on different DNNs, we verify that gTop-$k$ S-SGD has nearly consistent convergence performance with S-SGD, and it has only slight degradations on generalization performance. In terms of scaling efficiency, we evaluate gTop-$k$ on a cluster with 32 GPU machines which are interconnected with 1 Gbps Ethernet. The experimental results show that our method achieves $2.7-12\times$ higher scaling efficiency than S-SGD and $1.1-1.7\times$ improvement than the existing Top-$k$ S-SGD. |
Tasks | |
Published | 2019-01-14 |
URL | http://arxiv.org/abs/1901.04359v2 |
http://arxiv.org/pdf/1901.04359v2.pdf | |
PWC | https://paperswithcode.com/paper/a-distributed-synchronous-sgd-algorithm-with |
Repo | https://github.com/hclhkbu/gtopkssgd |
Framework | pytorch |
Model Pruning Enables Efficient Federated Learning on Edge Devices
Title | Model Pruning Enables Efficient Federated Learning on Edge Devices |
Authors | Yuang Jiang, Shiqiang Wang, Bong Jun Ko, Wei-Han Lee, Leandros Tassiulas |
Abstract | Federated learning is a recent approach for distributed model training without sharing the raw data of clients. It allows model training using the large amount of user data collected by edge and mobile devices, while preserving data privacy. A challenge in federated learning is that the devices usually have much lower computational power and communication bandwidth than machines in data centers. Training large-sized deep neural networks in such a federated setting can consume a large amount of time and resources. To overcome this challenge, we propose a method that integrates model pruning with federated learning in this paper, which includes initial model pruning at the server, further model pruning as part of the federated learning process, followed by the regular federated learning procedure. Our proposed approach can save the computation, communication, and storage costs compared to standard federated learning approaches. Extensive experiments on real edge devices validate the benefit of our proposed method. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12326v3 |
https://arxiv.org/pdf/1909.12326v3.pdf | |
PWC | https://paperswithcode.com/paper/model-pruning-enables-efficient-federated |
Repo | https://github.com/jiangyuang/ModelPruningLibrary |
Framework | pytorch |
Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial Critics
Title | Teach Biped Robots to Walk via Gait Principles and Reinforcement Learning with Adversarial Critics |
Authors | Kuangen Zhang, Zhimin Hou, Clarence W. de Silva, Haoyong Yu, Chenglong Fu |
Abstract | Controlling a biped robot to walk stably is a challenging task considering its nonlinearity and hybrid dynamics. Reinforcement learning can address these issues by directly mapping the observed states to optimal actions that maximize the cumulative reward. However, the local minima caused by unsuitable rewards and the overestimation of the cumulative reward impede the maximization of the cumulative reward. To increase the cumulative reward, this paper designs a gait reward based on walking principles, which compensates the local minima for unnatural motions. Besides, an Adversarial Twin Delayed Deep Deterministic (ATD3) policy gradient algorithm with a recurrent neural network (RNN) is proposed to further boost the cumulative reward by mitigating the overestimation of the cumulative reward. Experimental results in the Roboschool Walker2d and Webots Atlas simulators indicate that the test rewards increase by 23.50% and 9.63% after adding the gait reward. The test rewards further increase by 15.96% and 12.68% after using the ATD3_RNN, and the reason may be that the ATD3_RNN decreases the error of estimating cumulative reward from 19.86% to 3.35%. Besides, the cosine kinetic similarity between the human and the biped robot trained by the gait reward and ATD3_RNN increases by over 69.23%. Consequently, the designed gait reward and ATD3_RNN boost the cumulative reward and teach biped robots to walk better. |
Tasks | |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10194v1 |
https://arxiv.org/pdf/1910.10194v1.pdf | |
PWC | https://paperswithcode.com/paper/teach-biped-robots-to-walk-via-gait |
Repo | https://github.com/KuangenZhang/ATD3 |
Framework | pytorch |
On the Coherence of Fake News Articles
Title | On the Coherence of Fake News Articles |
Authors | Iknoor Singh, Deepak P, Anoop K |
Abstract | The generation and spread of fake news within new and online media sources is emerging as a phenomenon of high societal significance. Combating them using data-driven analytics has been attracting much recent scholarly interest. In this study, we analyze the textual coherence of fake news articles vis-a-vis legitimate ones. We develop three computational formulations of textual coherence drawing upon the state-of-the-art methods in natural language processing and data science. Two real-world datasets from widely different domains which have fake/legitimate article labellings are then analyzed with respect to textual coherence. We observe apparent differences in textual coherence across fake and legitimate news articles, with fake news articles consistently scoring lower on coherence as compared to legitimate news ones. While the relative coherence shortfall of fake news articles as compared to legitimate ones form the main observation from our study, we analyze several aspects of the differences and outline potential avenues of further inquiry. |
Tasks | |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11126v1 |
https://arxiv.org/pdf/1906.11126v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-coherence-of-fake-news-articles |
Repo | https://github.com/wikipedia2vec/wikipedia2vec |
Framework | none |