October 15, 2019

2527 words 12 mins read

Paper Group NANR 77

Paper Group NANR 77

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets. Unsupervised Bilingual Lexicon Induction via Latent Variable Models. Complex Word Identification: Convolutional Neural Network vs. Feature Engineering. CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities. Noise-Base …

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets

Title What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
Authors De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Lorenzo Torresani, Manohar Paluri, Li Fei-Fei, Juan Carlos Niebles
Abstract The ability to capture temporal information has been critical to the development of video understanding models. While there have been numerous attempts at modeling motion in videos, an explicit analysis of the effect of temporal information for video understanding is still missing. In this work, we aim to bridge this gap and ask the following question: How important is the motion in the video for recognizing the action? To this end, we propose two novel frameworks: (i) class-agnostic temporal generator and (ii) motion-invariant frame selector to reduce/remove motion for an ablation analysis without introducing other artifacts. This isolates the analysis of motion from other aspects of the video. The proposed frameworks provide a much tighter estimate of the effect of motion (from 25% to 6% on UCF101 and 15% to 5% on Kinetics) compared to baselines in our analysis. Our analysis provides critical insights about existing models like C3D, and how it could be made to achieve comparable results with a sparser set of frames.
Tasks Video Understanding
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Huang_What_Makes_a_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_What_Makes_a_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/what-makes-a-video-a-video-analyzing-temporal
Repo
Framework

Unsupervised Bilingual Lexicon Induction via Latent Variable Models

Title Unsupervised Bilingual Lexicon Induction via Latent Variable Models
Authors Zi-Yi Dou, Zhi-Hao Zhou, Shujian Huang
Abstract Bilingual lexicon extraction has been studied for decades and most previous methods have relied on parallel corpora or bilingual dictionaries. Recent studies have shown that it is possible to build a bilingual dictionary by aligning monolingual word embedding spaces in an unsupervised way. With the recent advances in generative models, we propose a novel approach which builds cross-lingual dictionaries via latent variable models and adversarial training with no parallel corpora. To demonstrate the effectiveness of our approach, we evaluate our approach on several language pairs and the experimental results show that our model could achieve competitive and even superior performance compared with several state-of-the-art models.
Tasks Latent Variable Models, Word Embeddings
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1062/
PDF https://www.aclweb.org/anthology/D18-1062
PWC https://paperswithcode.com/paper/unsupervised-bilingual-lexicon-induction-via
Repo
Framework

Complex Word Identification: Convolutional Neural Network vs. Feature Engineering

Title Complex Word Identification: Convolutional Neural Network vs. Feature Engineering
Authors Segun Taofeek Aroyehun, Jason Angel, Daniel Alej P{'e}rez Alvarez, ro, Alex Gelbukh, er
Abstract We describe the systems of NLP-CIC team that participated in the Complex Word Identification (CWI) 2018 shared task. The shared task aimed to benchmark approaches for identifying complex words in English and other languages from the perspective of non-native speakers. Our goal is to compare two approaches: feature engineering and a deep neural network. Both approaches achieved comparable performance on the English test set. We demonstrated the flexibility of the deep-learning approach by using the same deep neural network setup in the Spanish track. Our systems achieved competitive results: all our systems were within 0.01 of the system with the best macro-F1 score on the test sets except on Wikipedia test set, on which our best system is 0.04 below the best macro-F1 score.
Tasks Complex Word Identification, Feature Engineering, Text Simplification
Published 2018-06-01
URL https://www.aclweb.org/anthology/W18-0538/
PDF https://www.aclweb.org/anthology/W18-0538
PWC https://paperswithcode.com/paper/complex-word-identification-convolutional
Repo
Framework

CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities

Title CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities
Authors Yiyun Liang, Zhucheng Tu, Laetitia Huang, Jimmy Lin
Abstract We demonstrate a JavaScript implementation of a convolutional neural network that performs feedforward inference completely in the browser. Such a deployment means that models can run completely on the client, on a wide range of devices, without making backend server requests. This design is useful for applications with stringent latency requirements or low connectivity. Our evaluations show the feasibility of JavaScript as a deployment target. Furthermore, an in-browser implementation enables seamless integration with the JavaScript ecosystem for information visualization, providing opportunities to visually inspect neural networks and better understand their inner workings.
Tasks Interpretable Machine Learning, Sentence Classification, Sentiment Analysis
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-5013/
PDF https://www.aclweb.org/anthology/N18-5013
PWC https://paperswithcode.com/paper/cnns-for-nlp-in-the-browser-client-side
Repo
Framework

Noise-Based Regularizers for Recurrent Neural Networks

Title Noise-Based Regularizers for Recurrent Neural Networks
Authors Adji B. Dieng, Jaan Altosaar, Rajesh Ranganath, David M. Blei
Abstract Recurrent neural networks (RNNs) are powerful models for sequential data. They can approximate arbitrary computations, and have been used successfully in domains such as text and speech. However, the flexibility of RNNs makes them susceptible to overfitting and regularization is important. We develop a noise-based regularization method for RNNs. The idea is simple and easy to implement: we inject noise in the hidden units of the RNN and then maximize the original RNN’s likelihood averaged over the injected noise. On a language modeling benchmark, our method achieves better performance than the deterministic RNN and the variational dropout.
Tasks Language Modelling
Published 2018-01-01
URL https://openreview.net/forum?id=ryk77mbRZ
PDF https://openreview.net/pdf?id=ryk77mbRZ
PWC https://paperswithcode.com/paper/noise-based-regularizers-for-recurrent-neural
Repo
Framework

PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages

Title PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages
Authors Yiqing Zhang, Jianzhong Qi, Rui Zhang, Chu Yin, ong
Abstract Publication information in a researcher{'}s academic homepage provides insights about the researcher{'}s expertise, research interests, and collaboration networks. We aim to extract all the publication strings from a given academic homepage. This is a challenging task because the publication strings in different academic homepages may be located at different positions with different structures. To capture the positional and structural diversity, we propose an end-to-end hierarchical model named PubSE based on Bi-LSTM-CRF. We further propose an alternating training method for training the model. Experiments on real data show that PubSE outperforms the state-of-the-art models by up to 11.8{%} in F1-score.
Tasks
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1123/
PDF https://www.aclweb.org/anthology/D18-1123
PWC https://paperswithcode.com/paper/pubse-a-hierarchical-model-for-publication
Repo
Framework

Mumpitz at PARSEME Shared Task 2018: A Bidirectional LSTM for the Identification of Verbal Multiword Expressions

Title Mumpitz at PARSEME Shared Task 2018: A Bidirectional LSTM for the Identification of Verbal Multiword Expressions
Authors Rafael Ehren, Timm Lichte, Younes Samih
Abstract In this paper, we describe Mumpitz, the system we submitted to the PARSEME Shared task on automatic identification of verbal multiword expressions (VMWEs). Mumpitz consists of a Bidirectional Recurrent Neural Network (BRNN) with Long Short-Term Memory (LSTM) units and a heuristic that leverages the dependency information provided in the PARSEME corpus data to differentiate VMWEs in a sentence. We submitted results for seven languages in the closed track of the task and for one language in the open track. For the open track we used the same system, but with pretrained instead of randomly initialized word embeddings to improve the system performance.
Tasks Machine Translation, Word Embeddings
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4929/
PDF https://www.aclweb.org/anthology/W18-4929
PWC https://paperswithcode.com/paper/mumpitz-at-parseme-shared-task-2018-a
Repo
Framework

Recognizing Human Actions as the Evolution of Pose Estimation Maps

Title Recognizing Human Actions as the Evolution of Pose Estimation Maps
Authors Mengyuan Liu, Junsong Yuan
Abstract Most video-based action recognition approaches choose to extract features from the whole video to recognize actions. The cluttered background and non-action motions limit the performances of these methods, since they lack the explicit modeling of human body movements. With recent advances of human pose estimation, this work presents a novel method to recognize human action as the evolution of pose estimation maps. Instead of relying on the inaccurate human poses estimated from videos, we observe that pose estimation maps, the byproduct of pose estimation, preserve richer cues of human body to benefit action recognition. Specifically, the evolution of pose estimation maps can be decomposed as an evolution of heatmaps, e.g., probabilistic maps, and an evolution of estimated 2D human poses, which denote the changes of body shape and body pose, respectively. Considering the sparse property of heatmap, we develop spatial rank pooling to aggregate the evolution of heatmaps as a body shape evolution image. As body shape evolution image does not differentiate body parts, we design body guided sampling to aggregate the evolution of poses as a body pose evolution image. The complementary properties between both types of images are explored by deep convolutional neural networks to predict action label. Experiments on NTU RGB+D, UTD-MHAD and PennAction datasets verify the effectiveness of our method, which outperforms most state-of-the-art methods.
Tasks Action Recognition In Videos, Multimodal Activity Recognition, Pose Estimation, Skeleton Based Action Recognition, Temporal Action Localization
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Liu_Recognizing_Human_Actions_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Recognizing_Human_Actions_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/recognizing-human-actions-as-the-evolution-of
Repo
Framework

Learning clip representations for skeleton-based 3d action recognition

Title Learning clip representations for skeleton-based 3d action recognition
Authors Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid
Abstract This paper presents a new representation of skeleton sequences for 3D action recognition. Existing methods based on hand-crafted features or recurrent neural networks cannot adequately capture the complex spatial structures and the long-term temporal dynamics of the skeleton sequences, which are very important to recognize the actions. In this paper, we propose to transform each channel of the 3D coordinates of a skeleton sequence into a clip. Each frame of the generated clip represents the temporal information of the entire skeleton sequence and one particular spatial relationship between the skeleton joints. The entire clip incorporates multiple frames with different spatial relationships, which provide useful spatial structural information of the human skeleton. We also propose a multitask convolutional neural network (MTCNN) to learn the generated clips for action recognition. The proposed MTCNN processes all the frames of the generated clips in parallel to explore the spatial and temporal information of the skeleton sequences. The proposed method has been extensively tested on six challenging benchmark datasets. Experimental results consistently demonstrate the superiority of the proposed clip representation and the feature learning method for 3D action recognition compared to the existing techniques.
Tasks 3D Human Action Recognition, Skeleton Based Action Recognition
Published 2018-03-05
URL https://doi.org/10.1109/TIP.2018.2812099
PDF https://www.semanticscholar.org/paper/Learning-Clip-Representations-for-Skeleton-Based-3D-Ke-Bennamoun/ef761435c1af2b3e5caba5e8bbbf5aeab69d934e
PWC https://paperswithcode.com/paper/learning-clip-representations-for-skeleton
Repo
Framework

A Unified Syntax-aware Framework for Semantic Role Labeling

Title A Unified Syntax-aware Framework for Semantic Role Labeling
Authors Zuchao Li, Shexia He, Jiaxun Cai, Zhuosheng Zhang, Hai Zhao, Gongshen Liu, Linlin Li, Luo Si
Abstract Semantic role labeling (SRL) aims to recognize the predicate-argument structure of a sentence. Syntactic information has been paid a great attention over the role of enhancing SRL. However, the latest advance shows that syntax would not be so important for SRL with the emerging much smaller gap between syntax-aware and syntax-agnostic SRL. To comprehensively explore the role of syntax for SRL task, we extend existing models and propose a unified framework to investigate more effective and more diverse ways of incorporating syntax into sequential neural networks. Exploring the effect of syntactic input quality on SRL performance, we confirm that high-quality syntactic parse could still effectively enhance syntactically-driven SRL. Using empirically optimized integration strategy, we even enlarge the gap between syntax-aware and syntax-agnostic SRL. Our framework achieves state-of-the-art results on CoNLL-2009 benchmarks both for English and Chinese, substantially outperforming all previous models.
Tasks Machine Translation, Question Answering, Semantic Role Labeling
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1262/
PDF https://www.aclweb.org/anthology/D18-1262
PWC https://paperswithcode.com/paper/a-unified-syntax-aware-framework-for-semantic
Repo
Framework

Unbabel: How to combine AI with the crowd to scale professional-quality translation

Title Unbabel: How to combine AI with the crowd to scale professional-quality translation
Authors Jo{~a}o Gra{\c{c}}a
Abstract
Tasks Automatic Post-Editing
Published 2018-03-01
URL https://www.aclweb.org/anthology/W18-2103/
PDF https://www.aclweb.org/anthology/W18-2103
PWC https://paperswithcode.com/paper/unbabel-how-to-combine-ai-with-the-crowd-to
Repo
Framework

Investigating the Challenges of Temporal Relation Extraction from Clinical Text

Title Investigating the Challenges of Temporal Relation Extraction from Clinical Text
Authors Diana Galvan, Naoaki Okazaki, Koji Matsuda, Kentaro Inui
Abstract Temporal reasoning remains as an unsolved task for Natural Language Processing (NLP), particularly demonstrated in the clinical domain. The complexity of temporal representation in language is evident as results of the 2016 Clinical TempEval challenge indicate: the current state-of-the-art systems perform well in solving mention-identification tasks of event and time expressions but poorly in temporal relation extraction, showing a gap of around 0.25 point below human performance. We explore to adapt the tree-based LSTM-RNN model proposed by Miwa and Bansal (2016) to temporal relation extraction from clinical text, obtaining a five point improvement over the best 2016 Clinical TempEval system and two points over the state-of-the-art. We deliver a deep analysis of the results and discuss the next step towards human-like temporal reasoning.
Tasks Named Entity Recognition, Question Answering, Relation Extraction, Temporal Information Extraction, Text Summarization
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-5607/
PDF https://www.aclweb.org/anthology/W18-5607
PWC https://paperswithcode.com/paper/investigating-the-challenges-of-temporal
Repo
Framework

Gaussian Process Neurons

Title Gaussian Process Neurons
Authors Sebastian Urban, Patrick van der Smagt
Abstract We propose a method to learn stochastic activation functions for use in probabilistic neural networks. First, we develop a framework to embed stochastic activation functions based on Gaussian processes in probabilistic neural networks. Second, we analytically derive expressions for the propagation of means and covariances in such a network, thus allowing for an efficient implementation and training without the need for sampling. Third, we show how to apply variational Bayesian inference to regularize and efficiently train this model. The resulting model can deal with uncertain inputs and implicitly provides an estimate of the confidence of its predictions. Like a conventional neural network it can scale to datasets of arbitrary size and be extended with convolutional and recurrent connections, if desired.
Tasks Bayesian Inference, Gaussian Processes
Published 2018-01-01
URL https://openreview.net/forum?id=By-IifZRW
PDF https://openreview.net/pdf?id=By-IifZRW
PWC https://paperswithcode.com/paper/gaussian-process-neurons
Repo
Framework

Exploring Lexical-Semantic Knowledge in the Generation of Novel Riddles in Portuguese

Title Exploring Lexical-Semantic Knowledge in the Generation of Novel Riddles in Portuguese
Authors Hugo Gon{\c{c}}alo Oliveira, Ricardo Rodrigues
Abstract
Tasks Text Generation
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6604/
PDF https://www.aclweb.org/anthology/W18-6604
PWC https://paperswithcode.com/paper/exploring-lexical-semantic-knowledge-in-the
Repo
Framework

Early action prediction by soft regression

Title Early action prediction by soft regression
Authors Jian-Fang Hu, Wei-Shi Zheng, Lianyang Ma, Gang Wang, Jian-Huang Lai, Jianguo Zhang
Abstract We propose a novel approach for predicting on-going action with the assistance of a low-cost depth camera. Our approach introduces a soft regression-based early prediction framework. In this framework, we estimate soft labels for the subsequences at different progress levels, jointly learned with an action predictor. Our formulation of soft regression framework 1) overcomes a usual assumption in existing early action prediction systems that the progress level of on-going sequence is given in the testing stage; and 2) presents a theoretical framework to better resolve the ambiguity and uncertainty of subsequences at early performing stage. The proposed soft regression framework is further enhanced in order to take the relationships among subsequences and the discrepancy of soft labels over different classes into consideration, so that a Multiple Soft labels Recurrent Neural Network (MSRNN) is finally developed. For real-time performance, we also introduce “local accumulative frame feature (LAFF)", which can be computed efficiently by constructing an integral feature map. Our experiments on three RGB-D benchmark datasets and an unconstrained RGB action set demonstrate that the proposed regression-based early action prediction model outperforms existing models and the early action prediction on RGB-D sequence is more accurate than that on RGB channel.
Tasks Skeleton Based Action Recognition
Published 2018-08-06
URL https://doi.org/10.1109/TPAMI.2018.2863279
PDF https://discovery.dundee.ac.uk/ws/portalfiles/portal/28028712/Early_Action_Prediction_by_Soft_Regression.pdf
PWC https://paperswithcode.com/paper/early-action-prediction-by-soft-regression
Repo
Framework
comments powered by Disqus