October 15, 2019

2527 words 12 mins read

Paper Group NANR 77

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets. Unsupervised Bilingual Lexicon Induction via Latent Variable Models. Complex Word Identification: Convolutional Neural Network vs. Feature Engineering. CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities. Noise-Base …

What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets


Title	What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
Authors	De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Lorenzo Torresani, Manohar Paluri, Li Fei-Fei, Juan Carlos Niebles
Abstract	The ability to capture temporal information has been critical to the development of video understanding models. While there have been numerous attempts at modeling motion in videos, an explicit analysis of the effect of temporal information for video understanding is still missing. In this work, we aim to bridge this gap and ask the following question: How important is the motion in the video for recognizing the action? To this end, we propose two novel frameworks: (i) class-agnostic temporal generator and (ii) motion-invariant frame selector to reduce/remove motion for an ablation analysis without introducing other artifacts. This isolates the analysis of motion from other aspects of the video. The proposed frameworks provide a much tighter estimate of the effect of motion (from 25% to 6% on UCF101 and 15% to 5% on Kinetics) compared to baselines in our analysis. Our analysis provides critical insights about existing models like C3D, and how it could be made to achieve comparable results with a sparser set of frames.
Tasks	Video Understanding
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Huang_What_Makes_a_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_What_Makes_a_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/what-makes-a-video-a-video-analyzing-temporal
Repo
Framework

Unsupervised Bilingual Lexicon Induction via Latent Variable Models


Title	Unsupervised Bilingual Lexicon Induction via Latent Variable Models
Authors	Zi-Yi Dou, Zhi-Hao Zhou, Shujian Huang
Abstract	Bilingual lexicon extraction has been studied for decades and most previous methods have relied on parallel corpora or bilingual dictionaries. Recent studies have shown that it is possible to build a bilingual dictionary by aligning monolingual word embedding spaces in an unsupervised way. With the recent advances in generative models, we propose a novel approach which builds cross-lingual dictionaries via latent variable models and adversarial training with no parallel corpora. To demonstrate the effectiveness of our approach, we evaluate our approach on several language pairs and the experimental results show that our model could achieve competitive and even superior performance compared with several state-of-the-art models.
Tasks	Latent Variable Models, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1062/
PDF	https://www.aclweb.org/anthology/D18-1062
PWC	https://paperswithcode.com/paper/unsupervised-bilingual-lexicon-induction-via
Repo
Framework

Complex Word Identification: Convolutional Neural Network vs. Feature Engineering


Title	Complex Word Identification: Convolutional Neural Network vs. Feature Engineering
Authors	Segun Taofeek Aroyehun, Jason Angel, Daniel Alej P{'e}rez Alvarez, ro, Alex Gelbukh, er
Abstract	We describe the systems of NLP-CIC team that participated in the Complex Word Identification (CWI) 2018 shared task. The shared task aimed to benchmark approaches for identifying complex words in English and other languages from the perspective of non-native speakers. Our goal is to compare two approaches: feature engineering and a deep neural network. Both approaches achieved comparable performance on the English test set. We demonstrated the flexibility of the deep-learning approach by using the same deep neural network setup in the Spanish track. Our systems achieved competitive results: all our systems were within 0.01 of the system with the best macro-F1 score on the test sets except on Wikipedia test set, on which our best system is 0.04 below the best macro-F1 score.
Tasks	Complex Word Identification, Feature Engineering, Text Simplification
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-0538/
PDF	https://www.aclweb.org/anthology/W18-0538
PWC	https://paperswithcode.com/paper/complex-word-identification-convolutional
Repo
Framework

CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities


Title	CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities
Authors	Yiyun Liang, Zhucheng Tu, Laetitia Huang, Jimmy Lin
Abstract	We demonstrate a JavaScript implementation of a convolutional neural network that performs feedforward inference completely in the browser. Such a deployment means that models can run completely on the client, on a wide range of devices, without making backend server requests. This design is useful for applications with stringent latency requirements or low connectivity. Our evaluations show the feasibility of JavaScript as a deployment target. Furthermore, an in-browser implementation enables seamless integration with the JavaScript ecosystem for information visualization, providing opportunities to visually inspect neural networks and better understand their inner workings.
Tasks	Interpretable Machine Learning, Sentence Classification, Sentiment Analysis
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-5013/
PDF	https://www.aclweb.org/anthology/N18-5013
PWC	https://paperswithcode.com/paper/cnns-for-nlp-in-the-browser-client-side
Repo
Framework

Noise-Based Regularizers for Recurrent Neural Networks


Title	Noise-Based Regularizers for Recurrent Neural Networks
Authors	Adji B. Dieng, Jaan Altosaar, Rajesh Ranganath, David M. Blei
Abstract	Recurrent neural networks (RNNs) are powerful models for sequential data. They can approximate arbitrary computations, and have been used successfully in domains such as text and speech. However, the flexibility of RNNs makes them susceptible to overfitting and regularization is important. We develop a noise-based regularization method for RNNs. The idea is simple and easy to implement: we inject noise in the hidden units of the RNN and then maximize the original RNN’s likelihood averaged over the injected noise. On a language modeling benchmark, our method achieves better performance than the deterministic RNN and the variational dropout.
Tasks	Language Modelling
Published	2018-01-01
URL	https://openreview.net/forum?id=ryk77mbRZ
PDF	https://openreview.net/pdf?id=ryk77mbRZ
PWC	https://paperswithcode.com/paper/noise-based-regularizers-for-recurrent-neural
Repo
Framework

PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages


Title	PubSE: A Hierarchical Model for Publication Extraction from Academic Homepages
Authors	Yiqing Zhang, Jianzhong Qi, Rui Zhang, Chu Yin, ong
Abstract	Publication information in a researcher{'}s academic homepage provides insights about the researcher{'}s expertise, research interests, and collaboration networks. We aim to extract all the publication strings from a given academic homepage. This is a challenging task because the publication strings in different academic homepages may be located at different positions with different structures. To capture the positional and structural diversity, we propose an end-to-end hierarchical model named PubSE based on Bi-LSTM-CRF. We further propose an alternating training method for training the model. Experiments on real data show that PubSE outperforms the state-of-the-art models by up to 11.8{%} in F1-score.
Tasks
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1123/
PDF	https://www.aclweb.org/anthology/D18-1123
PWC	https://paperswithcode.com/paper/pubse-a-hierarchical-model-for-publication
Repo
Framework

Mumpitz at PARSEME Shared Task 2018: A Bidirectional LSTM for the Identification of Verbal Multiword Expressions


Title	Mumpitz at PARSEME Shared Task 2018: A Bidirectional LSTM for the Identification of Verbal Multiword Expressions
Authors	Rafael Ehren, Timm Lichte, Younes Samih
Abstract	In this paper, we describe Mumpitz, the system we submitted to the PARSEME Shared task on automatic identification of verbal multiword expressions (VMWEs). Mumpitz consists of a Bidirectional Recurrent Neural Network (BRNN) with Long Short-Term Memory (LSTM) units and a heuristic that leverages the dependency information provided in the PARSEME corpus data to differentiate VMWEs in a sentence. We submitted results for seven languages in the closed track of the task and for one language in the open track. For the open track we used the same system, but with pretrained instead of randomly initialized word embeddings to improve the system performance.
Tasks	Machine Translation, Word Embeddings
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-4929/
PDF	https://www.aclweb.org/anthology/W18-4929
PWC	https://paperswithcode.com/paper/mumpitz-at-parseme-shared-task-2018-a
Repo
Framework

Recognizing Human Actions as the Evolution of Pose Estimation Maps


Title	Recognizing Human Actions as the Evolution of Pose Estimation Maps
Authors	Mengyuan Liu, Junsong Yuan
Abstract	Most video-based action recognition approaches choose to extract features from the whole video to recognize actions. The cluttered background and non-action motions limit the performances of these methods, since they lack the explicit modeling of human body movements. With recent advances of human pose estimation, this work presents a novel method to recognize human action as the evolution of pose estimation maps. Instead of relying on the inaccurate human poses estimated from videos, we observe that pose estimation maps, the byproduct of pose estimation, preserve richer cues of human body to benefit action recognition. Specifically, the evolution of pose estimation maps can be decomposed as an evolution of heatmaps, e.g., probabilistic maps, and an evolution of estimated 2D human poses, which denote the changes of body shape and body pose, respectively. Considering the sparse property of heatmap, we develop spatial rank pooling to aggregate the evolution of heatmaps as a body shape evolution image. As body shape evolution image does not differentiate body parts, we design body guided sampling to aggregate the evolution of poses as a body pose evolution image. The complementary properties between both types of images are explored by deep convolutional neural networks to predict action label. Experiments on NTU RGB+D, UTD-MHAD and PennAction datasets verify the effectiveness of our method, which outperforms most state-of-the-art methods.
Tasks	Action Recognition In Videos, Multimodal Activity Recognition, Pose Estimation, Skeleton Based Action Recognition, Temporal Action Localization
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Liu_Recognizing_Human_Actions_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Recognizing_Human_Actions_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/recognizing-human-actions-as-the-evolution-of
Repo
Framework

Learning clip representations for skeleton-based 3d action recognition


Title	Learning clip representations for skeleton-based 3d action recognition
Authors	Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Sohel, Farid Boussaid
Abstract	This paper presents a new representation of skeleton sequences for 3D action recognition. Existing methods based on hand-crafted features or recurrent neural networks cannot adequately capture the complex spatial structures and the long-term temporal dynamics of the skeleton sequences, which are very important to recognize the actions. In this paper, we propose to transform each channel of the 3D coordinates of a skeleton sequence into a clip. Each frame of the generated clip represents the temporal information of the entire skeleton sequence and one particular spatial relationship between the skeleton joints. The entire clip incorporates multiple frames with different spatial relationships, which provide useful spatial structural information of the human skeleton. We also propose a multitask convolutional neural network (MTCNN) to learn the generated clips for action recognition. The proposed MTCNN processes all the frames of the generated clips in parallel to explore the spatial and temporal information of the skeleton sequences. The proposed method has been extensively tested on six challenging benchmark datasets. Experimental results consistently demonstrate the superiority of the proposed clip representation and the feature learning method for 3D action recognition compared to the existing techniques.
Tasks	3D Human Action Recognition, Skeleton Based Action Recognition
Published	2018-03-05
URL	https://doi.org/10.1109/TIP.2018.2812099
PDF	https://www.semanticscholar.org/paper/Learning-Clip-Representations-for-Skeleton-Based-3D-Ke-Bennamoun/ef761435c1af2b3e5caba5e8bbbf5aeab69d934e
PWC	https://paperswithcode.com/paper/learning-clip-representations-for-skeleton
Repo
Framework

A Unified Syntax-aware Framework for Semantic Role Labeling


Title	A Unified Syntax-aware Framework for Semantic Role Labeling
Authors	Zuchao Li, Shexia He, Jiaxun Cai, Zhuosheng Zhang, Hai Zhao, Gongshen Liu, Linlin Li, Luo Si
Abstract	Semantic role labeling (SRL) aims to recognize the predicate-argument structure of a sentence. Syntactic information has been paid a great attention over the role of enhancing SRL. However, the latest advance shows that syntax would not be so important for SRL with the emerging much smaller gap between syntax-aware and syntax-agnostic SRL. To comprehensively explore the role of syntax for SRL task, we extend existing models and propose a unified framework to investigate more effective and more diverse ways of incorporating syntax into sequential neural networks. Exploring the effect of syntactic input quality on SRL performance, we confirm that high-quality syntactic parse could still effectively enhance syntactically-driven SRL. Using empirically optimized integration strategy, we even enlarge the gap between syntax-aware and syntax-agnostic SRL. Our framework achieves state-of-the-art results on CoNLL-2009 benchmarks both for English and Chinese, substantially outperforming all previous models.
Tasks	Machine Translation, Question Answering, Semantic Role Labeling
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1262/
PDF	https://www.aclweb.org/anthology/D18-1262
PWC	https://paperswithcode.com/paper/a-unified-syntax-aware-framework-for-semantic
Repo
Framework

Unbabel: How to combine AI with the crowd to scale professional-quality translation


Title	Unbabel: How to combine AI with the crowd to scale professional-quality translation
Authors	Jo{~a}o Gra{\c{c}}a
Abstract
Tasks	Automatic Post-Editing
Published	2018-03-01
URL	https://www.aclweb.org/anthology/W18-2103/
PDF	https://www.aclweb.org/anthology/W18-2103
PWC	https://paperswithcode.com/paper/unbabel-how-to-combine-ai-with-the-crowd-to
Repo
Framework

Investigating the Challenges of Temporal Relation Extraction from Clinical Text


Title	Investigating the Challenges of Temporal Relation Extraction from Clinical Text
Authors	Diana Galvan, Naoaki Okazaki, Koji Matsuda, Kentaro Inui
Abstract	Temporal reasoning remains as an unsolved task for Natural Language Processing (NLP), particularly demonstrated in the clinical domain. The complexity of temporal representation in language is evident as results of the 2016 Clinical TempEval challenge indicate: the current state-of-the-art systems perform well in solving mention-identification tasks of event and time expressions but poorly in temporal relation extraction, showing a gap of around 0.25 point below human performance. We explore to adapt the tree-based LSTM-RNN model proposed by Miwa and Bansal (2016) to temporal relation extraction from clinical text, obtaining a five point improvement over the best 2016 Clinical TempEval system and two points over the state-of-the-art. We deliver a deep analysis of the results and discuss the next step towards human-like temporal reasoning.
Tasks	Named Entity Recognition, Question Answering, Relation Extraction, Temporal Information Extraction, Text Summarization
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5607/
PDF	https://www.aclweb.org/anthology/W18-5607
PWC	https://paperswithcode.com/paper/investigating-the-challenges-of-temporal
Repo
Framework

Gaussian Process Neurons


Title	Gaussian Process Neurons
Authors	Sebastian Urban, Patrick van der Smagt
Abstract	We propose a method to learn stochastic activation functions for use in probabilistic neural networks. First, we develop a framework to embed stochastic activation functions based on Gaussian processes in probabilistic neural networks. Second, we analytically derive expressions for the propagation of means and covariances in such a network, thus allowing for an efficient implementation and training without the need for sampling. Third, we show how to apply variational Bayesian inference to regularize and efficiently train this model. The resulting model can deal with uncertain inputs and implicitly provides an estimate of the confidence of its predictions. Like a conventional neural network it can scale to datasets of arbitrary size and be extended with convolutional and recurrent connections, if desired.
Tasks	Bayesian Inference, Gaussian Processes
Published	2018-01-01
URL	https://openreview.net/forum?id=By-IifZRW
PDF	https://openreview.net/pdf?id=By-IifZRW
PWC	https://paperswithcode.com/paper/gaussian-process-neurons
Repo
Framework

Exploring Lexical-Semantic Knowledge in the Generation of Novel Riddles in Portuguese


Title	Exploring Lexical-Semantic Knowledge in the Generation of Novel Riddles in Portuguese
Authors	Hugo Gon{\c{c}}alo Oliveira, Ricardo Rodrigues
Abstract
Tasks	Text Generation
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-6604/
PDF	https://www.aclweb.org/anthology/W18-6604
PWC	https://paperswithcode.com/paper/exploring-lexical-semantic-knowledge-in-the
Repo
Framework

Early action prediction by soft regression


Title	Early action prediction by soft regression
Authors	Jian-Fang Hu, Wei-Shi Zheng, Lianyang Ma, Gang Wang, Jian-Huang Lai, Jianguo Zhang
Abstract	We propose a novel approach for predicting on-going action with the assistance of a low-cost depth camera. Our approach introduces a soft regression-based early prediction framework. In this framework, we estimate soft labels for the subsequences at different progress levels, jointly learned with an action predictor. Our formulation of soft regression framework 1) overcomes a usual assumption in existing early action prediction systems that the progress level of on-going sequence is given in the testing stage; and 2) presents a theoretical framework to better resolve the ambiguity and uncertainty of subsequences at early performing stage. The proposed soft regression framework is further enhanced in order to take the relationships among subsequences and the discrepancy of soft labels over different classes into consideration, so that a Multiple Soft labels Recurrent Neural Network (MSRNN) is finally developed. For real-time performance, we also introduce “local accumulative frame feature (LAFF)", which can be computed efficiently by constructing an integral feature map. Our experiments on three RGB-D benchmark datasets and an unconstrained RGB action set demonstrate that the proposed regression-based early action prediction model outperforms existing models and the early action prediction on RGB-D sequence is more accurate than that on RGB channel.
Tasks	Skeleton Based Action Recognition
Published	2018-08-06
URL	https://doi.org/10.1109/TPAMI.2018.2863279
PDF	https://discovery.dundee.ac.uk/ws/portalfiles/portal/28028712/Early_Action_Prediction_by_Soft_Regression.pdf
PWC	https://paperswithcode.com/paper/early-action-prediction-by-soft-regression
Repo
Framework