October 16, 2019

2409 words 12 mins read

Paper Group NANR 11

Paper Group NANR 11

Multi-Level Policy and Reward Reinforcement Learning for Image Captioning. JAIST Annotated Corpus of Free Conversation. Connecting Language and Vision to Actions. Out-of-domain Detection based on Generative Adversarial Network. Now You Shake Me: Towards Automatic 4D Cinema. An Encoder-decoder Approach to Predicting Causal Relations in Stories. A Go …

Multi-Level Policy and Reward Reinforcement Learning for Image Captioning

Title Multi-Level Policy and Reward Reinforcement Learning for Image Captioning
Authors An-An Liu1, Ning Xu1, Hanwang Zhang2, Weizhi Nie1, Yuting Su1, Yongdong Zhang
Abstract Image captioning is one of the most challenging hallmarks of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flick- r30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.
Tasks Image Captioning
Published 2018-06-15
URL https://www.ijcai.org/proceedings/2018/0114.pdf
PDF https://www.ijcai.org/proceedings/2018/0114.pdf
PWC https://paperswithcode.com/paper/multi-level-policy-and-reward-reinforcement
Repo
Framework

JAIST Annotated Corpus of Free Conversation

Title JAIST Annotated Corpus of Free Conversation
Authors Kiyoaki Shirai, Tomotaka Fukuoka
Abstract
Tasks Dialog Act Classification
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1119/
PDF https://www.aclweb.org/anthology/L18-1119
PWC https://paperswithcode.com/paper/jaist-annotated-corpus-of-free-conversation
Repo
Framework

Connecting Language and Vision to Actions

Title Connecting Language and Vision to Actions
Authors Peter Anderson, Abhishek Das, Qi Wu
Abstract A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment. To this end, recent advances at the intersection of language and vision have made incredible progress {–} from being able to generate natural language descriptions of images/videos, to answering questions about them, to even holding free-form conversations about visual content! However, while these agents can passively describe images or answer (a sequence of) questions about them, they cannot act in the world (what if I cannot answer a question from my current view, or I am asked to move or manipulate something?). Thus, the challenge now is to extend this progress in language and vision to embodied agents that take actions and actively interact with their visual environments. To reduce the entry barrier for new researchers, this tutorial will provide an overview of the growing number of multimodal tasks and datasets that combine textual and visual understanding. We will comprehensively review existing state-of-the-art approaches to selected tasks such as image captioning, visual question answering (VQA) and visual dialog, presenting the key architectural building blocks (such as co-attention) and novel algorithms (such as cooperative/adversarial games) used to train models for these tasks. We will then discuss some of the current and upcoming challenges of combining language, vision and actions, and introduce some recently-released interactive 3D simulation environments designed for this purpose.
Tasks Image Captioning, Language Modelling, Question Answering, Visual Dialog, Visual Question Answering
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-5004/
PDF https://www.aclweb.org/anthology/P18-5004
PWC https://paperswithcode.com/paper/connecting-language-and-vision-to-actions
Repo
Framework

Out-of-domain Detection based on Generative Adversarial Network

Title Out-of-domain Detection based on Generative Adversarial Network
Authors Seonghan Ryu, Sangjun Koo, Hwanjo Yu, Gary Geunbae Lee
Abstract The main goal of this paper is to develop out-of-domain (OOD) detection for dialog systems. We propose to use only in-domain (IND) sentences to build a generative adversarial network (GAN) of which the discriminator generates low scores for OOD sentences. To improve basic GANs, we apply feature matching loss in the discriminator, use domain-category analysis as an additional task in the discriminator, and remove the biases in the generator. Thereby, we reduce the huge effort of collecting OOD sentences for training OOD detection. For evaluation, we experimented OOD detection on a multi-domain dialog system. The experimental results showed the proposed method was most accurate compared to the existing methods.
Tasks Sentence Embedding
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1077/
PDF https://www.aclweb.org/anthology/D18-1077
PWC https://paperswithcode.com/paper/out-of-domain-detection-based-on-generative
Repo
Framework

Now You Shake Me: Towards Automatic 4D Cinema

Title Now You Shake Me: Towards Automatic 4D Cinema
Authors Yuhao Zhou, Makarand Tapaswi, Sanja Fidler
Abstract We are interested in enabling automatic 4D cinema by parsing physical and special effects from untrimmed movies. These include effects such as physical interactions, water splashing, light, and shaking, and are grounded to either a character in the scene or the camera. We collect a new dataset referred to as the Movie4D dataset which annotates over 9K effects in 63 movies. We propose a Conditional Random Field model atop a neural network that brings together visual and audio information, as well as semantics in the form of person tracks. Our model further exploits correlations of effects between different characters in the clip as well as across movie threads. We propose effect detection and classification as two tasks, and present results along with ablation studies on our dataset, paving the way towards 4D cinema in everyone’s homes.
Tasks
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Zhou_Now_You_Shake_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Now_You_Shake_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/now-you-shake-me-towards-automatic-4d-cinema
Repo
Framework

An Encoder-decoder Approach to Predicting Causal Relations in Stories

Title An Encoder-decoder Approach to Predicting Causal Relations in Stories
Authors Melissa Roemmele, Andrew Gordon
Abstract We address the task of predicting causally related events in stories according to a standard evaluation framework, the Choice of Plausible Alternatives (COPA). We present a neural encoder-decoder model that learns to predict relations between adjacent sequences in stories as a means of modeling causality. We explore this approach using different methods for extracting and representing sequence pairs as well as different model architectures. We also compare the impact of different training datasets on our model. In particular, we demonstrate the usefulness of a corpus not previously applied to COPA, the ROCStories corpus. While not state-of-the-art, our results establish a new reference point for systems evaluated on COPA, and one that is particularly informative for future neural-based approaches.
Tasks
Published 2018-06-01
URL https://www.aclweb.org/anthology/W18-1506/
PDF https://www.aclweb.org/anthology/W18-1506
PWC https://paperswithcode.com/paper/an-encoder-decoder-approach-to-predicting
Repo
Framework

A Goal-oriented Neural Conversation Model by Self-Play

Title A Goal-oriented Neural Conversation Model by Self-Play
Authors Wei Wei, Quoc V. Le, Andrew M. Dai, Li-Jia Li
Abstract Building chatbots that can accomplish goals such as booking a flight ticket is an unsolved problem in natural language understanding. Much progress has been made to build conversation models using techniques such as sequence2sequence modeling. One challenge in applying such techniques to building goal-oriented conversation models is that maximum likelihood-based models are not optimized toward accomplishing goals. Recently, many methods have been proposed to address this issue by optimizing a reward that contains task status or outcome. However, adding the reward optimization on the fly usually provides little guidance for language construction and the conversation model soon becomes decoupled from the language model. In this paper, we propose a new setting in goal-oriented dialogue system to tighten the gap between these two aspects by enforcing model level information isolation on individual models between two agents. Language construction now becomes an important part in reward optimization since it is the only way information can be exchanged. We experimented our models using self-play and results showed that our method not only beat the baseline sequence2sequence model in rewards but can also generate human-readable meaningful conversations of comparable quality.
Tasks Language Modelling
Published 2018-01-01
URL https://openreview.net/forum?id=HJXyS7bRb
PDF https://openreview.net/pdf?id=HJXyS7bRb
PWC https://paperswithcode.com/paper/a-goal-oriented-neural-conversation-model-by
Repo
Framework

Mining Possessions: Existence, Type and Temporal Anchors

Title Mining Possessions: Existence, Type and Temporal Anchors
Authors Dhivya Chinnappa, Eduardo Blanco
Abstract This paper presents a corpus and experiments to mine possession relations from text. Specifically, we target alienable and control possessions, and assign temporal anchors indicating when the possession holds between possessor and possessee. We present new annotations for this task, and experimental results using both traditional classifiers and neural networks. Results show that the three subtasks (predicting possession existence, possession type and temporal anchors) can be automated.
Tasks
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-1046/
PDF https://www.aclweb.org/anthology/N18-1046
PWC https://paperswithcode.com/paper/mining-possessions-existence-type-and
Repo
Framework

Text Simplification from Professionally Produced Corpora

Title Text Simplification from Professionally Produced Corpora
Authors Carolina Scarton, Gustavo Paetzold, Lucia Specia
Abstract
Tasks Lexical Simplification, Machine Translation, Text Simplification
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1553/
PDF https://www.aclweb.org/anthology/L18-1553
PWC https://paperswithcode.com/paper/text-simplification-from-professionally
Repo
Framework

UC3M-NII Team at SemEval-2018 Task 7: Semantic Relation Classification in Scientific Papers via Convolutional Neural Network

Title UC3M-NII Team at SemEval-2018 Task 7: Semantic Relation Classification in Scientific Papers via Convolutional Neural Network
Authors V{'\i}ctor Su{'a}rez-Paniagua, Isabel Segura-Bedmar, Akiko Aizawa
Abstract This paper reports our participation for SemEval-2018 Task 7 on extraction and classification of relationships between entities in scientific papers. Our approach is based on the use of a Convolutional Neural Network (CNN) trained on350 abstract with manually annotated entities and relations. Our hypothesis is that this deep learning model can be applied to extract and classify relations between entities for scientific papers at the same time. We use the Part-of-Speech and the distances to the target entities as part of the embedding for each word and we blind all the entities by marker names. In addition, we use sampling techniques to overcome the imbalance issues of this dataset. Our architecture obtained an F1-score of 35.4{%} for the relation extraction task and 18.5{%} for the relation classification task with a basic configuration of the one step CNN.
Tasks Relation Classification, Relation Extraction, Sentence Classification, Sentiment Analysis
Published 2018-06-01
URL https://www.aclweb.org/anthology/S18-1126/
PDF https://www.aclweb.org/anthology/S18-1126
PWC https://paperswithcode.com/paper/uc3m-nii-team-at-semeval-2018-task-7-semantic
Repo
Framework

Compositional Language Modeling for Icon-Based Augmentative and Alternative Communication

Title Compositional Language Modeling for Icon-Based Augmentative and Alternative Communication
Authors Shiran Dudy, Steven Bedrick
Abstract Icon-based communication systems are widely used in the field of Augmentative and Alternative Communication. Typically, icon-based systems have lagged behind word- and character-based systems in terms of predictive typing functionality, due to the challenges inherent to training icon-based language models. We propose a method for synthesizing training data for use in icon-based language models, and explore two different modeling strategies. We propose a method to generate language models for corpus-less symbol-set.
Tasks Language Modelling, Spoken Language Understanding
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-3404/
PDF https://www.aclweb.org/anthology/W18-3404
PWC https://paperswithcode.com/paper/compositional-language-modeling-for-icon
Repo
Framework

Visual Attention Model for Name Tagging in Multimodal Social Media

Title Visual Attention Model for Name Tagging in Multimodal Social Media
Authors Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, Heng Ji
Abstract Everyday billions of multimodal posts containing both images and text are shared in social media sites such as Snapchat, Twitter or Instagram. This combination of image and text in a single message allows for more creative and expressive forms of communication, and has become increasingly common in such sites. This new paradigm brings new challenges for natural language understanding, as the textual component tends to be shorter, more informal, and often is only understood if combined with the visual context. In this paper, we explore the task of name tagging in multimodal social media posts. We start by creating two new multimodal datasets: the first based on Twitter posts and the second based on Snapchat captions (exclusively submitted to public and crowd-sourced stories). We then propose a novel model architecture based on Visual Attention that not only provides deeper visual understanding on the decisions of the model, but also significantly outperforms other state-of-the-art baseline methods for this task.
Tasks Question Answering
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-1185/
PDF https://www.aclweb.org/anthology/P18-1185
PWC https://paperswithcode.com/paper/visual-attention-model-for-name-tagging-in
Repo
Framework

Nested Named Entity Recognition Revisited

Title Nested Named Entity Recognition Revisited
Authors Arzoo Katiyar, Claire Cardie
Abstract We propose a novel recurrent neural network-based approach to simultaneously handle nested named entity recognition and nested entity mention detection. The model learns a hypergraph representation for nested entities using features extracted from a recurrent neural network. In evaluations on three standard data sets, we show that our approach significantly outperforms existing state-of-the-art methods, which are feature-based. The approach is also efficient: it operates linearly in the number of tokens and the number of possible output labels at any token. Finally, we present an extension of our model that jointly learns the head of each entity mention.
Tasks Coreference Resolution, Named Entity Recognition, Nested Named Entity Recognition, Opinion Mining, Relation Extraction
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-1079/
PDF https://www.aclweb.org/anthology/N18-1079
PWC https://paperswithcode.com/paper/nested-named-entity-recognition-revisited
Repo
Framework

Self-Learning Architecture for Natural Language Generation

Title Self-Learning Architecture for Natural Language Generation
Authors Hyungtak Choi, Siddarth K.M., Haehun Yang, Heesik Jeon, Inchul Hwang, Jihie Kim
Abstract In this paper, we propose a self-learning architecture for generating natural language templates for conversational assistants. Generating templates to cover all the combinations of slots in an intent is time consuming and labor-intensive. We examine three different models based on our proposed architecture - Rule-based model, Sequence-to-Sequence (Seq2Seq) model and Semantically Conditioned LSTM (SC-LSTM) model for the IoT domain - to reduce the human labor required for template generation. We demonstrate the feasibility of template generation for the IoT domain using our self-learning architecture. In both automatic and human evaluation, the self-learning architecture outperforms previous works trained with a fully human-labeled dataset. This is promising for commercial conversational assistant solutions.
Tasks Text Generation
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6520/
PDF https://www.aclweb.org/anthology/W18-6520
PWC https://paperswithcode.com/paper/self-learning-architecture-for-natural
Repo
Framework

Good Line Cutting: towards Accurate Pose Tracking of Line-assisted VO/VSLAM

Title Good Line Cutting: towards Accurate Pose Tracking of Line-assisted VO/VSLAM
Authors Yipu Zhao, Patricio A. Vela
Abstract This paper tackles a problem in line-assisted VO/VSLAM: accurately solving the least squares pose optimization with unreliable 3D line input. The solution we present is good line cutting, which extracts the most-informative sub-segment from each 3D line for use within the pose optimization formulation. By studying the impact of line cutting towards the information gain of pose estimation in line-based least squares problem, we demonstrate the applicability of improving pose estimation accuracy with good line cutting. To that end, we describe an efficient algorithm that approximately approaches the joint optimization problem of good line cutting. The proposed algorithm is integrated into a state-of-the-art line-assisted VSLAM system. When evaluated in two target scenarios of line-assisted VO/VSLAM, low-texture and motion blur, the accuracy of pose tracking is improved, while the robustness is preserved.
Tasks Pose Estimation, Pose Tracking
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Yipu_Zhao_Good_Line_Cutting_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Yipu_Zhao_Good_Line_Cutting_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/good-line-cutting-towards-accurate-pose
Repo
Framework
comments powered by Disqus