October 16, 2019

3344 words 16 mins read

Paper Group ANR 1157

An Attempt towards Interpretable Audio-Visual Video Captioning. Geometric Image Synthesis. Addressing Class Imbalance in Classification Problems of Noisy Signals by using Fourier Transform Surrogates. Entropy based Independent Learning in Anonymous Multi-Agent Settings. Learning to Play Pong using Policy Gradient Learning. The use of Virtual Realit …

An Attempt towards Interpretable Audio-Visual Video Captioning


Title	An Attempt towards Interpretable Audio-Visual Video Captioning
Authors	Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore, Chenliang Xu
Abstract	Automatically generating a natural language sentence to describe the content of an input video is a very challenging problem. It is an essential multimodal task in which auditory and visual contents are equally important. Although audio information has been exploited to improve video captioning in previous works, it is usually regarded as an additional feature fed into a black box fusion machine. How are the words in the generated sentences associated with the auditory and visual modalities? The problem is still not investigated. In this paper, we make the first attempt to design an interpretable audio-visual video captioning network to discover the association between words in sentences and audio-visual sequences. To achieve this, we propose a multimodal convolutional neural network-based audio-visual video captioning framework and introduce a modality-aware module for exploring modality selection during sentence generation. Besides, we collect new audio captioning and visual captioning datasets for further exploring the interactions between auditory and visual modalities for high-level video understanding. Extensive experiments demonstrate that the modality-aware module makes our model interpretable on modality selection during sentence generation. Even with the added interpretability, our video captioning network can still achieve comparable performance with recent state-of-the-art methods.
Tasks	Audio-Visual Video Captioning, Image Captioning, Video Captioning, Video Understanding
Published	2018-12-07
URL	http://arxiv.org/abs/1812.02872v1
PDF	http://arxiv.org/pdf/1812.02872v1.pdf
PWC	https://paperswithcode.com/paper/an-attempt-towards-interpretable-audio-visual
Repo
Framework

Geometric Image Synthesis


Title	Geometric Image Synthesis
Authors	Hassan Abu Alhaija, Siva Karthik Mustikovela, Andreas Geiger, Carsten Rother
Abstract	The task of generating natural images from 3D scenes has been a long standing goal in computer graphics. On the other hand, recent developments in deep neural networks allow for trainable models that can produce natural-looking images with little or no knowledge about the scene structure. While the generated images often consist of realistic looking local patterns, the overall structure of the generated images is often inconsistent. In this work we propose a trainable, geometry-aware image generation method that leverages various types of scene information, including geometry and segmentation, to create realistic looking natural images that match the desired scene structure. Our geometrically-consistent image synthesis method is a deep neural network, called Geometry to Image Synthesis (GIS) framework, which retains the advantages of a trainable method, e.g., differentiability and adaptiveness, but, at the same time, makes a step towards the generalizability, control and quality output of modern graphics rendering engines. We utilize the GIS framework to insert vehicles in outdoor driving scenes, as well as to generate novel views of objects from the Linemod dataset. We qualitatively show that our network is able to generalize beyond the training set to novel scene geometries, object shapes and segmentations. Furthermore, we quantitatively show that the GIS framework can be used to synthesize large amounts of training data which proves beneficial for training instance segmentation models.
Tasks	Image Generation, Instance Segmentation, Semantic Segmentation
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04696v2
PDF	http://arxiv.org/pdf/1809.04696v2.pdf
PWC	https://paperswithcode.com/paper/geometric-image-synthesis
Repo
Framework

Addressing Class Imbalance in Classification Problems of Noisy Signals by using Fourier Transform Surrogates


Title	Addressing Class Imbalance in Classification Problems of Noisy Signals by using Fourier Transform Surrogates
Authors	Justus T. C. Schwabedal, John C. Snyder, Ayse Cakmak, Shamim Nemati, Gari D. Clifford
Abstract	Randomizing the Fourier-transform (FT) phases of temporal-spatial data generates surrogates that approximate examples from the data-generating distribution. We propose such FT surrogates as a novel tool to augment and analyze training of neural networks and explore the approach in the example of sleep-stage classification. By computing FT surrogates of raw EEG, EOG, and EMG signals of under-represented sleep stages, we balanced the CAPSLPDB sleep database. We then trained and tested a convolutional neural network for sleep stage classification, and found that our surrogate-based augmentation improved the mean F1-score by 7%. As another application of FT surrogates, we formulated an approach to compute saliency maps for individual sleep epochs. The visualization is based on the response of inferred class probabilities under replacement of short data segments by partial surrogates. To quantify how well the distributions of the surrogates and the original data match, we evaluated a trained classifier on surrogates of correctly classified examples, and summarized these conditional predictions in a confusion matrix. We show how such conditional confusion matrices can qualitatively explain the performance of surrogates in class balancing. The FT-surrogate augmentation approach may improve classification on noisy signals if carefully adapted to the data distribution under analysis.
Tasks	EEG
Published	2018-06-20
URL	http://arxiv.org/abs/1806.08675v2
PDF	http://arxiv.org/pdf/1806.08675v2.pdf
PWC	https://paperswithcode.com/paper/addressing-class-imbalance-in-classification
Repo
Framework

Entropy based Independent Learning in Anonymous Multi-Agent Settings


Title	Entropy based Independent Learning in Anonymous Multi-Agent Settings
Authors	Tanvi Verma, Pradeep Varakantham, Hoong Chuin Lau
Abstract	Efficient sequential matching of supply and demand is a problem of interest in many online to offline services. For instance, Uber, Lyft, Grab for matching taxis to customers; Ubereats, Deliveroo, FoodPanda etc for matching restaurants to customers. In these online to offline service problems, individuals who are responsible for supply (e.g., taxi drivers, delivery bikes or delivery van drivers) earn more by being at the “right” place at the “right” time. We are interested in developing approaches that learn to guide individuals to be in the “right” place at the “right” time (to maximize revenue) in the presence of other similar “learning” individuals and only local aggregated observation of other agents states (e.g., only number of other taxis in same zone as current agent). A key characteristic of the domains of interest is that the interactions between individuals are anonymous, i.e., the outcome of an interaction (competing for demand) is dependent only on the number and not on the identity of the agents. We model these problems using the Anonymous MARL (AyMARL) model. The key contribution of this paper is in employing principle of maximum entropy to provide a general framework of independent learning that is both empirically effective (even with only local aggregated information of agent population distribution) and theoretically justified. Finally, our approaches provide a significant improvement with respect to joint and individual revenue on a generic simulator for online to offline services and a real world taxi problem over existing approaches. More importantly, this is achieved while having the least variance in revenues earned by the learning individuals, an indicator of fairness.
Tasks	Multi-agent Reinforcement Learning
Published	2018-03-27
URL	https://arxiv.org/abs/1803.09928v4
PDF	https://arxiv.org/pdf/1803.09928v4.pdf
PWC	https://paperswithcode.com/paper/maximum-entropy-based-independent-learning-in
Repo
Framework

Learning to Play Pong using Policy Gradient Learning


Title	Learning to Play Pong using Policy Gradient Learning
Authors	Somnuk Phon-Amnuaisuk
Abstract	Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following parameters: state values, V; state-action values, Q; and policy, pi. These parameters are commonly implemented as an array. Scaling up the problem means scaling up the size of the array and this will quickly lead to a computational bottleneck. To get around this, the RL problem is commonly formulated to learn a specific task using hand-crafted input features to curb the size of the array. In this report, we discuss an alternative end-to-end Deep Reinforcement Learning (DRL) approach where the DRL attempts to learn general task representations which in our context refers to learning to play the Pong game from a sequence of screen snapshots without game-specific hand-crafted features. We apply artificial neural networks (ANN) to approximate a policy of the RL model. The policy network, via Policy Gradients (PG) method, learns to play the Pong game from a sequence of frames without any extra semantics apart from the pixel information and the score. In contrast to the traditional tabular RL approach where the contents in the array have clear interpretations such as V or Q, the interpretation of knowledge content from the weights of the policy network is more illusive. In this work, we experiment with various Deep ANN architectures i.e., Feed forward ANN (FFNN), Convolution ANN (CNN) and Asynchronous Advantage Actor-Critic (A3C). We also examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the DRL has successfully learnt to play the Pong game. Insights into the internal learning mechanisms and future research directions are then discussed.
Tasks
Published	2018-07-23
URL	http://arxiv.org/abs/1807.08452v1
PDF	http://arxiv.org/pdf/1807.08452v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-play-pong-using-policy-gradient
Repo
Framework

The use of Virtual Reality in Enhancing Interdisciplinary Research and Education


Title	The use of Virtual Reality in Enhancing Interdisciplinary Research and Education
Authors	Tiffany Leung, Farhana Zulkernine, Haruna Isah
Abstract	Virtual Reality (VR) is increasingly being recognized for its educational potential and as an effective way to convey new knowledge to people, it supports interactive and collaborative activities. Affordable VR powered by mobile technologies is opening a new world of opportunities that can transform the ways in which we learn and engage with others. This paper reports our study regarding the application of VR in stimulating interdisciplinary communication. It investigates the promises of VR in interdisciplinary education and research. The main contributions of this study are (i) literature review of theories of learning underlying the justification of the use of VR systems in education, (ii) taxonomy of the various types and implementations of VR systems and their application in supporting education and research (iii) evaluation of educational applications of VR from a broad range of disciplines, (iv) investigation of how the learning process and learning outcomes are affected by VR systems, and (v) comparative analysis of VR and traditional methods of teaching in terms of quality of learning. This study seeks to inspire and inform interdisciplinary researchers and learners about the ways in which VR might support them and also VR software developers to push the limits of their craft.
Tasks
Published	2018-09-23
URL	http://arxiv.org/abs/1809.08585v1
PDF	http://arxiv.org/pdf/1809.08585v1.pdf
PWC	https://paperswithcode.com/paper/the-use-of-virtual-reality-in-enhancing
Repo
Framework

A General Framework of Multi-Armed Bandit Processes by Arm Switch Restrictions


Title	A General Framework of Multi-Armed Bandit Processes by Arm Switch Restrictions
Authors	Wenqing Bao, Xiaoqiang Cai, Xianyi Wu
Abstract	This paper proposes a general framework of multi-armed bandit (MAB) processes by introducing a type of restrictions on the switches among arms evolving in continuous time. The Gittins index process is constructed for any single arm subject to the restrictions on switches and then the optimality of the corresponding Gittins index rule is established. The Gittins indices defined in this paper are consistent with the ones for MAB processes in continuous time, integer time, semi-Markovian setting as well as general discrete time setting, so that the new theory covers the classical models as special cases and also applies to many other situations that have not yet been touched in the literature. While the proof of the optimality of Gittins index policies benefits from ideas in the existing theory of MAB processes in continuous time, new techniques are introduced which drastically simplify the proof.
Tasks
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06314v2
PDF	http://arxiv.org/pdf/1808.06314v2.pdf
PWC	https://paperswithcode.com/paper/a-general-framework-of-multi-armed-bandit
Repo
Framework

Learning Cognitive Models using Neural Networks


Title	Learning Cognitive Models using Neural Networks
Authors	Devendra Singh Chaplot, Christopher MacLellan, Ruslan Salakhutdinov, Kenneth Koedinger
Abstract	A cognitive model of human learning provides information about skills a learner must acquire to perform accurately in a task domain. Cognitive models of learning are not only of scientific interest, but are also valuable in adaptive online tutoring systems. A more accurate model yields more effective tutoring through better instructional decisions. Prior methods of automated cognitive model discovery have typically focused on well-structured domains, relied on student performance data or involved substantial human knowledge engineering. In this paper, we propose Cognitive Representation Learner (CogRL), a novel framework to learn accurate cognitive models in ill-structured domains with no data and little to no human knowledge engineering. Our contribution is two-fold: firstly, we show that representations learnt using CogRL can be used for accurate automatic cognitive model discovery without using any student performance data in several ill-structured domains: Rumble Blocks, Chinese Character, and Article Selection. This is especially effective and useful in domains where an accurate human-authored cognitive model is unavailable or authoring a cognitive model is difficult. Secondly, for domains where a cognitive model is available, we show that representations learned through CogRL can be used to get accurate estimates of skill difficulty and learning rate parameters without using any student performance data. These estimates are shown to highly correlate with estimates using student performance data on an Article Selection dataset.
Tasks
Published	2018-06-21
URL	http://arxiv.org/abs/1806.08065v1
PDF	http://arxiv.org/pdf/1806.08065v1.pdf
PWC	https://paperswithcode.com/paper/learning-cognitive-models-using-neural
Repo
Framework

Learning to Compose over Tree Structures via POS Tags


Title	Learning to Compose over Tree Structures via POS Tags
Authors	Gehui Shen, Zhi-Hong Deng, Ting Huang, Xi Chen
Abstract	Recursive Neural Network (RecNN), a type of models which compose words or phrases recursively over syntactic tree structures, has been proven to have superior ability to obtain sentence representation for a variety of NLP tasks. However, RecNN is born with a thorny problem that a shared compositional function for each node of trees can’t capture the complex semantic compositionality so that the expressive power of model is limited. In this paper, in order to address this problem, we propose Tag-Guided HyperRecNN/TreeLSTM (TG-HRecNN/TreeLSTM), which introduces hypernetwork into RecNNs to take as inputs Part-of-Speech (POS) tags of word/phrase and generate the semantic composition parameters dynamically. Experimental results on five datasets for two typical NLP tasks show proposed models both obtain significant improvement compared with RecNN and TreeLSTM consistently. Our TG-HTreeLSTM outperforms all existing RecNN-based models and achieves or is competitive with state-of-the-art on four sentence classification benchmarks. The effectiveness of our models is also demonstrated by qualitative analysis.
Tasks	Semantic Composition, Sentence Classification
Published	2018-08-18
URL	http://arxiv.org/abs/1808.06075v2
PDF	http://arxiv.org/pdf/1808.06075v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-compose-over-tree-structures-via
Repo
Framework

Sequence to Sequence Mixture Model for Diverse Machine Translation


Title	Sequence to Sequence Mixture Model for Diverse Machine Translation
Authors	Xuanli He, Gholamreza Haffari, Mohammad Norouzi
Abstract	Sequence to sequence (SEQ2SEQ) models often lack diversity in their generated translations. This can be attributed to the limitation of SEQ2SEQ models in capturing lexical and syntactic variations in a parallel corpus resulting from different styles, genres, topics, or ambiguity of the translation process. In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model. Each mixture component selects its own training dataset via optimization of the marginal loglikelihood, which leads to a soft clustering of the parallel corpus. Experiments on four language pairs demonstrate the superiority of our mixture model compared to a SEQ2SEQ baseline with standard or diversity-boosted beam search. Our mixture model uses negligible additional parameters and incurs no extra computation cost during decoding.
Tasks	Machine Translation
Published	2018-10-17
URL	http://arxiv.org/abs/1810.07391v1
PDF	http://arxiv.org/pdf/1810.07391v1.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-mixture-model-for
Repo
Framework

Classification based Grasp Detection using Spatial Transformer Network


Title	Classification based Grasp Detection using Spatial Transformer Network
Authors	Dongwon Park, Se Young Chun
Abstract	Robotic grasp detection task is still challenging, particularly for novel objects. With the recent advance of deep learning, there have been several works on detecting robotic grasp using neural networks. Typically, regression based grasp detection methods have outperformed classification based detection methods in computation complexity with excellent accuracy. However, classification based robotic grasp detection still seems to have merits such as intermediate step observability and straightforward back propagation routine for end-to-end training. In this work, we propose a novel classification based robotic grasp detection method with multiple-stage spatial transformer networks (STN). Our proposed method was able to achieve state-of-the-art performance in accuracy with real- time computation. Additionally, unlike other regression based grasp detection methods, our proposed method allows partial observation for intermediate results such as grasp location and orientation for a number of grasp configuration candidates.
Tasks
Published	2018-03-04
URL	http://arxiv.org/abs/1803.01356v1
PDF	http://arxiv.org/pdf/1803.01356v1.pdf
PWC	https://paperswithcode.com/paper/classification-based-grasp-detection-using
Repo
Framework

Foundations of Sequence-to-Sequence Modeling for Time Series


Title	Foundations of Sequence-to-Sequence Modeling for Time Series
Authors	Vitaly Kuznetsov, Zelda Mariet
Abstract	The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.
Tasks	Time Series, Time Series Forecasting
Published	2018-05-09
URL	http://arxiv.org/abs/1805.03714v2
PDF	http://arxiv.org/pdf/1805.03714v2.pdf
PWC	https://paperswithcode.com/paper/foundations-of-sequence-to-sequence-modeling
Repo
Framework

High-Quality Face Capture Using Anatomical Muscles


Title	High-Quality Face Capture Using Anatomical Muscles
Authors	Michael Bao, Matthew Cong, Stéphane Grabli, Ronald Fedkiw
Abstract	Muscle-based systems have the potential to provide both anatomical accuracy and semantic interpretability as compared to blendshape models; however, a lack of expressivity and differentiability has limited their impact. Thus, we propose modifying a recently developed rather expressive muscle-based system in order to make it fully-differentiable; in fact, our proposed modifications allow this physically robust and anatomically accurate muscle model to conveniently be driven by an underlying blendshape basis. Our formulation is intuitive, natural, as well as monolithically and fully coupled such that one can differentiate the model from end to end, which makes it viable for both optimization and learning-based approaches for a variety of applications. We illustrate this with a number of examples including both shape matching of three-dimensional geometry as as well as the automatic determination of a three-dimensional facial pose from a single two-dimensional RGB image without using markers or depth information.
Tasks
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02836v1
PDF	http://arxiv.org/pdf/1812.02836v1.pdf
PWC	https://paperswithcode.com/paper/high-quality-face-capture-using-anatomical
Repo
Framework

Joint Multitask Learning for Community Question Answering Using Task-Specific Embeddings


Title	Joint Multitask Learning for Community Question Answering Using Task-Specific Embeddings
Authors	Shafiq Joty, Lluis Marquez, Preslav Nakov
Abstract	We address jointly two important tasks for Question Answering in community forums: given a new question, (i) find related existing questions, and (ii) find relevant answers to this new question. We further use an auxiliary task to complement the previous two, i.e., (iii) find good answers with respect to the thread question in a question-comment thread. We use deep neural networks (DNNs) to learn meaningful task-specific embeddings, which we then incorporate into a conditional random field (CRF) model for the multitask setting, performing joint learning over a complex graph structure. While DNNs alone achieve competitive results when trained to produce the embeddings, the CRF, which makes use of the embeddings and the dependencies between the tasks, improves the results significantly and consistently across a variety of evaluation metrics, thus showing the complementarity of DNNs and structured learning.
Tasks	Community Question Answering, Question Answering
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08928v1
PDF	http://arxiv.org/pdf/1809.08928v1.pdf
PWC	https://paperswithcode.com/paper/joint-multitask-learning-for-community
Repo
Framework

Neural Networks Models for Analyzing Magic: the Gathering Cards


Title	Neural Networks Models for Analyzing Magic: the Gathering Cards
Authors	Felipe Zilio, Marcelo Prates, Luis Lamb
Abstract	Historically, games of all kinds have often been the subject of study in scientific works of Computer Science, including the field of machine learning. By using machine learning techniques and applying them to a game with defined rules or a structured dataset, it’s possible to learn and improve on the already existing techniques and methods to tackle new challenges and solve problems that are out of the ordinary. The already existing work on card games tends to focus on gameplay and card mechanics. This work aims to apply neural networks models, including Convolutional Neural Networks and Recurrent Neural Networks, in order to analyze Magic: the Gathering cards, both in terms of card text and illustrations; the card images and texts are used to train the networks in order to be able to classify them into multiple categories. The ultimate goal was to develop a methodology that could generate card text matching it to an input image, which was attained by relating the prediction values of the images and generated text across the different categories.
Tasks	Card Games, Text Matching
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03744v1
PDF	http://arxiv.org/pdf/1810.03744v1.pdf
PWC	https://paperswithcode.com/paper/neural-networks-models-for-analyzing-magic
Repo
Framework