January 31, 2020

2937 words 14 mins read

Paper Group ANR 180

Paper Group ANR 180

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog. Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog. Studying Software Engineering Patterns for Designing Machine Learning Systems. Co-Evolutionary Compression for Unpaired Image Translation. Order Optimal One-Shot Dis …

Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog

Title Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog
Authors Shachi H Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman
Abstract We are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances and have conversations with users about the objects, activities and events surrounding them. Recent progress in visual grounding techniques and Audio Understanding are enabling machines to understand shared semantic concepts and listen to the various sensory events in the environment. With audio and visual grounding methods, end-to-end multimodal SDS are trained to meaningfully communicate with us in natural language about the real dynamic audio-visual sensory world around us. In this work, we explore the role of `topics’ as the context of the conversation along with multimodal attention into such an end-to-end audio-visual scene-aware dialog system architecture. We also incorporate an end-to-end audio classification ConvNet, AclNet, into our models. We develop and test our approaches on the Audio Visual Scene-Aware Dialog (AVSD) dataset released as a part of the DSTC7. We present the analysis of our experiments and show that some of our model variations outperform the baseline system released for AVSD. |
Tasks Audio Classification
Published 2019-12-20
URL https://arxiv.org/abs/1912.10132v1
PDF https://arxiv.org/pdf/1912.10132v1.pdf
PWC https://paperswithcode.com/paper/exploring-context-attention-and-audio
Repo
Framework

Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog

Title Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog
Authors Shachi H Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman
Abstract With the recent advancements in Artificial Intelligence (AI), Intelligent Virtual Assistants (IVA) such as Alexa, Google Home, etc., have become a ubiquitous part of many homes. Currently, such IVAs are mostly audio-based, but going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audio-visual groundings of utterances. This will enable agents to have conversations with users about the objects, activities and events surrounding them. In this work, we present three main architectural explorations for the Audio Visual Scene-Aware Dialog (AVSD): 1) investigating `topics’ of the dialog as an important contextual feature for the conversation, 2) exploring several multimodal attention mechanisms during response generation, 3) incorporating an end-to-end audio classification ConvNet, AclNet, into our architecture. We discuss detailed analysis of the experimental results and show that our model variations outperform the baseline system presented for the AVSD task. |
Tasks Audio Classification
Published 2019-12-20
URL https://arxiv.org/abs/1912.10131v1
PDF https://arxiv.org/pdf/1912.10131v1.pdf
PWC https://paperswithcode.com/paper/leveraging-topics-and-audio-features-with
Repo
Framework

Studying Software Engineering Patterns for Designing Machine Learning Systems

Title Studying Software Engineering Patterns for Designing Machine Learning Systems
Authors Hironori Washizaki, Hiromu Uchida, Foutse Khomh, Yann-Gael Gueheneuc
Abstract Machine-learning (ML) techniques have become popular in the recent years. ML techniques rely on mathematics and on software engineering. Researchers and practitioners studying best practices for designing ML application systems and software to address the software complexity and quality of ML techniques. Such design practices are often formalized as architecture patterns and design patterns by encapsulating reusable solutions to commonly occurring problems within given contexts. However, to the best of our knowledge, there has been no work collecting, classifying, and discussing these software-engineering (SE) design patterns for ML techniques systematically. Thus, we set out to collect good/bad SE design patterns for ML techniques to provide developers with a comprehensive and ordered classification of such patterns. We report here preliminary results of a systematic-literature review (SLR) of good/bad design patterns for ML.
Tasks
Published 2019-10-10
URL https://arxiv.org/abs/1910.04736v2
PDF https://arxiv.org/pdf/1910.04736v2.pdf
PWC https://paperswithcode.com/paper/studying-software-engineering-patterns-for
Repo
Framework

Co-Evolutionary Compression for Unpaired Image Translation

Title Co-Evolutionary Compression for Unpaired Image Translation
Authors Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chunjing Xu, Qi Tian, Chang Xu
Abstract Generative adversarial networks (GANs) have been successfully used for considerable computer vision tasks, especially the image-to-image translation. However, generators in these networks are of complicated architectures with large number of parameters and huge computational complexities. Existing methods are mainly designed for compressing and speeding-up deep neural networks in the classification task, and cannot be directly applied on GANs for image translation, due to their different objectives and training procedures. To this end, we develop a novel co-evolutionary approach for reducing their memory usage and FLOPs simultaneously. In practice, generators for two image domains are encoded as two populations and synergistically optimized for investigating the most important convolution filters iteratively. Fitness of each individual is calculated using the number of parameters, a discriminator-aware regularization, and the cycle consistency. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method for obtaining compact and effective generators.
Tasks Image-to-Image Translation
Published 2019-07-25
URL https://arxiv.org/abs/1907.10804v1
PDF https://arxiv.org/pdf/1907.10804v1.pdf
PWC https://paperswithcode.com/paper/co-evolutionary-compression-for-unpaired
Repo
Framework

Order Optimal One-Shot Distributed Learning

Title Order Optimal One-Shot Distributed Learning
Authors Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani
Abstract We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d. samples. Based on its observed samples, each machine then sends an $O(\log(mn))$-length message to a server, at which a parameter minimizing an expected loss is to be estimated. We propose an algorithm called Multi-Resolution Estimator (MRE) whose expected error is no larger than $\tilde{O}\big(m^{-{1}/{\max(d,2)}} n^{-1/2}\big)$, where $d$ is the dimension of the parameter space. This error bound meets existing lower bounds up to poly-logarithmic factors, and is thereby order optimal. The expected error of MRE, unlike existing algorithms, tends to zero as the number of machines ($m$) goes to infinity, even when the number of samples per machine ($n$) remains upper bounded by a constant. This property of the MRE algorithm makes it applicable in new machine learning paradigms where $m$ is much larger than $n$.
Tasks
Published 2019-11-02
URL https://arxiv.org/abs/1911.00731v1
PDF https://arxiv.org/pdf/1911.00731v1.pdf
PWC https://paperswithcode.com/paper/order-optimal-one-shot-distributed-learning
Repo
Framework

Green Deep Reinforcement Learning for Radio Resource Management: Architecture, Algorithm Compression and Challenge

Title Green Deep Reinforcement Learning for Radio Resource Management: Architecture, Algorithm Compression and Challenge
Authors Zhiyong Du, Yansha Deng, Weisi Guo, Arumugam Nallanathan, Qihui Wu
Abstract AI heralds a step-change in the performance and capability of wireless networks and other critical infrastructures. However, it may also cause irreversible environmental damage due to their high energy consumption. Here, we address this challenge in the context of 5G and beyond, where there is a complexity explosion in radio resource management (RRM). On the one hand, deep reinforcement learning (DRL) provides a powerful tool for scalable optimization for high dimensional RRM problems in a dynamic environment. On the other hand, DRL algorithms consume a high amount of energy over time and risk compromising progress made in green radio research. This paper reviews and analyzes how to achieve green DRL for RRM via both architecture and algorithm innovations. Architecturally, a cloud based training and distributed decision-making DRL scheme is proposed, where RRM entities can make lightweight deep local decisions whilst assisted by on-cloud training and updating. On the algorithm level, compression approaches are introduced for both deep neural networks and the underlying Markov Decision Processes, enabling accurate low-dimensional representations of challenges. To scale learning across geographic areas, a spatial transfer learning scheme is proposed to further promote the learning efficiency of distributed DRL entities by exploiting the traffic demand correlations. Together, our proposed architecture and algorithms provide a vision for green and on-demand DRL capability.
Tasks Decision Making, Transfer Learning
Published 2019-10-11
URL https://arxiv.org/abs/1910.05054v1
PDF https://arxiv.org/pdf/1910.05054v1.pdf
PWC https://paperswithcode.com/paper/green-deep-reinforcement-learning-for-radio
Repo
Framework

Graph Convolutions over Constituent Trees for Syntax-Aware Semantic Role Labeling

Title Graph Convolutions over Constituent Trees for Syntax-Aware Semantic Role Labeling
Authors Diego Marcheggiani, Ivan Titov
Abstract Semantic role labeling (SRL) is the task of identifying predicates and labeling argument spans with semantic roles. Even though most semantic-role formalisms are built upon constituent syntax and only syntactic constituents can be labeled as arguments (e.g., FrameNet and PropBank), all the recent work on syntax-aware SRL relies on dependency representations of syntax. In contrast, we show how graph convolutional networks (GCNs) can be used to encode constituent structures and inform an SRL system. Nodes in our SpanGCN correspond to constituents. The computation is done in 3 stages. First, initial node representations are produced by composing' word representations of the first and the last word in the constituent. Second, graph convolutions relying on the constituent tree are performed, yielding syntactically-informed constituent representations. Finally, the constituent representations are decomposed’ back into word representations which in turn are used as input to the SRL classifier. We show the effectiveness of our syntax-aware model on standard CoNLL-2005, CoNLL-2012, and FrameNet benchmarks.
Tasks Semantic Role Labeling
Published 2019-09-21
URL https://arxiv.org/abs/1909.09814v1
PDF https://arxiv.org/pdf/1909.09814v1.pdf
PWC https://paperswithcode.com/paper/190909814
Repo
Framework

From Textual Information Sources to Linked Data in the Agatha Project

Title From Textual Information Sources to Linked Data in the Agatha Project
Authors Paulo Quaresma, Vitor Beires Nogueira, Kashyap Raiyani, Roy Bayot, Teresa Gonçalves
Abstract Automatic reasoning about textual information is a challenging task in modern Natural Language Processing (NLP) systems. In this work we describe our proposal for representing and reasoning about Portuguese documents by means of Linked Data like ontologies and thesauri. Our approach resorts to a specialized pipeline of natural language processing (part-of-speech tagger, named entity recognition, semantic role labeling) to populate an ontology for the domain of criminal investigations. The provided architecture and ontology are language independent. Although some of the NLP modules are language dependent, they can be built using adequate AI methodologies.
Tasks Named Entity Recognition, Semantic Role Labeling
Published 2019-09-03
URL https://arxiv.org/abs/1909.05359v1
PDF https://arxiv.org/pdf/1909.05359v1.pdf
PWC https://paperswithcode.com/paper/from-textual-information-sources-to-linked
Repo
Framework

Not Only Look But Observe: Variational Observation Model of Scene-Level 3D Multi-Object Understanding for Probabilistic SLAM

Title Not Only Look But Observe: Variational Observation Model of Scene-Level 3D Multi-Object Understanding for Probabilistic SLAM
Authors Hyeonwoo Yu
Abstract We present NOLBO, a variational observation model estimation for 3D multi-object from 2D single shot. Previous probabilistic instance-level understandings mainly consider the single-object image, not single shot with multi-object; relations between objects and the entire scene are out of their focus. The objectness of each observation also hardly join their model. Therefore, we propose a method to approximate the Bayesian observation model of scene-level 3D multi-object understanding. By exploiting variational auto-encoder (VAE), we estimate latent variables from the entire scene, which follow tractable distributions and concurrently imply 3D full shape and pose. To perform object-oriented data association and probabilistic simultaneous localization and mapping (SLAM), our observation models can easily be adopted to probabilistic inference by replacing object-oriented features with latent variables.
Tasks Simultaneous Localization and Mapping
Published 2019-07-23
URL https://arxiv.org/abs/1907.09760v3
PDF https://arxiv.org/pdf/1907.09760v3.pdf
PWC https://paperswithcode.com/paper/not-only-look-but-observe-variational
Repo
Framework

Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm

Title Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient Algorithm
Authors Pin Wang, Hanhan Li, Ching-Yao Chan
Abstract Lane change is a challenging task which requires delicate actions to ensure safety and comfort. Some recent studies have attempted to solve the lane-change control problem with Reinforcement Learning (RL), yet the action is confined to discrete action space. To overcome this limitation, we formulate the lane change behavior with continuous action in a model-free dynamic driving environment based on Deep Deterministic Policy Gradient (DDPG). The reward function, which is critical for learning the optimal policy, is defined by control values, position deviation status, and maneuvering time to provide the RL agent informative signals. The RL agent is trained from scratch without resorting to any prior knowledge of the environment and vehicle dynamics since they are not easy to obtain. Seven models under different hyperparameter settings are compared. A video showing the learning progress of the driving behavior is available. It demonstrates the RL vehicle agent initially runs out of road boundary frequently, but eventually has managed to smoothly and stably change to the target lane with a success rate of 100% under diverse driving situations in simulation.
Tasks Continuous Control
Published 2019-06-05
URL https://arxiv.org/abs/1906.02275v1
PDF https://arxiv.org/pdf/1906.02275v1.pdf
PWC https://paperswithcode.com/paper/continuous-control-for-automated-lane-change
Repo
Framework

Situation-Aware Pedestrian Trajectory Prediction with Spatio-Temporal Attention Model

Title Situation-Aware Pedestrian Trajectory Prediction with Spatio-Temporal Attention Model
Authors Sirin Haddad, Meiqing Wu, He Wei, Siew Kei Lam
Abstract Pedestrian trajectory prediction is essential for collision avoidance in autonomous driving and robot navigation. However, predicting a pedestrian’s trajectory in crowded environments is non-trivial as it is influenced by other pedestrians’ motion and static structures that are present in the scene. Such human-human and human-space interactions lead to non-linearities in the trajectories. In this paper, we present a new spatio-temporal graph based Long Short-Term Memory (LSTM) network for predicting pedestrian trajectory in crowded environments, which takes into account the interaction with static (physical objects) and dynamic (other pedestrians) elements in the scene. Our results are based on two widely-used datasets to demonstrate that the proposed method outperforms the state-of-the-art approaches in human trajectory prediction. In particular, our method leads to a reduction in Average Displacement Error (ADE) and Final Displacement Error (FDE) of up to 55% and 61% respectively over state-of-the-art approaches.
Tasks Autonomous Driving, Robot Navigation, Trajectory Prediction
Published 2019-02-13
URL http://arxiv.org/abs/1902.05437v1
PDF http://arxiv.org/pdf/1902.05437v1.pdf
PWC https://paperswithcode.com/paper/situation-aware-pedestrian-trajectory
Repo
Framework

Global optimization of parameters in the reactive force field ReaxFF for SiOH

Title Global optimization of parameters in the reactive force field ReaxFF for SiOH
Authors H. R. Larsson, A. C. T. van Duin, B. Hartke
Abstract We have used unbiased global optimization to fit a reactive force field to a given set of reference data. Specifically, we have employed genetic algorithms (GA) to fit ReaxFF to SiOH data, using an in-house GA code that is parallelized across reference data items via the message-passing interface (MPI). Details of GA tuning turn out to be far less important for global optimization efficiency than using suitable ranges within which the parameters are varied. To establish these ranges, either prior knowledge can be used or successive stages of GA optimizations, each building upon the best parameter vectors and ranges found in the previous stage. We finally arrive at optimized force fields with smaller error measures than those published previously. Hence, this optimization approach will contribute to converting force-field fitting from a specialist task to an everyday commodity, even for the more difficult case of reactive force fields.
Tasks
Published 2019-09-15
URL https://arxiv.org/abs/1909.06876v1
PDF https://arxiv.org/pdf/1909.06876v1.pdf
PWC https://paperswithcode.com/paper/global-optimization-of-parameters-in-the
Repo
Framework

Cooperative Generator-Discriminator Networks for Abstractive Summarization with Narrative Flow

Title Cooperative Generator-Discriminator Networks for Abstractive Summarization with Narrative Flow
Authors Saadia Gabriel, Antoine Bosselut, Ari Holtzman, Kyle Lo, Asli Celikyilmaz, Yejin Choi
Abstract We introduce Cooperative Generator-Discriminator Networks (Co-opNet), a general framework for abstractive summarization with distinct modeling of the narrative flow in the output summary. Most current approaches to abstractive summarization, in contrast, are based on datasets whose target summaries are either a single sentence, or a bag of standalone sentences (e.g., extracted highlights of a story), neither of which allows for learning coherent narrative flow in the output summaries. To promote research toward abstractive summarization with narrative flow, we first introduce a new dataset, Scientific Abstract SummarieS (SASS), where the abstracts are used as proxy gold summaries for scientific articles. We then propose Co-opNet, a novel transformer-based framework where the generator works with the discourse discriminator to compose a long-form summary. Empirical results demonstrate that Co-opNet learns to summarize with considerably improved global coherence compared to competitive baselines.
Tasks Abstractive Text Summarization
Published 2019-07-02
URL https://arxiv.org/abs/1907.01272v1
PDF https://arxiv.org/pdf/1907.01272v1.pdf
PWC https://paperswithcode.com/paper/cooperative-generator-discriminator-networks
Repo
Framework

A new simple and effective measure for bag-of-word inter-document similarity measurement

Title A new simple and effective measure for bag-of-word inter-document similarity measurement
Authors Sunil Aryal, Kai Ming Ting, Takashi Washio, Gholamreza Haffari
Abstract To measure the similarity of two documents in the bag-of-words (BoW) vector representation, different term weighting schemes are used to improve the performance of cosine similarity—the most widely used inter-document similarity measure in text mining. In this paper, we identify the shortcomings of the underlying assumptions of term weighting in the inter-document similarity measurement task; and provide a more fit-to-the-purpose alternative. Based on this new assumption, we introduce a new simple but effective similarity measure which does not require explicit term weighting. The proposed measure employs a more nuanced probabilistic approach than those used in term weighting to measure the similarity of two documents w.r.t each term occurring in the two documents. Our empirical comparison with the existing similarity measures using different term weighting schemes shows that the new measure produces (i) better results in the binary BoW representation; and (ii) competitive and more consistent results in the term-frequency-based BoW representation.
Tasks
Published 2019-02-09
URL http://arxiv.org/abs/1902.03402v1
PDF http://arxiv.org/pdf/1902.03402v1.pdf
PWC https://paperswithcode.com/paper/a-new-simple-and-effective-measure-for-bag-of
Repo
Framework

An Application of Multiple-Instance Learning to Estimate Generalization Risk

Title An Application of Multiple-Instance Learning to Estimate Generalization Risk
Authors Daiki Suehiro
Abstract We focus on several learning approaches that employ max-operator to evaluate the margin. For example, such approaches are commonly used in multi-class learning task and top-rank learning task. In general, in order to estimate the theoretical generalization risk, we need to individually evaluate the complexity of each hypothesis class used in the learning approaches. In this paper, we provide a technique to estimate a theoretical generalization risk for such learning approaches in a same fashion. The key idea is to “redundantly” reformulate the learning problem as one-class multiple-instance learning by redefining the specific input space based on the original input space. Surprisingly, we succeed to improve the generalization risk bounds for some multi-class learning and top-rank learning algorithms.
Tasks Multiple Instance Learning
Published 2019-11-14
URL https://arxiv.org/abs/1911.05999v1
PDF https://arxiv.org/pdf/1911.05999v1.pdf
PWC https://paperswithcode.com/paper/an-application-of-multiple-instance-learning
Repo
Framework
comments powered by Disqus