January 29, 2020

3293 words 16 mins read

Paper Group ANR 742

Paper Group ANR 742

Multiple Human Association between Top and Horizontal Views by Matching Subjects’ Spatial Distributions. Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning. Brief Review of Computational Intelligence Algorithms. Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments. Semi-super …

Multiple Human Association between Top and Horizontal Views by Matching Subjects’ Spatial Distributions

Title Multiple Human Association between Top and Horizontal Views by Matching Subjects’ Spatial Distributions
Authors Ruize Han, Yujun Zhang, Wei Feng, Chenxing Gong, Xiaoyu Zhang, Jiewen Zhao, Liang Wan, Song Wang
Abstract Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view data, e.g., those from wearable cameras on the ground. Collaborative analysis of different-view data can facilitate various kinds of applications, such as human tracking, person identification, and human activity recognition. However, for such collaborative analysis, the first step is to associate people, referred to as subjects in this paper, across these two views. This is a very challenging problem due to large human-appearance difference between top and horizontal views. In this paper, we present a new approach to address this problem by exploring and matching the subjects’ spatial distributions between the two views. More specifically, on the top-view image, we model and match subjects’ relative positions to the horizontal-view camera in both views and define a matching cost to decide the actual location of horizontal-view camera and its view angle in the top-view image. We collect a new dataset consisting of top-view and horizontal-view image pairs for performance evaluation and the experimental results show the effectiveness of the proposed method.
Tasks Activity Recognition, Human Activity Recognition, Person Identification
Published 2019-07-26
URL https://arxiv.org/abs/1907.11458v1
PDF https://arxiv.org/pdf/1907.11458v1.pdf
PWC https://paperswithcode.com/paper/multiple-human-association-between-top-and
Repo
Framework

Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning

Title Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning
Authors Gabriel Marzinotto, Geraldine Damnati, Frédéric Béchet
Abstract This paper presents a new semantic frame parsing model, based on Berkeley FrameNet, adapted to process spoken documents in order to perform information extraction from broadcast contents. Building upon previous work that had shown the effectiveness of adversarial learning for domain generalization in the context of semantic parsing of encyclopedic written documents, we propose to extend this approach to elocutionary style generalization. The underlying question throughout this study is whether adversarial learning can be used to combine data from different sources and train models on a higher level of abstraction in order to increase their robustness to lexical and stylistic variations as well as automatic speech recognition errors. The proposed strategy is evaluated on a French corpus of encyclopedic written documents and a smaller corpus of radio podcast transcriptions, both annotated with a FrameNet paradigm. We show that adversarial learning increases all models generalization capabilities both on manual and automatic speech transcription as well as on encyclopedic data.
Tasks Domain Generalization, Semantic Parsing, Speech Recognition, Spoken Language Understanding, Style Generalization
Published 2019-10-07
URL https://arxiv.org/abs/1910.02734v1
PDF https://arxiv.org/pdf/1910.02734v1.pdf
PWC https://paperswithcode.com/paper/adapting-a-framenet-semantic-parser-for
Repo
Framework

Brief Review of Computational Intelligence Algorithms

Title Brief Review of Computational Intelligence Algorithms
Authors Satyarth Vaidya, Arshveer Kaur, Lavika Goel
Abstract Computational Intelligence algorithms have gained a lot of attention of researchers in the recent years due to their ability to deliver near optimal solutions.
Tasks
Published 2019-01-04
URL http://arxiv.org/abs/1901.00983v3
PDF http://arxiv.org/pdf/1901.00983v3.pdf
PWC https://paperswithcode.com/paper/brief-review-of-computational-intelligence
Repo
Framework

Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments

Title Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments
Authors Alex Warstadt, Samuel R. Bowman
Abstract Recent work on evaluating grammatical knowledge in pretrained sentence encoders gives a fine-grained view of a small number of phenomena. We introduce a new analysis dataset that also has broad coverage of linguistic phenomena. We annotate the development set of the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018) for the presence of 13 classes of syntactic phenomena including various forms of argument alternations, movement, and modification. We use this analysis set to investigate the grammatical knowledge of three pretrained encoders: BERT (Devlin et al., 2018), GPT (Radford et al., 2018), and the BiLSTM baseline from Warstadt et al. We find that these models have a strong command of complex or non-canonical argument structures like ditransitives (Sue gave Dan a book) and passives (The book was read). Sentences with long distance dependencies like questions (What do you think I ate?) challenge all models, but for these, BERT and GPT have a distinct advantage over the baseline. We conclude that recent sentence encoders, despite showing near-human performance on acceptability classification overall, still fail to make fine-grained grammaticality distinctions for many complex syntactic structures.
Tasks Linguistic Acceptability
Published 2019-01-11
URL https://arxiv.org/abs/1901.03438v3
PDF https://arxiv.org/pdf/1901.03438v3.pdf
PWC https://paperswithcode.com/paper/grammatical-analysis-of-pretrained-sentence
Repo
Framework

Semi-supervised Multi-domain Multi-task Training for Metastatic Colon Lymph Node Diagnosis From Abdominal CT

Title Semi-supervised Multi-domain Multi-task Training for Metastatic Colon Lymph Node Diagnosis From Abdominal CT
Authors Saskia Glaser, Gabriel Maicas, Sergei Bedrikovetski, Tarik Sammour, Gustavo Carneiro
Abstract The diagnosis of the presence of metastatic lymph nodes from abdominal computed tomography (CT) scans is an essential task performed by radiologists to guide radiation and chemotherapy treatment. State-of-the-art deep learning classifiers trained for this task usually rely on a training set containing CT volumes and their respective image-level (i.e., global) annotation. However, the lack of annotations for the localisation of the regions of interest (ROIs) containing lymph nodes can limit classification accuracy due to the small size of the relevant ROIs in this problem. The use of lymph node ROIs together with global annotations in a multi-task training process has the potential to improve classification accuracy, but the high cost involved in obtaining the ROI annotation for the same samples that have global annotations is a roadblock for this alternative. We address this limitation by introducing a new training strategy from two data sets: one containing the global annotations, and another (publicly available) containing only the lymph node ROI localisation. We term our new strategy semi-supervised multi-domain multi-task training, where the goal is to improve the diagnosis accuracy on the globally annotated data set by incorporating the ROI annotations from a different domain. Using a private data set containing global annotations and a public data set containing lymph node ROI localisation, we show that our proposed training mechanism improves the area under the ROC curve for the classification task compared to several training method baselines.
Tasks Computed Tomography (CT)
Published 2019-10-23
URL https://arxiv.org/abs/1910.10371v1
PDF https://arxiv.org/pdf/1910.10371v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-multi-domain-multi-task
Repo
Framework

Different Approaches for Human Activity Recognition: A Survey

Title Different Approaches for Human Activity Recognition: A Survey
Authors Zawar Hussain, Michael Sheng, Wei Emma Zhang
Abstract Human activity recognition has gained importance in recent years due to its applications in various fields such as health, security and surveillance, entertainment, and intelligent environments. A significant amount of work has been done on human activity recognition and researchers have leveraged different approaches, such as wearable, object-tagged, and device-free, to recognize human activities. In this article, we present a comprehensive survey of the work conducted over the period 2010-2018 in various areas of human activity recognition with main focus on device-free solutions. The device-free approach is becoming very popular due to the fact that the subject is not required to carry anything, instead, the environment is tagged with devices to capture the required information. We propose a new taxonomy for categorizing the research work conducted in the field of activity recognition and divide the existing literature into three sub-areas: action-based, motion-based, and interaction-based. We further divide these areas into ten different sub-topics and present the latest research work in these sub-topics. Unlike previous surveys which focus only on one type of activities, to the best of our knowledge, we cover all the sub-areas in activity recognition and provide a comparison of the latest research work in these sub-areas. Specifically, we discuss the key attributes and design approaches for the work presented. Then we provide extensive analysis based on 10 important metrics, to give the reader, a complete overview of the state-of-the-art techniques and trends in different sub-areas of human activity recognition. In the end, we discuss open research issues and provide future research directions in the field of human activity recognition.
Tasks Activity Recognition, Human Activity Recognition
Published 2019-06-11
URL https://arxiv.org/abs/1906.05074v1
PDF https://arxiv.org/pdf/1906.05074v1.pdf
PWC https://paperswithcode.com/paper/different-approaches-for-human-activity
Repo
Framework

Balancing the Tradeoff Between Clustering Value and Interpretability

Title Balancing the Tradeoff Between Clustering Value and Interpretability
Authors Sandhya Saisubramanian, Sainyam Galhotra, Shlomo Zilberstein
Abstract Graph clustering groups entities – the vertices of a graph – based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decision-support systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an end-user, by optimizing interpretability in addition to common clustering objectives. We propose a $\beta$-interpretable clustering algorithm that ensures that at least $\beta$ fraction of nodes in each cluster share the same feature value. The tunable parameter $\beta$ is user-specified. We also present a more efficient algorithm for scenarios with $\beta!=!1$ and analyze the theoretical guarantees of the two algorithms. Finally, we empirically demonstrate the benefits of our approaches in generating interpretable clusters using four real-world datasets. The interpretability of the clusters is complemented by generating simple explanations denoting the feature values of the nodes in the clusters, using frequent pattern mining.
Tasks Graph Clustering
Published 2019-12-17
URL https://arxiv.org/abs/1912.07820v3
PDF https://arxiv.org/pdf/1912.07820v3.pdf
PWC https://paperswithcode.com/paper/balancing-the-tradeoff-between-clustering
Repo
Framework

Global Momentum Compression for Sparse Communication in Distributed SGD

Title Global Momentum Compression for Sparse Communication in Distributed SGD
Authors Shen-Yi Zhao, Yin-Peng Xie, Hao Gao, Wu-Jun Li
Abstract With the rapid growth of data, distributed stochastic gradient descent~(DSGD) has been widely used for solving large-scale machine learning problems. Due to the latency and limited bandwidth of network, communication has become the bottleneck of DSGD when we need to train large scale models, like deep neural networks. Communication compression with sparsified gradient, abbreviated as \emph{sparse communication}, has been widely used for reducing communication cost in DSGD. Recently, there has appeared one method, called deep gradient compression~(DGC), to combine memory gradient and momentum SGD for sparse communication. DGC has achieved promising performance in practise. However, the theory about the convergence of DGC is lack. In this paper, we propose a novel method, called \emph{\underline{g}}lobal \emph{\underline{m}}omentum \emph{\underline{c}}ompression~(GMC), for sparse communication in DSGD. GMC also combines memory gradient and momentum SGD. But different from DGC which adopts local momentum, GMC adopts global momentum. We theoretically prove the convergence rate of GMC for both convex and non-convex problems. To the best of our knowledge, this is the first work that proves the convergence of distributed momentum SGD~(DMSGD) with sparse communication and memory gradient. Empirical results show that, compared with the DMSGD counterpart without sparse communication, GMC can reduce the communication cost by approximately 100 fold without loss of generalization accuracy. GMC can also achieve comparable~(sometimes better) performance compared with DGC, with extra theoretical guarantee.
Tasks
Published 2019-05-30
URL https://arxiv.org/abs/1905.12948v2
PDF https://arxiv.org/pdf/1905.12948v2.pdf
PWC https://paperswithcode.com/paper/global-momentum-compression-for-sparse
Repo
Framework

Structural Robustness for Deep Learning Architectures

Title Structural Robustness for Deep Learning Architectures
Authors Carlos Lassance, Vincent Gripon, Jian Tang, Antonio Ortega
Abstract Deep Networks have been shown to provide state-of-the-art performance in many machine learning challenges. Unfortunately, they are susceptible to various types of noise, including adversarial attacks and corrupted inputs. In this work we introduce a formal definition of robustness which can be viewed as a localized Lipschitz constant of the network function, quantified in the domain of the data to be classified. We compare this notion of robustness to existing ones, and study its connections with methods in the literature. We evaluate this metric by performing experiments on various competitive vision datasets.
Tasks
Published 2019-09-11
URL https://arxiv.org/abs/1909.05095v1
PDF https://arxiv.org/pdf/1909.05095v1.pdf
PWC https://paperswithcode.com/paper/structural-robustness-for-deep-learning
Repo
Framework

Teaching Vehicles to Anticipate: A Systematic Study on Probabilistic Behavior Prediction using Large Data Sets

Title Teaching Vehicles to Anticipate: A Systematic Study on Probabilistic Behavior Prediction using Large Data Sets
Authors Florian Wirthmüller, Julian Schlechtriemen, Jochen Hipp, Manfred Reichert
Abstract Observations of traffic participants and their environment enable humans to drive road vehicles safely. However, when being driven, there is a notable difference between having a non-experienced vs. an experienced driver. One may get the feeling, that the latter one anticipates what may happen in the next few moments and considers these foresights in his driving behavior. To make the driving style of automated vehicles comparable to a human driver in the sense of comfort and perceived safety, the aforementioned anticipation skills need to become a built-in feature of self-driving vehicles. This article provides a systematic comparison of methods and strategies to generate this intention for self-driving cars using machine learning techniques. To implement and test these algorithms we use a large data set collected over more than 30000 km of highway driving and containing approximately 40000 real world driving situations. Moreover, we show that it is possible to certainly detect more than 47 % of all lane changes on German highways 3 or more seconds in advance with a false positive rate of less than 1 %. This enables us to predict the lateral position with a prediction horizon of 5 s with a median error of less than 0.21 m.
Tasks Self-Driving Cars
Published 2019-10-17
URL https://arxiv.org/abs/1910.07772v2
PDF https://arxiv.org/pdf/1910.07772v2.pdf
PWC https://paperswithcode.com/paper/teaching-vehicles-to-anticipate-a-systematic
Repo
Framework

Conditional Driving from Natural Language Instructions

Title Conditional Driving from Natural Language Instructions
Authors Junha Roh, Chris Paxton, Andrzej Pronobis, Ali Farhadi, Dieter Fox
Abstract Widespread adoption of self-driving cars will depend not only on their safety but largely on their ability to interact with human users. Just like human drivers, self-driving cars will be expected to understand and safely follow natural-language directions that suddenly alter the pre-planned route according to user’s preference or in presence of ambiguities, particularly in locations with poor or outdated map coverage. To this end, we propose a language-grounded driving agent implementing a hierarchical policy using recurrent layers and gated attention. The hierarchical approach enables us to reason both in terms of high-level language instructions describing long time horizons and low-level, complex, continuous state/action spaces required for real-time control of a self-driving car. We train our policy with conditional imitation learning from realistic language data collected from human drivers and navigators. Through quantitative and interactive experiments within the CARLA framework, we show that our model can successfully interpret language instructions and follow them safely, even when generalizing to previously unseen environments. Code and video are available at https://sites.google.com/view/language-grounded-driving.
Tasks Imitation Learning, Self-Driving Cars
Published 2019-10-16
URL https://arxiv.org/abs/1910.07615v1
PDF https://arxiv.org/pdf/1910.07615v1.pdf
PWC https://paperswithcode.com/paper/conditional-driving-from-natural-language
Repo
Framework

A Multi-Resolution Word Embedding for Document Retrieval from Large Unstructured Knowledge Bases

Title A Multi-Resolution Word Embedding for Document Retrieval from Large Unstructured Knowledge Bases
Authors Tolgahan Cakaloglu, Xiaowei Xu
Abstract Deep language models learning a hierarchical representation proved to be a powerful tool for natural language processing, text mining and information retrieval. However, representations that perform well for retrieval must capture semantic meaning at different levels of abstraction or context-scopes. In this paper, we propose a new method to generate multi-resolution word embeddings that represent documents at multiple resolutions in terms of context-scopes. In order to investigate its performance,we use the Stanford Question Answering Dataset (SQuAD) and the Question Answering by Search And Reading (QUASAR) in an open-domain question-answering setting, where the first task is to find documents useful for answering a given question. To this end, we first compare the quality of various text-embedding methods for retrieval performance and give an extensive empirical comparison with the performance of various non-augmented base embeddings with and without multi-resolution representation. We argue that multi-resolution word embeddings are consistently superior to the original counterparts and deep residual neural models specifically trained for retrieval purposes can yield further significant gains when they are used for augmenting those embeddings.
Tasks Information Retrieval, Open-Domain Question Answering, Question Answering, Word Embeddings
Published 2019-02-02
URL https://arxiv.org/abs/1902.00663v7
PDF https://arxiv.org/pdf/1902.00663v7.pdf
PWC https://paperswithcode.com/paper/a-multi-resolution-word-embedding-for
Repo
Framework

Semantic Understanding of Foggy Scenes with Purely Synthetic Data

Title Semantic Understanding of Foggy Scenes with Purely Synthetic Data
Authors Martin Hahner, Dengxin Dai, Christos Sakaridis, Jan-Nico Zaech, Luc Van Gool
Abstract This work addresses the problem of semantic scene understanding under foggy road conditions. Although marked progress has been made in semantic scene understanding over the recent years, it is mainly concentrated on clear weather outdoor scenes. Extending semantic segmentation methods to adverse weather conditions like fog is crucially important for outdoor applications such as self-driving cars. In this paper, we propose a novel method, which uses purely synthetic data to improve the performance on unseen real-world foggy scenes captured in the streets of Zurich and its surroundings. Our results highlight the potential and power of photo-realistic synthetic images for training and especially fine-tuning deep neural nets. Our contributions are threefold, 1) we created a purely synthetic, high-quality foggy dataset of 25,000 unique outdoor scenes, that we call Foggy Synscapes and plan to release publicly 2) we show that with this data we outperform previous approaches on real-world foggy test data 3) we show that a combination of our data and previously used data can even further improve the performance on real-world foggy data.
Tasks Scene Understanding, Self-Driving Cars, Semantic Segmentation
Published 2019-10-09
URL https://arxiv.org/abs/1910.03997v1
PDF https://arxiv.org/pdf/1910.03997v1.pdf
PWC https://paperswithcode.com/paper/semantic-understanding-of-foggy-scenes-with
Repo
Framework

Semi-Implicit Generative Model

Title Semi-Implicit Generative Model
Authors Mingzhang Yin, Mingyuan Zhou
Abstract To combine explicit and implicit generative models, we introduce semi-implicit generator (SIG) as a flexible hierarchical model that can be trained in the maximum likelihood framework. Both theoretically and experimentally, we demonstrate that SIG can generate high quality samples especially when dealing with multi-modality. By introducing SIG as an unbiased regularizer to the generative adversarial network (GAN), we show the interplay between maximum likelihood and adversarial learning can stabilize the adversarial training, resist the notorious mode collapsing problem of GANs, and improve the diversity of generated random samples.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1905.12659v2
PDF https://arxiv.org/pdf/1905.12659v2.pdf
PWC https://paperswithcode.com/paper/semi-implicit-generative-model
Repo
Framework

Generative Adversarial Networks for Operational Scenario Planning of Renewable Energy Farms: A Study on Wind and Photovoltaic

Title Generative Adversarial Networks for Operational Scenario Planning of Renewable Energy Farms: A Study on Wind and Photovoltaic
Authors Jens Schreiber, Maik Jessulat, Bernhard Sick
Abstract For the integration of renewable energy sources, power grid operators need realistic information about the effects of energy production and consumption to assess grid stability. Recently, research in scenario planning benefits from utilizing generative adversarial networks (GANs) as generative models for operational scenario planning. In these scenarios, operators examine temporal as well as spatial influences of different energy sources on the grid. The analysis of how renewable energy resources affect the grid enables the operators to evaluate the stability and to identify potential weak points such as a limiting transformer. However, due to their novelty, there are limited studies on how well GANs model the underlying power distribution. This analysis is essential because, e.g., especially extreme situations with low or high power generation are required to evaluate grid stability. We conduct a comparative study of the Wasserstein distance, binary-cross-entropy loss, and a Gaussian copula as the baseline applied on two wind and two solar datasets with limited data compared to previous studies. Both GANs achieve good results considering the limited amount of data, but the Wasserstein GAN is superior in modeling temporal and spatial relations, and the power distribution. Besides evaluating the generated power distribution over all farms, it is essential to assess terrain specific distributions for wind scenarios. These terrain specific power distributions affect the grid by their differences in their generating power magnitude. Therefore, in a second study, we show that even when simultaneously learning distributions from wind parks with terrain specific patterns, GANs are capable of modeling these individualities also when faced with limited data.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.00662v1
PDF https://arxiv.org/pdf/1906.00662v1.pdf
PWC https://paperswithcode.com/paper/190600662
Repo
Framework
comments powered by Disqus