October 21, 2019

3355 words 16 mins read

Paper Group AWR 26

Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions. Learning to Detect Fake Face Images in the Wild. Deep Reinforcement Learning for Swarm Systems. Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling. Graph Neural Networks for IceCube Signal Classification. Multimodal Speech Emotion Recognition Using Audi …

Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions


Title	Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions
Authors	Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, Fei Sha
Abstract	Learning with limited data is a key challenge for visual recognition. Many few-shot learning methods address this challenge by learning an instance embedding function from seen classes and apply the function to instances from unseen classes with limited labels. This style of transfer learning is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them leads to the target task. In this paper, we propose a novel approach to adapt the instance embeddings to the target classification task with a set-to-set function, yielding embeddings that are task-specific and are discriminative. We empirically investigated various instantiations of such set-to-set functions and observed the Transformer is most effective – as it naturally satisfies key properties of our desired model. We denote this model as FEAT (few-shot embedding adaptation w/ Transformer) and validate it on both the standard few-shot classification benchmark and four extended few-shot learning settings with essential use cases, i.e., cross-domain, transductive, generalized few-shot learning, and low-shot learning. It archived consistent improvements over baseline models as well as previous methods and established the new state-of-the-art results on two benchmarks.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Transfer Learning
Published	2018-12-10
URL	https://arxiv.org/abs/1812.03664v5
PDF	https://arxiv.org/pdf/1812.03664v5.pdf
PWC	https://paperswithcode.com/paper/learning-embedding-adaptation-for-few-shot
Repo	https://github.com/phecy/SSL-FEW-SHOT
Framework	pytorch

Learning to Detect Fake Face Images in the Wild


Title	Learning to Detect Fake Face Images in the Wild
Authors	Chih-Chung Hsu, Chia-Yen Lee, Yi-Xiu Zhuang
Abstract	Although Generative Adversarial Network (GAN) can be used to generate the realistic image, improper use of these technologies brings hidden concerns. For example, GAN can be used to generate a tampered video for specific people and inappropriate events, creating images that are detrimental to a particular person, and may even affect that personal safety. In this paper, we will develop a deep forgery discriminator (DeepFD) to efficiently and effectively detect the computer-generated images. Directly learning a binary classifier is relatively tricky since it is hard to find the common discriminative features for judging the fake images generated from different GANs. To address this shortcoming, we adopt contrastive loss in seeking the typical features of the synthesized images generated by different GANs and follow by concatenating a classifier to detect such computer-generated images. Experimental results demonstrate that the proposed DeepFD successfully detected 94.7% fake images generated by several state-of-the-art GANs.
Tasks	Face Swapping, Fake Image Detection, GAN image forensics, Image Generation
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08754v3
PDF	http://arxiv.org/pdf/1809.08754v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-detect-fake-face-images-in-the
Repo	https://github.com/jesse1029/Fake-Face-Images-Detection-Tensorflow
Framework	tf

Deep Reinforcement Learning for Swarm Systems


Title	Deep Reinforcement Learning for Swarm Systems
Authors	Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann
Abstract	Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.
Tasks	Decision Making
Published	2018-07-17
URL	https://arxiv.org/abs/1807.06613v3
PDF	https://arxiv.org/pdf/1807.06613v3.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-swarm-systems
Repo	https://github.com/LCAS/deep_rl_for_swarms
Framework	none

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling


Title	Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling
Authors	N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, S. Poria
Abstract	Multimodal sentiment analysis is a very actively growing field of research. A promising area of opportunity in this field is to improve the multimodal fusion mechanism. We present a novel feature fusion strategy that proceeds in a hierarchical fashion, first fusing the modalities two in two and only then fusing all three modalities. On multimodal sentiment analysis of individual utterances, our strategy outperforms conventional concatenation of features by 1%, which amounts to 5% reduction in error rate. On utterance-level multimodal sentiment analysis of multi-utterance video clips, for which current state-of-the-art techniques incorporate contextual information from other utterances of the same clip, our hierarchical fusion gives up to 2.4% (almost 10% error rate reduction) over currently used concatenation. The implementation of our method is publicly available in the form of open-source code.
Tasks	Multimodal Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06228v1
PDF	http://arxiv.org/pdf/1806.06228v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-sentiment-analysis-using
Repo	https://github.com/SenticNet/hfusion
Framework	tf

Graph Neural Networks for IceCube Signal Classification


Title	Graph Neural Networks for IceCube Signal Classification
Authors	Nicholas Choma, Federico Monti, Lisa Gerhardt, Tomasz Palczewski, Zahra Ronaghi, Prabhat, Wahid Bhimji, Michael M. Bronstein, Spencer R. Klein, Joan Bruna
Abstract	Tasks involving the analysis of geometric (graph- and manifold-structured) data have recently gained prominence in the machine learning community, giving birth to a rapidly developing field of geometric deep learning. In this work, we leverage graph neural networks to improve signal detection in the IceCube neutrino observatory. The IceCube detector array is modeled as a graph, where vertices are sensors and edges are a learned function of the sensors’ spatial coordinates. As only a subset of IceCube’s sensors is active during a given observation, we note the adaptive nature of our GNN, wherein computation is restricted to the input signal support. We demonstrate the effectiveness of our GNN architecture on a task classifying IceCube events, where it outperforms both a traditional physics-based method as well as classical 3D convolution neural networks.
Tasks
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06166v1
PDF	http://arxiv.org/pdf/1809.06166v1.pdf
PWC	https://paperswithcode.com/paper/graph-neural-networks-for-icecube-signal
Repo	https://github.com/WIPACrepo/NuIntClassification
Framework	pytorch

Multimodal Speech Emotion Recognition Using Audio and Text


Title	Multimodal Speech Emotion Recognition Using Audio and Text
Authors	Seunghyun Yoon, Seokhyun Byun, Kyomin Jung
Abstract	Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.
Tasks	Emotion Classification, Emotion Recognition, Multimodal Emotion Recognition, Multimodal Sentiment Analysis, Speech Emotion Recognition
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04635v1
PDF	http://arxiv.org/pdf/1810.04635v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-speech-emotion-recognition-using
Repo	https://github.com/david-yoon/multimodal-speech-emotion
Framework	tf

Stackelberg GAN: Towards Provable Minimax Equilibrium via Multi-Generator Architectures


Title	Stackelberg GAN: Towards Provable Minimax Equilibrium via Multi-Generator Architectures
Authors	Hongyang Zhang, Susu Xu, Jiantao Jiao, Pengtao Xie, Ruslan Salakhutdinov, Eric P. Xing
Abstract	We study the problem of alleviating the instability issue in the GAN training procedure via new architecture design. The discrepancy between the minimax and maximin objective values could serve as a proxy for the difficulties that the alternating gradient descent encounters in the optimization of GANs. In this work, we give new results on the benefits of multi-generator architecture of GANs. We show that the minimax gap shrinks to $\epsilon$ as the number of generators increases with rate $\widetilde{O}(1/\epsilon)$. This improves over the best-known result of $\widetilde{O}(1/\epsilon^2)$. At the core of our techniques is a novel application of Shapley-Folkman lemma to the generic minimax problem, where in the literature the technique was only known to work when the objective function is restricted to the Lagrangian function of a constraint optimization problem. Our proposed Stackelberg GAN performs well experimentally in both synthetic and real-world datasets, improving Fr'echet Inception Distance by $14.61%$ over the previous multi-generator GANs on the benchmark datasets.
Tasks
Published	2018-11-19
URL	http://arxiv.org/abs/1811.08010v1
PDF	http://arxiv.org/pdf/1811.08010v1.pdf
PWC	https://paperswithcode.com/paper/stackelberg-gan-towards-provable-minimax
Repo	https://github.com/hongyanz/Stackelberg-GAN
Framework	pytorch

Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations


Title	Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations
Authors	Maziar Raissi
Abstract	A long-standing problem at the interface of artificial intelligence and applied mathematics is to devise an algorithm capable of achieving human level or even superhuman proficiency in transforming observed data into predictive mathematical models of the physical world. In the current era of abundance of data and advanced machine learning capabilities, the natural question arises: How can we automatically uncover the underlying laws of physics from high-dimensional data generated from experiments? In this work, we put forth a deep learning approach for discovering nonlinear partial differential equations from scattered and potentially noisy observations in space and time. Specifically, we approximate the unknown solution as well as the nonlinear dynamics by two deep neural networks. The first network acts as a prior on the unknown solution and essentially enables us to avoid numerical differentiations which are inherently ill-conditioned and unstable. The second network represents the nonlinear dynamics and helps us distill the mechanisms that govern the evolution of a given spatiotemporal data-set. We test the effectiveness of our approach for several benchmark problems spanning a number of scientific domains and demonstrate how the proposed framework can help us accurately learn the underlying dynamics and forecast future states of the system. In particular, we study the Burgers’, Korteweg-de Vries (KdV), Kuramoto-Sivashinsky, nonlinear Schr"{o}dinger, and Navier-Stokes equations.
Tasks
Published	2018-01-20
URL	http://arxiv.org/abs/1801.06637v1
PDF	http://arxiv.org/pdf/1801.06637v1.pdf
PWC	https://paperswithcode.com/paper/deep-hidden-physics-models-deep-learning-of
Repo	https://github.com/maziarraissi/DeepHPMs
Framework	tf

A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign


Title	A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign
Authors	Pham Quang Nhat Minh
Abstract	In this report, we describe our participant named-entity recognition system at VLSP 2018 evaluation campaign. We formalized the task as a sequence labeling problem using BIO encoding scheme. We applied a feature-based model which combines word, word-shape features, Brown-cluster-based features, and word-embedding-based features. We compare several methods to deal with nested entities in the dataset. We showed that combining tags of entities at all levels for training a sequence labeling model (joint-tag model) improved the accuracy of nested named-entity recognition.
Tasks	Named Entity Recognition, Nested Named Entity Recognition
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08463v1
PDF	http://arxiv.org/pdf/1803.08463v1.pdf
PWC	https://paperswithcode.com/paper/a-feature-based-model-for-nested-named-entity
Repo	https://github.com/minhpqn/vietner
Framework	none

edge2vec: Representation learning using edge semantics for biomedical knowledge discovery


Title	edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Authors	Zheng Gao, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong Liu, Jeremy Yang, Christopher Gessner, Brian Foote, David Wild, Qi Yu, Ying Ding
Abstract	Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.
Tasks	Information Retrieval, Knowledge Graphs, Representation Learning
Published	2018-09-07
URL	https://arxiv.org/abs/1809.02269v3
PDF	https://arxiv.org/pdf/1809.02269v3.pdf
PWC	https://paperswithcode.com/paper/edge2vec-representation-learning-using-edge
Repo	https://github.com/RoyZhengGao/edge2vec
Framework	none

Mobile Sensor Data Anonymization


Title	Mobile Sensor Data Anonymization
Authors	Mohammad Malekzadeh, Richard G. Clegg, Andrea Cavallaro, Hamed Haddadi
Abstract	Motion sensors such as accelerometers and gyroscopes measure the instant acceleration and rotation of a device, in three dimensions. Raw data streams from motion sensors embedded in portable and wearable devices may reveal private information about users without their awareness. For example, motion data might disclose the weight or gender of a user, or enable their re-identification. To address this problem, we propose an on-device transformation of sensor data to be shared for specific applications, such as monitoring selected daily activities, without revealing information that enables user identification. We formulate the anonymization problem using an information-theoretic approach and propose a new multi-objective loss function for training deep autoencoders. This loss function helps minimizing user-identity information as well as data distortion to preserve the application-specific utility. The training process regulates the encoder to disregard user-identifiable patterns and tunes the decoder to shape the output independently of users in the training set. The trained autoencoder can be deployed on a mobile or wearable device to anonymize sensor data even for users who are not included in the training dataset. Data from 24 users transformed by the proposed anonymizing autoencoder lead to a promising trade-off between utility and privacy, with an accuracy for activity recognition above 92% and an accuracy for user identification below 7%.
Tasks	Activity Recognition
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11546v3
PDF	http://arxiv.org/pdf/1810.11546v3.pdf
PWC	https://paperswithcode.com/paper/mobile-sensor-data-anonymization
Repo	https://github.com/mmalekzadeh/motion-sense
Framework	none

Large-Scale Stochastic Sampling from the Probability Simplex


Title	Large-Scale Stochastic Sampling from the Probability Simplex
Authors	Jack Baker, Paul Fearnhead, Emily B Fox, Christopher Nemeth
Abstract	Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demonstrate that because of this, current SGMCMC methods for the simplex struggle with sparse simplex spaces; when many of the components are close to zero. Unfortunately, many popular large-scale Bayesian models, such as network or topic models, require inference on sparse simplex spaces. To avoid the biases caused by this discretization error, we propose the stochastic Cox-Ingersoll-Ross process (SCIR), which removes all discretization error and we prove that samples from the SCIR process are asymptotically unbiased. We discuss how this idea can be extended to target other constrained spaces. Use of the SCIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.
Tasks	Bayesian Inference, Topic Models
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07137v2
PDF	http://arxiv.org/pdf/1806.07137v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-stochastic-sampling-from-the
Repo	https://github.com/jbaker92/scir
Framework	none

Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos


Title	Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos
Authors	Dawei Liang, Edison Thomaz
Abstract	Over the years, activity sensing and recognition has been shown to play a key enabling role in a wide range of applications, from sustainability and human-computer interaction to health care. While many recognition tasks have traditionally employed inertial sensors, acoustic-based methods offer the benefit of capturing rich contextual information, which can be useful when discriminating complex activities. Given the emergence of deep learning techniques and leveraging new, large-scaled multi-media datasets, this paper revisits the opportunity of training audio-based classifiers without the onerous and time-consuming task of annotating audio data. We propose a framework for audio-based activity recognition that makes use of millions of embedding features from public online video sound clips. Based on the combination of oversampling and deep learning approaches, our framework does not require further feature processing or outliers filtering as in prior work. We evaluated our approach in the context of Activities of Daily Living (ADL) by recognizing 15 everyday activities with 14 participants in their own homes, achieving 64.2% and 83.6% averaged within-subject accuracy in terms of top-1 and top-3 classification respectively. Individual class performance was also examined in the paper to further study the co-occurrence characteristics of the activities and the robustness of the framework.
Tasks	Activity Recognition
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08691v2
PDF	http://arxiv.org/pdf/1810.08691v2.pdf
PWC	https://paperswithcode.com/paper/audio-based-activities-of-daily-living-adl
Repo	https://github.com/dawei-liang/AudioAR_Research_Codes
Framework	tf

Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity


Title	Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity
Authors	Claudia Carpineti, Vincenzo Lomonaco, Luca Bedogni, Marco Di Felice, Luciano Bononi
Abstract	Making applications aware of the mobility experienced by the user can open the door to a wide range of novel services in different use-cases, from smart parking to vehicular traffic monitoring. In the literature, there are many different studies demonstrating the theoretical possibility of performing Transportation Mode Detection (TMD) by mining smart-phones embedded sensors data. However, very few of them provide details on the benchmarking process and on how to implement the detection process in practice. In this study, we provide guidelines and fundamental results that can be useful for both researcher and practitioners aiming at implementing a working TMD system. These guidelines consist of three main contributions. First, we detail the construction of a training dataset, gathered by heterogeneous users and including five different transportation modes; the dataset is made available to the research community as reference benchmark. Second, we provide an in-depth analysis of the sensor-relevance for the case of Dual TDM, which is required by most of mobility-aware applications. Third, we investigate the possibility to perform TMD of unknown users/instances not present in the training set and we compare with state-of-the-art Android APIs for activity recognition.
Tasks	Activity Recognition
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05596v1
PDF	http://arxiv.org/pdf/1810.05596v1.pdf
PWC	https://paperswithcode.com/paper/custom-dual-transportation-mode-detection-by
Repo	https://github.com/vlomonaco/US-TransportationMode
Framework	none

Object Level Visual Reasoning in Videos


Title	Object Level Visual Reasoning in Videos
Authors	Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, Greg Mori
Abstract	Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity recognition require a level of understanding that pushes beyond this and call for models with capabilities for fine distinction and detailed comprehension of interactions between actors and objects in a scene. We propose a model capable of learning to reason about semantically meaningful spatiotemporal interactions in videos. The key to our approach is a choice of performing this reasoning at the object level through the integration of state of the art object detection networks. This allows the model to learn detailed spatial interactions that exist at a semantic, object-interaction relevant level. We evaluate our method on three standard datasets (Twenty-BN Something-Something, VLOG and EPIC Kitchens) and achieve state of the art results on all of them. Finally, we show visualizations of the interactions learned by the model, which illustrate object classes and their interactions corresponding to different activity classes.
Tasks	Activity Recognition, Human Activity Recognition, Object Detection, Visual Reasoning
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06157v3
PDF	http://arxiv.org/pdf/1806.06157v3.pdf
PWC	https://paperswithcode.com/paper/object-level-visual-reasoning-in-videos
Repo	https://github.com/fabienbaradel/object_level_visual_reasoning
Framework	pytorch