Paper Group AWR 26
Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions. Learning to Detect Fake Face Images in the Wild. Deep Reinforcement Learning for Swarm Systems. Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling. Graph Neural Networks for IceCube Signal Classification. Multimodal Speech Emotion Recognition Using Audi …
Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions
Title | Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions |
Authors | Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, Fei Sha |
Abstract | Learning with limited data is a key challenge for visual recognition. Many few-shot learning methods address this challenge by learning an instance embedding function from seen classes and apply the function to instances from unseen classes with limited labels. This style of transfer learning is task-agnostic: the embedding function is not learned optimally discriminative with respect to the unseen classes, where discerning among them leads to the target task. In this paper, we propose a novel approach to adapt the instance embeddings to the target classification task with a set-to-set function, yielding embeddings that are task-specific and are discriminative. We empirically investigated various instantiations of such set-to-set functions and observed the Transformer is most effective – as it naturally satisfies key properties of our desired model. We denote this model as FEAT (few-shot embedding adaptation w/ Transformer) and validate it on both the standard few-shot classification benchmark and four extended few-shot learning settings with essential use cases, i.e., cross-domain, transductive, generalized few-shot learning, and low-shot learning. It archived consistent improvements over baseline models as well as previous methods and established the new state-of-the-art results on two benchmarks. |
Tasks | Few-Shot Image Classification, Few-Shot Learning, Transfer Learning |
Published | 2018-12-10 |
URL | https://arxiv.org/abs/1812.03664v5 |
https://arxiv.org/pdf/1812.03664v5.pdf | |
PWC | https://paperswithcode.com/paper/learning-embedding-adaptation-for-few-shot |
Repo | https://github.com/phecy/SSL-FEW-SHOT |
Framework | pytorch |
Learning to Detect Fake Face Images in the Wild
Title | Learning to Detect Fake Face Images in the Wild |
Authors | Chih-Chung Hsu, Chia-Yen Lee, Yi-Xiu Zhuang |
Abstract | Although Generative Adversarial Network (GAN) can be used to generate the realistic image, improper use of these technologies brings hidden concerns. For example, GAN can be used to generate a tampered video for specific people and inappropriate events, creating images that are detrimental to a particular person, and may even affect that personal safety. In this paper, we will develop a deep forgery discriminator (DeepFD) to efficiently and effectively detect the computer-generated images. Directly learning a binary classifier is relatively tricky since it is hard to find the common discriminative features for judging the fake images generated from different GANs. To address this shortcoming, we adopt contrastive loss in seeking the typical features of the synthesized images generated by different GANs and follow by concatenating a classifier to detect such computer-generated images. Experimental results demonstrate that the proposed DeepFD successfully detected 94.7% fake images generated by several state-of-the-art GANs. |
Tasks | Face Swapping, Fake Image Detection, GAN image forensics, Image Generation |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.08754v3 |
http://arxiv.org/pdf/1809.08754v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-detect-fake-face-images-in-the |
Repo | https://github.com/jesse1029/Fake-Face-Images-Detection-Tensorflow |
Framework | tf |
Deep Reinforcement Learning for Swarm Systems
Title | Deep Reinforcement Learning for Swarm Systems |
Authors | Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann |
Abstract | Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies. |
Tasks | Decision Making |
Published | 2018-07-17 |
URL | https://arxiv.org/abs/1807.06613v3 |
https://arxiv.org/pdf/1807.06613v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-swarm-systems |
Repo | https://github.com/LCAS/deep_rl_for_swarms |
Framework | none |
Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling
Title | Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling |
Authors | N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, S. Poria |
Abstract | Multimodal sentiment analysis is a very actively growing field of research. A promising area of opportunity in this field is to improve the multimodal fusion mechanism. We present a novel feature fusion strategy that proceeds in a hierarchical fashion, first fusing the modalities two in two and only then fusing all three modalities. On multimodal sentiment analysis of individual utterances, our strategy outperforms conventional concatenation of features by 1%, which amounts to 5% reduction in error rate. On utterance-level multimodal sentiment analysis of multi-utterance video clips, for which current state-of-the-art techniques incorporate contextual information from other utterances of the same clip, our hierarchical fusion gives up to 2.4% (almost 10% error rate reduction) over currently used concatenation. The implementation of our method is publicly available in the form of open-source code. |
Tasks | Multimodal Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis |
Published | 2018-06-16 |
URL | http://arxiv.org/abs/1806.06228v1 |
http://arxiv.org/pdf/1806.06228v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-sentiment-analysis-using |
Repo | https://github.com/SenticNet/hfusion |
Framework | tf |
Graph Neural Networks for IceCube Signal Classification
Title | Graph Neural Networks for IceCube Signal Classification |
Authors | Nicholas Choma, Federico Monti, Lisa Gerhardt, Tomasz Palczewski, Zahra Ronaghi, Prabhat, Wahid Bhimji, Michael M. Bronstein, Spencer R. Klein, Joan Bruna |
Abstract | Tasks involving the analysis of geometric (graph- and manifold-structured) data have recently gained prominence in the machine learning community, giving birth to a rapidly developing field of geometric deep learning. In this work, we leverage graph neural networks to improve signal detection in the IceCube neutrino observatory. The IceCube detector array is modeled as a graph, where vertices are sensors and edges are a learned function of the sensors’ spatial coordinates. As only a subset of IceCube’s sensors is active during a given observation, we note the adaptive nature of our GNN, wherein computation is restricted to the input signal support. We demonstrate the effectiveness of our GNN architecture on a task classifying IceCube events, where it outperforms both a traditional physics-based method as well as classical 3D convolution neural networks. |
Tasks | |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06166v1 |
http://arxiv.org/pdf/1809.06166v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-neural-networks-for-icecube-signal |
Repo | https://github.com/WIPACrepo/NuIntClassification |
Framework | pytorch |
Multimodal Speech Emotion Recognition Using Audio and Text
Title | Multimodal Speech Emotion Recognition Using Audio and Text |
Authors | Seunghyun Yoon, Seokhyun Byun, Kyomin Jung |
Abstract | Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%. |
Tasks | Emotion Classification, Emotion Recognition, Multimodal Emotion Recognition, Multimodal Sentiment Analysis, Speech Emotion Recognition |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.04635v1 |
http://arxiv.org/pdf/1810.04635v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-speech-emotion-recognition-using |
Repo | https://github.com/david-yoon/multimodal-speech-emotion |
Framework | tf |
Stackelberg GAN: Towards Provable Minimax Equilibrium via Multi-Generator Architectures
Title | Stackelberg GAN: Towards Provable Minimax Equilibrium via Multi-Generator Architectures |
Authors | Hongyang Zhang, Susu Xu, Jiantao Jiao, Pengtao Xie, Ruslan Salakhutdinov, Eric P. Xing |
Abstract | We study the problem of alleviating the instability issue in the GAN training procedure via new architecture design. The discrepancy between the minimax and maximin objective values could serve as a proxy for the difficulties that the alternating gradient descent encounters in the optimization of GANs. In this work, we give new results on the benefits of multi-generator architecture of GANs. We show that the minimax gap shrinks to $\epsilon$ as the number of generators increases with rate $\widetilde{O}(1/\epsilon)$. This improves over the best-known result of $\widetilde{O}(1/\epsilon^2)$. At the core of our techniques is a novel application of Shapley-Folkman lemma to the generic minimax problem, where in the literature the technique was only known to work when the objective function is restricted to the Lagrangian function of a constraint optimization problem. Our proposed Stackelberg GAN performs well experimentally in both synthetic and real-world datasets, improving Fr'echet Inception Distance by $14.61%$ over the previous multi-generator GANs on the benchmark datasets. |
Tasks | |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.08010v1 |
http://arxiv.org/pdf/1811.08010v1.pdf | |
PWC | https://paperswithcode.com/paper/stackelberg-gan-towards-provable-minimax |
Repo | https://github.com/hongyanz/Stackelberg-GAN |
Framework | pytorch |
Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations
Title | Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations |
Authors | Maziar Raissi |
Abstract | A long-standing problem at the interface of artificial intelligence and applied mathematics is to devise an algorithm capable of achieving human level or even superhuman proficiency in transforming observed data into predictive mathematical models of the physical world. In the current era of abundance of data and advanced machine learning capabilities, the natural question arises: How can we automatically uncover the underlying laws of physics from high-dimensional data generated from experiments? In this work, we put forth a deep learning approach for discovering nonlinear partial differential equations from scattered and potentially noisy observations in space and time. Specifically, we approximate the unknown solution as well as the nonlinear dynamics by two deep neural networks. The first network acts as a prior on the unknown solution and essentially enables us to avoid numerical differentiations which are inherently ill-conditioned and unstable. The second network represents the nonlinear dynamics and helps us distill the mechanisms that govern the evolution of a given spatiotemporal data-set. We test the effectiveness of our approach for several benchmark problems spanning a number of scientific domains and demonstrate how the proposed framework can help us accurately learn the underlying dynamics and forecast future states of the system. In particular, we study the Burgers’, Korteweg-de Vries (KdV), Kuramoto-Sivashinsky, nonlinear Schr"{o}dinger, and Navier-Stokes equations. |
Tasks | |
Published | 2018-01-20 |
URL | http://arxiv.org/abs/1801.06637v1 |
http://arxiv.org/pdf/1801.06637v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-hidden-physics-models-deep-learning-of |
Repo | https://github.com/maziarraissi/DeepHPMs |
Framework | tf |
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign
Title | A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign |
Authors | Pham Quang Nhat Minh |
Abstract | In this report, we describe our participant named-entity recognition system at VLSP 2018 evaluation campaign. We formalized the task as a sequence labeling problem using BIO encoding scheme. We applied a feature-based model which combines word, word-shape features, Brown-cluster-based features, and word-embedding-based features. We compare several methods to deal with nested entities in the dataset. We showed that combining tags of entities at all levels for training a sequence labeling model (joint-tag model) improved the accuracy of nested named-entity recognition. |
Tasks | Named Entity Recognition, Nested Named Entity Recognition |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08463v1 |
http://arxiv.org/pdf/1803.08463v1.pdf | |
PWC | https://paperswithcode.com/paper/a-feature-based-model-for-nested-named-entity |
Repo | https://github.com/minhpqn/vietner |
Framework | none |
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Title | edge2vec: Representation learning using edge semantics for biomedical knowledge discovery |
Authors | Zheng Gao, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong Liu, Jeremy Yang, Christopher Gessner, Brian Foote, David Wild, Qi Yu, Ying Ding |
Abstract | Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability. |
Tasks | Information Retrieval, Knowledge Graphs, Representation Learning |
Published | 2018-09-07 |
URL | https://arxiv.org/abs/1809.02269v3 |
https://arxiv.org/pdf/1809.02269v3.pdf | |
PWC | https://paperswithcode.com/paper/edge2vec-representation-learning-using-edge |
Repo | https://github.com/RoyZhengGao/edge2vec |
Framework | none |
Mobile Sensor Data Anonymization
Title | Mobile Sensor Data Anonymization |
Authors | Mohammad Malekzadeh, Richard G. Clegg, Andrea Cavallaro, Hamed Haddadi |
Abstract | Motion sensors such as accelerometers and gyroscopes measure the instant acceleration and rotation of a device, in three dimensions. Raw data streams from motion sensors embedded in portable and wearable devices may reveal private information about users without their awareness. For example, motion data might disclose the weight or gender of a user, or enable their re-identification. To address this problem, we propose an on-device transformation of sensor data to be shared for specific applications, such as monitoring selected daily activities, without revealing information that enables user identification. We formulate the anonymization problem using an information-theoretic approach and propose a new multi-objective loss function for training deep autoencoders. This loss function helps minimizing user-identity information as well as data distortion to preserve the application-specific utility. The training process regulates the encoder to disregard user-identifiable patterns and tunes the decoder to shape the output independently of users in the training set. The trained autoencoder can be deployed on a mobile or wearable device to anonymize sensor data even for users who are not included in the training dataset. Data from 24 users transformed by the proposed anonymizing autoencoder lead to a promising trade-off between utility and privacy, with an accuracy for activity recognition above 92% and an accuracy for user identification below 7%. |
Tasks | Activity Recognition |
Published | 2018-10-26 |
URL | http://arxiv.org/abs/1810.11546v3 |
http://arxiv.org/pdf/1810.11546v3.pdf | |
PWC | https://paperswithcode.com/paper/mobile-sensor-data-anonymization |
Repo | https://github.com/mmalekzadeh/motion-sense |
Framework | none |
Large-Scale Stochastic Sampling from the Probability Simplex
Title | Large-Scale Stochastic Sampling from the Probability Simplex |
Authors | Jack Baker, Paul Fearnhead, Emily B Fox, Christopher Nemeth |
Abstract | Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demonstrate that because of this, current SGMCMC methods for the simplex struggle with sparse simplex spaces; when many of the components are close to zero. Unfortunately, many popular large-scale Bayesian models, such as network or topic models, require inference on sparse simplex spaces. To avoid the biases caused by this discretization error, we propose the stochastic Cox-Ingersoll-Ross process (SCIR), which removes all discretization error and we prove that samples from the SCIR process are asymptotically unbiased. We discuss how this idea can be extended to target other constrained spaces. Use of the SCIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches. |
Tasks | Bayesian Inference, Topic Models |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07137v2 |
http://arxiv.org/pdf/1806.07137v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-stochastic-sampling-from-the |
Repo | https://github.com/jbaker92/scir |
Framework | none |
Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos
Title | Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos |
Authors | Dawei Liang, Edison Thomaz |
Abstract | Over the years, activity sensing and recognition has been shown to play a key enabling role in a wide range of applications, from sustainability and human-computer interaction to health care. While many recognition tasks have traditionally employed inertial sensors, acoustic-based methods offer the benefit of capturing rich contextual information, which can be useful when discriminating complex activities. Given the emergence of deep learning techniques and leveraging new, large-scaled multi-media datasets, this paper revisits the opportunity of training audio-based classifiers without the onerous and time-consuming task of annotating audio data. We propose a framework for audio-based activity recognition that makes use of millions of embedding features from public online video sound clips. Based on the combination of oversampling and deep learning approaches, our framework does not require further feature processing or outliers filtering as in prior work. We evaluated our approach in the context of Activities of Daily Living (ADL) by recognizing 15 everyday activities with 14 participants in their own homes, achieving 64.2% and 83.6% averaged within-subject accuracy in terms of top-1 and top-3 classification respectively. Individual class performance was also examined in the paper to further study the co-occurrence characteristics of the activities and the robustness of the framework. |
Tasks | Activity Recognition |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08691v2 |
http://arxiv.org/pdf/1810.08691v2.pdf | |
PWC | https://paperswithcode.com/paper/audio-based-activities-of-daily-living-adl |
Repo | https://github.com/dawei-liang/AudioAR_Research_Codes |
Framework | tf |
Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity
Title | Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity |
Authors | Claudia Carpineti, Vincenzo Lomonaco, Luca Bedogni, Marco Di Felice, Luciano Bononi |
Abstract | Making applications aware of the mobility experienced by the user can open the door to a wide range of novel services in different use-cases, from smart parking to vehicular traffic monitoring. In the literature, there are many different studies demonstrating the theoretical possibility of performing Transportation Mode Detection (TMD) by mining smart-phones embedded sensors data. However, very few of them provide details on the benchmarking process and on how to implement the detection process in practice. In this study, we provide guidelines and fundamental results that can be useful for both researcher and practitioners aiming at implementing a working TMD system. These guidelines consist of three main contributions. First, we detail the construction of a training dataset, gathered by heterogeneous users and including five different transportation modes; the dataset is made available to the research community as reference benchmark. Second, we provide an in-depth analysis of the sensor-relevance for the case of Dual TDM, which is required by most of mobility-aware applications. Third, we investigate the possibility to perform TMD of unknown users/instances not present in the training set and we compare with state-of-the-art Android APIs for activity recognition. |
Tasks | Activity Recognition |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.05596v1 |
http://arxiv.org/pdf/1810.05596v1.pdf | |
PWC | https://paperswithcode.com/paper/custom-dual-transportation-mode-detection-by |
Repo | https://github.com/vlomonaco/US-TransportationMode |
Framework | none |
Object Level Visual Reasoning in Videos
Title | Object Level Visual Reasoning in Videos |
Authors | Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, Greg Mori |
Abstract | Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity recognition require a level of understanding that pushes beyond this and call for models with capabilities for fine distinction and detailed comprehension of interactions between actors and objects in a scene. We propose a model capable of learning to reason about semantically meaningful spatiotemporal interactions in videos. The key to our approach is a choice of performing this reasoning at the object level through the integration of state of the art object detection networks. This allows the model to learn detailed spatial interactions that exist at a semantic, object-interaction relevant level. We evaluate our method on three standard datasets (Twenty-BN Something-Something, VLOG and EPIC Kitchens) and achieve state of the art results on all of them. Finally, we show visualizations of the interactions learned by the model, which illustrate object classes and their interactions corresponding to different activity classes. |
Tasks | Activity Recognition, Human Activity Recognition, Object Detection, Visual Reasoning |
Published | 2018-06-16 |
URL | http://arxiv.org/abs/1806.06157v3 |
http://arxiv.org/pdf/1806.06157v3.pdf | |
PWC | https://paperswithcode.com/paper/object-level-visual-reasoning-in-videos |
Repo | https://github.com/fabienbaradel/object_level_visual_reasoning |
Framework | pytorch |