October 20, 2019

3298 words 16 mins read

Paper Group AWR 273

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments. General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms. Affinity Derivation and Graph Merge for Instance Segmentation. SNIPER: Efficient Multi-Scale Training. Depth-aware CNN for RGB-D Segmenta …

Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments


Title	Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
Authors	Giovanni Morrone, Luca Pasa, Vadim Tikhanoff, Sonia Bergamaschi, Luciano Fadiga, Leonardo Badino
Abstract	In this paper, we address the problem of enhancing the speech of a speaker of interest in a cocktail party scenario when visual information of the speaker of interest is available. Contrary to most previous studies, we do not learn visual features on the typically small audio-visual datasets, but use an already available face landmark detector (trained on a separate image dataset). The landmarks are used by LSTM-based models to generate time-frequency masks which are applied to the acoustic mixed-speech spectrogram. Results show that: (i) landmark motion features are very effective features for this task, (ii) similarly to previous work, reconstruction of the target speaker’s spectrogram mediated by masking is significantly more accurate than direct spectrogram reconstruction, and (iii) the best masks depend on both motion landmark features and the input mixed-speech spectrogram. To the best of our knowledge, our proposed models are the first models trained and evaluated on the limited size GRID and TCD-TIMIT datasets, that achieve speaker-independent speech enhancement in a multi-talker setting.
Tasks	Speech Enhancement, Speech Separation
Published	2018-11-06
URL	https://arxiv.org/abs/1811.02480v3
PDF	https://arxiv.org/pdf/1811.02480v3.pdf
PWC	https://paperswithcode.com/paper/face-landmark-based-speaker-independent-audio
Repo	https://github.com/dr-pato/audio_visual_speech_enhancement
Framework	tf

General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms


Title	General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms
Authors	Diego Perez-Liebana, Jialin Liu, Ahmed Khalifa, Raluca D. Gaina, Julian Togelius, Simon M. Lucas
Abstract	General Video Game Playing (GVGP) aims at designing an agent that is capable of playing multiple video games with no human intervention. In 2014, The General Video Game AI (GVGAI) competition framework was created and released with the purpose of providing researchers a common open-source and easy to use platform for testing their AI methods with potentially infinity of games created using Video Game Description Language (VGDL). The framework has been expanded into several tracks during the last few years to meet the demand of different research directions. The agents are required either to play multiple unknown games with or without access to game simulations, or to design new game levels or rules. This survey paper presents the VGDL, the GVGAI framework, existing tracks, and reviews the wide use of GVGAI framework in research, education and competitions five years after its birth. A future plan of framework improvements is also described.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10363v4
PDF	http://arxiv.org/pdf/1802.10363v4.pdf
PWC	https://paperswithcode.com/paper/general-video-game-ai-a-multi-track-framework
Repo	https://github.com/aadharna/UntouchableThunder
Framework	pytorch

Affinity Derivation and Graph Merge for Instance Segmentation


Title	Affinity Derivation and Graph Merge for Instance Segmentation
Authors	Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu
Abstract	We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance. In our scheme, we use two neural networks with similar structure. One is to predict pixel level semantic score and the other is designed to derive pixel affinities. Regarding pixels as the vertexes and affinities as edges, we then propose a simple yet effective graph merge algorithm to cluster pixels into instances. Experimental results show that our scheme can generate fine-grained instance mask. With Cityscapes training data, the proposed scheme achieves 27.3 AP on test set.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10870v1
PDF	http://arxiv.org/pdf/1811.10870v1.pdf
PWC	https://paperswithcode.com/paper/affinity-derivation-and-graph-merge-for
Repo	https://github.com/xck36/GMIS
Framework	tf

SNIPER: Efficient Multi-Scale Training


Title	SNIPER: Efficient Multi-Scale Training
Authors	Bharat Singh, Mahyar Najibi, Larry S. Davis
Abstract	We present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks. Instead of processing every pixel in an image pyramid, SNIPER processes context regions around ground-truth instances (referred to as chips) at the appropriate scale. For background sampling, these context-regions are generated using proposals extracted from a region proposal network trained with a short learning schedule. Hence, the number of chips generated per image during training adaptively changes based on the scene complexity. SNIPER only processes 30% more pixels compared to the commonly used single scale training at 800x1333 pixels on the COCO dataset. But, it also observes samples from extreme resolutions of the image pyramid, like 1400x2000 pixels. As SNIPER operates on resampled low resolution chips (512x512 pixels), it can have a batch size as large as 20 on a single GPU even with a ResNet-101 backbone. Therefore it can benefit from batch-normalization during training without the need for synchronizing batch-normalization statistics across GPUs. SNIPER brings training of instance level recognition tasks like object detection closer to the protocol for image classification and suggests that the commonly accepted guideline that it is important to train on high resolution images for instance level visual recognition tasks might not be correct. Our implementation based on Faster-RCNN with a ResNet-101 backbone obtains an mAP of 47.6% on the COCO dataset for bounding box detection and can process 5 images per second during inference with a single GPU. Code is available at https://github.com/MahyarNajibi/SNIPER/.
Tasks	Object Detection
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09300v3
PDF	http://arxiv.org/pdf/1805.09300v3.pdf
PWC	https://paperswithcode.com/paper/sniper-efficient-multi-scale-training
Repo	https://github.com/Hwang64/PSIS
Framework	none

Depth-aware CNN for RGB-D Segmentation


Title	Depth-aware CNN for RGB-D Segmentation
Authors	Weiyue Wang, Ulrich Neumann
Abstract	Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs. State-of-the-art methods either use depth as additional images or process spatial information in 3D volumes or point clouds. These methods suffer from high computation and memory cost. To address these issues, we present Depth-aware CNN by introducing two intuitive, flexible and effective operations: depth-aware convolution and depth-aware average pooling. By leveraging depth similarity between pixels in the process of information propagation, geometry is seamlessly incorporated into CNN. Without introducing any additional parameters, both operators can be easily integrated into existing CNNs. Extensive experiments and ablation studies on challenging RGB-D semantic segmentation benchmarks validate the effectiveness and flexibility of our approach.
Tasks	Semantic Segmentation
Published	2018-03-19
URL	http://arxiv.org/abs/1803.06791v1
PDF	http://arxiv.org/pdf/1803.06791v1.pdf
PWC	https://paperswithcode.com/paper/depth-aware-cnn-for-rgb-d-segmentation
Repo	https://github.com/laughtervv/DepthAwareCNN
Framework	pytorch

Unsupervised Learning of Object Landmarks through Conditional Image Generation


Title	Unsupervised Learning of Object Landmarks through Conditional Image Generation
Authors	Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
Abstract	We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors. We further show that our method is applicable to a large variety of datasets - faces, people, 3D objects, and digits - without any modifications.
Tasks	Conditional Image Generation, Image Generation, Unsupervised Facial Landmark Detection
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07823v2
PDF	http://arxiv.org/pdf/1806.07823v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-object-landmarks
Repo	https://github.com/tomasjakab/imm
Framework	tf

Video Trajectory Classification and Anomaly Detection Using Hybrid CNN-VAE


Title	Video Trajectory Classification and Anomaly Detection Using Hybrid CNN-VAE
Authors	Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy, Adway Mitra
Abstract	Classifying time series data using neural networks is a challenging problem when the length of the data varies. Video object trajectories, which are key to many of the visual surveillance applications, are often found to be of varying length. If such trajectories are used to understand the behavior (normal or anomalous) of moving objects, they need to be represented correctly. In this paper, we propose video object trajectory classification and anomaly detection using a hybrid Convolutional Neural Network (CNN) and Variational Autoencoder (VAE) architecture. First, we introduce a high level representation of object trajectories using color gradient form. In the next stage, a semi-supervised way to annotate moving object trajectories extracted using Temporal Unknown Incremental Clustering (TUIC), has been applied for trajectory class labeling. Anomalous trajectories are separated using t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, a hybrid CNN-VAE architecture has been used for trajectory classification and anomaly detection. The results obtained using publicly available surveillance video datasets reveal that the proposed method can successfully identify some of the important traffic anomalies such as vehicles not following lane driving, sudden speed variations, abrupt termination of vehicle movement, and vehicles moving in wrong directions. The proposed method is able to detect above anomalies at higher accuracy as compared to existing anomaly detection methods.
Tasks	Anomaly Detection, Time Series
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07203v1
PDF	http://arxiv.org/pdf/1812.07203v1.pdf
PWC	https://paperswithcode.com/paper/video-trajectory-classification-and-anomaly
Repo	https://github.com/lisaong/hss
Framework	tf

Landmine Detection Using Autoencoders on Multi-polarization GPR Volumetric Data


Title	Landmine Detection Using Autoencoders on Multi-polarization GPR Volumetric Data
Authors	Paolo Bestagini, Federico Lombardi, Maurizio Lualdi, Francesco Picetti, Stefano Tubaro
Abstract	Buried landmines and unexploded remnants of war are a constant threat for the population of many countries that have been hit by wars in the past years. The huge amount of human lives lost due to this phenomenon has been a strong motivation for the research community toward the development of safe and robust techniques designed for landmine clearance. Nonetheless, being able to detect and localize buried landmines with high precision in an automatic fashion is still considered a challenging task due to the many different boundary conditions that characterize this problem (e.g., several kinds of objects to detect, different soils and meteorological conditions, etc.). In this paper, we propose a novel technique for buried object detection tailored to unexploded landmine discovery. The proposed solution exploits a specific kind of convolutional neural network (CNN) known as autoencoder to analyze volumetric data acquired with ground penetrating radar (GPR) using different polarizations. This method works in an anomaly detection framework, indeed we only train the autoencoder on GPR data acquired on landmine-free areas. The system then recognizes landmines as objects that are dissimilar to the soil used during the training step. Experiments conducted on real data show that the proposed technique requires little training and no ad-hoc data pre-processing to achieve accuracy higher than 93% on challenging datasets.
Tasks	Anomaly Detection, Object Detection
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01316v1
PDF	http://arxiv.org/pdf/1810.01316v1.pdf
PWC	https://paperswithcode.com/paper/landmine-detection-using-autoencoders-on
Repo	https://github.com/polimi-ispl/landmine_detection_autoencoder
Framework	tf

Adversarial Feedback Loop


Title	Adversarial Feedback Loop
Authors	Firas Shama, Roey Mechrez, Alon Shoshan, Lihi Zelnik-Manor
Abstract	Thanks to their remarkable generative capabilities, GANs have gained great popularity, and are used abundantly in state-of-the-art methods and applications. In a GAN based model, a discriminator is trained to learn the real data distribution. To date, it has been used only for training purposes, where it’s utilized to train the generator to provide real-looking outputs. In this paper we propose a novel method that makes an explicit use of the discriminator in test-time, in a feedback manner in order to improve the generator results. To the best of our knowledge it is the first time a discriminator is involved in test-time. We claim that the discriminator holds significant information on the real data distribution, that could be useful for test-time as well, a potential that has not been explored before. The approach we propose does not alter the conventional training stage. At test-time, however, it transfers the output from the generator into the discriminator, and uses feedback modules (convolutional blocks) to translate the features of the discriminator layers into corrections to the features of the generator layers, which are used eventually to get a better generator result. Our method can contribute to both conditional and unconditional GANs. As demonstrated by our experiments, it can improve the results of state-of-the-art networks for super-resolution, and image generation.
Tasks	Image Generation, Super-Resolution
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08126v1
PDF	http://arxiv.org/pdf/1811.08126v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-feedback-loop
Repo	https://github.com/shamafiras/AFL
Framework	pytorch

Lifelong Domain Word Embedding via Meta-Learning


Title	Lifelong Domain Word Embedding via Meta-Learning
Authors	Hu Xu, Bing Liu, Lei Shu, Philip S. Yu
Abstract	Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.
Tasks	Meta-Learning, Word Embeddings
Published	2018-05-25
URL	http://arxiv.org/abs/1805.09991v1
PDF	http://arxiv.org/pdf/1805.09991v1.pdf
PWC	https://paperswithcode.com/paper/lifelong-domain-word-embedding-via-meta
Repo	https://github.com/howardhsu/L-DEM
Framework	none

Determinantal thinning of point processes with network learning applications


Title	Determinantal thinning of point processes with network learning applications
Authors	Bartłomiej Błaszczyszyn, Paul Keeler
Abstract	A new type of dependent thinning for point processes in continuous space is proposed, which leverages the advantages of determinantal point processes defined on finite spaces and, as such, is particularly amenable to statistical, numerical, and simulation techniques. It gives a new point process that can serve as a network model exhibiting repulsion. The properties and functions of the new point process, such as moment measures, the Laplace functional, the void probabilities, as well as conditional (Palm) characteristics can be estimated accurately by simulating the underlying (non-thinned) point process, which can be taken, for example, to be Poisson. This is in contrast (and preference to) finite Gibbs point processes, which, instead of thinning, require weighting the Poisson realizations, involving usually intractable normalizing constants. Models based on determinantal point processes are also well suited for statistical (supervised) learning techniques, allowing the models to be fitted to observed network patterns with some particular geometric properties. We illustrate this approach by imitating with determinantal thinning the well-known Mat{'e}rn~II hard-core thinning, as well as a soft-core thinning depending on nearest-neighbour triangles. These two examples demonstrate how the proposed approach can lead to new, statistically optimized, probabilistic transmission scheduling schemes.
Tasks	Point Processes
Published	2018-10-09
URL	http://arxiv.org/abs/1810.08672v2
PDF	http://arxiv.org/pdf/1810.08672v2.pdf
PWC	https://paperswithcode.com/paper/determinantal-thinning-of-point-processes
Repo	https://github.com/hpaulkeeler/DetPoisson_MATLAB
Framework	none

Fast Estimation of Causal Interactions using Wold Processes


Title	Fast Estimation of Causal Interactions using Wold Processes
Authors	Flavio Figueiredo, Guilherme Borges, Pedro O. S. Vaz de Melo, Renato M. Assunção
Abstract	We here focus on the task of learning Granger causality matrices for multivariate point processes. In order to accomplish this task, our work is the first to explore the use of Wold processes. By doing so, we are able to develop asymptotically fast MCMC learning algorithms. With $N$ being the total number of events and $K$ the number of processes, our learning algorithm has a $O(N(,\log(N),+,\log(K)))$ cost per iteration. This is much faster than the $O(N^3,K^2)$ or $O(K^3)$ for the state of the art. Our approach, called GrangerBusca, is validated on nine datasets. This is an advance in relation to most prior efforts which focus mostly on subsets of the Memetracker data. Regarding accuracy, GrangerBusca is three times more accurate (in Precision@10) than the state of the art for the commonly explored subsets Memetracker. Due to GrangerBusca’s much lower training complexity, our approach is the only one able to train models for larger, full, sets of data.
Tasks	Point Processes
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04595v2
PDF	http://arxiv.org/pdf/1807.04595v2.pdf
PWC	https://paperswithcode.com/paper/fast-estimation-of-causal-interactions-using
Repo	https://github.com/flaviovdf/granger-busca
Framework	none

DPPy: Sampling DPPs with Python


Title	DPPy: Sampling DPPs with Python
Authors	Guillaume Gautier, Guillermo Polito, Rémi Bardenet, Michal Valko
Abstract	Determinantal point processes (DPPs) are specific probability distributions over clouds of points that are used as models and computational tools across physics, probability, statistics, and more recently machine learning. Sampling from DPPs is a challenge and therefore we present DPPy, a Python toolbox that gathers known exact and approximate sampling algorithms for both finite and continuous DPPs. The project is hosted on GitHub and equipped with an extensive documentation.
Tasks	Point Processes
Published	2018-09-19
URL	https://arxiv.org/abs/1809.07258v2
PDF	https://arxiv.org/pdf/1809.07258v2.pdf
PWC	https://paperswithcode.com/paper/dppy-sampling-determinantal-point-processes
Repo	https://github.com/guilgautier/DPPy_paper
Framework	none

A Review of Network Inference Techniques for Neural Activation Time Series


Title	A Review of Network Inference Techniques for Neural Activation Time Series
Authors	George Panagopoulos
Abstract	Studying neural connectivity is considered one of the most promising and challenging areas of modern neuroscience. The underpinnings of cognition are hidden in the way neurons interact with each other. However, our experimental methods of studying real neural connections at a microscopic level are still arduous and costly. An efficient alternative is to infer connectivity based on the neuronal activations using computational methods. A reliable method for network inference, would not only facilitate research of neural circuits without the need of laborious experiments but also reveal insights on the underlying mechanisms of the brain. In this work, we perform a review of methods for neural circuit inference given the activation time series of the neural population. Approaching it from machine learning perspective, we divide the methodologies into unsupervised and supervised learning. The methods are based on correlation metrics, probabilistic point processes, and neural networks. Furthermore, we add a data mining methodology inspired by influence estimation in social networks as a new supervised learning approach. For comparison, we use the small version of the Chalearn Connectomics competition, that is accompanied with ground truth connections between neurons. The experiments indicate that unsupervised learning methods perform better, however, supervised methods could surpass them given enough data and resources.
Tasks	Point Processes, Time Series
Published	2018-06-20
URL	http://arxiv.org/abs/1806.08212v1
PDF	http://arxiv.org/pdf/1806.08212v1.pdf
PWC	https://paperswithcode.com/paper/a-review-of-network-inference-techniques-for
Repo	https://github.com/GiorgosPanagopoulos/Network-Inference-From-Neural-Activations
Framework	tf

Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks


Title	Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks
Authors	Kenneth T. Co, Luis Muñoz-González, Sixte de Maupeou, Emil C. Lupu
Abstract	Deep Convolutional Networks (DCNs) have been shown to be vulnerable to adversarial examples—perturbed inputs specifically designed to produce intentional errors in the learning algorithms at test time. Existing input-agnostic adversarial perturbations exhibit interesting visual patterns that are currently unexplained. In this paper, we introduce a structured approach for generating Universal Adversarial Perturbations (UAPs) with procedural noise functions. Our approach unveils the systemic vulnerability of popular DCN models like Inception v3 and YOLO v3, with single noise patterns able to fool a model on up to 90% of the dataset. Procedural noise allows us to generate a distribution of UAPs with high universal evasion rates using only a few parameters. Additionally, we propose Bayesian optimization to efficiently learn procedural noise parameters to construct inexpensive untargeted black-box attacks. We demonstrate that it can achieve an average of less than 10 queries per successful attack, a 100-fold improvement on existing methods. We further motivate the use of input-agnostic defences to increase the stability of models to adversarial perturbations. The universality of our attacks suggests that DCN models may be sensitive to aggregations of low-level class-agnostic features. These findings give insight on the nature of some universal adversarial perturbations and how they could be generated in other applications.
Tasks
Published	2018-09-30
URL	https://arxiv.org/abs/1810.00470v4
PDF	https://arxiv.org/pdf/1810.00470v4.pdf
PWC	https://paperswithcode.com/paper/procedural-noise-adversarial-examples-for
Repo	https://github.com/kenny-co/procedural-advml
Framework	tf