Paper Group AWR 273
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments. General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms. Affinity Derivation and Graph Merge for Instance Segmentation. SNIPER: Efficient Multi-Scale Training. Depth-aware CNN for RGB-D Segmenta …
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
Title | Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments |
Authors | Giovanni Morrone, Luca Pasa, Vadim Tikhanoff, Sonia Bergamaschi, Luciano Fadiga, Leonardo Badino |
Abstract | In this paper, we address the problem of enhancing the speech of a speaker of interest in a cocktail party scenario when visual information of the speaker of interest is available. Contrary to most previous studies, we do not learn visual features on the typically small audio-visual datasets, but use an already available face landmark detector (trained on a separate image dataset). The landmarks are used by LSTM-based models to generate time-frequency masks which are applied to the acoustic mixed-speech spectrogram. Results show that: (i) landmark motion features are very effective features for this task, (ii) similarly to previous work, reconstruction of the target speaker’s spectrogram mediated by masking is significantly more accurate than direct spectrogram reconstruction, and (iii) the best masks depend on both motion landmark features and the input mixed-speech spectrogram. To the best of our knowledge, our proposed models are the first models trained and evaluated on the limited size GRID and TCD-TIMIT datasets, that achieve speaker-independent speech enhancement in a multi-talker setting. |
Tasks | Speech Enhancement, Speech Separation |
Published | 2018-11-06 |
URL | https://arxiv.org/abs/1811.02480v3 |
https://arxiv.org/pdf/1811.02480v3.pdf | |
PWC | https://paperswithcode.com/paper/face-landmark-based-speaker-independent-audio |
Repo | https://github.com/dr-pato/audio_visual_speech_enhancement |
Framework | tf |
General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms
Title | General Video Game AI: a Multi-Track Framework for Evaluating Agents, Games and Content Generation Algorithms |
Authors | Diego Perez-Liebana, Jialin Liu, Ahmed Khalifa, Raluca D. Gaina, Julian Togelius, Simon M. Lucas |
Abstract | General Video Game Playing (GVGP) aims at designing an agent that is capable of playing multiple video games with no human intervention. In 2014, The General Video Game AI (GVGAI) competition framework was created and released with the purpose of providing researchers a common open-source and easy to use platform for testing their AI methods with potentially infinity of games created using Video Game Description Language (VGDL). The framework has been expanded into several tracks during the last few years to meet the demand of different research directions. The agents are required either to play multiple unknown games with or without access to game simulations, or to design new game levels or rules. This survey paper presents the VGDL, the GVGAI framework, existing tracks, and reviews the wide use of GVGAI framework in research, education and competitions five years after its birth. A future plan of framework improvements is also described. |
Tasks | |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10363v4 |
http://arxiv.org/pdf/1802.10363v4.pdf | |
PWC | https://paperswithcode.com/paper/general-video-game-ai-a-multi-track-framework |
Repo | https://github.com/aadharna/UntouchableThunder |
Framework | pytorch |
Affinity Derivation and Graph Merge for Instance Segmentation
Title | Affinity Derivation and Graph Merge for Instance Segmentation |
Authors | Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, Jizheng Xu, Houqiang Li, Yan Lu |
Abstract | We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance. In our scheme, we use two neural networks with similar structure. One is to predict pixel level semantic score and the other is designed to derive pixel affinities. Regarding pixels as the vertexes and affinities as edges, we then propose a simple yet effective graph merge algorithm to cluster pixels into instances. Experimental results show that our scheme can generate fine-grained instance mask. With Cityscapes training data, the proposed scheme achieves 27.3 AP on test set. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10870v1 |
http://arxiv.org/pdf/1811.10870v1.pdf | |
PWC | https://paperswithcode.com/paper/affinity-derivation-and-graph-merge-for |
Repo | https://github.com/xck36/GMIS |
Framework | tf |
SNIPER: Efficient Multi-Scale Training
Title | SNIPER: Efficient Multi-Scale Training |
Authors | Bharat Singh, Mahyar Najibi, Larry S. Davis |
Abstract | We present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks. Instead of processing every pixel in an image pyramid, SNIPER processes context regions around ground-truth instances (referred to as chips) at the appropriate scale. For background sampling, these context-regions are generated using proposals extracted from a region proposal network trained with a short learning schedule. Hence, the number of chips generated per image during training adaptively changes based on the scene complexity. SNIPER only processes 30% more pixels compared to the commonly used single scale training at 800x1333 pixels on the COCO dataset. But, it also observes samples from extreme resolutions of the image pyramid, like 1400x2000 pixels. As SNIPER operates on resampled low resolution chips (512x512 pixels), it can have a batch size as large as 20 on a single GPU even with a ResNet-101 backbone. Therefore it can benefit from batch-normalization during training without the need for synchronizing batch-normalization statistics across GPUs. SNIPER brings training of instance level recognition tasks like object detection closer to the protocol for image classification and suggests that the commonly accepted guideline that it is important to train on high resolution images for instance level visual recognition tasks might not be correct. Our implementation based on Faster-RCNN with a ResNet-101 backbone obtains an mAP of 47.6% on the COCO dataset for bounding box detection and can process 5 images per second during inference with a single GPU. Code is available at https://github.com/MahyarNajibi/SNIPER/. |
Tasks | Object Detection |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09300v3 |
http://arxiv.org/pdf/1805.09300v3.pdf | |
PWC | https://paperswithcode.com/paper/sniper-efficient-multi-scale-training |
Repo | https://github.com/Hwang64/PSIS |
Framework | none |
Depth-aware CNN for RGB-D Segmentation
Title | Depth-aware CNN for RGB-D Segmentation |
Authors | Weiyue Wang, Ulrich Neumann |
Abstract | Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs. State-of-the-art methods either use depth as additional images or process spatial information in 3D volumes or point clouds. These methods suffer from high computation and memory cost. To address these issues, we present Depth-aware CNN by introducing two intuitive, flexible and effective operations: depth-aware convolution and depth-aware average pooling. By leveraging depth similarity between pixels in the process of information propagation, geometry is seamlessly incorporated into CNN. Without introducing any additional parameters, both operators can be easily integrated into existing CNNs. Extensive experiments and ablation studies on challenging RGB-D semantic segmentation benchmarks validate the effectiveness and flexibility of our approach. |
Tasks | Semantic Segmentation |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06791v1 |
http://arxiv.org/pdf/1803.06791v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-aware-cnn-for-rgb-d-segmentation |
Repo | https://github.com/laughtervv/DepthAwareCNN |
Framework | pytorch |
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Title | Unsupervised Learning of Object Landmarks through Conditional Image Generation |
Authors | Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi |
Abstract | We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors. We further show that our method is applicable to a large variety of datasets - faces, people, 3D objects, and digits - without any modifications. |
Tasks | Conditional Image Generation, Image Generation, Unsupervised Facial Landmark Detection |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07823v2 |
http://arxiv.org/pdf/1806.07823v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-object-landmarks |
Repo | https://github.com/tomasjakab/imm |
Framework | tf |
Video Trajectory Classification and Anomaly Detection Using Hybrid CNN-VAE
Title | Video Trajectory Classification and Anomaly Detection Using Hybrid CNN-VAE |
Authors | Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy, Adway Mitra |
Abstract | Classifying time series data using neural networks is a challenging problem when the length of the data varies. Video object trajectories, which are key to many of the visual surveillance applications, are often found to be of varying length. If such trajectories are used to understand the behavior (normal or anomalous) of moving objects, they need to be represented correctly. In this paper, we propose video object trajectory classification and anomaly detection using a hybrid Convolutional Neural Network (CNN) and Variational Autoencoder (VAE) architecture. First, we introduce a high level representation of object trajectories using color gradient form. In the next stage, a semi-supervised way to annotate moving object trajectories extracted using Temporal Unknown Incremental Clustering (TUIC), has been applied for trajectory class labeling. Anomalous trajectories are separated using t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, a hybrid CNN-VAE architecture has been used for trajectory classification and anomaly detection. The results obtained using publicly available surveillance video datasets reveal that the proposed method can successfully identify some of the important traffic anomalies such as vehicles not following lane driving, sudden speed variations, abrupt termination of vehicle movement, and vehicles moving in wrong directions. The proposed method is able to detect above anomalies at higher accuracy as compared to existing anomaly detection methods. |
Tasks | Anomaly Detection, Time Series |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07203v1 |
http://arxiv.org/pdf/1812.07203v1.pdf | |
PWC | https://paperswithcode.com/paper/video-trajectory-classification-and-anomaly |
Repo | https://github.com/lisaong/hss |
Framework | tf |
Landmine Detection Using Autoencoders on Multi-polarization GPR Volumetric Data
Title | Landmine Detection Using Autoencoders on Multi-polarization GPR Volumetric Data |
Authors | Paolo Bestagini, Federico Lombardi, Maurizio Lualdi, Francesco Picetti, Stefano Tubaro |
Abstract | Buried landmines and unexploded remnants of war are a constant threat for the population of many countries that have been hit by wars in the past years. The huge amount of human lives lost due to this phenomenon has been a strong motivation for the research community toward the development of safe and robust techniques designed for landmine clearance. Nonetheless, being able to detect and localize buried landmines with high precision in an automatic fashion is still considered a challenging task due to the many different boundary conditions that characterize this problem (e.g., several kinds of objects to detect, different soils and meteorological conditions, etc.). In this paper, we propose a novel technique for buried object detection tailored to unexploded landmine discovery. The proposed solution exploits a specific kind of convolutional neural network (CNN) known as autoencoder to analyze volumetric data acquired with ground penetrating radar (GPR) using different polarizations. This method works in an anomaly detection framework, indeed we only train the autoencoder on GPR data acquired on landmine-free areas. The system then recognizes landmines as objects that are dissimilar to the soil used during the training step. Experiments conducted on real data show that the proposed technique requires little training and no ad-hoc data pre-processing to achieve accuracy higher than 93% on challenging datasets. |
Tasks | Anomaly Detection, Object Detection |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01316v1 |
http://arxiv.org/pdf/1810.01316v1.pdf | |
PWC | https://paperswithcode.com/paper/landmine-detection-using-autoencoders-on |
Repo | https://github.com/polimi-ispl/landmine_detection_autoencoder |
Framework | tf |
Adversarial Feedback Loop
Title | Adversarial Feedback Loop |
Authors | Firas Shama, Roey Mechrez, Alon Shoshan, Lihi Zelnik-Manor |
Abstract | Thanks to their remarkable generative capabilities, GANs have gained great popularity, and are used abundantly in state-of-the-art methods and applications. In a GAN based model, a discriminator is trained to learn the real data distribution. To date, it has been used only for training purposes, where it’s utilized to train the generator to provide real-looking outputs. In this paper we propose a novel method that makes an explicit use of the discriminator in test-time, in a feedback manner in order to improve the generator results. To the best of our knowledge it is the first time a discriminator is involved in test-time. We claim that the discriminator holds significant information on the real data distribution, that could be useful for test-time as well, a potential that has not been explored before. The approach we propose does not alter the conventional training stage. At test-time, however, it transfers the output from the generator into the discriminator, and uses feedback modules (convolutional blocks) to translate the features of the discriminator layers into corrections to the features of the generator layers, which are used eventually to get a better generator result. Our method can contribute to both conditional and unconditional GANs. As demonstrated by our experiments, it can improve the results of state-of-the-art networks for super-resolution, and image generation. |
Tasks | Image Generation, Super-Resolution |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08126v1 |
http://arxiv.org/pdf/1811.08126v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-feedback-loop |
Repo | https://github.com/shamafiras/AFL |
Framework | pytorch |
Lifelong Domain Word Embedding via Meta-Learning
Title | Lifelong Domain Word Embedding via Meta-Learning |
Authors | Hu Xu, Bing Liu, Lei Shu, Philip S. Yu |
Abstract | Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks. |
Tasks | Meta-Learning, Word Embeddings |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.09991v1 |
http://arxiv.org/pdf/1805.09991v1.pdf | |
PWC | https://paperswithcode.com/paper/lifelong-domain-word-embedding-via-meta |
Repo | https://github.com/howardhsu/L-DEM |
Framework | none |
Determinantal thinning of point processes with network learning applications
Title | Determinantal thinning of point processes with network learning applications |
Authors | Bartłomiej Błaszczyszyn, Paul Keeler |
Abstract | A new type of dependent thinning for point processes in continuous space is proposed, which leverages the advantages of determinantal point processes defined on finite spaces and, as such, is particularly amenable to statistical, numerical, and simulation techniques. It gives a new point process that can serve as a network model exhibiting repulsion. The properties and functions of the new point process, such as moment measures, the Laplace functional, the void probabilities, as well as conditional (Palm) characteristics can be estimated accurately by simulating the underlying (non-thinned) point process, which can be taken, for example, to be Poisson. This is in contrast (and preference to) finite Gibbs point processes, which, instead of thinning, require weighting the Poisson realizations, involving usually intractable normalizing constants. Models based on determinantal point processes are also well suited for statistical (supervised) learning techniques, allowing the models to be fitted to observed network patterns with some particular geometric properties. We illustrate this approach by imitating with determinantal thinning the well-known Mat{'e}rn~II hard-core thinning, as well as a soft-core thinning depending on nearest-neighbour triangles. These two examples demonstrate how the proposed approach can lead to new, statistically optimized, probabilistic transmission scheduling schemes. |
Tasks | Point Processes |
Published | 2018-10-09 |
URL | http://arxiv.org/abs/1810.08672v2 |
http://arxiv.org/pdf/1810.08672v2.pdf | |
PWC | https://paperswithcode.com/paper/determinantal-thinning-of-point-processes |
Repo | https://github.com/hpaulkeeler/DetPoisson_MATLAB |
Framework | none |
Fast Estimation of Causal Interactions using Wold Processes
Title | Fast Estimation of Causal Interactions using Wold Processes |
Authors | Flavio Figueiredo, Guilherme Borges, Pedro O. S. Vaz de Melo, Renato M. Assunção |
Abstract | We here focus on the task of learning Granger causality matrices for multivariate point processes. In order to accomplish this task, our work is the first to explore the use of Wold processes. By doing so, we are able to develop asymptotically fast MCMC learning algorithms. With $N$ being the total number of events and $K$ the number of processes, our learning algorithm has a $O(N(,\log(N),+,\log(K)))$ cost per iteration. This is much faster than the $O(N^3,K^2)$ or $O(K^3)$ for the state of the art. Our approach, called GrangerBusca, is validated on nine datasets. This is an advance in relation to most prior efforts which focus mostly on subsets of the Memetracker data. Regarding accuracy, GrangerBusca is three times more accurate (in Precision@10) than the state of the art for the commonly explored subsets Memetracker. Due to GrangerBusca’s much lower training complexity, our approach is the only one able to train models for larger, full, sets of data. |
Tasks | Point Processes |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04595v2 |
http://arxiv.org/pdf/1807.04595v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-estimation-of-causal-interactions-using |
Repo | https://github.com/flaviovdf/granger-busca |
Framework | none |
DPPy: Sampling DPPs with Python
Title | DPPy: Sampling DPPs with Python |
Authors | Guillaume Gautier, Guillermo Polito, Rémi Bardenet, Michal Valko |
Abstract | Determinantal point processes (DPPs) are specific probability distributions over clouds of points that are used as models and computational tools across physics, probability, statistics, and more recently machine learning. Sampling from DPPs is a challenge and therefore we present DPPy, a Python toolbox that gathers known exact and approximate sampling algorithms for both finite and continuous DPPs. The project is hosted on GitHub and equipped with an extensive documentation. |
Tasks | Point Processes |
Published | 2018-09-19 |
URL | https://arxiv.org/abs/1809.07258v2 |
https://arxiv.org/pdf/1809.07258v2.pdf | |
PWC | https://paperswithcode.com/paper/dppy-sampling-determinantal-point-processes |
Repo | https://github.com/guilgautier/DPPy_paper |
Framework | none |
A Review of Network Inference Techniques for Neural Activation Time Series
Title | A Review of Network Inference Techniques for Neural Activation Time Series |
Authors | George Panagopoulos |
Abstract | Studying neural connectivity is considered one of the most promising and challenging areas of modern neuroscience. The underpinnings of cognition are hidden in the way neurons interact with each other. However, our experimental methods of studying real neural connections at a microscopic level are still arduous and costly. An efficient alternative is to infer connectivity based on the neuronal activations using computational methods. A reliable method for network inference, would not only facilitate research of neural circuits without the need of laborious experiments but also reveal insights on the underlying mechanisms of the brain. In this work, we perform a review of methods for neural circuit inference given the activation time series of the neural population. Approaching it from machine learning perspective, we divide the methodologies into unsupervised and supervised learning. The methods are based on correlation metrics, probabilistic point processes, and neural networks. Furthermore, we add a data mining methodology inspired by influence estimation in social networks as a new supervised learning approach. For comparison, we use the small version of the Chalearn Connectomics competition, that is accompanied with ground truth connections between neurons. The experiments indicate that unsupervised learning methods perform better, however, supervised methods could surpass them given enough data and resources. |
Tasks | Point Processes, Time Series |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.08212v1 |
http://arxiv.org/pdf/1806.08212v1.pdf | |
PWC | https://paperswithcode.com/paper/a-review-of-network-inference-techniques-for |
Repo | https://github.com/GiorgosPanagopoulos/Network-Inference-From-Neural-Activations |
Framework | tf |
Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks
Title | Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks |
Authors | Kenneth T. Co, Luis Muñoz-González, Sixte de Maupeou, Emil C. Lupu |
Abstract | Deep Convolutional Networks (DCNs) have been shown to be vulnerable to adversarial examples—perturbed inputs specifically designed to produce intentional errors in the learning algorithms at test time. Existing input-agnostic adversarial perturbations exhibit interesting visual patterns that are currently unexplained. In this paper, we introduce a structured approach for generating Universal Adversarial Perturbations (UAPs) with procedural noise functions. Our approach unveils the systemic vulnerability of popular DCN models like Inception v3 and YOLO v3, with single noise patterns able to fool a model on up to 90% of the dataset. Procedural noise allows us to generate a distribution of UAPs with high universal evasion rates using only a few parameters. Additionally, we propose Bayesian optimization to efficiently learn procedural noise parameters to construct inexpensive untargeted black-box attacks. We demonstrate that it can achieve an average of less than 10 queries per successful attack, a 100-fold improvement on existing methods. We further motivate the use of input-agnostic defences to increase the stability of models to adversarial perturbations. The universality of our attacks suggests that DCN models may be sensitive to aggregations of low-level class-agnostic features. These findings give insight on the nature of some universal adversarial perturbations and how they could be generated in other applications. |
Tasks | |
Published | 2018-09-30 |
URL | https://arxiv.org/abs/1810.00470v4 |
https://arxiv.org/pdf/1810.00470v4.pdf | |
PWC | https://paperswithcode.com/paper/procedural-noise-adversarial-examples-for |
Repo | https://github.com/kenny-co/procedural-advml |
Framework | tf |