January 30, 2020

3004 words 15 mins read

Paper Group ANR 378

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments. The Regularization of Small Sub-Constraint Satisfaction Problems. Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models. Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency. Variational bridge constr …

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments


Title	Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
Authors	Guan-Lin Chao, William Chan, Ian Lane
Abstract	Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlapping speech with similar frequency and temporal characteristics. We propose the use of speaker-targeted acoustic and audio-visual models for this task. We complement the acoustic features in a hybrid DNN-HMM model with information of the target speaker’s identity as well as visual features from the mouth region of the target speaker. Experimentation was performed using simulated cocktail-party data generated from the GRID audio-visual corpus by overlapping two speakers’s speech on a single acoustic channel. Our audio-only baseline achieved a WER of 26.3%. The audio-visual model improved the WER to 4.4%. Introducing speaker identity information had an even more pronounced effect, improving the WER to 3.6%. Combining both approaches, however, did not significantly improve performance further. Our work demonstrates that speaker-targeted models can significantly improve the speech recognition in cocktail party environments.
Tasks	Speech Recognition
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05962v1
PDF	https://arxiv.org/pdf/1906.05962v1.pdf
PWC	https://paperswithcode.com/paper/speaker-targeted-audio-visual-models-for
Repo
Framework

The Regularization of Small Sub-Constraint Satisfaction Problems


Title	The Regularization of Small Sub-Constraint Satisfaction Problems
Authors	Sven Löffler, Ke Liu, Petra Hofstedt
Abstract	This paper describes a new approach on optimization of constraint satisfaction problems (CSPs) by means of substituting sub-CSPs with locally consistent regular membership constraints. The purpose of this approach is to reduce the number of fails in the resolution process, to improve the inferences made during search by the constraint solver by strengthening constraint propagation, and to maintain the level of propagation while reducing the cost of propagating the constraints. Our experimental results show improvements in terms of the resolution speed compared to the original CSPs and a competitiveness to the recent tabulation approach. Besides, our approach can be realized in a preprocessing step, and therefore wouldn’t collide with redundancy constraints or parallel computing if implemented.
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05907v1
PDF	https://arxiv.org/pdf/1908.05907v1.pdf
PWC	https://paperswithcode.com/paper/the-regularization-of-small-sub-constraint
Repo
Framework

Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models


Title	Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models
Authors	Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová
Abstract	In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model. We study a loss function that is the negative log-likelihood of the model. We analyse the number of local minima at a fixed distance from the signal/spike with the Kac-Rice formula, and locate trivialization of the landscape at large signal-to-noise ratios. We evaluate in a closed form the performance of a gradient flow algorithm using integro-differential PDEs as developed in physics of disordered systems for the Langevin dynamics. We analyze the performance of an approximate message passing algorithm estimating the maximum likelihood configuration via its state evolution. We conclude by comparing the above results: while we observe a drastic slow down of the gradient flow dynamics even in the region where the landscape is trivial, both the analyzed algorithms are shown to perform well even in the part of the region of parameters where spurious local minima are present.
Tasks
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00139v4
PDF	https://arxiv.org/pdf/1902.00139v4.pdf
PWC	https://paperswithcode.com/paper/passed-spurious-analysing-descent-algorithms
Repo
Framework

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency


Title	Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency
Authors	Tejas Khot, Shubham Agrawal, Shubham Tulsiani, Christoph Mertz, Simon Lucey, Martial Hebert
Abstract	We present a learning based approach for multi-view stereopsis (MVS). While current deep MVS methods achieve impressive results, they crucially rely on ground-truth 3D training data, and acquisition of such precise 3D geometry for supervision is a major hurdle. Our framework instead leverages photometric consistency between multiple views as supervisory signal for learning depth prediction in a wide baseline MVS setup. However, naively applying photo consistency constraints is undesirable due to occlusion and lighting changes across views. To overcome this, we propose a robust loss formulation that: a) enforces first order consistency and b) for each point, selectively enforces consistency with some views, thus implicitly handling occlusions. We demonstrate our ability to learn MVS without 3D supervision using a real dataset, and show that each component of our proposed robust loss results in a significant improvement. We qualitatively observe that our reconstructions are often more complete than the acquired ground truth, further showing the merits of this approach. Lastly, our learned model generalizes to novel settings, and our approach allows adaptation of existing CNNs to datasets without ground-truth 3D by unsupervised finetuning. Project webpage: https://tejaskhot.github.io/unsup_mvs
Tasks	Depth Estimation
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02706v2
PDF	https://arxiv.org/pdf/1905.02706v2.pdf
PWC	https://paperswithcode.com/paper/learning-unsupervised-multi-view-stereopsis
Repo
Framework

Variational bridge constructs for approximate Gaussian process regression


Title	Variational bridge constructs for approximate Gaussian process regression
Authors	Wil O C Ward, Mauricio A Álvarez
Abstract	This paper introduces a method to approximate Gaussian process regression by representing the problem as a stochastic differential equation and using variational inference to approximate solutions. The approximations are compared with full GP regression and generated paths are demonstrated to be indistinguishable from GP samples. We show that the approach extends easily to non-linear dynamics and discuss extensions to which the approach can be easily applied.
Tasks
Published	2019-01-07
URL	http://arxiv.org/abs/1901.01727v1
PDF	http://arxiv.org/pdf/1901.01727v1.pdf
PWC	https://paperswithcode.com/paper/variational-bridge-constructs-for-approximate
Repo
Framework

The Termination Critic


Title	The Termination Critic
Authors	Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup
Abstract	In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination condition, as opposed to – as is common – the policy. The termination condition is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding – arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a “critic” for the termination condition. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning and planning.
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09996v1
PDF	http://arxiv.org/pdf/1902.09996v1.pdf
PWC	https://paperswithcode.com/paper/the-termination-critic
Repo
Framework

Nearest Neighbor Search-Based Bitwise Source Separation Using Discriminant Winner-Take-All Hashing


Title	Nearest Neighbor Search-Based Bitwise Source Separation Using Discriminant Winner-Take-All Hashing
Authors	Sunwoo Kim, Minje Kim
Abstract	We propose an iteration-free source separation algorithm based on Winner-Take-All (WTA) hash codes, which is a faster, yet accurate alternative to a complex machine learning model for single-channel source separation in a resource-constrained environment. We first generate random permutations with WTA hashing to encode the shape of the multidimensional audio spectrum to a reduced bitstring representation. A nearest neighbor search on the hash codes of an incoming noisy spectrum as the query string results in the closest matches among the hashed mixture spectra. Using the indices of the matching frames, we obtain the corresponding ideal binary mask vectors for denoising. Since both the training data and the search operation are bitwise, the procedure can be done efficiently in hardware implementations. Experimental results show that the WTA hash codes are discriminant and provide an affordable dictionary search mechanism that leads to a competent performance compared to a comprehensive model and oracle masking.
Tasks	Denoising
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09799v1
PDF	https://arxiv.org/pdf/1908.09799v1.pdf
PWC	https://paperswithcode.com/paper/nearest-neighbor-search-based-bitwise-source
Repo
Framework

Evaluation of Deep Learning-based prediction models in Microgrids


Title	Evaluation of Deep Learning-based prediction models in Microgrids
Authors	Alexey Györi, Mathis Niederau, Violett Zeller, Volker Stich
Abstract	It is crucial today that economies harness renewable energies and integrate them into the existing grid. Conventionally, energy has been generated based on forecasts of peak and low demands. Renewable energy can neither be produced on demand nor stored efficiently. Thus, the aim of this paper is to evaluate Deep Learning-based forecasts of energy consumption to align energy consumption with renewable energy production. Using a dataset from a use-case related to landfill leachate management, multiple prediction models were used to forecast energy demand.The results were validated based on the same dataset from the recycling industry. Shallow models showed the lowest Mean Absolute Percentage Error (MAPE), significantly outperforming a persistence baseline for both, long-term (30 days), mid-term (7 days) and short-term (1 day) forecasts. A potential decrease of up to 23% in peak energy demand was found that could lead to a reduction of 3,091 kg in CO2-emissions per year. Our approach requires low finanacial investments for energy-management hardware, making it suitable for usage in Small and Medium sized Enterprises (SMEs).
Tasks
Published	2019-09-29
URL	https://arxiv.org/abs/1910.00500v1
PDF	https://arxiv.org/pdf/1910.00500v1.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-deep-learning-based-prediction
Repo
Framework

Estimating and Inferring the Maximum Degree of Stimulus-Locked Time-Varying Brain Connectivity Networks


Title	Estimating and Inferring the Maximum Degree of Stimulus-Locked Time-Varying Brain Connectivity Networks
Authors	Kean Ming Tan, Junwei Lu, Tong Zhang, Han Liu
Abstract	Neuroscientists have enjoyed much success in understanding brain functions by constructing brain connectivity networks using data collected under highly controlled experimental settings. However, these experimental settings bear little resemblance to our real-life experience in day-to-day interactions with the surroundings. To address this issue, neuroscientists have been measuring brain activity under natural viewing experiments in which the subjects are given continuous stimuli, such as watching a movie or listening to a story. The main challenge with this approach is that the measured signal consists of both the stimulus-induced signal, as well as intrinsic-neural and non-neuronal signals. By exploiting the experimental design, we propose to estimate stimulus-locked brain network by treating non-stimulus-induced signals as nuisance parameters. In many neuroscience applications, it is often important to identify brain regions that are connected to many other brain regions during cognitive process. We propose an inferential method to test whether the maximum degree of the estimated network is larger than a pre-specific number. We prove that the type I error can be controlled and that the power increases to one asymptotically. Simulation studies are conducted to assess the performance of our method. Finally, we analyze a functional magnetic resonance imaging dataset obtained under the Sherlock Holmes movie stimuli.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11588v2
PDF	https://arxiv.org/pdf/1905.11588v2.pdf
PWC	https://paperswithcode.com/paper/estimating-and-inferring-the-maximum-degree
Repo
Framework

DeepPerimeter: Indoor Boundary Estimation from Posed Monocular Sequences


Title	DeepPerimeter: Indoor Boundary Estimation from Posed Monocular Sequences
Authors	Ameya Phalak, Zhao Chen, Darvin Yi, Khushi Gupta, Vijay Badrinarayanan, Andrew Rabinovich
Abstract	We present DeepPerimeter, a deep learning based pipeline for inferring a full indoor perimeter (i.e. exterior boundary map) from a sequence of posed RGB images. Our method relies on robust deep methods for depth estimation and wall segmentation to generate an exterior boundary point cloud, and then uses deep unsupervised clustering to fit wall planes to obtain a final boundary map of the room. We demonstrate that DeepPerimeter results in excellent visual and quantitative performance on the popular ScanNet and FloorNet datasets and works for room shapes of various complexities as well as in multiroom scenarios. We also establish important baselines for future work on indoor perimeter estimation, topics which will become increasingly prevalent as application areas like augmented reality and robotics become more significant.
Tasks	Depth Estimation
Published	2019-04-25
URL	https://arxiv.org/abs/1904.11595v2
PDF	https://arxiv.org/pdf/1904.11595v2.pdf
PWC	https://paperswithcode.com/paper/190411595
Repo
Framework

Training in Task Space to Speed Up and Guide Reinforcement Learning


Title	Training in Task Space to Speed Up and Guide Reinforcement Learning
Authors	Guillaume Bellegarda, Katie Byl
Abstract	Recent breakthroughs in the reinforcement learning (RL) community have made significant advances towards learning and deploying policies on real world robotic systems. However, even with the current state-of-the-art algorithms and computational resources, these algorithms are still plagued with high sample complexity, and thus long training times, especially for high degree of freedom (DOF) systems. There are also concerns arising from lack of perceived stability or robustness guarantees from emerging policies. This paper aims at mitigating these drawbacks by: (1) modeling a complex, high DOF system with a representative simple one, (2) making explicit use of forward and inverse kinematics without forcing the RL algorithm to “learn” them on its own, and (3) learning locomotion policies in Cartesian space instead of joint space. In this paper these methods are applied to JPL’s Robosimian, but can be readily used on any system with a base and end effector(s). These locomotion policies can be produced in just a few minutes, trained on a single laptop. We compare the robustness of the resulting learned policies to those of other control methods. An accompanying video for this paper can be found at https://youtu.be/xDxxSw5ahnc .
Tasks
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02219v1
PDF	http://arxiv.org/pdf/1903.02219v1.pdf
PWC	https://paperswithcode.com/paper/training-in-task-space-to-speed-up-and-guide
Repo
Framework

Machine Learning for Intelligent Authentication in 5G-and-Beyond Wireless Networks


Title	Machine Learning for Intelligent Authentication in 5G-and-Beyond Wireless Networks
Authors	He Fang, Xianbin Wang, Stefano Tomasin
Abstract	The fifth generation (5G) and beyond wireless networks are critical to support diverse vertical applications by connecting heterogeneous devices and machines, which directly increase vulnerability for various spoofing attacks. Conventional cryptographic and physical layer authentication techniques are facing some challenges in complex dynamic wireless environments, including significant security overhead, low reliability, as well as difficulty in pre-designing authentication model, providing continuous protections, and learning time-varying attributes. In this article, we envision new authentication approaches based on machine learning techniques by opportunistically leveraging physical layer attributes, and introduce intelligence to authentication for more efficient security provisioning. Machine learning paradigms for intelligent authentication design are presented, namely for parametric/non-parametric and supervised/unsupervised/reinforcement learning algorithms. In a nutshell, the machine learning-based intelligent authentication approaches utilize specific features in the multi-dimensional domain for achieving cost-effective, more reliable, model-free, continuous and situation-aware device validation under unknown network conditions and unpredictable dynamics.
Tasks
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00429v2
PDF	https://arxiv.org/pdf/1907.00429v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-for-intelligent
Repo
Framework

Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations


Title	Deep Learning for Sequential Recommendation: Algorithms, Influential Factors, and Evaluations
Authors	Hui Fang, Danning Zhang, Yiheng Shu, Guibing Guo
Abstract	In the field of sequential recommendation, deep learning (DL)-based methods have received a lot of attention in the past few years and surpassed traditional models such as Markov chain-based and factorization-based ones. However, there is little systematic study on DL-based methods, especially regarding to how to design an effective DL model for sequential recommendation. In this view, this survey focuses on DL-based sequential recommender systems by taking the aforementioned issues into consideration. Specifically,we illustrate the concept of sequential recommendation, propose a categorization of existing algorithms in terms of three types of behavioral sequence, summarize the key factors affecting the performance of DL-based models, and conduct corresponding evaluations to demonstrate the effects of these factors. We conclude this survey by systematically outlining future directions and challenges in this field.
Tasks	Recommendation Systems
Published	2019-04-30
URL	https://arxiv.org/abs/1905.01997v2
PDF	https://arxiv.org/pdf/1905.01997v2.pdf
PWC	https://paperswithcode.com/paper/190501997
Repo
Framework

Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks


Title	Reconstruction of Natural Visual Scenes from Neural Spikes with Deep Neural Networks
Authors	Yichen Zhang, Shanshan Jia, Yajing Zheng, Zhaofei Yu, Yonghong Tian, Siwei Ma, Tiejun Huang, Jian K. Liu
Abstract	Neural coding is one of the central questions in systems neuroscience for understanding how the brain processes stimulus from the environment, moreover, it is also a cornerstone for designing algorithms of brain-machine interface, where decoding incoming stimulus is highly demanded for better performance of physical devices. Traditionally researchers have focused on functional magnetic resonance imaging (fMRI) data as the neural signals of interest for decoding visual scenes. However, our visual perception operates in a fast time scale of millisecond in terms of an event termed neural spike. There are few studies of decoding by using spikes. Here we fulfill this aim by developing a novel decoding framework based on deep neural networks, named spike-image decoder (SID), for reconstructing natural visual scenes, including static images and dynamic videos, from experimentally recorded spikes of a population of retinal ganglion cells. The SID is an end-to-end decoder with one end as neural spikes and the other end as images, which can be trained directly such that visual scenes are reconstructed from spikes in a highly accurate fashion. Our SID also outperforms on the reconstruction of visual stimulus compared to existing fMRI decoding models. In addition, with the aid of a spike encoder, we show that SID can be generalized to arbitrary visual scenes by using the image datasets of MNIST, CIFAR10, and CIFAR100. Furthermore, with a pre-trained SID, one can decode any dynamic videos to achieve real-time encoding and decoding of visual scenes by spikes. Altogether, our results shed new light on neuromorphic computing for artificial visual systems, such as event-based visual cameras and visual neuroprostheses.
Tasks
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13007v2
PDF	https://arxiv.org/pdf/1904.13007v2.pdf
PWC	https://paperswithcode.com/paper/reconstruction-of-natural-visual-scenes-from
Repo
Framework

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning


Title	SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning
Authors	Linxi Fan, Yuke Zhu, Jiren Zhu, Zihua Liu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei
Abstract	We present an overview of SURREAL-System, a reproducible, flexible, and scalable framework for distributed reinforcement learning (RL). The framework consists of a stack of four layers: Provisioner, Orchestrator, Protocol, and Algorithms. The Provisioner abstracts away the machine hardware and node pools across different cloud providers. The Orchestrator provides a unified interface for scheduling and deploying distributed algorithms by high-level description, which is capable of deploying to a wide range of hardware from a personal laptop to full-fledged cloud clusters. The Protocol provides network communication primitives optimized for RL. Finally, the SURREAL algorithms, such as Proximal Policy Optimization (PPO) and Evolution Strategies (ES), can easily scale to 1000s of CPU cores and 100s of GPUs. The learning performances of our distributed algorithms establish new state-of-the-art on OpenAI Gym and Robotics Suites tasks.
Tasks
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12989v2
PDF	https://arxiv.org/pdf/1909.12989v2.pdf
PWC	https://paperswithcode.com/paper/surreal-system-fully-integrated-stack-for
Repo
Framework