October 21, 2019

3119 words 15 mins read

Paper Group AWR 121

Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning. Deep Sequence Learning with Auxiliary Information for Traffic Prediction. Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories. Neural Non-Stationary Spectral Kernel. A snapshot on non …

Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning


Title	Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning
Authors	Paul Gay, Stuart James, Alessio Del Bue
Abstract	Recent approaches on visual scene understanding attempt to build a scene graph – a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motion. Indeed, in such cases, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.
Tasks	Graph Generation, Scene Graph Generation, Scene Understanding
Published	2018-07-16
URL	http://arxiv.org/abs/1807.05933v2
PDF	http://arxiv.org/pdf/1807.05933v2.pdf
PWC	https://paperswithcode.com/paper/visual-graphs-from-motion-vgfm-scene
Repo	https://github.com/paulgay/VGfM
Framework	tf

Deep Sequence Learning with Auxiliary Information for Traffic Prediction


Title	Deep Sequence Learning with Auxiliary Information for Traffic Prediction
Authors	Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang, Yike Guo, Fei Wu
Abstract	Predicting traffic conditions from online route queries is a challenging task as there are many complicated interactions over the roads and crowds involved. In this paper, we intend to improve traffic prediction by appropriate integration of three kinds of implicit but essential factors encoded in auxiliary information. We do this within an encoder-decoder sequence learning framework that integrates the following data: 1) offline geographical and social attributes. For example, the geographical structure of roads or public social events such as national celebrations; 2) road intersection information. In general, traffic congestion occurs at major junctions; 3) online crowd queries. For example, when many online queries issued for the same destination due to a public performance, the traffic around the destination will potentially become heavier at this location after a while. Qualitative and quantitative experiments on a real-world dataset from Baidu have demonstrated the effectiveness of our framework.
Tasks	Traffic Prediction
Published	2018-06-13
URL	http://arxiv.org/abs/1806.07380v1
PDF	http://arxiv.org/pdf/1806.07380v1.pdf
PWC	https://paperswithcode.com/paper/deep-sequence-learning-with-auxiliary
Repo	https://github.com/JingqingZ/BaiduTraffic
Framework	tf

Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories


Title	Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories
Authors	Yanfu Zhang, Wenshan Wang, Rogerio Bonatti, Daniel Maturana, Sebastian Scherer
Abstract	Predicting the motion of a mobile agent from a third-person perspective is an important component for many robotics applications, such as autonomous navigation and tracking. With accurate motion prediction of other agents, robots can plan for more intelligent behaviors to achieve specified objectives, instead of acting in a purely reactive way. Previous work addresses motion prediction by either only filtering kinematics, or using hand-designed and learned representations of the environment. Instead of separating kinematic and environmental context, we propose a novel approach to integrate both into an inverse reinforcement learning (IRL) framework for trajectory prediction. Instead of exponentially increasing the state-space complexity with kinematics, we propose a two-stage neural network architecture that considers motion and environment together to recover the reward function. The first-stage network learns feature representations of the environment using low-level LiDAR statistics and the second-stage network combines those learned features with kinematics data. We collected over 30 km of off-road driving data and validated experimentally that our method can effectively extract useful environmental and kinematic features. We generate accurate predictions of the distribution of future trajectories of the vehicle, encoding complex behaviors such as multi-modal distributions at road intersections, and even show different predictions at the same intersection depending on the vehicle’s speed.
Tasks	Autonomous Navigation, motion prediction, Trajectory Prediction
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07225v1
PDF	http://arxiv.org/pdf/1810.07225v1.pdf
PWC	https://paperswithcode.com/paper/integrating-kinematics-and-environment
Repo	https://github.com/yfzhang/vehicle-motion-forecasting
Framework	pytorch

Neural Non-Stationary Spectral Kernel


Title	Neural Non-Stationary Spectral Kernel
Authors	Sami Remes, Markus Heinonen, Samuel Kaski
Abstract	Standard kernels such as Mat'ern or RBF kernels only encode simple monotonic dependencies within the input space. Spectral mixture kernels have been proposed as general-purpose, flexible kernels for learning and discovering more complicated patterns in the data. Spectral mixture kernels have recently been generalized into non-stationary kernels by replacing the mixture weights, frequency means and variances by input-dependent functions. These functions have also been modelled as Gaussian processes on their own. In this paper we propose modelling the hyperparameter functions with neural networks, and provide an experimental comparison between the stationary spectral mixture and the two non-stationary spectral mixtures. Scalable Gaussian process inference is implemented within the sparse variational framework for all the kernels considered. We show that the neural variant of the kernel is able to achieve the best performance, among alternatives, on several benchmark datasets.
Tasks	Gaussian Processes
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10978v1
PDF	http://arxiv.org/pdf/1811.10978v1.pdf
PWC	https://paperswithcode.com/paper/neural-non-stationary-spectral-kernel
Repo	https://github.com/sremes/nssm-gp
Framework	tf

A snapshot on nonstandard supervised learning problems: taxonomy, relationships and methods


Title	A snapshot on nonstandard supervised learning problems: taxonomy, relationships and methods
Authors	David Charte, Francisco Charte, Salvador García, Francisco Herrera
Abstract	Machine learning is a field which studies how machines can alter and adapt their behavior, improving their actions according to the information they are given. This field is subdivided into multiple areas, among which the best known are supervised learning (e.g. classification and regression) and unsupervised learning (e.g. clustering and association rules). Within supervised learning, most studies and research are focused on well known standard tasks, such as binary classification, multiclass classification and regression with one dependent variable. However, there are many other less known problems. These are what we generically call nonstandard supervised learning problems. The literature about them is much more sparse, and each study is directed to a specific task. Therefore, the definitions, relations and applications of this kind of learners are hard to find. The goal of this paper is to provide the reader with a broad view on the distinct variations of nonstandard supervised problems. A comprehensive taxonomy summarizing their traits is proposed. A review of the common approaches followed to accomplish them and their main applications is provided as well.
Tasks
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12044v1
PDF	http://arxiv.org/pdf/1811.12044v1.pdf
PWC	https://paperswithcode.com/paper/a-snapshot-on-nonstandard-supervised-learning
Repo	https://github.com/fdavidcl/q
Framework	none

AP18-OLR Challenge: Three Tasks and Their Baselines


Title	AP18-OLR Challenge: Three Tasks and Their Baselines
Authors	Zhiyuan Tang, Dong Wang, Qing Chen
Abstract	The third oriental language recognition (OLR) challenge AP18-OLR is introduced in this paper, including the data profile, the tasks and the evaluation principles. Following the events in the last two years, namely AP16-OLR and AP17-OLR, the challenge this year focuses on more challenging tasks, including (1) short-duration utterances, (2) confusing languages, and (3) open-set recognition. The same as the previous events, the data of AP18-OLR is also provided by SpeechOcean and the NSFC M2ASR project. Baselines based on both the i-vector model and neural networks are constructed for the participants’ reference. We report the baseline results on the three tasks and demonstrate that the three tasks are truly challenging. All the data is free for participants, and the Kaldi recipes for the baselines have been published online.
Tasks	Open Set Learning
Published	2018-06-02
URL	http://arxiv.org/abs/1806.00616v1
PDF	http://arxiv.org/pdf/1806.00616v1.pdf
PWC	https://paperswithcode.com/paper/ap18-olr-challenge-three-tasks-and-their
Repo	https://github.com/Rithmax/Sub-band-Envelope-Features-Using-Frequency-Domain-Linear-Prediction
Framework	none

Evaluating Compositionality in Sentence Embeddings


Title	Evaluating Compositionality in Sentence Embeddings
Authors	Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller, Samuel J. Gershman, Noah D. Goodman
Abstract	An important challenge for human-like AI is compositional semantics. Recent research has attempted to address this by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to other tasks. We present a new dataset for one such task, `natural language inference’ (NLI), that cannot be solved using only word-level knowledge and requires some compositionality. We find that the performance of state of the art sentence embeddings (InferSent; Conneau et al., 2017) on our new dataset is poor. We analyze the decision rules learned by InferSent and find that they are consistent with simple heuristics that are ecologically valid in its training dataset. Further, we find that augmenting training with our dataset improves test performance on our dataset without loss of performance on the original training dataset. This highlights the importance of structured datasets in better understanding and improving AI systems. \|
Tasks	Natural Language Inference, Sentence Embeddings
Published	2018-02-12
URL	http://arxiv.org/abs/1802.04302v2
PDF	http://arxiv.org/pdf/1802.04302v2.pdf
PWC	https://paperswithcode.com/paper/evaluating-compositionality-in-sentence
Repo	https://github.com/ishita-dg/ScrambleTests
Framework	pytorch

An Empirical Study of Example Forgetting during Deep Neural Network Learning


Title	An Empirical Study of Example Forgetting during Deep Neural Network Learning
Authors	Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon
Abstract	Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event’ to have occurred when an individual training example transitions from being classified correctly to incorrectly over the course of learning. Across several benchmark data sets, we find that: (i) certain examples are forgotten with high frequency, and some not at all; (ii) a data set’s (un)forgettable examples generalize across neural architectures; and (iii) based on forgetting dynamics, a significant fraction of examples can be omitted from the training data set while still maintaining state-of-the-art generalization performance. \|
Tasks
Published	2018-12-12
URL	https://arxiv.org/abs/1812.05159v3
PDF	https://arxiv.org/pdf/1812.05159v3.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-of-example-forgetting
Repo	https://github.com/mtoneva/example_forgetting
Framework	pytorch

MS-UNIQUE: Multi-model and Sharpness-weighted Unsupervised Image Quality Estimation


Title	MS-UNIQUE: Multi-model and Sharpness-weighted Unsupervised Image Quality Estimation
Authors	Mohit Prabhushankar, Dogancan Temel, Ghassan AlRegib
Abstract	In this paper, we train independent linear decoder models to estimate the perceived quality of images. More specifically, we calculate the responses of individual non-overlapping image patches to each of the decoders and scale these responses based on the sharpness characteristics of filter set. We use multiple linear decoders to capture different abstraction levels of the image patches. Training each model is carried out on 100,000 image patches from the ImageNet database in an unsupervised fashion. Color space selection and ZCA Whitening are performed over these patches to enhance the descriptiveness of the data. The proposed quality estimator is tested on the LIVE and the TID 2013 image quality assessment databases. Performance of the proposed method is compared against eleven other state of the art methods in terms of accuracy, consistency, linearity, and monotonic behavior. Based on experimental results, the proposed method is generally among the top performing quality estimators in all categories.
Tasks	Image Quality Assessment, Image Quality Estimation
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08947v1
PDF	http://arxiv.org/pdf/1811.08947v1.pdf
PWC	https://paperswithcode.com/paper/ms-unique-multi-model-and-sharpness-weighted
Repo	https://github.com/olivesgatech/MS-UNIQUE
Framework	none

DeepSphere: Efficient spherical Convolutional Neural Network with HEALPix sampling for cosmological applications


Title	DeepSphere: Efficient spherical Convolutional Neural Network with HEALPix sampling for cosmological applications
Authors	Nathanaël Perraudin, Michaël Defferrard, Tomasz Kacprzak, Raphael Sgier
Abstract	Convolutional Neural Networks (CNNs) are a cornerstone of the Deep Learning toolbox and have led to many breakthroughs in Artificial Intelligence. These networks have mostly been developed for regular Euclidean domains such as those supporting images, audio, or video. Because of their success, CNN-based methods are becoming increasingly popular in Cosmology. Cosmological data often comes as spherical maps, which make the use of the traditional CNNs more complicated. The commonly used pixelization scheme for spherical maps is the Hierarchical Equal Area isoLatitude Pixelisation (HEALPix). We present a spherical CNN for analysis of full and partial HEALPix maps, which we call DeepSphere. The spherical CNN is constructed by representing the sphere as a graph. Graphs are versatile data structures that can act as a discrete representation of a continuous manifold. Using the graph-based representation, we define many of the standard CNN operations, such as convolution and pooling. With filters restricted to being radial, our convolutions are equivariant to rotation on the sphere, and DeepSphere can be made invariant or equivariant to rotation. This way, DeepSphere is a special case of a graph CNN, tailored to the HEALPix sampling of the sphere. This approach is computationally more efficient than using spherical harmonics to perform convolutions. We demonstrate the method on a classification problem of weak lensing mass maps from two cosmological models and compare the performance of the CNN with that of two baseline classifiers. The results show that the performance of DeepSphere is always superior or equal to both of these baselines. For high noise levels and for data covering only a smaller fraction of the sphere, DeepSphere achieves typically 10% better classification accuracy than those baselines. Finally, we show how learned filters can be visualized to introspect the neural network.
Tasks
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12186v2
PDF	http://arxiv.org/pdf/1810.12186v2.pdf
PWC	https://paperswithcode.com/paper/deepsphere-efficient-spherical-convolutional
Repo	https://github.com/SwissDataScienceCenter/DeepSphere
Framework	tf

3D MRI brain tumor segmentation using autoencoder regularization


Title	3D MRI brain tumor segmentation using autoencoder regularization
Authors	Andriy Myronenko
Abstract	Automated segmentation of brain tumors from 3D magnetic resonance images (MRIs) is necessary for the diagnosis, monitoring, and treatment planning of the disease. Manual delineation practices require anatomical knowledge, are expensive, time consuming and can be inaccurate due to human error. Here, we describe a semantic segmentation network for tumor subregion segmentation from 3D MRIs based on encoder-decoder architecture. Due to a limited training dataset size, a variational auto-encoder branch is added to reconstruct the input image itself in order to regularize the shared decoder and impose additional constraints on its layers. The current approach won 1st place in the BraTS 2018 challenge.
Tasks	Brain Tumor Segmentation, Semantic Segmentation
Published	2018-10-27
URL	http://arxiv.org/abs/1810.11654v3
PDF	http://arxiv.org/pdf/1810.11654v3.pdf
PWC	https://paperswithcode.com/paper/3d-mri-brain-tumor-segmentation-using
Repo	https://github.com/IAmSuyogJadhav/3d-mri-brain-tumor-segmentation-using-autoencoder-regularization
Framework	none

Barren plateaus in quantum neural network training landscapes


Title	Barren plateaus in quantum neural network training landscapes
Authors	Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Ryan Babbush, Hartmut Neven
Abstract	Many experimental proposals for noisy intermediate scale quantum devices involve training a parameterized quantum circuit with a classical optimization loop. Such hybrid quantum-classical algorithms are popular for applications in quantum simulation, optimization, and machine learning. Due to its simplicity and hardware efficiency, random circuits are often proposed as initial guesses for exploring the space of quantum states. We show that the exponential dimension of Hilbert space and the gradient estimation complexity make this choice unsuitable for hybrid quantum-classical algorithms run on more than a few qubits. Specifically, we show that for a wide class of reasonable parameterized quantum circuits, the probability that the gradient along any reasonable direction is non-zero to some fixed precision is exponentially small as a function of the number of qubits. We argue that this is related to the 2-design characteristic of random circuits, and that solutions to this problem must be studied.
Tasks
Published	2018-03-29
URL	http://arxiv.org/abs/1803.11173v1
PDF	http://arxiv.org/pdf/1803.11173v1.pdf
PWC	https://paperswithcode.com/paper/barren-plateaus-in-quantum-neural-network
Repo	https://github.com/XanaduAI/qml/blob/master/implementations/tutorial_barren_plateaus.py
Framework	none

Bringing Alive Blurred Moments


Title	Bringing Alive Blurred Moments
Authors	Kuldeep Purohit, Anshul Shah, A. N. Rajagopalan
Abstract	We present a solution for the goal of extracting a video from a single motion blurred image to sequentially reconstruct the clear views of a scene as beheld by the camera during the time of exposure. We first learn motion representation from sharp videos in an unsupervised manner through training of a convolutional recurrent video autoencoder network that performs a surrogate task of video reconstruction. Once trained, it is employed for guided training of a motion encoder for blurred images. This network extracts embedded motion information from the blurred image to generate a sharp video in conjunction with the trained recurrent video decoder. As an intermediate step, we also design an efficient architecture that enables real-time single image deblurring and outperforms competing methods across all factors: accuracy, speed, and compactness. Experiments on real scenes and standard datasets demonstrate the superiority of our framework over the state-of-the-art and its ability to generate a plausible sequence of temporally consistent sharp frames.
Tasks	Deblurring, Video Reconstruction
Published	2018-04-09
URL	http://arxiv.org/abs/1804.02913v2
PDF	http://arxiv.org/pdf/1804.02913v2.pdf
PWC	https://paperswithcode.com/paper/bringing-alive-blurred-moments
Repo	https://github.com/anshulbshah/Blurred-Image-to-Video
Framework	none


Title	Blind Ptychography by Douglas-Rachford Splitting
Authors	A. Fannjiang, Z. Zhang
Abstract	Blind ptychography is the scanning version of coherent diffractive imaging which seeks to recover both the object and the probe simultaneously. Based on alternating minimization by Douglas-Rachford splitting, AMDRS is a blind ptychographic algorithm informed by the uniqueness theory, the Poisson noise model and the stability analysis. Enhanced by the initialization method and the use of a randomly phased mask, AMDRS converges globally and geometrically. Three boundary conditions are considered in the simulations: periodic, dark-field and bright-field boundary conditions. The dark-field boundary condition is suited for isolated objects while the bright-field boundary condition is for non-isolated objects. The periodic boundary condition is a mathematically convenient reference point. Depending on the avail- ability of the boundary prior the dark-field and the bright-field boundary conditions may or may not be enforced in the reconstruction. Not surprisingly, enforcing the boundary condition improves the rate of convergence, sometimes in a significant way. Enforcing the bright-field condition in the reconstruction can also remove the linear phase ambiguity.
Tasks
Published	2018-08-26
URL	http://arxiv.org/abs/1809.00962v3
PDF	http://arxiv.org/pdf/1809.00962v3.pdf
PWC	https://paperswithcode.com/paper/blind-ptychography-by-douglas-rachford
Repo	https://github.com/AnotherdayBeaux/Blind_Ptychography_GUI
Framework	none

Deep CT to MR Synthesis using Paired and Unpaired Data


Title	Deep CT to MR Synthesis using Paired and Unpaired Data
Authors	Cheng-Bin Jin, Hakil Kim, Wonmo Jung, Seongsu Joo, Ensik Park, Ahn Young Saem, In Ho Han, Jae Il Lee, Xuenan Cui
Abstract	MR imaging will play a very important role in radiotherapy treatment planning for segmentation of tumor volumes and organs. However, the use of MR-based radiotherapy is limited because of the high cost and the increased use of metal implants such as cardiac pacemakers and artificial joints in aging society. To improve the accuracy of CT-based radiotherapy planning, we propose a synthetic approach that translates a CT image into an MR image using paired and unpaired training data. In contrast to the current synthetic methods for medical images, which depend on sparse pairwise-aligned data or plentiful unpaired data, the proposed approach alleviates the rigid registration challenge of paired training and overcomes the context-misalignment problem of the unpaired training. A generative adversarial network was trained to transform 2D brain CT image slices into 2D brain MR image slices, combining adversarial loss, dual cycle-consistent loss, and voxel-wise loss. The experiments were analyzed using CT and MR images of 202 patients. Qualitative and quantitative comparisons against independent paired training and unpaired training methods demonstrate the superiority of our approach.
Tasks
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10790v2
PDF	http://arxiv.org/pdf/1805.10790v2.pdf
PWC	https://paperswithcode.com/paper/deep-ct-to-mr-synthesis-using-paired-and
Repo	https://github.com/ChengBinJin/MRGAN-TensorFlow
Framework	tf