April 1, 2020

3067 words 15 mins read

Paper Group ANR 434

On Recoverability of Randomly Compressed Tensors with Low CP Rank. Soft Hindsight Experience Replay. MCFlow: Monte Carlo Flow Models for Data Imputation. Multi-Objective Variational Autoencoder: an Application for Smart Infrastructure Maintenance. EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion. Censored Quantile R …

On Recoverability of Randomly Compressed Tensors with Low CP Rank


Title	On Recoverability of Randomly Compressed Tensors with Low CP Rank
Authors	Shahana Ibrahim, Xiao Fu, Xingguo Li
Abstract	Our interest lies in the recoverability properties of compressed tensors under the \textit{canonical polyadic decomposition} (CPD) model. The considered problem is well-motivated in many applications, e.g., hyperspectral image and video compression. Prior work studied this problem under somewhat special assumptions—e.g., the latent factors of the tensor are sparse or drawn from absolutely continuous distributions. We offer an alternative result: We show that if the tensor is compressed by a subgaussian linear mapping, then the tensor is recoverable if the number of measurements is on the same order of magnitude as that of the model parameters—without strong assumptions on the latent factors. Our proof is based on deriving a \textit{restricted isometry property} (R.I.P.) under the CPD model via set covering techniques, and thus exhibits a flavor of classic compressive sensing. The new recoverability result enriches the understanding to the compressed CP tensor recovery problem; it offers theoretical guarantees for recovering tensors whose elements are not necessarily continuous or sparse.
Tasks	Compressive Sensing, Video Compression
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02370v1
PDF	https://arxiv.org/pdf/2001.02370v1.pdf
PWC	https://paperswithcode.com/paper/on-recoverability-of-randomly-compressed
Repo
Framework

Soft Hindsight Experience Replay


Title	Soft Hindsight Experience Replay
Authors	Qiwei He, Liansheng Zhuang, Houqiang Li
Abstract	Efficient learning in the environment with sparse rewards is one of the most important challenges in Deep Reinforcement Learning (DRL). In continuous DRL environments such as robotic arms control, Hindsight Experience Replay (HER) has been shown an effective solution. However, due to the brittleness of deterministic methods, HER and its variants typically suffer from a major challenge for stability and convergence, which significantly affects the final performance. This challenge severely limits the applicability of such methods to complex real-world domains. To tackle this challenge, in this paper, we propose Soft Hindsight Experience Replay (SHER), a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL), combining the failed experiences reuse and maximum entropy probabilistic inference model. We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards. Experimental results show that, in contrast to HER and its variants, our proposed SHER achieves state-of-the-art performance, especially in the difficult HandManipulation tasks. Furthermore, our SHER method is more stable, achieving very similar performance across different random seeds.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02089v1
PDF	https://arxiv.org/pdf/2002.02089v1.pdf
PWC	https://paperswithcode.com/paper/soft-hindsight-experience-replay
Repo
Framework

MCFlow: Monte Carlo Flow Models for Data Imputation


Title	MCFlow: Monte Carlo Flow Models for Data Imputation
Authors	Trevor W. Richardson, Wencheng Wu, Lei Lin, Beilei Xu, Edgar A. Bernal
Abstract	We consider the topic of data imputation, a foundational task in machine learning that addresses issues with missing data. To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Carlo sampling. We address the causality dilemma that arises when training models with incomplete data by introducing an iterative learning scheme which alternately updates the density estimate and the values of the missing entries in the training data. We provide extensive empirical validation of the effectiveness of the proposed method on standard multivariate and image datasets, and benchmark its performance against state-of-the-art alternatives. We demonstrate that MCFlow is superior to competing methods in terms of the quality of the imputed data, as well as with regards to its ability to preserve the semantic structure of the data.
Tasks	Imputation
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12628v1
PDF	https://arxiv.org/pdf/2003.12628v1.pdf
PWC	https://paperswithcode.com/paper/mcflow-monte-carlo-flow-models-for-data
Repo
Framework

Multi-Objective Variational Autoencoder: an Application for Smart Infrastructure Maintenance


Title	Multi-Objective Variational Autoencoder: an Application for Smart Infrastructure Maintenance
Authors	Ali Anaissi, Seid Miad Zandavi
Abstract	Multi-way data analysis has become an essential tool for capturing underlying structures in higher-order data sets where standard two-way analysis techniques often fail to discover the hidden correlations between variables in multi-way data. We propose a multi-objective variational autoencoder (MVA) method for smart infrastructure damage detection and diagnosis in multi-way sensing data based on the reconstruction probability of autoencoder deep neural network (ADNN). Our method fuses data from multiple sensors in one ADNN at which informative features are being extracted and utilized for damage identification. It generates probabilistic anomaly scores to detect damage, asses its severity and further localize it via a new localization layer introduced in the ADNN. We evaluated our method on multi-way datasets in the area of structural health monitoring for damage diagnosis purposes. The data was collected from our deployed data acquisition system on a cable-stayed bridge in Western Sydney and from a laboratory based building structure obtained from Los Alamos National Laboratory (LANL). Experimental results show that the proposed method can accurately detect structural damage. It was also able to estimate the different levels of damage severity, and capture damage locations in an unsupervised aspect. Compared to the state-of-the-art approaches, our proposed method shows better performance in terms of damage detection and localization.
Tasks
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05070v1
PDF	https://arxiv.org/pdf/2003.05070v1.pdf
PWC	https://paperswithcode.com/paper/multi-objective-variational-autoencoder-an
Repo
Framework

EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion


Title	EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion
Authors	Chia-Yuan Chang, Shuo-En Chang, Pei-Yung Hsiao, Li-Chen Fu
Abstract	Panoptic segmentation is a scene parsing task which unifies semantic segmentation and instance segmentation into one single task. However, the current state-of-the-art studies did not take too much concern on inference time. In this work, we propose an Efficient Panoptic Segmentation Network (EPSNet) to tackle the panoptic segmentation tasks with fast inference speed. Basically, EPSNet generates masks based on simple linear combination of prototype masks and mask coefficients. The light-weight network branches for instance segmentation and semantic segmentation only need to predict mask coefficients and produce masks with the shared prototypes predicted by prototype network branch. Furthermore, to enhance the quality of shared prototypes, we adopt a module called “cross-layer attention fusion module”, which aggregates the multi-scale features with attention mechanism helping them capture the long-range dependencies between each other. To validate the proposed work, we have conducted various experiments on the challenging COCO panoptic dataset, which achieve highly promising performance with significantly faster inference speed (53ms on GPU).
Tasks	Instance Segmentation, Panoptic Segmentation, Scene Parsing, Semantic Segmentation
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10142v1
PDF	https://arxiv.org/pdf/2003.10142v1.pdf
PWC	https://paperswithcode.com/paper/epsnet-efficient-panoptic-segmentation
Repo
Framework

Censored Quantile Regression Forest


Title	Censored Quantile Regression Forest
Authors	Alexander Hanbo Li, Jelena Bradic
Abstract	Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring. The proposed procedure named {\it censored quantile regression forest}, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure.
Tasks
Published	2020-01-08
URL	https://arxiv.org/abs/2001.03458v1
PDF	https://arxiv.org/pdf/2001.03458v1.pdf
PWC	https://paperswithcode.com/paper/censored-quantile-regression-forest
Repo
Framework

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos


Title	Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos
Authors	Yijun Song, Jingwen Wang, Lin Ma, Zhou Yu, Jun Yu
Abstract	The task of temporally grounding textual queries in videos is to localize one video segment that semantically corresponds to the given query. Most of the existing approaches rely on segment-sentence pairs (temporal annotations) for training, which are usually unavailable in real-world scenarios. In this work we present an effective weakly-supervised model, named as Multi-Level Attentional Reconstruction Network (MARN), which only relies on video-sentence pairs during the training stage. The proposed method leverages the idea of attentional reconstruction and directly scores the candidate segments with the learnt proposal-level attentions. Moreover, another branch learning clip-level attention is exploited to refine the proposals at both the training and testing stage. We develop a novel proposal sampling mechanism to leverage intra-proposal information for learning better proposal representation and adopt 2D convolution to exploit inter-proposal clues for learning reliable attention map. Experiments on Charades-STA and ActivityNet-Captions datasets demonstrate the superiority of our MARN over the existing weakly-supervised methods.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07048v1
PDF	https://arxiv.org/pdf/2003.07048v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-multi-level-attentional
Repo
Framework

Entropy Regularized Power k-Means Clustering


Title	Entropy Regularized Power k-Means Clustering
Authors	Saptarshi Chakraborty, Debolina Paul, Swagatam Das, Jason Xu
Abstract	Despite its well-known shortcomings, $k$-means remains one of the most widely used approaches to data clustering. Current research continues to tackle its flaws while attempting to preserve its simplicity. Recently, the \textit{power $k$-means} algorithm was proposed to avoid trapping in local minima by annealing through a family of smoother surfaces. However, the approach lacks theoretical justification and fails in high dimensions when many features are irrelevant. This paper addresses these issues by introducing \textit{entropy regularization} to learn feature relevance while annealing. We prove consistency of the proposed approach and derive a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees. In particular, our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both. Its merits are thoroughly assessed on a suite of real and synthetic data experiments.
Tasks
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03452v1
PDF	https://arxiv.org/pdf/2001.03452v1.pdf
PWC	https://paperswithcode.com/paper/entropy-regularized-power-k-means-clustering
Repo
Framework

$Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison


Title	$Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
Authors	Tengyang Xie, Nan Jiang
Abstract	We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning. Compared to classical iterative methods such as Fitted Q-Iteration—whose performance loss incurs quadratic dependence on horizon—these methods estimate (some forms of) the Bellman error and enjoy linear-in-horizon error propagation, a property established for the first time for algorithms that rely solely on batch data and output stationary policies. One of the algorithms uses a novel and explicit importance-weighting correction to overcome the infamous “double sampling” difficulty in Bellman error estimation, and does not use any squared losses. Our analyses reveal its distinct characteristics and potential advantages compared to classical algorithms.
Tasks
Published	2020-03-09
URL	https://arxiv.org/abs/2003.03924v2
PDF	https://arxiv.org/pdf/2003.03924v2.pdf
PWC	https://paperswithcode.com/paper/qstar-approximation-schemes-for-batch
Repo
Framework

Emergence of Pragmatics from Referential Game between Theory of Mind Agents


Title	Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Authors	Luyao Yuan, Zipeng Fu, Jingyue Shen, Lu Xu, Junhong Shen, Song-Chun Zhu
Abstract	Pragmatics studies how context can contribute to language meanings [1]. In human communication, language is never interpreted out of context, and sentences can usually convey more information than their literal meanings [2]. However, this mechanism is missing in most multi-agent systems [3, 4, 5, 6], restricting the communication efficiency and the capability of human-agent interaction. In this paper, we propose an algorithm, using which agents can spontaneously learn the ability to “read between lines” without any explicit hand-designed rules. We integrate the theory of mind (ToM) [7, 8] in a cooperative multi-agent pedagogical situation and propose an adaptive reinforcement learning (RL) algorithm to develop a communication protocol. ToM is a profound cognitive science concept, claiming that people regularly reason about other’s mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. With this ability, agents consider language as not only messages but also rational acts reflecting others’ hidden states. Our experiments demonstrate the advantage of pragmatic protocols over non-pragmatic protocols. We also show the teaching complexity following the pragmatic protocol empirically approximates to recursive teaching dimension (RTD).
Tasks
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07752v1
PDF	https://arxiv.org/pdf/2001.07752v1.pdf
PWC	https://paperswithcode.com/paper/emergence-of-pragmatics-from-referential-game
Repo
Framework

Compensation of Fiber Nonlinearities in Digital Coherent Systems Leveraging Long Short-Term Memory Neural Networks


Title	Compensation of Fiber Nonlinearities in Digital Coherent Systems Leveraging Long Short-Term Memory Neural Networks
Authors	Stavros Deligiannidis, Adonis Bogris, Charis Mesaritakis, Yannis Kopsinis
Abstract	We introduce for the first time the utilization of Long short-term memory (LSTM) neural network architectures for the compensation of fiber nonlinearities in digital coherent systems. We conduct numerical simulations considering either C-band or O-band transmission systems for single channel and multi-channel 16-QAM modulation format with polarization multiplexing. A detailed analysis regarding the effect of the number of hidden units and the length of the word of symbols that trains the LSTM algorithm and corresponds to the considered channel memory is conducted in order to reveal the limits of LSTM based receiver with respect to performance and complexity. The numerical results show that LSTM Neural Networks can be very efficient as post processors of optical receivers which classify data that have undergone non-linear impairments in fiber and provide superior performance compared to digital back propagation, especially in the multi-channel transmission scenario. The complexity analysis shows that LSTM becomes more complex as the number of hidden units and the channel memory increase can be less complex than DBP in long distances (> 1000 km).
Tasks
Published	2020-01-31
URL	https://arxiv.org/abs/2001.11802v1
PDF	https://arxiv.org/pdf/2001.11802v1.pdf
PWC	https://paperswithcode.com/paper/compensation-of-fiber-nonlinearities-in
Repo
Framework

Fine-Grained Urban Flow Inference


Title	Fine-Grained Urban Flow Inference
Authors	Kun Ouyang, Yuxuan Liang, Ye Liu, Zekun Tong, Sijie Ruan, Yu Zheng, David S. Rosenblum
Abstract	The ubiquitous deployment of monitoring devices in urban flow monitoring systems induces a significant cost for maintenance and operation. A technique is required to reduce the number of deployed devices, while preventing the degeneration of data accuracy and granularity. In this paper, we present an approach for inferring the real-time and fine-grained crowd flows throughout a city based on coarse-grained observations. This task exhibits two challenges: the spatial correlations between coarse- and fine-grained urban flows, and the complexities of external impacts. To tackle these issues, we develop a model entitled UrbanFM which consists of two major parts: 1) an inference network to generate fine-grained flow distributions from coarse-grained inputs that uses a feature extraction module and a novel distributional upsampling module; 2) a general fusion subnet to further boost the performance by considering the influence of different external factors. This structure provides outstanding effectiveness and efficiency for small scale upsampling. However, the single-pass upsampling used by UrbanFM is insufficient at higher upscaling rates. Therefore, we further present UrbanPy, a cascading model for progressive inference of fine-grained urban flows by decomposing the original tasks into multiple subtasks. Compared to UrbanFM, such an enhanced structure demonstrates favorable performance for larger-scale inference tasks.
Tasks
Published	2020-02-05
URL	https://arxiv.org/abs/2002.02318v1
PDF	https://arxiv.org/pdf/2002.02318v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-urban-flow-inference
Repo
Framework

Internal representation dynamics and geometry in recurrent neural networks


Title	Internal representation dynamics and geometry in recurrent neural networks
Authors	Stefan Horoi, Guillaume Lajoie, Guy Wolf
Abstract	The efficiency of recurrent neural networks (RNNs) in dealing with sequential data has long been established. However, unlike deep, and convolution networks where we can attribute the recognition of a certain feature to every layer, it is unclear what “sub-task” a single recurrent step or layer accomplishes. Our work seeks to shed light onto how a vanilla RNN implements a simple classification task by analysing the dynamics of the network and the geometric properties of its hidden states. We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer. Furthermore the network’s dynamics and the sequence length are both critical to correct classifications even when there is no additional task relevant information provided.
Tasks
Published	2020-01-09
URL	https://arxiv.org/abs/2001.03255v2
PDF	https://arxiv.org/pdf/2001.03255v2.pdf
PWC	https://paperswithcode.com/paper/internal-representation-dynamics-and-geometry
Repo
Framework

Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences


Title	Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences
Authors	Zhu Zhang, Zhou Zhao, Yang Zhao, Qi Wang, Huasheng Liu, Lianli Gao
Abstract	In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG). Given an untrimmed video and a declarative/interrogative sentence depicting an object, STVG aims to localize the spatio-temporal tube of the queried object. STVG has two challenging settings: (1) We need to localize spatio-temporal object tubes from untrimmed videos, where the object may only exist in a very small segment of the video; (2) We deal with multi-form sentences, including the declarative sentences with explicit objects and interrogative sentences with unknown objects. Existing methods cannot tackle the STVG task due to the ineffective tube pre-generation and the lack of object relationship modeling. Thus, we then propose a novel Spatio-Temporal Graph Reasoning Network (STGRN) for this task. First, we build a spatio-temporal region graph to capture the region relationships with temporal object dynamics, which involves the implicit and explicit spatial subgraphs in each frame and the temporal dynamic subgraph across frames. We then incorporate textual clues into the graph and develop the multi-step cross-modal graph reasoning. Next, we introduce a spatio-temporal localizer with a dynamic selection method to directly retrieve the spatio-temporal tubes without tube pre-generation. Moreover, we contribute a large-scale video grounding dataset VidSTG based on video relation dataset VidOR. The extensive experiments demonstrate the effectiveness of our method.
Tasks
Published	2020-01-19
URL	https://arxiv.org/abs/2001.06891v3
PDF	https://arxiv.org/pdf/2001.06891v3.pdf
PWC	https://paperswithcode.com/paper/where-does-it-exist-spatio-temporal-video
Repo
Framework

Two Cycle Learning: Clustering Based Regularisation for Deep Semi-Supervised Classification


Title	Two Cycle Learning: Clustering Based Regularisation for Deep Semi-Supervised Classification
Authors	Philip Sellars, Angelica Aviles-Rivero, Carola Bibiane Schönlieb
Abstract	This works addresses the challenge of classification with minimal annotations. Obtaining annotated data is time consuming, expensive and can require expert knowledge. As a result, there is an acceleration towards semi-supervised learning (SSL) approaches which utilise large amounts of unlabelled data to improve classification performance. The vast majority of SSL approaches have focused on implementing the \textit{low-density separation assumption}, in which the idea is that decision boundaries should lie in low density regions. However, they have implemented this assumption by treating the dataset as a set of individual attributes rather than as a global structure, which limits the overall performance of the classifier. Therefore, in this work, we go beyond this implementation and propose a novel SSL framework called two-cycle learning. For the first cycle, we use clustering based regularisation that allows for improved decision boundaries as well as features that generalises well. The second cycle is set as a graph based SSL that take advantages of the richer discriminative features of the first cycle to significantly boost the accuracy of generated pseudo-labels. We evaluate our two-cycle learning method extensively across multiple datasets, outperforming current approaches.
Tasks
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05317v1
PDF	https://arxiv.org/pdf/2001.05317v1.pdf
PWC	https://paperswithcode.com/paper/two-cycle-learning-clustering-based
Repo
Framework