January 26, 2020

3284 words 16 mins read

Paper Group ANR 1558

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets. Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling. Graph Convolution for Multimodal Information Extraction from Visually Rich Documents. Early Prediction of Alzheimer’s Disease Dementia Based on Baseline Hippocampal MRI and 1-Year Foll …

Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets


Title	Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets
Authors	Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, Jack Xin
Abstract	Training activation quantized neural networks involves minimizing a piecewise constant function whose gradient vanishes almost everywhere, which is undesirable for the standard back-propagation or chain rule. An empirical way around this issue is to use a straight-through estimator (STE) (Bengio et al., 2013) in the backward pass only, so that the “gradient” through the modified chain rule becomes non-trivial. Since this unusual “gradient” is certainly not the gradient of loss function, the following question arises: why searching in its negative direction minimizes the training loss? In this paper, we provide the theoretical justification of the concept of STE by answering this question. We consider the problem of learning a two-linear-layer network with binarized ReLU activation and Gaussian input data. We shall refer to the unusual “gradient” given by the STE-modifed chain rule as coarse gradient. The choice of STE is not unique. We prove that if the STE is properly chosen, the expected coarse gradient correlates positively with the population gradient (not available for the training), and its negation is a descent direction for minimizing the population loss. We further show the associated coarse gradient descent algorithm converges to a critical point of the population loss minimization problem. Moreover, we show that a poor choice of STE leads to instability of the training algorithm near certain local minima, which is verified with CIFAR-10 experiments.
Tasks
Published	2019-03-13
URL	https://arxiv.org/abs/1903.05662v4
PDF	https://arxiv.org/pdf/1903.05662v4.pdf
PWC	https://paperswithcode.com/paper/understanding-straight-through-estimator-in-1
Repo
Framework

Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling


Title	Spatio-temporal Manifold Learning for Human Motions via Long-horizon Modeling
Authors	He Wang, Edmond S. L. Ho, Hubert P. H. Shum, Zhanxing Zhu
Abstract	Data-driven modeling of human motions is ubiquitous in computer graphics and computer vision applications, such as synthesizing realistic motions or recognizing actions. Recent research has shown that such problems can be approached by learning a natural motion manifold using deep learning to address the shortcomings of traditional data-driven approaches. However, previous methods can be sub-optimal for two reasons. First, the skeletal information has not been fully utilized for feature extraction. Unlike images, it is difficult to define spatial proximity in skeletal motions in the way that deep networks can be applied. Second, motion is time-series data with strong multi-modal temporal correlations. A frame could be followed by several candidate frames leading to different motions; long-range dependencies exist where a number of frames in the beginning correlate to a number of frames later. Ineffective modeling would either under-estimate the multi-modality and variance, resulting in featureless mean motion or over-estimate them resulting in jittery motions. In this paper, we propose a new deep network to tackle these challenges by creating a natural motion manifold that is versatile for many applications. The network has a new spatial component for feature extraction. It is also equipped with a new batch prediction model that predicts a large number of frames at once, such that long-term temporally-based objective functions can be employed to correctly learn the motion multi-modality and variances. With our system, long-duration motions can be predicted/synthesized using an open-loop setup where the motion retains the dynamics accurately. It can also be used for denoising corrupted motions and synthesizing new motions with given control signals. We demonstrate that our system can create superior results comparing to existing work in multiple applications.
Tasks	Denoising, Time Series
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07214v1
PDF	https://arxiv.org/pdf/1908.07214v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-manifold-learning-for-human
Repo
Framework

Graph Convolution for Multimodal Information Extraction from Visually Rich Documents


Title	Graph Convolution for Multimodal Information Extraction from Visually Rich Documents
Authors	Xiaojing Liu, Feiyu Gao, Qiong Zhang, Huasha Zhao
Abstract	Visually rich documents (VRDs) are ubiquitous in daily business and life. Examples are purchase receipts, insurance policy documents, custom declaration forms and so on. In VRDs, visual and layout information is critical for document understanding, and texts in such documents cannot be serialized into the one-dimensional sequence without losing information. Classic information extraction models such as BiLSTM-CRF typically operate on text sequences and do not incorporate visual features. In this paper, we introduce a graph convolution based model to combine textual and visual information presented in VRDs. Graph embeddings are trained to summarize the context of a text segment in the document, and further combined with text embeddings for entity extraction. Extensive experiments have been conducted to show that our method outperforms BiLSTM-CRF baselines by significant margins, on two real-world datasets. Additionally, ablation studies are also performed to evaluate the effectiveness of each component of our model.
Tasks	Entity Extraction
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11279v1
PDF	http://arxiv.org/pdf/1903.11279v1.pdf
PWC	https://paperswithcode.com/paper/graph-convolution-for-multimodal-information
Repo
Framework

Early Prediction of Alzheimer’s Disease Dementia Based on Baseline Hippocampal MRI and 1-Year Follow-Up Cognitive Measures Using Deep Recurrent Neural Networks


Title	Early Prediction of Alzheimer’s Disease Dementia Based on Baseline Hippocampal MRI and 1-Year Follow-Up Cognitive Measures Using Deep Recurrent Neural Networks
Authors	Hongming Li, Yong Fan
Abstract	Multi-modal biological, imaging, and neuropsychological markers have demonstrated promising performance for distinguishing Alzheimer’s disease (AD) patients from cognitively normal elders. However, it remains difficult to early predict when and which mild cognitive impairment (MCI) individuals will convert to AD dementia. Informed by pattern classification studies which have demonstrated that pattern classifiers built on longitudinal data could achieve better classification performance than those built on cross-sectional data, we develop a deep learning model based on recurrent neural networks (RNNs) to learn informative representation and temporal dynamics of longitudinal cognitive measures of individual subjects and combine them with baseline hippocampal MRI for building a prognostic model of AD dementia progression. Experimental results on a large cohort of MCI subjects have demonstrated that the deep learning model could learn informative measures from longitudinal data for characterizing the progression of MCI subjects to AD dementia, and the prognostic model could early predict AD progression with high accuracy.
Tasks
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01451v1
PDF	http://arxiv.org/pdf/1901.01451v1.pdf
PWC	https://paperswithcode.com/paper/early-prediction-of-alzheimers-disease
Repo
Framework

Graph-based Inpainting for 3D Dynamic Point Clouds


Title	Graph-based Inpainting for 3D Dynamic Point Clouds
Authors	Zeqing Fu, Wei Hu, Zongming Guo
Abstract	With the development of depth sensors and 3D laser scanning techniques, 3D dynamic point clouds have attracted increasing attention as a format for the representation of 3D objects in motion, with applications in various fields such as 3D immersive tele-presence, navigation, animation, gaming and virtual reality. However, dynamic point clouds usually exhibit holes of missing data, mainly due to the fast motion, the limitation of acquisition techniques and complicated structure. Further, point clouds are defined on irregular non-Euclidean domain, which is challenging to address with conventional methods for regular data. Hence, leveraging on graph signal processing tools, we propose an efficient dynamic point cloud inpainting method, exploiting both the inter-frame coherence and the intra-frame self-similarity in 3D dynamic point clouds. Specifically, for each frame in a point cloud sequence, we first split it into cubes of fixed size as the processing unit, and treat cubes with holes inside as target cubes. Secondly, we take advantage of the intra-frame self-similarity in the target frame, by globally searching for the most similar cube to each target cube as the intra-source cube. Thirdly, we exploit the inter-frame coherence among every three consecutive frames, by searching the corresponding cubes in the previous and subsequent frames for each target cube as the inter-source cubes, which contains most nearest neighbors of the target cube in the relative location. Finally, we formulate dynamic point cloud inpainting as an optimization problem based on both intra- and inter-source cubes, which is regularized by the graph-signal smoothness prior. Experimental results show that the proposed approach outperforms three competing methods significantly, both in objective and subjective quality.
Tasks
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10795v1
PDF	http://arxiv.org/pdf/1904.10795v1.pdf
PWC	https://paperswithcode.com/paper/graph-based-inpainting-for-3d-dynamic-point
Repo
Framework

ScenarioSA: A Large Scale Conversational Database for Interactive Sentiment Analysis


Title	ScenarioSA: A Large Scale Conversational Database for Interactive Sentiment Analysis
Authors	Yazhou Zhang, Lingling Song, Dawei Song, Peng Guo, Junwei Zhang, Peng Zhang
Abstract	Interactive sentiment analysis is an emerging, yet challenging, subtask of the sentiment analysis problem. It aims to discover the affective state and sentimental change of each person in a conversation. Existing sentiment analysis approaches are insufficient in modelling the interactions among people. However, the development of new approaches are critically limited by the lack of labelled interactive sentiment datasets. In this paper, we present a new conversational emotion database that we have created and made publically available, namely ScenarioSA. We manually label 2,214 multi-turn English conversations collected from natural contexts. In comparison with existing sentiment datasets, ScenarioSA (1) covers a wide range of scenarios; (2) describes the interactions between two speakers; and (3) reflects the sentimental evolution of each speaker over the course of a conversation. Finally, we evaluate various state-of-the-art algorithms on ScenarioSA, demonstrating the need of novel interactive sentiment analysis models and the potential of ScenarioSA to facilitate the development of such models.
Tasks	Sentiment Analysis
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05562v1
PDF	https://arxiv.org/pdf/1907.05562v1.pdf
PWC	https://paperswithcode.com/paper/scenariosa-a-large-scale-conversational
Repo
Framework

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems


Title	Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
Authors	Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton
Abstract	We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation. In the $K$-armed bandit setting, we show that there are infinitely many variants of $\tt RandUCB$, all of which achieve the minimax-optimal $\widetilde{O}(\sqrt{K T})$ regret after $T$ rounds. Moreover, for a specific multi-armed bandit setting, we show that both UCB and TS can be recovered as special cases of $\tt RandUCB$. For structured bandits, where each arm is associated with a $d$-dimensional feature vector and rewards are distributed according to a linear or generalized linear model, we prove that $\tt RandUCB$ achieves the minimax-optimal $\widetilde{O}(d \sqrt{T})$ regret even in the case of infinitely many arms. Through experiments in both the multi-armed and structured bandit settings, we demonstrate that $\tt RandUCB$ matches or outperforms TS and other randomized exploration strategies. Our theoretical and empirical results together imply that $\tt RandUCB$ achieves the best of both worlds.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04928v2
PDF	https://arxiv.org/pdf/1910.04928v2.pdf
PWC	https://paperswithcode.com/paper/old-dog-learns-new-tricks-randomized-ucb-for
Repo
Framework


Title	A Perceived Environment Design using a Multi-Modal Variational Autoencoder for learning Active-Sensing
Authors	Timo Korthals, Malte Schilling, Jürgen Leitner
Abstract	This contribution comprises the interplay between a multi-modal variational autoencoder and an environment to a perceived environment, on which an agent can act. Furthermore, we conclude our work with a comparison to curiosity-driven learning.
Tasks
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00584v1
PDF	https://arxiv.org/pdf/1911.00584v1.pdf
PWC	https://paperswithcode.com/paper/a-perceived-environment-design-using-a-multi
Repo
Framework

Effect of Superpixel Aggregation on Explanations in LIME – A Case Study with Biological Data


Title	Effect of Superpixel Aggregation on Explanations in LIME – A Case Study with Biological Data
Authors	Ludwig Schallner, Johannes Rabold, Oliver Scholz, Ute Schmid
Abstract	End-to-end learning with deep neural networks, such as convolutional neural networks (CNNs), has been demonstrated to be very successful for different tasks of image classification. To make decisions of black-box approaches transparent, different solutions have been proposed. LIME is an approach to explainable AI relying on segmenting images into superpixels based on the Quick-Shift algorithm. In this paper, we present an explorative study of how different superpixel methods, namely Felzenszwalb, SLIC and Compact-Watershed, impact the generated visual explanations. We compare the resulting relevance areas with the image parts marked by a human reference. Results show that image parts selected as relevant strongly vary depending on the applied method. Quick-Shift resulted in the least and Compact-Watershed in the highest correspondence with the reference relevance areas.
Tasks	Image Classification
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07856v1
PDF	https://arxiv.org/pdf/1910.07856v1.pdf
PWC	https://paperswithcode.com/paper/effect-of-superpixel-aggregation-on
Repo
Framework

Predicting Attributes of Nodes Using Network Structure


Title	Predicting Attributes of Nodes Using Network Structure
Authors	Sarwan Ali, Muhammad Haroon Shakeel, Imdadullah Khan, Safiullah Faizullah, Muhammad Asad Khan
Abstract	In many graphs such as social networks, nodes have associated attributes representing their behavior. Predicting node attributes in such graphs is an important problem with applications in many domains like recommendation systems, privacy preservation, and targeted advertisement. Attributes values can be predicted by analyzing patterns and correlations among attributes and employing classification/regression algorithms. However, these approaches do not utilize readily available network topology information. In this regard, interconnections between different attributes of nodes can be exploited to improve the prediction accuracy. In this paper, we propose an approach to represent a node by a feature map with respect to an attribute $a_i$ (which is used as input for machine learning algorithms) using all attributes of neighbors to predict attributes values for $a_i$. We perform extensive experimentation on ten real-world datasets and show that the proposed feature map significantly improves the prediction accuracy as compared to baseline approaches on these datasets.
Tasks	Recommendation Systems
Published	2019-12-27
URL	https://arxiv.org/abs/1912.12264v1
PDF	https://arxiv.org/pdf/1912.12264v1.pdf
PWC	https://paperswithcode.com/paper/predicting-attributes-of-nodes-using-network
Repo
Framework

Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution


Title	Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution
Authors	Chen Feng, Tao Sheng, Zhiyu Liang, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Matthew Ardi, Alexander C. Berg, Yiran Chen, Bo Chen, Kent Gauen, Yung-Hsiang Lu
Abstract	The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015 that encourages joint hardware and software solutions for computer vision systems with low latency and power. Track 1 of the competition in 2018 focused on the innovation of software solutions with fixed inference engine and hardware. This decision allows participants to submit models online and not worry about building and bringing custom hardware on-site, which attracted a historically large number of submissions. Among the diverse solutions, the winning solution proposed a quantization-friendly framework for MobileNets that achieves an accuracy of 72.67% on the holdout dataset with an average latency of 27ms on a single CPU core of Google Pixel2 phone, which is superior to the best real-time MobileNet models at the time.
Tasks	Quantization
Published	2019-03-12
URL	http://arxiv.org/abs/1903.06791v1
PDF	http://arxiv.org/pdf/1903.06791v1.pdf
PWC	https://paperswithcode.com/paper/low-power-inference-for-on-device-visual
Repo
Framework

Multi-level Encoder-Decoder Architectures for Image Restoration


Title	Multi-level Encoder-Decoder Architectures for Image Restoration
Authors	Indra Deep Mastan, Shanmuganathan Raman
Abstract	Many real-world solutions for image restoration are learning-free and based on handcrafted image priors such as self-similarity. Recently, deep-learning methods that use training data have achieved state-of-the-art results in various image restoration tasks (e.g., super-resolution and inpainting). Ulyanov et al. bridge the gap between these two families of methods (CVPR 18). They have shown that learning-free methods perform close to the state-of-the-art learning-based methods (approximately 1 PSNR). Their approach benefits from the encoder-decoder network. In this paper, we propose a framework based on the multi-level extensions of the encoder-decoder network, to investigate interesting aspects of the relationship between image restoration and network construction independent of learning. Our framework allows various network structures by modifying the following network components: skip links, cascading of the network input into intermediate layers, a composition of the encoder-decoder subnetworks, and network depth. These handcrafted network structures illustrate how the construction of untrained networks influence the following image restoration tasks: denoising, super-resolution, and inpainting. We also demonstrate image reconstruction using flash and no-flash image pairs. We provide performance comparisons with the state-of-the-art methods for all the restoration tasks above.
Tasks	Denoising, Image Reconstruction, Image Restoration, Super-Resolution
Published	2019-05-01
URL	https://arxiv.org/abs/1905.00322v3
PDF	https://arxiv.org/pdf/1905.00322v3.pdf
PWC	https://paperswithcode.com/paper/multi-level-encoder-decoder-architectures-for
Repo
Framework

Controlled Natural Languages and Default Reasoning


Title	Controlled Natural Languages and Default Reasoning
Authors	Tiantian Gao
Abstract	Controlled natural languages (CNLs) are effective languages for knowledge representation and reasoning. They are designed based on certain natural languages with restricted lexicon and grammar. CNLs are unambiguous and simple as opposed to their base languages. They preserve the expressiveness and coherence of natural languages. In this report, we focus on a class of CNLs, called machine-oriented CNLs, which have well-defined semantics that can be deterministically translated into formal languages, such as Prolog, to do logical reasoning. Over the past 20 years, a number of machine-oriented CNLs emerged and have been used in many application domains for problem solving and question answering. However, few of them support non-monotonic inference. In our work, we propose non-monotonic extensions of CNL to support defeasible reasoning. In the first part of this report, we survey CNLs and compare three influential systems: Attempto Controlled English (ACE), Processable English (PENG), and Computer-processable English (CPL). We compare their language design, semantic interpretations, and reasoning services. In the second part of this report, we first identify typical non-monotonicity in natural languages, such as defaults, exceptions and conversational implicatures. Then, we propose their representation in CNL and the corresponding formalizations in a form of defeasible reasoning known as Logic Programming with Defaults and Argumentation Theory (LPDA).
Tasks	Question Answering
Published	2019-05-11
URL	https://arxiv.org/abs/1905.04422v1
PDF	https://arxiv.org/pdf/1905.04422v1.pdf
PWC	https://paperswithcode.com/paper/controlled-natural-languages-and-default
Repo
Framework

An approach to image denoising using manifold approximation without clean images


Title	An approach to image denoising using manifold approximation without clean images
Authors	Rohit Jena
Abstract	Image restoration has been an extensively researched topic in numerous fields. With the advent of deep learning, a lot of the current algorithms were replaced by algorithms that are more flexible and robust. Deep networks have demonstrated impressive performance in a variety of tasks like blind denoising, image enhancement, deblurring, super-resolution, inpainting, among others. Most of these learning-based algorithms use a large amount of clean data during the training process. However, in certain applications in medical image processing, one may not have access to a large amount of clean data. In this paper, we propose a method for denoising that attempts to learn the denoising process by pushing the noisy data close to the clean data manifold, using only noisy images during training. Furthermore, we use perceptual loss terms and an iterative refinement step to further refine the clean images without losing important features.
Tasks	Deblurring, Denoising, Image Denoising, Image Enhancement, Image Restoration, Super-Resolution
Published	2019-04-28
URL	http://arxiv.org/abs/1904.12323v1
PDF	http://arxiv.org/pdf/1904.12323v1.pdf
PWC	https://paperswithcode.com/paper/an-approach-to-image-denoising-using-manifold
Repo
Framework

Probabilistic Software Modeling: A Data-driven Paradigm for Software Analysis


Title	Probabilistic Software Modeling: A Data-driven Paradigm for Software Analysis
Authors	Hannes Thaller, Lukas Linsbauer, Rudolf Ramler, Alexander Egyed
Abstract	Software systems are complex, and behavioral comprehension with the increasing amount of AI components challenges traditional testing and maintenance strategies.The lack of tools and methodologies for behavioral software comprehension leaves developers to testing and debugging that work in the boundaries of known scenarios. We present Probabilistic Software Modeling (PSM), a data-driven modeling paradigm for predictive and generative methods in software engineering. PSM analyzes a program and synthesizes a network of probabilistic models that can simulate and quantify the original program’s behavior. The approach extracts the type, executable, and property structure of a program and copies its topology. Each model is then optimized towards the observed runtime leading to a network that reflects the system’s structure and behavior. The resulting network allows for the full spectrum of statistical inferential analysis with which rich predictive and generative applications can be built. Applications range from the visualization of states, inferential queries, test case generation, and anomaly detection up to the stochastic execution of the modeled system. In this work, we present the modeling methodologies, an empirical study of the runtime behavior of software systems, and a comprehensive study on PSM modeled systems. Results indicate that PSM is a solid foundation for structural and behavioral software comprehension applications.
Tasks	Anomaly Detection
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07936v2
PDF	https://arxiv.org/pdf/1912.07936v2.pdf
PWC	https://paperswithcode.com/paper/probabilistic-software-modeling-a-data-driven
Repo
Framework