October 17, 2019

3037 words 15 mins read

Paper Group ANR 829

Real-Time Subpixel Fast Bilateral Stereo. An Estimation of Favorite Value in Emotion Generating Calculation by Fuzzy Petri Net. Semantic Driven Multi-Camera Pedestrian Detection. Deep Learning Based Sphere Decoding. Neural network state estimation for full quantum state tomography. Privacy Aware Offloading of Deep Neural Networks. A Study on Dialog …

Real-Time Subpixel Fast Bilateral Stereo


Title	Real-Time Subpixel Fast Bilateral Stereo
Authors	Rui Fan, Yanan Liu, Mohammud Junaid Bocus, Lujia Wang, Ming Liu
Abstract	Stereo vision technique has been widely used in robotic systems to acquire 3-D information. In recent years, many researchers have applied bilateral filtering in stereo vision to adaptively aggregate the matching costs. This has greatly improved the accuracy of the estimated disparity maps. However, the process of filtering the whole cost volume is very time consuming and therefore the researchers have to resort to some powerful hardware for the real-time purpose. This paper presents the implementation of fast bilateral stereo on a state-of-the-art GPU. By highly exploiting the parallel computing architecture of the GPU, the fast bilateral stereo performs in real time when processing the Middlebury stereo datasets.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.02044v3
PDF	http://arxiv.org/pdf/1807.02044v3.pdf
PWC	https://paperswithcode.com/paper/real-time-subpixel-fast-bilateral-stereo
Repo
Framework

An Estimation of Favorite Value in Emotion Generating Calculation by Fuzzy Petri Net


Title	An Estimation of Favorite Value in Emotion Generating Calculation by Fuzzy Petri Net
Authors	Takumi Ichimura, Kousuke Tanabe
Abstract	Emotion Generating Calculations (EGC) method based on the Emotion Eliciting Condition Theory can decide whether an event arouses pleasure or not and quantify the degree under the event. An event in the form of Case Frame representation is classified into 12 types of calculations. However, the weak point in EGC is Favorite Value (FV) as the personal taste information. In order to improve the problem, this paper challenges to establish a learning method to learn speaker’s taste information from dialog. Especially, the learning method employs Fuzzy Petri Net to find an appropriate FV to a word which has the unknown FV. This paper discusses the effective learning method to improve a weak point of EGC when a missing value of FV exists.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03994v1
PDF	http://arxiv.org/pdf/1804.03994v1.pdf
PWC	https://paperswithcode.com/paper/an-estimation-of-favorite-value-in-emotion
Repo
Framework

Semantic Driven Multi-Camera Pedestrian Detection


Title	Semantic Driven Multi-Camera Pedestrian Detection
Authors	Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós, Pablo Carballeira
Abstract	Nowadays, pedestrian detection is one of the pivotal fields in computer vision, especially when performed over video surveillance scenarios. People detection methods are highly sensitive to occlusions among pedestrians, which dramatically degrades performance in crowded scenarios. The cutback in camera prices has allowed generalizing multi-camera set-ups, which can better confront occlusions by using different points of view to disambiguate detections. In this paper we present an approach to improve the performance of these multi-camera systems and to make them independent of the considered scenario, via an automatic understanding of the scene content. This semantic information, obtained from a semantic segmentation, is used 1) to automatically generate a common Area of Interest for all cameras, instead of the usual manual definition of this area; and 2) to improve the 2D detections of each camera via an optimization technique which maximizes coherence of every detection both in all 2D views and in the 3D world, obtaining best-fitted bounding boxes and a consensus height for every pedestrian. Experimental results on five publicly available datasets show that the proposed approach, which does not require any training stage, outperforms state-of-the-art multi-camera pedestrian detectors non specifically trained for these datasets, which demonstrates the expected semantic-based robustness to different scenarios.
Tasks	Pedestrian Detection, Semantic Segmentation
Published	2018-12-27
URL	http://arxiv.org/abs/1812.10779v1
PDF	http://arxiv.org/pdf/1812.10779v1.pdf
PWC	https://paperswithcode.com/paper/semantic-driven-multi-camera-pedestrian
Repo
Framework

Deep Learning Based Sphere Decoding


Title	Deep Learning Based Sphere Decoding
Authors	Mostafa Mohammadkarimi, Mehrtash Mehrabi, Masoud Ardakani, Yindi Jing
Abstract	In this paper, a deep learning (DL)-based sphere decoding algorithm is proposed, where the radius of the decoding hypersphere is learnt by a deep neural network (DNN). The performance achieved by the proposed algorithm is very close to the optimal maximum likelihood decoding (MLD) over a wide range of signal-to-noise ratios (SNRs), while the computational complexity, compared to existing sphere decoding variants, is significantly reduced. This improvement is attributed to DNN’s ability of intelligently learning the radius of the hypersphere used in decoding. The expected complexity of the proposed DL-based algorithm is analytically derived and compared with existing ones. It is shown that the number of lattice points inside the decoding hypersphere drastically reduces in the DL- based algorithm in both the average and worst-case senses. The effectiveness of the proposed algorithm is shown through simulation for high-dimensional multiple-input multiple-output (MIMO) systems, using high-order modulations.
Tasks
Published	2018-07-06
URL	http://arxiv.org/abs/1807.03162v1
PDF	http://arxiv.org/pdf/1807.03162v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-sphere-decoding
Repo
Framework

Neural network state estimation for full quantum state tomography


Title	Neural network state estimation for full quantum state tomography
Authors	Qian Xu, Shuqi Xu
Abstract	An efficient state estimation model, neural network estimation (NNE), empowered by machine learning techniques, is presented for full quantum state tomography (FQST). A parameterized function based on neural network is applied to map the measurement outcomes to the estimated quantum states. Parameters are updated with supervised learning procedures. From the computational complexity perspective our algorithm is the most efficient one among existing state estimation algorithms for full quantum state tomography. We perform numerical tests to prove both the accuracy and scalability of our model.
Tasks	Quantum State Tomography
Published	2018-11-16
URL	http://arxiv.org/abs/1811.06654v2
PDF	http://arxiv.org/pdf/1811.06654v2.pdf
PWC	https://paperswithcode.com/paper/neural-network-state-estimation-for-full
Repo
Framework

Privacy Aware Offloading of Deep Neural Networks


Title	Privacy Aware Offloading of Deep Neural Networks
Authors	Sam Leroux, Tim Verbelen, Pieter Simoens, Bart Dhoedt
Abstract	Deep neural networks require large amounts of resources which makes them hard to use on resource constrained devices such as Internet-of-things devices. Offloading the computations to the cloud can circumvent these constraints but introduces a privacy risk since the operator of the cloud is not necessarily trustworthy. We propose a technique that obfuscates the data before sending it to the remote computation node. The obfuscated data is unintelligible for a human eavesdropper but can still be classified with a high accuracy by a neural network trained on unobfuscated images.
Tasks
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12024v1
PDF	http://arxiv.org/pdf/1805.12024v1.pdf
PWC	https://paperswithcode.com/paper/privacy-aware-offloading-of-deep-neural
Repo
Framework

A Study on Dialog Act Recognition using Character-Level Tokenization


Title	A Study on Dialog Act Recognition using Character-Level Tokenization
Authors	Eugénio Ribeiro, Ricardo Ribeiro, David Martins de Matos
Abstract	Dialog act recognition is an important step for dialog systems since it reveals the intention behind the uttered words. Most approaches on the task use word-level tokenization. In contrast, this paper explores the use of character-level tokenization. This is relevant since there is information at the sub-word level that is related to the function of the words and, thus, their intention. We also explore the use of different context windows around each token, which are able to capture important elements, such as affixes. Furthermore, we assess the importance of punctuation and capitalization. We performed experiments on both the Switchboard Dialog Act Corpus and the DIHANA Corpus. In both cases, the experiments not only show that character-level tokenization leads to better performance than the typical word-level approaches, but also that both approaches are able to capture complementary information. Thus, the best results are achieved by combining tokenization at both levels.
Tasks	Tokenization
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07231v2
PDF	http://arxiv.org/pdf/1805.07231v2.pdf
PWC	https://paperswithcode.com/paper/a-study-on-dialog-act-recognition-using
Repo
Framework

Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods


Title	Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods
Authors	Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley, Adam White, Martha White, Richard S. Sutton
Abstract	This paper investigates estimating the variance of a temporal-difference learning agent’s update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a particular state and is mathematically expressed as the expected sum of discounted future rewards (called the return). These values can be straightforwardly estimated by averaging batches of returns using Monte Carlo methods. However, if we wish to update the agent’s value estimates during learning–before terminal outcomes are observed–we must use a different estimation target called the {\lambda}-return, which truncates the return with the agent’s own estimate of the value function. Temporal difference learning methods estimate the expected {\lambda}-return for each state, allowing these methods to update online and incrementally, and in most cases achieve better generalization error and faster learning than Monte Carlo methods. Naturally one could attempt to estimate higher-order moments of the {\lambda}-return. This paper is about estimating the variance of the {\lambda}-return. Prior work has shown that given estimates of the variance of the {\lambda}-return, learning systems can be constructed to (1) mitigate risk in action selection, and (2) automatically adapt the parameters of the learning process itself to improve performance. Unfortunately, existing methods for estimating the variance of the {\lambda}-return are complex and not well understood empirically. We contribute a method for estimating the variance of the {\lambda}-return directly using policy evaluation methods from reinforcement learning. Our approach is significantly simpler than prior methods that independently estimate the second moment of the {\lambda}-return. Empirically our new approach behaves at least as well as existing approaches, but is generally more robust.
Tasks
Published	2018-01-25
URL	http://arxiv.org/abs/1801.08287v2
PDF	http://arxiv.org/pdf/1801.08287v2.pdf
PWC	https://paperswithcode.com/paper/directly-estimating-the-variance-of-the
Repo
Framework

SSIMLayer: Towards Robust Deep Representation Learning via Nonlinear Structural Similarity


Title	SSIMLayer: Towards Robust Deep Representation Learning via Nonlinear Structural Similarity
Authors	Ahmed Abobakr, Mohammed Hossny, Saeid Nahavandi
Abstract	Deeper convolutional neural networks provide more capacity to approximate complex mapping functions. However, increasing network depth imposes difficulties on training and increases model complexity. This paper presents a new nonlinear computational layer of considerably high capacity to the deep convolutional neural network architectures. This layer performs a set of comprehensive convolution operations that mimics the overall function of the human visual system (HVS) via focusing on learning structural information in its input. The core of its computations is evaluating the components of the structural similarity metric (SSIM) in a setting that allows the kernels to learn to match structural information. The proposed SSIMLayer is inherently nonlinear and hence, it does not require subsequent nonlinear transformations. Experiments conducted on CIFAR-10 benchmark demonstrates that the SSIMLayer provides better convergence than the traditional convolutional layer, bypasses the need for nonlinear transformations and shows more robustness against noise perturbations and adversarial attacks.
Tasks	Representation Learning
Published	2018-06-24
URL	http://arxiv.org/abs/1806.09152v2
PDF	http://arxiv.org/pdf/1806.09152v2.pdf
PWC	https://paperswithcode.com/paper/ssimlayer-towards-robust-deep-representation
Repo
Framework

Learning to Sketch with Deep Q Networks and Demonstrated Strokes


Title	Learning to Sketch with Deep Q Networks and Demonstrated Strokes
Authors	Tao Zhou, Chen Fang, Zhaowen Wang, Jimei Yang, Byungmoon Kim, Zhili Chen, Jonathan Brandt, Demetri Terzopoulos
Abstract	Doodling is a useful and common intelligent skill that people can learn and master. In this work, we propose a two-stage learning framework to teach a machine to doodle in a simulated painting environment via Stroke Demonstration and deep Q-learning (SDQ). The developed system, Doodle-SDQ, generates a sequence of pen actions to reproduce a reference drawing and mimics the behavior of human painters. In the first stage, it learns to draw simple strokes by imitating in supervised fashion from a set of strokeaction pairs collected from artist paintings. In the second stage, it is challenged to draw real and more complex doodles without ground truth actions; thus, it is trained with Qlearning. Our experiments confirm that (1) doodling can be learned without direct stepby- step action supervision and (2) pretraining with stroke demonstration via supervised learning is important to improve performance. We further show that Doodle-SDQ is effective at producing plausible drawings in different media types, including sketch and watercolor.
Tasks	Q-Learning
Published	2018-10-14
URL	http://arxiv.org/abs/1810.05977v1
PDF	http://arxiv.org/pdf/1810.05977v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-sketch-with-deep-q-networks-and
Repo
Framework

Identifying Cross-Depicted Historical Motifs


Title	Identifying Cross-Depicted Historical Motifs
Authors	Vinaychandran Pondenkandath, Michele Alberti, Nicole Eichenberger, Rolf Ingold, Marcus Liwicki
Abstract	Cross-depiction is the problem of identifying the same object even when it is depicted in a variety of manners. This is a common problem in handwritten historical documents image analysis, for instance when the same letter or motif is depicted in several different ways. It is a simple task for humans yet conventional heuristic computer vision methods struggle to cope with it. In this paper we address this problem using state-of-the-art deep learning techniques on a dataset of historical watermarks containing images created with different methods of reproduction, such as hand tracing, rubbing, and radiography. To study the robustness of deep learning based approaches to the cross-depiction problem, we measure their performance on two different tasks: classification and similarity rankings. For the former we achieve a classification accuracy of 96% using deep convolutional neural networks. For the latter we have a false positive rate at 95% true positive rate of 0.11. These results outperform state-of-the-art methods by a significant margin.
Tasks
Published	2018-04-05
URL	http://arxiv.org/abs/1804.01728v2
PDF	http://arxiv.org/pdf/1804.01728v2.pdf
PWC	https://paperswithcode.com/paper/identifying-cross-depicted-historical-motifs
Repo
Framework

Policy Design for Active Sequential Hypothesis Testing using Deep Learning


Title	Policy Design for Active Sequential Hypothesis Testing using Deep Learning
Authors	Dhruva Kartik, Ekraam Sabir, Urbashi Mitra, Prem Natarajan
Abstract	Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.
Tasks
Published	2018-10-11
URL	http://arxiv.org/abs/1810.04859v1
PDF	http://arxiv.org/pdf/1810.04859v1.pdf
PWC	https://paperswithcode.com/paper/policy-design-for-active-sequential
Repo
Framework

Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale


Title	Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale
Authors	Eli Sherman, Hitinder Gurm, Ulysses Balis, Scott Owens, Jenna Wiens
Abstract	In healthcare, patient risk stratification models are often learned using time-series data extracted from electronic health records. When extracting data for a clinical prediction task, several formulations exist, depending on how one chooses the time of prediction and the prediction horizon. In this paper, we show how the formulation can greatly impact both model performance and clinical utility. Leveraging a publicly available ICU dataset, we consider two clinical prediction tasks: in-hospital mortality, and hypokalemia. Through these case studies, we demonstrate the necessity of evaluating models using an outcome-independent reference point, since choosing the time of prediction relative to the event can result in unrealistic performance. Further, an outcome-independent scheme outperforms an outcome-dependent scheme on both tasks (In-Hospital Mortality AUROC .882 vs. .831; Serum Potassium: AUROC .829 vs. .740) when evaluated on test sets that mimic real-world use.
Tasks	Time Series
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12520v1
PDF	http://arxiv.org/pdf/1811.12520v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-clinical-time-series-data-for
Repo
Framework

Deep Segment Hash Learning for Music Generation


Title	Deep Segment Hash Learning for Music Generation
Authors	Kevin Joslyn, Naifan Zhuang, Kien A. Hua
Abstract	Music generation research has grown in popularity over the past decade, thanks to the deep learning revolution that has redefined the landscape of artificial intelligence. In this paper, we propose a novel approach to music generation inspired by musical segment concatenation methods and hash learning algorithms. Given a segment of music, we use a deep recurrent neural network and ranking-based hash learning to assign a forward hash code to the segment to retrieve candidate segments for continuation with matching backward hash codes. The proposed method is thus called Deep Segment Hash Learning (DSHL). To the best of our knowledge, DSHL is the first end-to-end segment hash learning method for music generation, and the first to use pair-wise training with segments of music. We demonstrate that this method is capable of generating music which is both original and enjoyable, and that DSHL offers a promising new direction for music generation research.
Tasks	Music Generation
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12176v1
PDF	http://arxiv.org/pdf/1805.12176v1.pdf
PWC	https://paperswithcode.com/paper/deep-segment-hash-learning-for-music
Repo
Framework

Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control


Title	Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control
Authors	Kenny Young, Baoxiang Wang, Matthew E. Taylor
Abstract	Reinforcement learning (RL) has had many successes in both “deep” and “shallow” settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this — most notably large experience replay buffers or the use of multiple parallel actors. These techniques come at the cost of moving away from the online RL problem as it is traditionally formulated (i.e., a single agent learning online without maintaining a large database of training examples). Meta-learning can potentially help with both these issues by tuning hyperparameters online and allowing the algorithm to more robustly adjust to non-stationarity in a problem. This paper applies meta-gradient descent to derive a set of step-size tuning algorithms specifically for online RL control with eligibility traces. Our novel technique, Metatrace, makes use of an eligibility trace analogous to methods like $TD(\lambda)$. We explore tuning both a single scalar step-size and a separate step-size for each learned parameter. We evaluate Metatrace first for control with linear function approximation in the classic mountain car problem and then in a noisy, non-stationary version. Finally, we apply Metatrace for control with nonlinear function approximation in 5 games in the Arcade Learning Environment where we explore how it impacts learning speed and robustness to initial step-size choice. Results show that the meta-step-size parameter of Metatrace is easy to set, Metatrace can speed learning, and Metatrace can allow an RL algorithm to deal with non-stationarity in the learning task.
Tasks	Atari Games, Meta-Learning
Published	2018-05-10
URL	https://arxiv.org/abs/1805.04514v2
PDF	https://arxiv.org/pdf/1805.04514v2.pdf
PWC	https://paperswithcode.com/paper/metatrace-online-step-size-tuning-by-meta
Repo
Framework