January 28, 2020

3053 words 15 mins read

Paper Group ANR 933

X-Section: Cross-Section Prediction for Enhanced RGBD Fusion. Bayesian Hyperparameter Optimization with BoTorch, GPyTorch and Ax. Dynamic Matrix Decomposition for Action Recognition. Deep Features Analysis with Attention Networks. A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots. Wasserstein Adversarial Imitation Lear …

X-Section: Cross-Section Prediction for Enhanced RGBD Fusion


Title	X-Section: Cross-Section Prediction for Enhanced RGBD Fusion
Authors	Andrea Nicastro, Ronald Clark, Stefan Leutenegger
Abstract	Detailed 3D reconstruction is an important challenge with application to robotics, augmented and virtual reality, which has seen impressive progress throughout the past years. Advancements were driven by the availability of depth cameras (RGB-D), as well as increased compute power, e.g.\ in the form of GPUs – but also thanks to inclusion of machine learning in the process. Here, we propose X-Section, an RGB-D 3D reconstruction approach that leverages deep learning to make object-level predictions about thicknesses that can be readily integrated into a volumetric multi-view fusion process, where we propose an extension to the popular KinectFusion approach. In essence, our method allows to complete shape in general indoor scenes behind what is sensed by the RGB-D camera, which may be crucial e.g.\ for robotic manipulation tasks or efficient scene exploration. Predicting object thicknesses rather than volumes allows us to work with comparably high spatial resolution without exploding memory and training data requirements on the employed Convolutional Neural Networks. In a series of qualitative and quantitative evaluations, we demonstrate how we accurately predict object thickness and reconstruct general 3D scenes containing multiple objects.
Tasks	3D Reconstruction
Published	2019-03-03
URL	https://arxiv.org/abs/1903.00987v3
PDF	https://arxiv.org/pdf/1903.00987v3.pdf
PWC	https://paperswithcode.com/paper/x-section-cross-section-prediction-for
Repo
Framework

Bayesian Hyperparameter Optimization with BoTorch, GPyTorch and Ax


Title	Bayesian Hyperparameter Optimization with BoTorch, GPyTorch and Ax
Authors	Daniel T. Chang
Abstract	Deep learning models are full of hyperparameters, which are set manually before the learning process can start. To find the best configuration for these hyperparameters in such a high dimensional space, with time-consuming and expensive model training / validation, is not a trivial challenge. Bayesian optimization is a powerful tool for the joint optimization of hyperparameters, efficiently trading off exploration and exploitation of the hyperparameter space. In this paper, we discuss Bayesian hyperparameter optimization, including hyperparameter optimization, Bayesian optimization, and Gaussian processes. We also review BoTorch, GPyTorch and Ax, the new open-source frameworks that we use for Bayesian optimization, Gaussian process inference and adaptive experimentation, respectively. For experimentation, we apply Bayesian hyperparameter optimization, for optimizing group weights, to weighted group pooling, which couples unsupervised tiered graph autoencoders learning and supervised graph classification learning for molecular graphs. We find that Ax, BoTorch and GPyTorch together provide a simple-to-use but powerful framework for Bayesian hyperparameter optimization, using Ax’s high-level API that constructs and runs a full optimization loop and returns the best hyperparameter configuration.
Tasks	Gaussian Processes, Graph Classification, Hyperparameter Optimization
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05686v1
PDF	https://arxiv.org/pdf/1912.05686v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-hyperparameter-optimization-with
Repo
Framework

Dynamic Matrix Decomposition for Action Recognition


Title	Dynamic Matrix Decomposition for Action Recognition
Authors	Abdul Basit
Abstract	Designing a technique for the automatic analysis of different actions in videos in order to detect the presence of interested activities is of high significance nowadays. In this paper, we explore a robust and dynamic appearance technique for the purpose of identifying different action activities. We also exploit a low-rank and structured sparse matrix decomposition (LSMD) method to better model these activities.. Our method is effective in encoding localized spatio-temporal features which enables the analysis of local motion taking place in the video. Our proposed model use adjacent frame differences as the input to the method thereby forcing it to capture the changes occurring in the video. The performance of our model is tested on a benchmark dataset in terms of detection accuracy. Results achieved with our model showed the promising capability of our model in detecting action activities.
Tasks	Temporal Action Localization
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07438v1
PDF	http://arxiv.org/pdf/1902.07438v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-matrix-decomposition-for-action
Repo
Framework

Deep Features Analysis with Attention Networks


Title	Deep Features Analysis with Attention Networks
Authors	Shipeng Xie, Da Chen, Rong Zhang, Hui Xue
Abstract	Deep neural network models have recently draw lots of attention, as it consistently produce impressive results in many computer vision tasks such as image classification, object detection, etc. However, interpreting such model and show the reason why it performs quite well becomes a challenging question. In this paper, we propose a novel method to interpret the neural network models with attention mechanism. Inspired by the heatmap visualization, we analyze the relation between classification accuracy with the attention based heatmap. An improved attention based method is also included and illustrate that a better classifier can be interpreted by the attention based heatmap.
Tasks	Image Classification, Object Detection
Published	2019-01-20
URL	http://arxiv.org/abs/1901.10042v1
PDF	http://arxiv.org/pdf/1901.10042v1.pdf
PWC	https://paperswithcode.com/paper/deep-features-analysis-with-attention
Repo
Framework


Title	A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots
Authors	Heriberto Cuayáhuitl
Abstract	The deep supervised and reinforcement learning paradigms (among others) have the potential to endow interactive multimodal social robots with the ability of acquiring skills autonomously. But it is still not very clear yet how they can be best deployed in real world applications. As a step in this direction, we propose a deep learning-based approach for efficiently training a humanoid robot to play multimodal games—and use the game of `Noughts & Crosses’ with two variants as a case study. Its minimum requirements for learning to perceive and interact are based on a few hundred example images, a few example multimodal dialogues and physical demonstrations of robot manipulation, and automatic simulations. In addition, we propose novel algorithms for robust visual game tracking and for competitive policy learning with high winning rates, which substantially outperform DQN-based baselines. While an automatic evaluation shows evidence that the proposed approach can be easily extended to new games with competitive robot behaviours, a human evaluation with 130 humans playing with the Pepper robot confirms that highly accurate visual perception is required for successful game play. \|
Tasks
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10398v1
PDF	https://arxiv.org/pdf/1908.10398v1.pdf
PWC	https://paperswithcode.com/paper/a-data-efficient-deep-learning-approach-for
Repo
Framework

Wasserstein Adversarial Imitation Learning


Title	Wasserstein Adversarial Imitation Learning
Authors	Huang Xiao, Michael Herman, Joerg Wagner, Sebastian Ziesche, Jalal Etesami, Thai Hong Linh
Abstract	Imitation Learning describes the problem of recovering an expert policy from demonstrations. While inverse reinforcement learning approaches are known to be very sample-efficient in terms of expert demonstrations, they usually require problem-dependent reward functions or a (task-)specific reward-function regularization. In this paper, we show a natural connection between inverse reinforcement learning approaches and Optimal Transport, that enables more general reward functions with desirable properties (e.g., smoothness). Based on our observation, we propose a novel approach called Wasserstein Adversarial Imitation Learning. Our approach considers the Kantorovich potentials as a reward function and further leverages regularized optimal transport to enable large-scale applications. In several robotic experiments, our approach outperforms the baselines in terms of average cumulative rewards and shows a significant improvement in sample-efficiency, by requiring just one expert demonstration.
Tasks	Imitation Learning
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08113v1
PDF	https://arxiv.org/pdf/1906.08113v1.pdf
PWC	https://paperswithcode.com/paper/wasserstein-adversarial-imitation-learning
Repo
Framework

Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm


Title	Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm
Authors	Jingwei Liu
Abstract	Microbial clades modeling is a challenging problem in biology based on microarray genome sequences, especially in new species gene isolates discovery and category. Marker family genome sequences play important roles in describing specific microbial clades within species, a framework of support vector machine (SVM) based microbial species classification with N-best algorithm is constructed to classify the centroid marker genome fragments randomly generated from marker genome sequences on MetaRef. A time series feature extraction method is proposed by segmenting the centroid gene sequences and mapping into different dimensional spaces. Two ways of data splitting are investigated according to random splitting fragments along genome sequence (DI) , or separating genome sequences into two parts (DII).Two strategies of fragments recognition tasks, dimension-by-dimension and sequence–by–sequence, are investigated. The k-mer size selection, overlap of segmentation and effects of random split percents are also discussed. Experiments on 12390 maker genome sequences belonging to marker families of 17 species from MetaRef show that, both for DI and DII in dimension-by-dimension and sequence-by-sequence recognition, the recognition accuracy rates can achieve above 28% in top-1 candidate, and above 91% in top-10 candidate both on training and testing sets overall.
Tasks	Time Series
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09061v1
PDF	http://arxiv.org/pdf/1904.09061v1.pdf
PWC	https://paperswithcode.com/paper/random-fragments-classification-of-microbial
Repo
Framework

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model


Title	H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model
Authors	Yanpei Shi, Qiang Huang, Thomas Hain
Abstract	In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use of hierarchical structure aims to learn speaker related information locally and globally. In the proposed approach, frame-level encoder and attention are applied on segments of an input utterance and generate individual segment vectors. Then, segment level attention is applied on the segment vectors to construct an utterance representation. To evaluate the effectiveness of the proposed approach, NIST SRE 2008 Part1 dataset is used for training, and two datasets, Switchboard Cellular part1 and CallHome American English Speech, are used to evaluate the quality of extracted utterance embeddings on speaker identification and verification tasks. In comparison with two baselines, X-vector, X-vector+Attention, the obtained results show that H-vectors can achieve a significantly better performance. Furthermore, the extracted utterance-level embeddings are more discriminative than the two baselines when mapped into a 2D space using t-SNE.
Tasks	Speaker Identification
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07900v2
PDF	https://arxiv.org/pdf/1910.07900v2.pdf
PWC	https://paperswithcode.com/paper/h-vectors-utterance-level-speaker-embedding
Repo
Framework

Emirati-Accented Speaker Identification in Stressful Talking Conditions


Title	Emirati-Accented Speaker Identification in Stressful Talking Conditions
Authors	Ismail Shahin, Ali Bou Nassif
Abstract	This research is dedicated to improving text-independent Emirati-accented speaker identification performance in stressful talking conditions using three distinct classifiers: First-Order Hidden Markov Models (HMM1s), Second-Order Hidden Markov Models (HMM2s), and Third-Order Hidden Markov Models (HMM3s). The database that has been used in this work was collected from 25 per gender Emirati native speakers uttering eight widespread Emirati sentences in each of neutral, shouted, slow, loud, soft, and fast talking conditions. The extracted features of the captured database are called Mel-Frequency Cepstral Coefficients (MFCCs). Based on HMM1s, HMM2s, and HMM3s, average Emirati-accented speaker identification accuracy in stressful conditions is 58.6%, 61.1%, and 65.0%, respectively. The achieved average speaker identification accuracy in stressful conditions based on HMM3s is so similar to that attained in subjective assessment by human listeners.
Tasks	Speaker Identification
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13070v2
PDF	https://arxiv.org/pdf/1909.13070v2.pdf
PWC	https://paperswithcode.com/paper/emirati-accented-speaker-identification-in
Repo
Framework

Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks


Title	Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks
Authors	Yuanshun Yao, Huiying Li, Haitao Zheng, Ben Y. Zhao
Abstract	Recent work has proposed the concept of backdoor attacks on deep neural networks (DNNs), where misbehaviors are hidden inside “normal” models, only to be triggered by very specific inputs. In practice, however, these attacks are difficult to perform and highly constrained by sharing of models through transfer learning. Adversaries have a small window during which they must compromise the student model before it is deployed. In this paper, we describe a significantly more powerful variant of the backdoor attack, latent backdoors, where hidden rules can be embedded in a single “Teacher” model, and automatically inherited by all “Student” models through the transfer learning process. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of lab volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.
Tasks	Traffic Sign Recognition, Transfer Learning
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10447v1
PDF	https://arxiv.org/pdf/1905.10447v1.pdf
PWC	https://paperswithcode.com/paper/regula-sub-rosa-latent-backdoor-attacks-on
Repo
Framework

LdSM: Logarithm-depth Streaming Multi-label Decision Trees


Title	LdSM: Logarithm-depth Streaming Multi-label Decision Trees
Authors	Maryam Majzoubi, Anna Choromanska
Abstract	We consider multi-label classification where the goal is to annotate each data point with the most relevant $\textit{subset}$ of labels from an extremely large label set. Efficient annotation can be achieved with balanced tree predictors, i.e. trees with logarithmic-depth in the label complexity, whose leaves correspond to labels. Designing prediction mechanism with such trees for real data applications is non-trivial as it needs to accommodate sending examples to multiple leaves while at the same time sustain high prediction accuracy. In this paper we develop the LdSM algorithm for the construction and training of multi-label decision trees, where in every node of the tree we optimize a novel objective function that favors balanced splits, maintains high class purity of children nodes, and allows sending examples to multiple directions but with a penalty that prevents tree over-growth. Each node of the tree is trained once the previous node is completed leading to a streaming approach for training. We analyze the proposed objective theoretically and show that minimizing it leads to pure and balanced data splits. Furthermore, we show a boosting theorem that captures its connection to the multi-label classification error. Experimental results on benchmark data sets demonstrate that our approach achieves high prediction accuracy and low prediction time and position LdSM as a competitive tool among existing state-of-the-art approaches.
Tasks	Multi-Label Classification
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10428v4
PDF	https://arxiv.org/pdf/1905.10428v4.pdf
PWC	https://paperswithcode.com/paper/ldsm-logarithm-depth-streaming-multi-label
Repo
Framework

Use of Ghost Cytometry to Differentiate Cells with Similar Gross Morphologic Characteristics


Title	Use of Ghost Cytometry to Differentiate Cells with Similar Gross Morphologic Characteristics
Authors	Hiroaki Adachi, Yoko Kawamura, Keiji Nakagawa, Ryoichi Horisaki, Issei Sato, Satoko Yamaguchi, Katsuhito Fujiu, Kayo Waki, Hiroyuki Noji, Sadao Ota
Abstract	Imaging flow cytometry shows significant potential for increasing our understanding of heterogeneous and complex life systems and is useful for biomedical applications. Ghost cytometry is a recently proposed approach for directly analyzing compressively measured signals, thereby relieving the computational bottleneck observed in high-throughput cytometry based on morphological information. While this image-free approach could distinguish different cell types using the same fluorescence staining method, further strict controls are sometimes required to clearly demonstrate that the classification is based on detailed morphologic analysis. In this study, we show that ghost cytometry can be used to classify cell populations of the same type but with different fluorescence distributions in space, supporting the strength of our image-free approach for morphologic cell analysis.
Tasks
Published	2019-03-22
URL	http://arxiv.org/abs/1903.09538v1
PDF	http://arxiv.org/pdf/1903.09538v1.pdf
PWC	https://paperswithcode.com/paper/use-of-ghost-cytometry-to-differentiate-cells
Repo
Framework

Branch and Bound for Piecewise Linear Neural Network Verification


Title	Branch and Bound for Piecewise Linear Neural Network Verification
Authors	Rudy Bunel, Jingyue Lu, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, M. Pawan Kumar
Abstract	The success of Deep Learning and its potential use in many safety-critical applications has motivated research on formal verification of Neural Network (NN) models. In this context, verification involves proving or disproving that an NN model satisfies certain input-output properties. Despite the reputation of learned NN models as black boxes, and the theoretical hardness of proving useful properties about them, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure and taking insights from formal methods such as Satisifiability Modulo Theory. However, these methods are still far from scaling to realistic neural networks. To facilitate progress on this crucial area, we exploit the Mixed Integer Linear Programming (MIP) formulation of verification to propose a family of algorithms based on Branch-and-Bound (BaB). We show that our family contains previous verification methods as special cases. With the help of the BaB framework, we make three key contributions. Firstly, we identify new methods that combine the strengths of multiple existing approaches, accomplishing significant performance improvements over previous state of the art. Secondly, we introduce an effective branching strategy on ReLU non-linearities. This branching strategy allows us to efficiently and successfully deal with high input dimensional problems with convolutional network architecture, on which previous methods fail frequently. Finally, we propose comprehensive test data sets and benchmarks which includes a collection of previously released testcases. We use the data sets to conduct a thorough experimental comparison of existing and new algorithms and to provide an inclusive analysis of the factors impacting the hardness of verification problems.
Tasks
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06588v3
PDF	https://arxiv.org/pdf/1909.06588v3.pdf
PWC	https://paperswithcode.com/paper/branch-and-bound-for-piecewise-linear-neural
Repo
Framework

EM-Fusion: Dynamic Object-Level SLAM with Probabilistic Data Association


Title	EM-Fusion: Dynamic Object-Level SLAM with Probabilistic Data Association
Authors	Michael Strecke, Jörg Stückler
Abstract	The majority of approaches for acquiring dense 3D environment maps with RGB-D cameras assumes static environments or rejects moving objects as outliers. The representation and tracking of moving objects, however, has significant potential for applications in robotics or augmented reality. In this paper, we propose a novel approach to dynamic SLAM with dense object-level representations. We represent rigid objects in local volumetric signed distance function (SDF) maps, and formulate multi-object tracking as direct alignment of RGB-D images with the SDF representations. Our main novelty is a probabilistic formulation which naturally leads to strategies for data association and occlusion handling. We analyze our approach in experiments and demonstrate that our approach compares favorably with the state-of-the-art methods in terms of robustness and accuracy.
Tasks	Multi-Object Tracking, Object Tracking
Published	2019-04-26
URL	http://arxiv.org/abs/1904.11781v1
PDF	http://arxiv.org/pdf/1904.11781v1.pdf
PWC	https://paperswithcode.com/paper/em-fusion-dynamic-object-level-slam-with
Repo
Framework

AlgaeDICE: Policy Gradient from Arbitrary Experience


Title	AlgaeDICE: Policy Gradient from Arbitrary Experience
Authors	Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans
Abstract	In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility. This presents a challenge to traditional RL algorithms since the max-return objective involves an expectation over on-policy samples. We introduce a new formulation of max-return optimization that allows the problem to be re-expressed by an expectation over an arbitrary behavior-agnostic and off-policy data distribution. We first derive this result by considering a regularized version of the dual max-return objective before extending our findings to unregularized objectives through the use of a Lagrangian formulation of the linear programming characterization of Q-values. We show that, if auxiliary dual variables of the objective are optimized, then the gradient of the off-policy objective is exactly the on-policy policy gradient, without any use of importance weighting. In addition to revealing the appealing theoretical properties of this approach, we also show that it delivers good practical performance.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02074v1
PDF	https://arxiv.org/pdf/1912.02074v1.pdf
PWC	https://paperswithcode.com/paper/algaedice-policy-gradient-from-arbitrary
Repo
Framework