July 28, 2019

3364 words 16 mins read

Paper Group ANR 347

6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality. BARCHAN: Blob Alignment for Robust CHromatographic ANalysis. Generative Adversarial Mapping Networks. Arabidopsis roots segmentation based on morphological operations and CRFs. A New Convolutional Network-in-Network Structure and Its Appl …

6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality


Title	6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality
Authors	David Joseph Tan, Nassir Navab, Federico Tombari
Abstract	To determine the 3D orientation and 3D location of objects in the surroundings of a camera mounted on a robot or mobile device, we developed two powerful algorithms in object detection and temporal tracking that are combined seamlessly for robotic perception and interaction as well as Augmented Reality (AR). A separate evaluation of, respectively, the object detection and the temporal tracker demonstrates the important stride in research as well as the impact on industrial robotic applications and AR. When evaluated on a standard dataset, the detector produced the highest f1-score with a large margin while the tracker generated the best accuracy at a very low latency of approximately 2 ms per frame with one CPU core: both algorithms outperforming the state of the art. When combined, we achieve a powerful framework that is robust to handle multiple instances of the same object under occlusion and clutter while attaining real-time performance. Aiming at stepping beyond the simple scenarios used by current systems, often constrained by having a single object in absence of clutter, averting to touch the object to prevent close-range partial occlusion, selecting brightly colored objects to easily segment them individually or assuming that the object has simple geometric structure, we demonstrate the capacity to handle challenging cases under clutter, partial occlusion and varying lighting conditions with objects of different shapes and sizes.
Tasks	6D Pose Estimation using RGB, Object Detection, Pose Estimation
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01459v1
PDF	http://arxiv.org/pdf/1709.01459v1.pdf
PWC	https://paperswithcode.com/paper/6d-object-pose-estimation-with-depth-images-a
Repo
Framework

BARCHAN: Blob Alignment for Robust CHromatographic ANalysis


Title	BARCHAN: Blob Alignment for Robust CHromatographic ANalysis
Authors	Camille Couprie, Laurent Duval, Maxime Moreaud, Sophie Hénon, Mélinda Tebib, Vincent Souchon
Abstract	Comprehensive Two dimensional gas chromatography (GCxGC) plays a central role into the elucidation of complex samples. The automation of the identification of peak areas is of prime interest to obtain a fast and repeatable analysis of chromatograms. To determine the concentration of compounds or pseudo-compounds, templates of blobs are defined and superimposed on a reference chromatogram. The templates then need to be modified when different chromatograms are recorded. In this study, we present a chromatogram and template alignment method based on peak registration called BARCHAN. Peaks are identified using a robust mathematical morphology tool. The alignment is performed by a probabilistic estimation of a rigid transformation along the first dimension, and a non-rigid transformation in the second dimension, taking into account noise, outliers and missing peaks in a fully automated way. Resulting aligned chromatograms and masks are presented on two datasets. The proposed algorithm proves to be fast and reliable. It significantly reduces the time to results for GCxGC analysis.
Tasks
Published	2017-02-25
URL	http://arxiv.org/abs/1702.07942v1
PDF	http://arxiv.org/pdf/1702.07942v1.pdf
PWC	https://paperswithcode.com/paper/barchan-blob-alignment-for-robust
Repo
Framework

Generative Adversarial Mapping Networks


Title	Generative Adversarial Mapping Networks
Authors	Jianbo Guo, Guangxiang Zhu, Jian Li
Abstract	Generative Adversarial Networks (GANs) have shown impressive performance in generating photo-realistic images. They fit generative models by minimizing certain distance measure between the real image distribution and the generated data distribution. Several distance measures have been used, such as Jensen-Shannon divergence, $f$-divergence, and Wasserstein distance, and choosing an appropriate distance measure is very important for training the generative network. In this paper, we choose to use the maximum mean discrepancy (MMD) as the distance metric, which has several nice theoretical guarantees. In fact, generative moment matching network (GMMN) (Li, Swersky, and Zemel 2015) is such a generative model which contains only one generator network $G$ trained by directly minimizing MMD between the real and generated distributions. However, it fails to generate meaningful samples on challenging benchmark datasets, such as CIFAR-10 and LSUN. To improve on GMMN, we propose to add an extra network $F$, called mapper. $F$ maps both real data distribution and generated data distribution from the original data space to a feature representation space $\mathcal{R}$, and it is trained to maximize MMD between the two mapped distributions in $\mathcal{R}$, while the generator $G$ tries to minimize the MMD. We call the new model generative adversarial mapping networks (GAMNs). We demonstrate that the adversarial mapper $F$ can help $G$ to better capture the underlying data distribution. We also show that GAMN significantly outperforms GMMN, and is also superior to or comparable with other state-of-the-art GAN based methods on MNIST, CIFAR-10 and LSUN-Bedrooms datasets.
Tasks
Published	2017-09-28
URL	http://arxiv.org/abs/1709.09820v1
PDF	http://arxiv.org/pdf/1709.09820v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-mapping-networks
Repo
Framework

Arabidopsis roots segmentation based on morphological operations and CRFs


Title	Arabidopsis roots segmentation based on morphological operations and CRFs
Authors	José Ignacio Orlando, Hugo Luis Manterola, Enzo Ferrante, Federico Ariel
Abstract	Arabidopsis thaliana is a plant species widely utilized by scientists to estimate the impact of genetic differences in root morphological features. For this purpose, images of this plant after genetic modifications are taken to study differences in the root architecture. This task requires manual segmentations of radicular structures, although this is a particularly tedious and time-consuming labor. In this work, we present an unsupervised method for Arabidopsis thaliana root segmentation based on morphological operations and fully-connected Conditional Random Fields. Although other approaches have been proposed to this purpose, all of them are based on more complex and expensive imaging modalities. Our results prove that our method can be easily applied over images taken using conventional scanners, with a minor user intervention. A first data set, our results and a fully open source implementation are available online.
Tasks
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07793v1
PDF	http://arxiv.org/pdf/1704.07793v1.pdf
PWC	https://paperswithcode.com/paper/arabidopsis-roots-segmentation-based-on
Repo
Framework

A New Convolutional Network-in-Network Structure and Its Applications in Skin Detection, Semantic Segmentation, and Artifact Reduction


Title	A New Convolutional Network-in-Network Structure and Its Applications in Skin Detection, Semantic Segmentation, and Artifact Reduction
Authors	Yoonsik Kim, Insung Hwang, Nam Ik Cho
Abstract	The inception network has been shown to provide good performance on image classification problems, but there are not much evidences that it is also effective for the image restoration or pixel-wise labeling problems. For image restoration problems, the pooling is generally not used because the decimated features are not helpful for the reconstruction of an image as the output. Moreover, most deep learning architectures for the restoration problems do not use dense prediction that need lots of training parameters. From these observations, for enjoying the performance of inception-like structure on the image based problems we propose a new convolutional network-in-network structure. The proposed network can be considered a modification of inception structure where pool projection and pooling layer are removed for maintaining the entire feature map size, and a larger kernel filter is added instead. Proposed network greatly reduces the number of parameters on account of removed dense prediction and pooling, which is an advantage, but may also reduce the receptive field in each layer. Hence, we add a larger kernel than the original inception structure for not increasing the depth of layers. The proposed structure is applied to typical image-to-image learning problems, i.e., the problems where the size of input and output are same such as skin detection, semantic segmentation, and compression artifacts reduction. Extensive experiments show that the proposed network brings comparable or better results than the state-of-the-art convolutional neural networks for these problems.
Tasks	Image Classification, Image Restoration, Semantic Segmentation
Published	2017-01-22
URL	http://arxiv.org/abs/1701.06190v1
PDF	http://arxiv.org/pdf/1701.06190v1.pdf
PWC	https://paperswithcode.com/paper/a-new-convolutional-network-in-network
Repo
Framework

Panoramic Robust PCA for Foreground-Background Separation on Noisy, Free-Motion Camera Video


Title	Panoramic Robust PCA for Foreground-Background Separation on Noisy, Free-Motion Camera Video
Authors	Brian E. Moore, Chen Gao, Raj Rao Nadakuditi
Abstract	This work presents a new robust PCA method for foreground-background separation on freely moving camera video with possible dense and sparse corruptions. Our proposed method registers the frames of the corrupted video and then encodes the varying perspective arising from camera motion as missing data in a global model. This formulation allows our algorithm to produce a panoramic background component that automatically stitches together corrupted data from partially overlapping frames to reconstruct the full field of view. We model the registered video as the sum of a low-rank component that captures the background, a smooth component that captures the dynamic foreground of the scene, and a sparse component that isolates possible outliers and other sparse corruptions in the video. The low-rank portion of our model is based on a recent low-rank matrix estimator (OptShrink) that has been shown to yield superior low-rank subspace estimates in practice. To estimate the smooth foreground component of our model, we use a weighted total variation framework that enables our method to reliably decouple the true foreground of the video from sparse corruptions. We perform extensive numerical experiments on both static and moving camera video subject to a variety of dense and sparse corruptions. Our experiments demonstrate the state-of-the-art performance of our proposed method compared to existing methods both in terms of foreground and background estimation accuracy.
Tasks
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06229v3
PDF	http://arxiv.org/pdf/1712.06229v3.pdf
PWC	https://paperswithcode.com/paper/panoramic-robust-pca-for-foreground
Repo
Framework

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator


Title	Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator
Authors	Matthew J. Marinella, Sapan Agarwal, Alexander Hsia, Isaac Richter, Robin Jacobs-Gedrim, John Niroula, Steven J. Plimpton, Engin Ipek, Conrad D. James
Abstract	Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50M parameters are made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the next several orders of magnitude in performance per watt gains. Using an analog resistive memory (ReRAM) crossbar to perform key matrix operations in an accelerator is an attractive option. This work presents a detailed design using a state of the art 14/16 nm PDK for of an analog crossbar circuit block designed to process three key kernels required in training and inference of neural networks. A detailed circuit and device-level analysis of energy, latency, area, and accuracy are given and compared to relevant designs using standard digital ReRAM and SRAM operations. It is shown that the analog accelerator has a 270x energy and 540x latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate (MAC). Compared to an SRAM based accelerator, the energy is 430X better and latency is 34X better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies.
Tasks
Published	2017-07-31
URL	http://arxiv.org/abs/1707.09952v2
PDF	http://arxiv.org/pdf/1707.09952v2.pdf
PWC	https://paperswithcode.com/paper/multiscale-co-design-analysis-of-energy
Repo
Framework

Improving Sonar Image Patch Matching via Deep Learning


Title	Improving Sonar Image Patch Matching via Deep Learning
Authors	Matias Valdenegro-Toro
Abstract	Matching sonar images with high accuracy has been a problem for a long time, as sonar images are inherently hard to model due to reflections, noise and viewpoint dependence. Autonomous Underwater Vehicles require good sonar image matching capabilities for tasks such as tracking, simultaneous localization and mapping (SLAM) and some cases of object detection/recognition. We propose the use of Convolutional Neural Networks (CNN) to learn a matching function that can be trained from labeled sonar data, after pre-processing to generate matching and non-matching pairs. In a dataset of 39K training pairs, we obtain 0.91 Area under the ROC Curve (AUC) for a CNN that outputs a binary classification matching decision, and 0.89 AUC for another CNN that outputs a matching score. In comparison, classical keypoint matching methods like SIFT, SURF, ORB and AKAZE obtain AUC 0.61 to 0.68. Alternative learning methods obtain similar results, with a Random Forest Classifier obtaining AUC 0.79, and a Support Vector Machine resulting in AUC 0.66.
Tasks	Object Detection, Simultaneous Localization and Mapping
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02150v1
PDF	http://arxiv.org/pdf/1709.02150v1.pdf
PWC	https://paperswithcode.com/paper/improving-sonar-image-patch-matching-via-deep
Repo
Framework

Aicyber’s System for NLPCC 2017 Shared Task 2: Voting of Baselines


Title	Aicyber’s System for NLPCC 2017 Shared Task 2: Voting of Baselines
Authors	Du Steven, Xi Zhang
Abstract	This paper presents Aicyber’s system for NLPCC 2017 shared task 2. It is formed by a voting of three deep learning based system trained on character-enhanced word vectors and a well known bag-of-word model.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05467v1
PDF	http://arxiv.org/pdf/1711.05467v1.pdf
PWC	https://paperswithcode.com/paper/aicybers-system-for-nlpcc-2017-shared-task-2
Repo
Framework

ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations


Title	ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
Authors	John Wieting, Kevin Gimpel
Abstract	We describe PARANMT-50M, a dataset of more than 50 million English-English sentential paraphrase pairs. We generated the pairs automatically by using neural machine translation to translate the non-English side of a large parallel corpus, following Wieting et al. (2017). Our hope is that ParaNMT-50M can be a valuable resource for paraphrase generation and can provide a rich source of semantic knowledge to improve downstream natural language understanding tasks. To show its utility, we use ParaNMT-50M to train paraphrastic sentence embeddings that outperform all supervised systems on every SemEval semantic textual similarity competition, in addition to showing how it can be used for paraphrase generation.
Tasks	Machine Translation, Paraphrase Generation, Semantic Textual Similarity, Sentence Embeddings
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05732v2
PDF	http://arxiv.org/pdf/1711.05732v2.pdf
PWC	https://paperswithcode.com/paper/paranmt-50m-pushing-the-limits-of
Repo
Framework

Online Natural Gradient as a Kalman Filter


Title	Online Natural Gradient as a Kalman Filter
Authors	Yann Ollivier
Abstract	We cast Amari’s natural gradient in statistical learning as a specific case of Kalman filtering. Namely, applying an extended Kalman filter to estimate a fixed unknown parameter of a probabilistic model from a series of observations, is rigorously equivalent to estimating this parameter via an online stochastic natural gradient descent on the log-likelihood of the observations. In the i.i.d. case, this relation is a consequence of the “information filter” phrasing of the extended Kalman filter. In the recurrent (state space, non-i.i.d.) case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models. This exact algebraic correspondence provides relevant interpretations for natural gradient hyperparameters such as learning rates or initialization and regularization of the Fisher information matrix.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00209v3
PDF	http://arxiv.org/pdf/1703.00209v3.pdf
PWC	https://paperswithcode.com/paper/online-natural-gradient-as-a-kalman-filter
Repo
Framework

Large Linear Multi-output Gaussian Process Learning


Title	Large Linear Multi-output Gaussian Process Learning
Authors	Vladimir Feinberg, Li-Fang Cheng, Kai Li, Barbara E Engelhardt
Abstract	Gaussian processes (GPs), or distributions over arbitrary functions in a continuous domain, can be generalized to the multi-output case: a linear model of coregionalization (LMC) is one approach. LMCs estimate and exploit correlations across the multiple outputs. While model estimation can be performed efficiently for single-output GPs, these assume stationarity, but in the multi-output case the cross-covariance interaction is not stationary. We propose Large Linear GP (LLGP), which circumvents the need for stationarity by inducing structure in the LMC kernel through a common grid of inputs shared between outputs, enabling optimization of GP hyperparameters for multi-dimensional outputs and low-dimensional inputs. When applied to synthetic two-dimensional and real time series data, we find our theoretical improvement relative to the current solutions for multi-output GPs is realized with LLGP reducing training time while improving or maintaining predictive mean accuracy. Moreover, by using a direct likelihood approximation rather than a variational one, model confidence estimates are significantly improved.
Tasks	Gaussian Processes, Time Series
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10813v3
PDF	http://arxiv.org/pdf/1705.10813v3.pdf
PWC	https://paperswithcode.com/paper/large-linear-multi-output-gaussian-process
Repo
Framework

Analyzing users’ sentiment towards popular consumer industries and brands on Twitter


Title	Analyzing users’ sentiment towards popular consumer industries and brands on Twitter
Authors	Guoning Hu, Preeti Bhargava, Saul Fuhrmann, Sarah Ellinger, Nemanja Spasojevic
Abstract	Social media serves as a unified platform for users to express their thoughts on subjects ranging from their daily lives to their opinion on consumer brands and products. These users wield an enormous influence in shaping the opinions of other consumers and influence brand perception, brand loyalty and brand advocacy. In this paper, we analyze the opinion of 19M Twitter users towards 62 popular industries, encompassing 12,898 enterprise and consumer brands, as well as associated subject matter topics, via sentiment analysis of 330M tweets over a period spanning a month. We find that users tend to be most positive towards manufacturing and most negative towards service industries. In addition, they tend to be more positive or negative when interacting with brands than generally on Twitter. We also find that sentiment towards brands within an industry varies greatly and we demonstrate this using two industries as use cases. In addition, we discover that there is no strong correlation between topic sentiments of different industries, demonstrating that topic sentiments are highly dependent on the context of the industry that they are mentioned in. We demonstrate the value of such an analysis in order to assess the impact of brands on social media. We hope that this initial study will prove valuable for both researchers and companies in understanding users’ perception of industries, brands and associated topics and encourage more research in this field.
Tasks	Sentiment Analysis
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07434v1
PDF	http://arxiv.org/pdf/1709.07434v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-users-sentiment-towards-popular
Repo
Framework

Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images


Title	Convolutional Networks for Object Category and 3D Pose Estimation from 2D Images
Authors	Siddharth Mahendran, Haider Ali, Rene Vidal
Abstract	Current CNN-based algorithms for recovering the 3D pose of an object in an image assume knowledge about both the object category and its 2D localization in the image. In this paper, we relax one of these constraints and propose to solve the task of joint object category and 3D pose estimation from an image assuming known 2D localization. We design a new architecture for this task composed of a feature network that is shared between subtasks, an object categorization network built on top of the feature network, and a collection of category dependent pose regression networks. We also introduce suitable loss functions and a training method for the new architecture. Experiments on the challenging PASCAL3D+ dataset show state-of-the-art performance in the joint categorization and pose estimation task. Moreover, our performance on the joint task is comparable to the performance of state-of-the-art methods on the simpler 3D pose estimation with known object category task.
Tasks	3D Pose Estimation, Pose Estimation
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07426v3
PDF	http://arxiv.org/pdf/1711.07426v3.pdf
PWC	https://paperswithcode.com/paper/convolutional-networks-for-object-category
Repo
Framework

Control-Oriented Learning on the Fly


Title	Control-Oriented Learning on the Fly
Authors	Melkior Ornik, Arie Israel, Ufuk Topcu
Abstract	This paper focuses on developing a strategy for control of systems whose dynamics are almost entirely unknown. This situation arises naturally in a scenario where a system undergoes a critical failure. In that case, it is imperative to retain the ability to satisfy basic control objectives in order to avert an imminent catastrophe. A prime example of such an objective is the reach-avoid problem, where a system needs to move to a certain state in a constrained state space. To deal with limitations on our knowledge of system dynamics, we develop a theory of myopic control. The primary goal of myopic control is to, at any given time, optimize the current direction of the system trajectory, given solely the information obtained about the system until that time. We propose an algorithm that uses small perturbations in the control effort to learn local dynamics while simultaneously ensuring that the system moves in a direction that appears to be nearly optimal, and provide hard bounds for its suboptimality. We additionally verify the usefulness of the algorithm on a simulation of a damaged aircraft seeking to avoid a crash, as well as on an example of a Van der Pol oscillator.
Tasks
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04889v2
PDF	http://arxiv.org/pdf/1709.04889v2.pdf
PWC	https://paperswithcode.com/paper/control-oriented-learning-on-the-fly
Repo
Framework