January 29, 2020

3223 words 16 mins read

Paper Group ANR 617

Pose estimation and bin picking for deformable products. Salient Slices: Improved Neural Network Training and Performance with Image Entropy. Cascaded Revision Network for Novel Object Captioning. Data Valuation using Reinforcement Learning. Global Optimality Guarantees For Policy Gradient Methods. Motion Generation Considering Situation with Condi …

Pose estimation and bin picking for deformable products


Title	Pose estimation and bin picking for deformable products
Authors	Benjamin Joffe, Tevon Walker. Remi Gourdon, Konrad Ahlin
Abstract	Robotic systems in manufacturing applications commonly assume known object geometry and appearance. This simplifies the task for the 3D perception algorithms and allows the manipulation to be more deterministic. However, those approaches are not easily transferable to the agricultural and food domains due to the variability and deformability of natural food. We demonstrate an approach applied to poultry products that allows picking up a whole chicken from an unordered bin using a suction cup gripper, estimating its pose using a Deep Learning approach, and placing it in a canonical orientation where it can be further processed. Our robotic system was experimentally evaluated and is able to generalize to object variations and achieves high accuracy on bin picking and pose estimation tasks in a real-world environment.
Tasks	Pose Estimation
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05185v1
PDF	https://arxiv.org/pdf/1911.05185v1.pdf
PWC	https://paperswithcode.com/paper/pose-estimation-and-bin-picking-for
Repo
Framework

Salient Slices: Improved Neural Network Training and Performance with Image Entropy


Title	Salient Slices: Improved Neural Network Training and Performance with Image Entropy
Authors	Steven J. Frank, Andrea M. Frank
Abstract	As a training and analysis strategy for convolutional neural networks (CNNs), we slice images into tiled segments and use, for training and prediction, segments that both satisfy a criterion of information diversity and contain sufficient content to support classification. In particular, we utilize image entropy as the diversity criterion. This ensures that each tile carries as much information diversity as the original image, and for many applications serves as an indicator of usefulness in classification. To make predictions, a probability aggregation framework is applied to probabilities assigned by the CNN to the input image tiles. This technique facilitates the use of large, high-resolution images that would be impractical to analyze unmodified; provides data augmentation for training, which is particularly valuable when image availability is limited; and the ensemble nature of the input for prediction enhances its accuracy.
Tasks	Data Augmentation
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12436v3
PDF	https://arxiv.org/pdf/1907.12436v3.pdf
PWC	https://paperswithcode.com/paper/salient-slices-improved-neural-network
Repo
Framework

Cascaded Revision Network for Novel Object Captioning


Title	Cascaded Revision Network for Novel Object Captioning
Authors	Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Yi Yang
Abstract	Image captioning, a challenging task where the machine automatically describes an image by sentences, has drawn significant attention in recent years. Despite the remarkable improvements of recent approaches, however, these methods are built upon a large set of training image-sentence pairs. The expensive labor efforts hence limit the captioning model to describe the wider world. In this paper, we present a novel network structure, Cascaded Revision Network, which aims at relieving the problem by equipping the model with out-of-domain knowledge. CRN first tries its best to describe an image using the existing vocabulary from in-domain knowledge. Due to the lack of out-of-domain knowledge, the caption may be inaccurate or include ambiguous words for the image with unknown (novel) objects. We propose to re-edit the primary captioning sentence by a series of cascaded operations. We introduce a perplexity predictor to find out which words are most likely to be inaccurate given the input image. Thereafter, we utilize external knowledge from a pre-trained object detection model and select more accurate words from detection results by the visual matching module. In the last step, we design a semantic matching module to ensure that the novel object is fit in the right position. By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects. We validate the proposed method with state-of-the-art performance on the held-out MSCOCO dataset as well as scale to ImageNet, demonstrating the effectiveness of this method.
Tasks	Image Captioning, Object Detection
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02726v1
PDF	https://arxiv.org/pdf/1908.02726v1.pdf
PWC	https://paperswithcode.com/paper/cascaded-revision-network-for-novel-object
Repo
Framework

Data Valuation using Reinforcement Learning


Title	Data Valuation using Reinforcement Learning
Authors	Jinsung Yoon, Sercan O. Arik, Tomas Pfister
Abstract	Quantifying the value of data is a fundamental problem in machine learning. Data valuation has multiple important use cases: (1) building insights about the learning task, (2) domain adaptation, (3) corrupted sample discovery, and (4) robust learning. To adaptively learn data values jointly with the target task predictor model, we propose a meta learning framework which we name Data Valuation using Reinforcement Learning (DVRL). We employ a data value estimator (modeled by a deep neural network) to learn how likely each datum is used in training of the predictor model. We train the data value estimator using a reinforcement signal of the reward obtained on a small validation set that reflects performance on the target task. We demonstrate that DVRL yields superior data value estimates compared to alternative methods across different types of datasets and in a diverse set of application scenarios. The corrupted sample discovery performance of DVRL is close to optimal in many regimes (i.e. as if the noisy samples were known apriori), and for domain adaptation and robust learning DVRL significantly outperforms state-of-the-art by 14.6% and 10.8%, respectively.
Tasks	Domain Adaptation, Meta-Learning
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11671v1
PDF	https://arxiv.org/pdf/1909.11671v1.pdf
PWC	https://paperswithcode.com/paper/data-valuation-using-reinforcement-learning
Repo
Framework

Global Optimality Guarantees For Policy Gradient Methods


Title	Global Optimality Guarantees For Policy Gradient Methods
Authors	Jalaj Bhandari, Daniel Russo
Abstract	Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by classical techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to local minima. This work identifies structural properties – shared by finite MDPs and several classic control problems – which guarantee that policy gradient objective function has no suboptimal local minima despite being non-convex. When these assumptions are relaxed, our work gives conditions under which any local minimum is near-optimal, where the error bound depends on a notion of the expressive capacity of the policy class.
Tasks	Policy Gradient Methods
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01786v1
PDF	https://arxiv.org/pdf/1906.01786v1.pdf
PWC	https://paperswithcode.com/paper/global-optimality-guarantees-for-policy
Repo
Framework

Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots


Title	Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots
Authors	Kyo Kutsuzawa, Hitoshi Kusano, Ayaka Kume, Shoichiro Yamaguchi
Abstract	When robots work in a cluttered environment, the constraints for motions change frequently and the required action can change even for the same task. However, planning complex motions from direct calculation has the risk of resulting in poor performance local optima. In addition, machine learning approaches often require relearning for novel situations. In this paper, we propose a method of searching appropriate motions by using conditional Generative Adversarial Networks (cGANs), which can generate motions based on the conditions by mimicking training datasets. By training cGANs with various motions for a task, its latent space is fulfilled with the valid motions for the task. The appropriate motions can be found efficiently by searching the latent space of the trained cGANs instead of the motion space, while avoiding poor local optima. We demonstrate that the proposed method successfully works for an object-throwing task to given target positions in both numerical simulation and real-robot experiments. The proposed method resulted in three times higher accuracy with 2.5 times faster calculation time than searching the action space directly.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03253v1
PDF	https://arxiv.org/pdf/1910.03253v1.pdf
PWC	https://paperswithcode.com/paper/motion-generation-considering-situation-with
Repo
Framework

Mono-Stixels: Monocular depth reconstruction of dynamic street scenes


Title	Mono-Stixels: Monocular depth reconstruction of dynamic street scenes
Authors	Fabian Brickwedde, Steffen Abraham, Rudolf Mester
Abstract	In this paper we present mono-stixels, a compact environment representation specially designed for dynamic street scenes. Mono-stixels are a novel approach to estimate stixels from a monocular camera sequence instead of the traditionally used stereo depth measurements. Our approach jointly infers the depth, motion and semantic information of the dynamic scene as a 1D energy minimization problem based on optical flow estimates, pixel-wise semantic segmentation and camera motion. The optical flow of a stixel is described by a homography. By applying the mono-stixel model the degrees of freedom of a stixel-homography are reduced to only up to two degrees of freedom. Furthermore, we exploit a scene model and semantic information to handle moving objects. In our experiments we use the public available DeepFlow for optical flow estimation and FCN8s for the semantic information as inputs and show on the KITTI 2015 dataset that mono-stixels provide a compact and reliable depth reconstruction of both the static and moving parts of the scene. Thereby, mono-stixels overcome the limitation to static scenes of previous structure-from-motion approaches.
Tasks	Optical Flow Estimation, Semantic Segmentation
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02635v1
PDF	https://arxiv.org/pdf/1908.02635v1.pdf
PWC	https://paperswithcode.com/paper/mono-stixels-monocular-depth-reconstruction
Repo
Framework

Subspace Networks for Few-shot Classification


Title	Subspace Networks for Few-shot Classification
Authors	Arnout Devos, Matthias Grossglauser
Abstract	We propose subspace networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each class. Subspace networks learn an embedding space in which classification can be performed by computing distances of embedded points to subspace representations of each class. The class subspaces are spanned by examples belonging to the same class, transformed by a learnable embedding function. Similarly to recent approaches for few-shot learning, subspace networks reflect a simple inductive bias that is beneficial in this limited-data regime and they achieve excellent results. In particular, our proposed method shows consistently better performance than other state-of-the-art few-shot distance-metric learning methods when the embedding function is deep or when training and testing domains are shifted.
Tasks	Few-Shot Learning, Metric Learning
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13613v1
PDF	https://arxiv.org/pdf/1905.13613v1.pdf
PWC	https://paperswithcode.com/paper/subspace-networks-for-few-shot-classification
Repo
Framework

Supervised and Semi-supervised Deep Probabilistic Models for Indoor Positioning Problems


Title	Supervised and Semi-supervised Deep Probabilistic Models for Indoor Positioning Problems
Authors	Weizhu Qian, Fabrice Lauri, Franck Gechter
Abstract	Predicting smartphone users location with WiFi fingerprints has been a popular research topic recently. In this work, we propose two novel deep learning-based models, the convolutional mixture density recurrent neural network and the VAE-based semi-supervised learning model. The convolutional mixture density recurrent neural network is designed for path prediction, in which the advantages of convolutional neural networks, recurrent neural networks and mixture density networks are combined. Further, since most of real-world datasets are not labeled, we devise the VAE-based model for the semi-supervised learning tasks. In order to test the proposed models, we conduct the validation experiments on the real-world datasets. The final results verify the effectiveness of our approaches and show the superiority over other existing methods.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09906v2
PDF	https://arxiv.org/pdf/1911.09906v2.pdf
PWC	https://paperswithcode.com/paper/supervised-and-semi-supervised-deep-learning
Repo
Framework

GP-VAE: Deep Probabilistic Time Series Imputation


Title	GP-VAE: Deep Probabilistic Time Series Imputation
Authors	Vincent Fortuin, Dmitry Baranchuk, Gunnar Rätsch, Stephan Mandt
Abstract	Multivariate time series with missing values are common in areas such as healthcare and finance, and have grown in number and complexity over the years. This raises the question whether deep learning methodologies can outperform classical data imputation methods in this domain. However, naive applications of deep learning fall short in giving reliable confidence estimates and lack interpretability. We propose a new deep sequential latent variable model for dimensionality reduction and data imputation. Our modeling assumption is simple and interpretable: the high dimensional time series has a lower-dimensional representation which evolves smoothly in time according to a Gaussian process. The non-linear dimensionality reduction in the presence of missing data is achieved using a VAE approach with a novel structured variational approximation. We demonstrate that our approach outperforms several classical and deep learning-based data imputation methods on high-dimensional data from the domains of computer vision and healthcare, while additionally improving the smoothness of the imputations and providing interpretable uncertainty estimates.
Tasks	Dimensionality Reduction, Imputation, Multivariate Time Series Imputation, Time Series
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04155v5
PDF	https://arxiv.org/pdf/1907.04155v5.pdf
PWC	https://paperswithcode.com/paper/multivariate-time-series-imputation-with-1
Repo
Framework

High Frame Rate Video Reconstruction based on an Event Camera


Title	High Frame Rate Video Reconstruction based on an Event Camera
Authors	Liyuan Pan, Richard Hartley, Cedric Scheerlinck, Miaomiao Liu, Xin Yu, Yuchao Dai
Abstract	Event-based cameras measure intensity changes (called events) with microsecond accuracy under high-speed motion and challenging lighting conditions. With the active pixel sensor (APS), event cameras allow simultaneous output of intensity frames. However, the output images are captured at a relatively low frame rate and often suffer from motion blur. A blurred image can be regarded as the integral of a sequence of latent images, while events indicate changes between the latent images. Thus, we are able to model the blur-generation process by associating event data to a latent sharp image. Based on the abundant event data alongside low frame rate, easily blurred images, we propose a simple yet effective approach to reconstruct high-quality and high frame rate sharp videos. Starting with a single blurred frame and its event data, we propose the Event-based Double Integral (EDI) model and solve it by adding regularization terms. Then, we extend it to multiple Event-based Double Integral (mEDI) model to get more smooth results based on multiple images and their events. Furthermore, we provide a new and more efficient solver to minimize the proposed energy model. By optimizing the energy function, we achieve significant improvements in removing blur and the reconstruction of a high temporal resolution video. The video generation is based on solving a simple non-convex optimization problem in a single scalar variable. Experimental results on both synthetic and real sequences demonstrate the superiority of our mEDI model and optimization method compared to the state of the art.
Tasks	Video Generation, Video Reconstruction
Published	2019-03-12
URL	http://arxiv.org/abs/1903.06531v2
PDF	http://arxiv.org/pdf/1903.06531v2.pdf
PWC	https://paperswithcode.com/paper/bringing-blurry-alive-at-high-frame-rate-with
Repo
Framework

Interpretable Neural Predictions with Differentiable Binary Variables


Title	Interpretable Neural Predictions with Differentiable Binary Variables
Authors	Joost Bastings, Wilker Aziz, Ivan Titov
Abstract	The success of neural networks comes hand in hand with a desire for more interpretability. We focus on text classifiers and make them more interpretable by having them provide a justification, a rationale, for their predictions. We approach this problem by jointly training two neural network models: a latent model that selects a rationale (i.e. a short and informative part of the input text), and a classifier that learns from the words in the rationale alone. Previous work proposed to assign binary latent masks to input positions and to promote short selections via sparsity-inducing penalties such as L0 regularisation. We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE. In our formulation, we can tractably compute the expected value of penalties such as L0, which allows us to directly optimise the model towards a pre-specified text selection rate. We show that our approach is competitive with previous work on rationale extraction, and explore further uses in attention mechanisms.
Tasks
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08160v1
PDF	https://arxiv.org/pdf/1905.08160v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-neural-predictions-with
Repo
Framework

Classification of Crop Tolerance to Heat and Drought: A Deep Convolutional Neural Networks Approach


Title	Classification of Crop Tolerance to Heat and Drought: A Deep Convolutional Neural Networks Approach
Authors	Saeed Khaki, Zahra Khalilzadeh, Lizhi Wang
Abstract	Environmental stresses such as drought and heat can cause substantial yield loss in agriculture. As such, hybrid crops that are tolerant to drought and heat stress would produce more consistent yields compared to the hybrids that are not tolerant to these stresses. In the 2019 Syngenta Crop Challenge, Syngenta released several large datasets that recorded the yield performances of 2,452 corn hybrids planted in 1,560 locations between 2008 and 2017 and asked participants to classify the corn hybrids as either tolerant or susceptible to drought stress, heat stress, and combined drought and heat stress. However, no data was provided that classified any set of hybrids as tolerant or susceptible to any type of stress. In this paper, we present an unsupervised approach to solving this problem, which was recognized as one of the winners in the 2019 Syngenta Crop Challenge. Our results labeled 121 hybrids as drought tolerant, 193 as heat tolerant, and 29 as tolerant to both stresses.
Tasks
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00454v5
PDF	https://arxiv.org/pdf/1906.00454v5.pdf
PWC	https://paperswithcode.com/paper/190600454
Repo
Framework

An Empirical Study of Efficient ASR Rescoring with Transformers


Title	An Empirical Study of Efficient ASR Rescoring with Transformers
Authors	Hongzhao Huang, Fuchun Peng
Abstract	Neural language models (LMs) have been proved to significantly outperform classical n-gram LMs for language modeling due to their superior abilities to model long-range dependencies in text and handle data sparsity problems. And recently, well configured deep Transformers have exhibited superior performance over shallow stack of recurrent neural network layers for language modeling. However, these state-of-the-art deep Transformer models were mostly engineered to be deep with high model capacity, which makes it computationally inefficient and challenging to be deployed into large-scale real-world applications. Therefore, it is important to develop Transformer LMs that have relatively small model sizes, while still retaining good performance of those much larger models. In this paper, we aim to conduct empirical study on training Transformers with small parameter sizes in the context of ASR rescoring. By combining techniques including subword units, adaptive softmax, large-scale model pre-training, and knowledge distillation, we show that we are able to successfully train small Transformer LMs with significant relative word error rate reductions (WERR) through n-best rescoring. In particular, our experiments on a video speech recognition dataset show that we are able to achieve WERRs ranging from 6.46% to 7.17% while only with 5.5% to 11.9% parameter sizes of the well-known large GPT model [1], whose WERR with rescoring on the same dataset is 7.58%.
Tasks	Language Modelling, Speech Recognition
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11450v1
PDF	https://arxiv.org/pdf/1910.11450v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-of-efficient-asr-rescoring
Repo
Framework

The General Pair-based Weighting Loss for Deep Metric Learning


Title	The General Pair-based Weighting Loss for Deep Metric Learning
Authors	Haijun Liu, Jian Cheng, Wen Wang, Yanzhou Su
Abstract	Deep metric learning aims at learning the distance metric between pair of samples, through the deep neural networks to extract the semantic feature embeddings where similar samples are close to each other while dissimilar samples are farther apart. A large amount of loss functions based on pair distances have been presented in the literature for guiding the training of deep metric learning. In this paper, we unify them in a general pair-based weighting loss function, where the minimizing objective loss is just the distances weighting of informative pairs. The general pair-based weighting loss includes two main aspects, (1) samples mining and (2) pairs weighting. Samples mining aims at selecting the informative positive and negative pair sets to exploit the structured relationship of samples in a mini-batch and also reduce the number of non-trivial pairs. Pair weighting aims at assigning different weights for different pairs according to the pair distances for discriminatively training the network. We detailedly review those existing pair-based losses inline with our general loss function, and explore some possible methods from the perspective of samples mining and pairs weighting. Finally, extensive experiments on three image retrieval datasets show that our general pair-based weighting loss obtains new state-of-the-art performance, demonstrating the effectiveness of the pair-based samples mining and pairs weighting for deep metric learning.
Tasks	Image Retrieval, Metric Learning
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12837v1
PDF	https://arxiv.org/pdf/1905.12837v1.pdf
PWC	https://paperswithcode.com/paper/the-general-pair-based-weighting-loss-for
Repo
Framework