July 26, 2019

2953 words 14 mins read

Paper Group ANR 758

Semi-Dense Visual Odometry for RGB-D Cameras Using Approximate Nearest Neighbour Fields. Texture segmentation with Fully Convolutional Networks. Using Multi-Label Classification for Improved Question Answering. Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits. Temporal Dynamic Graph LSTM for Action-driven Video Object Detection. …

Semi-Dense Visual Odometry for RGB-D Cameras Using Approximate Nearest Neighbour Fields


Title	Semi-Dense Visual Odometry for RGB-D Cameras Using Approximate Nearest Neighbour Fields
Authors	Yi Zhou, Laurent Kneip, Hongdong Li
Abstract	This paper presents a robust and efficient semi-dense visual odometry solution for RGB-D cameras. The core of our method is a 2D-3D ICP pipeline which estimates the pose of the sensor by registering the projection of a 3D semi-dense map of the reference frame with the 2D semi-dense region extracted in the current frame. The processing is speeded up by efficiently implemented approximate nearest neighbour fields under the Euclidean distance criterion, which permits the use of compact Gauss-Newton updates in the optimization. The registration is formulated as a maximum a posterior problem to deal with outliers and sensor noises, and consequently the equivalent weighted least squares problem is solved by iteratively reweighted least squares method. A variety of robust weight functions are tested and the optimum is determined based on the characteristics of the sensor model. Extensive evaluation on publicly available RGB-D datasets shows that the proposed method predominantly outperforms existing state-of-the-art methods.
Tasks	Visual Odometry
Published	2017-02-06
URL	http://arxiv.org/abs/1702.02512v1
PDF	http://arxiv.org/pdf/1702.02512v1.pdf
PWC	https://paperswithcode.com/paper/semi-dense-visual-odometry-for-rgb-d-cameras
Repo
Framework

Texture segmentation with Fully Convolutional Networks


Title	Texture segmentation with Fully Convolutional Networks
Authors	Vincent Andrearczyk, Paul F. Whelan
Abstract	In the last decade, deep learning has contributed to advances in a wide range computer vision tasks including texture analysis. This paper explores a new approach for texture segmentation using deep convolutional neural networks, sharing important ideas with classic filter bank based texture segmentation methods. Several methods are developed to train Fully Convolutional Networks to segment textures in various applications. We show in particular that these networks can learn to recognize and segment a type of texture, e.g. wood and grass from texture recognition datasets (no training segmentation). We demonstrate that Fully Convolutional Networks can learn from repetitive patterns to segment a particular texture from a single image or even a part of an image. We take advantage of these findings to develop a method that is evaluated on a series of supervised and unsupervised experiments and improve the state of the art on the Prague texture segmentation datasets.
Tasks	Texture Classification
Published	2017-03-15
URL	http://arxiv.org/abs/1703.05230v1
PDF	http://arxiv.org/pdf/1703.05230v1.pdf
PWC	https://paperswithcode.com/paper/texture-segmentation-with-fully-convolutional
Repo
Framework

Using Multi-Label Classification for Improved Question Answering


Title	Using Multi-Label Classification for Improved Question Answering
Authors	Ricardo Usbeck, Michael Hoffmann, Michael Röder, Jens Lehmann, Axel-Cyrille Ngonga Ngomo
Abstract	A plethora of diverse approaches for question answering over RDF data have been developed in recent years. While the accuracy of these systems has increased significantly over time, most systems still focus on particular types of questions or particular challenges in question answering. What is a curse for single systems is a blessing for the combination of these systems. We show in this paper how machine learning techniques can be applied to create a more accurate question answering metasystem by reusing existing systems. In particular, we develop a multi-label classification-based metasystem for question answering over 6 existing systems using an innovative set of 14 question features. The metasystem outperforms the best single system by 14% F-measure on the recent QALD-6 benchmark. Furthermore, we analyzed the influence and correlation of the underlying features on the metasystem quality.
Tasks	Multi-Label Classification, Question Answering
Published	2017-10-24
URL	http://arxiv.org/abs/1710.08634v1
PDF	http://arxiv.org/pdf/1710.08634v1.pdf
PWC	https://paperswithcode.com/paper/using-multi-label-classification-for-improved
Repo
Framework

Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits


Title	Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
Authors	Huasen Wu, Xueying Guo, Xin Liu
Abstract	In this paper, we propose and study opportunistic bandits - a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load/price is low and exploit more when the load/price is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (AdaUCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that AdaUCB achieves $O(\log T)$ regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, AdaUCB achieves $O(1)$ regret with respect to $T$ if the exploration cost is zero when the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that AdaUCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load/price fluctuations.
Tasks
Published	2017-09-12
URL	http://arxiv.org/abs/1709.04004v2
PDF	http://arxiv.org/pdf/1709.04004v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-exploration-exploitation-tradeoff
Repo
Framework

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection


Title	Temporal Dynamic Graph LSTM for Action-driven Video Object Detection
Authors	Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-Yan Yeung, Abhinav Gupta
Abstract	In this paper, we investigate a weakly-supervised object detection framework. Most existing frameworks focus on using static images to learn object detectors. However, these detectors often fail to generalize to videos because of the existing domain shift. Therefore, we investigate learning these detectors directly from boring videos of daily activities. Instead of using bounding boxes, we explore the use of action descriptions as supervision since they are relatively easy to gather. A common issue, however, is that objects of interest that are not involved in human actions are often absent in global action descriptions known as “missing label”. To tackle this problem, we propose a novel temporal dynamic graph Long Short-Term Memory network (TD-Graph LSTM). TD-Graph LSTM enables global temporal reasoning by constructing a dynamic graph that is based on temporal correlations of object proposals and spans the entire video. The missing label issue for each individual frame can thus be significantly alleviated by transferring knowledge across correlated objects proposals in the whole video. Extensive evaluations on a large-scale daily-life action dataset (i.e., Charades) demonstrates the superiority of our proposed method. We also release object bounding-box annotations for more than 5,000 frames in Charades. We believe this annotated data can also benefit other research on video-based object recognition in the future.
Tasks	Object Detection, Object Recognition, Video Object Detection, Weakly Supervised Object Detection
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00666v1
PDF	http://arxiv.org/pdf/1708.00666v1.pdf
PWC	https://paperswithcode.com/paper/temporal-dynamic-graph-lstm-for-action-driven
Repo
Framework

Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings


Title	Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
Authors	John Wieting, Kevin Gimpel
Abstract	We consider the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b). While they found LSTM recurrent networks to underperform word averaging, we present several developments that together produce the opposite conclusion. These include training on sentence pairs rather than phrase pairs, averaging states to represent sequences, and regularizing aggressively. These improve LSTMs in both transfer learning and supervised settings. We also introduce a new recurrent architecture, the Gated Recurrent Averaging Network, that is inspired by averaging and LSTMs while outperforming them both. We analyze our learned models, finding evidence of preferences for particular parts of speech and dependency relations.
Tasks	Sentence Embeddings, Transfer Learning
Published	2017-04-30
URL	http://arxiv.org/abs/1705.00364v1
PDF	http://arxiv.org/pdf/1705.00364v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-recurrent-networks-for
Repo
Framework

AIXIjs: A Software Demo for General Reinforcement Learning


Title	AIXIjs: A Software Demo for General Reinforcement Learning
Authors	John Aslanides
Abstract	Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn’t explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07615v1
PDF	http://arxiv.org/pdf/1705.07615v1.pdf
PWC	https://paperswithcode.com/paper/aixijs-a-software-demo-for-general
Repo
Framework

What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification


Title	What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification
Authors	Lin Wu, Yang Wang, Xue Li, Junbin Gao
Abstract	Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions within spatial manipulation to perform matching in local correspondence. However, they essentially extract \emph{fixed} representations from pre-divided regions for each image and perform matching based on the extracted representation subsequently. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of \emph{what-and-where to match} for effective person re-id. To address \emph{what} to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This thus, leads to flexible representations for pair-wise images. To address \emph{where} to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.
Tasks	Person Re-Identification
Published	2017-07-21
URL	http://arxiv.org/abs/1707.07074v4
PDF	http://arxiv.org/pdf/1707.07074v4.pdf
PWC	https://paperswithcode.com/paper/what-and-where-to-match-deep-spatially
Repo
Framework

Attention based convolutional neural network for predicting RNA-protein binding sites


Title	Attention based convolutional neural network for predicting RNA-protein binding sites
Authors	Xiaoyong Pan, Junchi Yan
Abstract	RNA-binding proteins (RBPs) play crucial roles in many biological processes, e.g. gene regulation. Computational identification of RBP binding sites on RNAs are urgently needed. In particular, RBPs bind to RNAs by recognizing sequence motifs. Thus, fast locating those motifs on RNA sequences is crucial and time-efficient for determining whether the RNAs interact with the RBPs or not. In this study, we present an attention based convolutional neural network, iDeepA, to predict RNA-protein binding sites from raw RNA sequences. We first encode RNA sequences into one-hot encoding. Next, we design a deep learning model with a convolutional neural network (CNN) and an attention mechanism, which automatically search for important positions, e.g. binding motifs, to learn discriminant high-level features for predicting RBP binding sites. We evaluate iDeepA on publicly gold-standard RBP binding sites derived from CLIP-seq data. The results demonstrate iDeepA achieves comparable performance with other state-of-the-art methods.
Tasks
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02270v1
PDF	http://arxiv.org/pdf/1712.02270v1.pdf
PWC	https://paperswithcode.com/paper/attention-based-convolutional-neural-network-1
Repo
Framework

Differentially Private Variational Dropout


Title	Differentially Private Variational Dropout
Authors	Beyza Ermis, Ali Taylan Cemgil
Abstract	Deep neural networks with their large number of parameters are highly flexible learning systems. The high flexibility in such networks brings with some serious problems such as overfitting, and regularization is used to address this problem. A currently popular and effective regularization technique for controlling the overfitting is dropout. Often, large data collections required for neural networks contain sensitive information such as the medical histories of patients, and the privacy of the training data should be protected. In this paper, we modify the recently proposed variational dropout technique which provided an elegant Bayesian interpretation to dropout, and show that the intrinsic noise in the variational dropout can be exploited to obtain a degree of differential privacy. The iterative nature of training neural networks presents a challenge for privacy-preserving estimation since multiple iterations increase the amount of noise added. We overcome this by using a relaxed notion of differential privacy, called concentrated differential privacy, which provides tighter estimates on the overall privacy loss. We demonstrate the accuracy of our privacy-preserving variational dropout algorithm on benchmark datasets.
Tasks
Published	2017-11-30
URL	http://arxiv.org/abs/1712.02629v3
PDF	http://arxiv.org/pdf/1712.02629v3.pdf
PWC	https://paperswithcode.com/paper/differentially-private-variational-dropout
Repo
Framework

A Review of Methodologies for Natural-Language-Facilitated Human-Robot Cooperation


Title	A Review of Methodologies for Natural-Language-Facilitated Human-Robot Cooperation
Authors	Rui Liu, Xiaoli Zhang
Abstract	Natural-language-facilitated human-robot cooperation (NLC) refers to using natural language (NL) to facilitate interactive information sharing and task executions with a common goal constraint between robots and humans. Recently, NLC research has received increasing attention. Typical NLC scenarios include robotic daily assistance, robotic health caregiving, intelligent manufacturing, autonomous navigation, and robot social accompany. However, a thorough review, that can reveal latest methodologies to use NL to facilitate human-robot cooperation, is missing. In this review, a comprehensive summary about methodologies for NLC is presented. NLC research includes three main research focuses: NL instruction understanding, NL-based execution plan generation, and knowledge-world mapping. In-depth analyses on theoretical methods, applications, and model advantages and disadvantages are made. Based on our paper review and perspective, potential research directions of NLC are summarized.
Tasks	Autonomous Navigation
Published	2017-01-30
URL	http://arxiv.org/abs/1701.08756v3
PDF	http://arxiv.org/pdf/1701.08756v3.pdf
PWC	https://paperswithcode.com/paper/a-review-of-methodologies-for-natural
Repo
Framework

Solutions of Quadratic First-Order ODEs applied to Computer Vision Problems


Title	Solutions of Quadratic First-Order ODEs applied to Computer Vision Problems
Authors	David Casillas-Perez, Daniel Pizarro, Manuel Mazo, Adrien Bartoli
Abstract	This article is a study about the existence and the uniqueness of solutions of a specific quadratic first-order ODE that frequently appears in multiple reconstruction problems. It is called the \emph{planar-perspective equation} due to the duality with the geometric problem of reconstruction of planar-perspective curves from their modulus. Solutions of the \emph{planar-perspective equation} are related with planar curves parametrized with perspective parametrization due to this geometric interpretation. The article proves the existence of only two local solutions to the \emph{initial value problem} with \emph{regular initial conditions} and a maximum of two analytic solutions with \emph{critical initial conditions}. The article also gives theorems to extend the local definition domain where the existence of both solutions are guaranteed. It introduces the \emph{maximal depth function} as a function that upper-bound all possible solutions of the \emph{planar-perspective equation} and contains all its possible \emph{critical points}. Finally, the article describes the \emph{maximal-depth solution problem} that consists of finding the solution of the referred equation that has maximum the depth and proves its uniqueness. It is an important problem as it does not need initial conditions to obtain the unique solution and its the frequent solution that practical algorithms of the state-of-the-art give.
Tasks
Published	2017-10-11
URL	http://arxiv.org/abs/1710.04265v3
PDF	http://arxiv.org/pdf/1710.04265v3.pdf
PWC	https://paperswithcode.com/paper/solutions-of-quadratic-first-order-odes
Repo
Framework

Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese


Title	Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese
Authors	Yuanzhi Ke, Masafumi Hagiwara
Abstract	The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embedding-based approach is cost-effective for machine learning on Chinese and Japanese.
Tasks	Sentiment Analysis
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03312v1
PDF	http://arxiv.org/pdf/1708.03312v1.pdf
PWC	https://paperswithcode.com/paper/radical-level-ideograph-encoder-for-rnn-based
Repo
Framework

ActivityNet Challenge 2017 Summary


Title	ActivityNet Challenge 2017 Summary
Authors	Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch
Abstract	The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary: results and challenge participants papers.
Tasks	Activity Recognition
Published	2017-10-22
URL	http://arxiv.org/abs/1710.08011v1
PDF	http://arxiv.org/pdf/1710.08011v1.pdf
PWC	https://paperswithcode.com/paper/activitynet-challenge-2017-summary
Repo
Framework

Detecting Hands in Egocentric Videos: Towards Action Recognition


Title	Detecting Hands in Egocentric Videos: Towards Action Recognition
Authors	Alejandro Cartas, Mariella Dimiccoli, Petia Radeva
Abstract	Recently, there has been a growing interest in analyzing human daily activities from data collected by wearable cameras. Since the hands are involved in a vast set of daily tasks, detecting hands in egocentric images is an important step towards the recognition of a variety of egocentric actions. However, besides extreme illumination changes in egocentric images, hand detection is not a trivial task because of the intrinsic large variability of hand appearance. We propose a hand detector that exploits skin modeling for fast hand proposal generation and Convolutional Neural Networks for hand recognition. We tested our method on UNIGE-HANDS dataset and we showed that the proposed approach achieves competitive hand detection results.
Tasks	Temporal Action Localization
Published	2017-09-08
URL	http://arxiv.org/abs/1709.02780v1
PDF	http://arxiv.org/pdf/1709.02780v1.pdf
PWC	https://paperswithcode.com/paper/detecting-hands-in-egocentric-videos-towards
Repo
Framework