February 1, 2020

3271 words 16 mins read

Paper Group AWR 208

Augmentation for small object detection. ID-aware Quality for Set-based Person Re-identification. Similarity-DT: Kernel Similarity Embedding for Dynamic Texture Synthesis. Semantics-Aligned Representation Learning for Person Re-identification. MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation. PRECOG: PREdiction Conditioned On …

Augmentation for small object detection


Title	Augmentation for small object detection
Authors	Mate Kisantal, Zbigniew Wojna, Jakub Murawski, Jacek Naruniec, Kyunghyun Cho
Abstract	In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7% relative improvement on the instance segmentation and 7.1% on the object detection of small objects, compared to the current state of the art method on MS COCO.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation, Small Object Detection
Published	2019-02-19
URL	http://arxiv.org/abs/1902.07296v1
PDF	http://arxiv.org/pdf/1902.07296v1.pdf
PWC	https://paperswithcode.com/paper/augmentation-for-small-object-detection
Repo	https://github.com/siddhanthaldar/PyTorch_Object_Detection
Framework	pytorch

ID-aware Quality for Set-based Person Re-identification


Title	ID-aware Quality for Set-based Person Re-identification
Authors	Xinshao Wang, Elyor Kodirov, Yang Hua, Neil M. Robertson
Abstract	Set-based person re-identification (SReID) is a matching problem that aims to verify whether two sets are of the same identity (ID). Existing SReID models typically generate a feature representation per image and aggregate them to represent the set as a single embedding. However, they can easily be perturbed by noises–perceptually/semantically low quality images–which are inevitable due to imperfect tracking/detection systems, or overfit to trivial images. In this work, we present a novel and simple solution to this problem based on ID-aware quality that measures the perceptual and semantic quality of images guided by their ID information. Specifically, we propose an ID-aware Embedding that consists of two key components: (1) Feature learning attention that aims to learn robust image embeddings by focusing on ‘medium’ hard images. This way it can prevent overfitting to trivial images, and alleviate the influence of outliers. (2) Feature fusion attention is to fuse image embeddings in the set to obtain the set-level embedding. It ignores noisy information and pays more attention to discriminative images to aggregate more discriminative information. Experimental results on four datasets show that our method outperforms state-of-the-art approaches despite the simplicity of our approach.
Tasks	Person Re-Identification
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09143v1
PDF	https://arxiv.org/pdf/1911.09143v1.pdf
PWC	https://paperswithcode.com/paper/id-aware-quality-for-set-based-person-re
Repo	https://github.com/XinshaoAmosWang/OSM_CAA_WeightedContrastiveLoss
Framework	none

Similarity-DT: Kernel Similarity Embedding for Dynamic Texture Synthesis


Title	Similarity-DT: Kernel Similarity Embedding for Dynamic Texture Synthesis
Authors	Shiming Chen, Peng Zhang, Xinge You, Qinmu Peng, Xin Liu, Zehong Cao, Dacheng Tao
Abstract	Dynamic texture (DT) exhibits statistical stationarity in the spatial domain and stochastic repetitiveness in the temporal dimension, indicating that different frames of DT possess high similarity correlation. However, there are no DT synthesis methods to consider the similarity prior for representing DT instead, which can explicitly capture the homogeneous and heterogeneous correlation between different frames of DT. In this paper, we propose a novel DT synthesis method (named Similarity-DT), which embeds the similarity prior into the representation of DT. Specifically, we first raise two hypotheses: the content of texture video frames varies over time-to-time, while the more closed frames should be more similar; the transition between frame-to-frame could be modeled as a linear or nonlinear function to capture the similarity correlation. Then, our proposed Similarity-DT integrates kernel learning and extreme learning machine (ELM) into a powerful unified synthesis model to learn kernel similarity embedding to represent the spatial-temporal transition among frame-to-frame of DTs. Extensive experiments on DT videos collected from internet and two benchmark datasets, i.e., Gatech Graphcut Textures and Dyntex, demonstrate that the learned kernel similarity embedding effectively exhibits the discriminative representation for DTs. Hence our method is capable of preserving long-term temporal continuity of the synthesized DT sequences with excellent sustainability and generalization. We also show that our method effectively generates realistic DT videos with fast speed and low computation, compared with the state-of-the-art approaches.
Tasks	Texture Synthesis
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04254v2
PDF	https://arxiv.org/pdf/1911.04254v2.pdf
PWC	https://paperswithcode.com/paper/similarity-dt-kernel-similarity-embedding-for
Repo	https://github.com/shiming-chen/Similariy-DT
Framework	none

Semantics-Aligned Representation Learning for Person Re-identification


Title	Semantics-Aligned Representation Learning for Person Re-identification
Authors	Xin Jin, Cuiling Lan, Wenjun Zeng, Guoqiang Wei, Zhibo Chen
Abstract	Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. This is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc. In this paper, we propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Ablation studies demonstrate the effectiveness of our design. We achieve the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID. Code for our proposed method is available at: https://github.com/microsoft/Semantics-Aligned-Representation-Learning-for-Person-Re-identification.
Tasks	Person Re-Identification, Representation Learning, Texture Synthesis
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13143v3
PDF	https://arxiv.org/pdf/1905.13143v3.pdf
PWC	https://paperswithcode.com/paper/semantics-aligned-representation-learning-for
Repo	https://github.com/microsoft/Semantics-Aligned-Representation-Learning-for-Person-Re-identification
Framework	pytorch

MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation


Title	MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation
Authors	Shuangjie Xu, Daizong Liu, Linchao Bao, Wei Liu, Pan Zhou
Abstract	We address the problem of semi-supervised video object segmentation (VOS), where the masks of objects of interests are given in the first frame of an input video. To deal with challenging cases where objects are occluded or missing, previous work relies on greedy data association strategies that make decisions for each frame individually. In this paper, we propose a novel approach to defer the decision making for a target object in each frame, until a global view can be established with the entire video being taken into consideration. Our approach is in the same spirit as Multiple Hypotheses Tracking (MHT) methods, making several critical adaptations for the VOS problem. We employ the bounding box (bbox) hypothesis for tracking tree formation, and the multiple hypotheses are spawned by propagating the preceding bbox into the detected bbox proposals within a gated region starting from the initial object mask in the first frame. The gated region is determined by a gating scheme which takes into account a more comprehensive motion model rather than the simple Kalman filtering model in traditional MHT. To further design more customized algorithms tailored for VOS, we develop a novel mask propagation score instead of the appearance similarity score that could be brittle due to large deformations. The mask propagation score, together with the motion score, determines the affinity between the hypotheses during tree pruning. Finally, a novel mask merging strategy is employed to handle mask conflicts between objects. Extensive experiments on challenging datasets demonstrate the effectiveness of the proposed method, especially in the case of object missing.
Tasks	Decision Making, Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08141v1
PDF	http://arxiv.org/pdf/1904.08141v1.pdf
PWC	https://paperswithcode.com/paper/190408141
Repo	https://github.com/shuangjiexu/MHP-VOS
Framework	pytorch

PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings


Title	PRECOG: PREdiction Conditioned On Goals in Visual Multi-Agent Settings
Authors	Nicholas Rhinehart, Rowan McAllister, Kris Kitani, Sergey Levine
Abstract	For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions between a variable number of agents. We perform both standard forecasting and the novel task of conditional forecasting, which reasons about how all agents will likely respond to the goal of a controlled agent (here, the AV). We train models on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model’s predictions of all agents improve when conditioned on knowledge of the AV’s goal, further illustrating its capability to model agent interactions.
Tasks	Autonomous Vehicles
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01296v3
PDF	https://arxiv.org/pdf/1905.01296v3.pdf
PWC	https://paperswithcode.com/paper/precog-prediction-conditioned-on-goals-in
Repo	https://github.com/nrhine1/precog_carla_dataset
Framework	none

Joint Forward-Backward Visual Odometry for Stereo Cameras


Title	Joint Forward-Backward Visual Odometry for Stereo Cameras
Authors	Raghav Sardana, Rahul Kottath, Vinod Karar, Shashi Poddar
Abstract	Visual odometry is a widely used technique in the field of robotics and automation to keep a track on the location of a robot using visual cues alone. In this paper, we propose a joint forward backward visual odometry framework by combining both, the forward motion and backward motion estimated from stereo cameras. The basic framework of LIBVIOS2 is used here for pose estimation as it can run in real-time on standard CPUs. The complementary nature of errors in the forward and backward mode of visual odometry helps in providing a refined motion estimation upon combining these individual estimates. In addition, two reliability measures, that is, forward-backward relative pose error and forward-backward absolute pose error have been proposed for evaluating visual odometry frameworks on its own without the requirement of any ground truth data. The proposed scheme is evaluated on the KITTI visual odometry dataset. The experimental results demonstrate improved accuracy of the proposed scheme over the traditional odometry pipeline without much increase in the system overload.
Tasks	Motion Estimation, Pose Estimation, Visual Odometry
Published	2019-12-21
URL	https://arxiv.org/abs/1912.10293v1
PDF	https://arxiv.org/pdf/1912.10293v1.pdf
PWC	https://paperswithcode.com/paper/joint-forward-backward-visual-odometry-for
Repo	https://github.com/raghavsardana/raghavsardana.github.io
Framework	none

XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera


Title	XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera
Authors	Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, Christian Theobalt
Abstract	We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates in generic scenes and is robust to difficult occlusions both by other people and objects. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully-connected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that neither extracted global body positions nor joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes.
Tasks	3D Human Pose Estimation, Motion Capture, Pose Estimation
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00837v1
PDF	https://arxiv.org/pdf/1907.00837v1.pdf
PWC	https://paperswithcode.com/paper/xnect-real-time-multi-person-3d-human-pose
Repo	https://github.com/Daniil-Osokin/lightweight-human-pose-estimation-3d-demo.pytorch
Framework	pytorch

Towards Safety Verification of Direct Perception Neural Networks


Title	Towards Safety Verification of Direct Perception Neural Networks
Authors	Chih-Hong Cheng, Chung-Hao Huang, Thomas Brunner, Vahid Hashemi
Abstract	We study the problem of safety verification of direct perception neural networks, where camera images are used as inputs to produce high-level features for autonomous vehicles to make control decisions. Formal verification of direct perception neural networks is extremely challenging, as it is difficult to formulate the specification that requires characterizing input as constraints, while the number of neurons in such a network can reach millions. We approach the specification problem by learning an input property characterizer which carefully extends a direct perception neural network at close-to-output layers, and address the scalability problem by a novel assume-guarantee based verification approach. The presented workflow is used to understand a direct perception neural network (developed by Audi) which computes the next waypoint and orientation for autonomous vehicles to follow.
Tasks	Autonomous Vehicles
Published	2019-04-09
URL	https://arxiv.org/abs/1904.04706v2
PDF	https://arxiv.org/pdf/1904.04706v2.pdf
PWC	https://paperswithcode.com/paper/towards-safety-verification-of-direct
Repo	https://github.com/dependable-ai/nn-dependability-kit
Framework	tf

Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change


Title	Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change
Authors	Haim Dubossarsky, Simon Hengchen, Nina Tahmasebi, Dominik Schlechtweg
Abstract	State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing outperforms alignment models on a synthetic task as well as a manual testset. We introduce a principled way to simulate lexical semantic change and systematically control for possible biases.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01688v1
PDF	https://arxiv.org/pdf/1906.01688v1.pdf
PWC	https://paperswithcode.com/paper/time-out-temporal-referencing-for-robust
Repo	https://github.com/Garrafao/TemporalReferencing
Framework	none

Separating value functions across time-scales


Title	Separating value functions across time-scales
Authors	Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier
Abstract	In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a shorter effective planning horizon. This comes at the risk of potentially biasing the optimization target away from the undiscounted goal. In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning. We present an extension of temporal difference (TD) learning, which we call TD($\Delta$), that breaks down a value function into a series of components based on the differences between value functions with smaller discount factors. The separation of a longer horizon value function into these components has useful properties in scalability and performance. We discuss these properties and show theoretic and empirical improvements over standard TD learning in certain settings.
Tasks
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01883v3
PDF	https://arxiv.org/pdf/1902.01883v3.pdf
PWC	https://paperswithcode.com/paper/separating-value-functions-across-time-scales
Repo	https://github.com/facebookresearch/td-delta
Framework	pytorch

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization


Title	Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization
Authors	Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
Abstract	Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. Our approach can be interpreted as a way to learn a discrete and latent representation of the state-action space. To learn option policies that correspond to modes of the advantage function, we introduce advantage-weighted importance sampling. In our HRL method, the gating policy learns to select option policies based on an option-value function, and these option policies are optimized based on the deterministic policy gradient method. This framework is derived by leveraging the analogy between a monolithic policy in standard RL and a hierarchical policy in HRL by using a deterministic option policy. Experimental results indicate that our HRL approach can learn a diversity of options and that it can enhance the performance of RL in continuous control tasks.
Tasks	Continuous Control, Hierarchical Reinforcement Learning
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01365v2
PDF	http://arxiv.org/pdf/1901.01365v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-reinforcement-learning-via
Repo	https://github.com/TakaOsa/adInfoHRL
Framework	tf

The autofeat Python Library for Automated Feature Engineering and Selection


Title	The autofeat Python Library for Automated Feature Engineering and Selection
Authors	Franziska Horn, Robert Pack, Michael Rieger
Abstract	This paper describes the autofeat Python library, which provides scikit-learn style linear regression and classification models with automated feature engineering and selection capabilities. Complex non-linear machine learning models, such as neural networks, are in practice often difficult to train and even harder to explain to non-statisticians, who require transparent analysis results as a basis for important business decisions. While linear models are efficient and intuitive, they generally provide lower prediction accuracies. Our library provides a multi-step feature engineering and selection process, where first a large pool of non-linear features is generated, from which then a small and robust set of meaningful features is selected, which improve the prediction accuracy of a linear model while retaining its interpretability.
Tasks	Automated Feature Engineering, Feature Engineering
Published	2019-01-22
URL	https://arxiv.org/abs/1901.07329v4
PDF	https://arxiv.org/pdf/1901.07329v4.pdf
PWC	https://paperswithcode.com/paper/the-autofeat-python-library-for-automatic
Repo	https://github.com/cod3licious/autofeat
Framework	none

Training Generative Networks with general Optimal Transport distances


Title	Training Generative Networks with general Optimal Transport distances
Authors	Vaios Laschos, Jan Tinapp, Klaus Obermayer
Abstract	We propose a new algorithm that uses an auxiliary Neural Network to calculate the transport distance between two data distributions and export an optimal transport map. In the sequel we use the aforementioned map to train Generative Networks. Unlike WGANs, where the Euclidean distance is implicitly used, this new method allows to use any transportation cost function that can be chosen to match the problem at hand. More specifically, it allows to use the squared distance as a transportation cost function, giving rise to the Wasserstein-2 metric for probability distributions, which has rich geometric properties that result in fast and stable gradients descends. It also allows to use image centered distances, like the Structure Similarity index, with notable differences in the results.
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00535v1
PDF	https://arxiv.org/pdf/1910.00535v1.pdf
PWC	https://paperswithcode.com/paper/training-generative-networks-with-general
Repo	https://github.com/artnoage/Optimal-Transport-GAN
Framework	tf

Deep Q-Learning for Nash Equilibria: Nash-DQN


Title	Deep Q-Learning for Nash Equilibria: Nash-DQN
Authors	Philippe Casgrain, Brian Ning, Sebastian Jaimungal
Abstract	Model-free learning for multi-agent stochastic games is an active area of research. Existing reinforcement learning algorithms, however, are often restricted to zero-sum games, and are applicable only in small state-action spaces or other simplified settings. Here, we develop a new data efficient Deep-Q-learning methodology for model-free learning of Nash equilibria for general-sum stochastic games. The algorithm uses a local linear-quadratic expansion of the stochastic game, which leads to analytically solvable optimal actions. The expansion is parametrized by deep neural networks to give it sufficient flexibility to learn the environment without the need to experience all state-action pairs. We study symmetry properties of the algorithm stemming from label-invariant stochastic games and as a proof of concept, apply our algorithm to learning optimal trading strategies in competitive electronic markets.
Tasks	Q-Learning
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10554v1
PDF	http://arxiv.org/pdf/1904.10554v1.pdf
PWC	https://paperswithcode.com/paper/deep-q-learning-for-nash-equilibria-nash-dqn
Repo	https://github.com/p-casgrain/Nash-DQN
Framework	pytorch