July 29, 2019

3093 words 15 mins read

Paper Group AWR 173

RON: Reverse Connection with Objectness Prior Networks for Object Detection. Recent Trends in Deep Learning Based Natural Language Processing. Causal Effect Inference with Deep Latent-Variable Models. Joint Maximum Purity Forest with Application to Image Super-Resolution. Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outpu …

RON: Reverse Connection with Objectness Prior Networks for Object Detection


Title	RON: Reverse Connection with Objectness Prior Networks for Object Detection
Authors	Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen
Abstract	We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart.
Tasks	Object Detection, Object Localization
Published	2017-07-06
URL	http://arxiv.org/abs/1707.01691v1
PDF	http://arxiv.org/pdf/1707.01691v1.pdf
PWC	https://paperswithcode.com/paper/ron-reverse-connection-with-objectness-prior
Repo	https://github.com/taokong/RON
Framework	tf

Recent Trends in Deep Learning Based Natural Language Processing


Title	Recent Trends in Deep Learning Based Natural Language Processing
Authors	Tom Young, Devamanyu Hazarika, Soujanya Poria, Erik Cambria
Abstract	Deep learning methods employ multiple processing layers to learn hierarchical representations of data and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding of the past, present and future of deep learning in NLP.
Tasks
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02709v8
PDF	http://arxiv.org/pdf/1708.02709v8.pdf
PWC	https://paperswithcode.com/paper/recent-trends-in-deep-learning-based-natural
Repo	https://github.com/ridakadri14/AspectBasedSentimentAnalysis
Framework	tf

Causal Effect Inference with Deep Latent-Variable Models


Title	Causal Effect Inference with Deep Latent-Variable Models
Authors	Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, Max Welling
Abstract	Learning individual-level causal effects from observational data, such as inferring the most effective medication for a specific patient, is a problem of growing importance for policy makers. The most important aspect of inferring causal effects from observational data is the handling of confounders, factors that affect both an intervention and its outcome. A carefully designed observational study attempts to measure all important confounders. However, even if one does not have direct access to all confounders, there may exist noisy and uncertain measurement of proxies for confounders. We build on recent advances in latent variable modeling to simultaneously estimate the unknown latent space summarizing the confounders and the causal effect. Our method is based on Variational Autoencoders (VAE) which follow the causal structure of inference with proxies. We show our method is significantly more robust than existing methods, and matches the state-of-the-art on previous benchmarks focused on individual treatment effects.
Tasks	Latent Variable Models
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08821v2
PDF	http://arxiv.org/pdf/1705.08821v2.pdf
PWC	https://paperswithcode.com/paper/causal-effect-inference-with-deep-latent
Repo	https://github.com/kim-hyunsu/CEVAE-pyro
Framework	pytorch

Joint Maximum Purity Forest with Application to Image Super-Resolution


Title	Joint Maximum Purity Forest with Application to Image Super-Resolution
Authors	Hailiang Li, Kin-Man Lam, Dong Li
Abstract	In this paper, we propose a novel random-forest scheme, namely Joint Maximum Purity Forest (JMPF), for classification, clustering, and regression tasks. In the JMPF scheme, the original feature space is transformed into a compactly pre-clustered feature space, via a trained rotation matrix. The rotation matrix is obtained through an iterative quantization process, where the input data belonging to different classes are clustered to the respective vertices of the new feature space with maximum purity. In the new feature space, orthogonal hyperplanes, which are employed at the split-nodes of decision trees in random forests, can tackle the clustering problems effectively. We evaluated our proposed method on public benchmark datasets for regression and classification tasks, and experiments showed that JMPF remarkably outperforms other state-of-the-art random-forest-based approaches. Furthermore, we applied JMPF to image super-resolution, because the transformed, compact features are more discriminative to the clustering-regression scheme. Experiment results on several public benchmark datasets also showed that the JMPF-based image super-resolution scheme is consistently superior to recent state-of-the-art image super-resolution algorithms.
Tasks	Image Super-Resolution, Quantization, Super-Resolution
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09200v1
PDF	http://arxiv.org/pdf/1708.09200v1.pdf
PWC	https://paperswithcode.com/paper/joint-maximum-purity-forest-with-application
Repo	https://github.com/HarleyHK/JMPF
Framework	none

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs


Title	Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
Authors	Michael Gygli, Mohammad Norouzi, Anelia Angelova
Abstract	We approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input. Once the model is trained, we perform inference by gradient descent on the continuous relaxations of the output variables to find outputs with promising scores from the value network. When applied to image segmentation, the value network takes an image and a segmentation mask as inputs and predicts a scalar estimating the intersection over union between the input and ground truth masks. For multi-label classification, the DVN’s objective is to correctly predict the F1 score for any potential label configuration. The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks.
Tasks	Multi-Label Classification, Semantic Segmentation
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04363v2
PDF	http://arxiv.org/pdf/1703.04363v2.pdf
PWC	https://paperswithcode.com/paper/deep-value-networks-learn-to-evaluate-and
Repo	https://github.com/gyglim/dvn
Framework	pytorch

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map


Title	V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Authors	Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
Abstract	Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly non-linear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge. The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.
Tasks	3D Human Pose Estimation, Hand Pose Estimation, Pose Estimation
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07399v3
PDF	http://arxiv.org/pdf/1711.07399v3.pdf
PWC	https://paperswithcode.com/paper/v2v-posenet-voxel-to-voxel-prediction-network
Repo	https://github.com/rajbharat/PoseNet-V2V-Pytorch1.0-Win10
Framework	pytorch

Learning Deep Features via Congenerous Cosine Loss for Person Recognition


Title	Learning Deep Features via Congenerous Cosine Loss for Person Recognition
Authors	Yu Liu, Hongyang Li, Xiaogang Wang
Abstract	Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance. In this paper, we propose a novel method to address this task by training a network to obtain robust and representative features. The intuition is that we directly compare and optimize the cosine distance between two features - enlarging inter-class distinction as well as alleviating inner-class variance. We propose a congenerous cosine loss by minimizing the cosine distance between samples and their cluster centroid in a cooperative way. Such a design reduces the complexity and could be implemented via softmax with normalized inputs. Our method also differs from previous work in person recognition that we do not conduct a second training on the test subset. The identity of a person is determined by measuring the similarity from several body regions in the reference set. Experimental results show that the proposed approach achieves better classification accuracy against previous state-of-the-arts.
Tasks	Person Recognition
Published	2017-02-22
URL	http://arxiv.org/abs/1702.06890v2
PDF	http://arxiv.org/pdf/1702.06890v2.pdf
PWC	https://paperswithcode.com/paper/learning-deep-features-via-congenerous-cosine
Repo	https://github.com/sciencefans/coco_loss
Framework	none

Watch Your Step: Learning Node Embeddings via Graph Attention


Title	Watch Your Step: Learning Node Embeddings via Graph Attention
Authors	Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi
Abstract	Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyper-parameters with trainable parameters that we automatically learn via backpropagation. In particular, we learn a novel attention model on the power series of the transition matrix, which guides the random walk to optimize an upstream objective. Unlike previous approaches to attention models, the method that we propose utilizes attention parameters exclusively on the data (e.g. on the random walk), and not used by the model for inference. We experiment on link prediction tasks, as we aim to produce embeddings that best-preserve the graph structure, generalizing to unseen information. We improve state-of-the-art on a comprehensive suite of real world datasets including social, collaboration, and biological networks. Adding attention to random walks can reduce the error by 20% to 45% on datasets we attempted. Further, our learned attention parameters are different for every graph, and our automatically-found values agree with the optimal choice of hyper-parameter if we manually tune existing methods.
Tasks	Graph Embedding, Link Prediction, Node Classification
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09599v2
PDF	http://arxiv.org/pdf/1710.09599v2.pdf
PWC	https://paperswithcode.com/paper/watch-your-step-learning-node-embeddings-via
Repo	https://github.com/karthik63/attention
Framework	tf

Learning Less is More - 6D Camera Localization via 3D Surface Regression


Title	Learning Less is More - 6D Camera Localization via 3D Surface Regression
Authors	Eric Brachmann, Carsten Rother
Abstract	Popular research areas like autonomous driving and augmented reality have renewed the interest in image-based camera localization. In this work, we address the task of predicting the 6D camera pose from a single RGB image in a given 3D environment. With the advent of neural networks, previous works have either learned the entire camera localization process, or multiple components of a camera localization pipeline. Our key contribution is to demonstrate and explain that learning a single component of this pipeline is sufficient. This component is a fully convolutional neural network for densely regressing so-called scene coordinates, defining the correspondence between the input image and the 3D scene space. The neural network is prepended to a new end-to-end trainable pipeline. Our system is efficient, highly accurate, robust in training, and exhibits outstanding generalization capabilities. It exceeds state-of-the-art consistently on indoor and outdoor datasets. Interestingly, our approach surpasses existing techniques even without utilizing a 3D model of the scene during training, since the network is able to discover 3D scene geometry automatically, solely from single-view constraints.
Tasks	Autonomous Driving, Camera Localization
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10228v2
PDF	http://arxiv.org/pdf/1711.10228v2.pdf
PWC	https://paperswithcode.com/paper/learning-less-is-more-6d-camera-localization
Repo	https://github.com/HanjiangHu/DIFL-FCL
Framework	pytorch

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence


Title	MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence
Authors	Lianmin Zheng, Jiacheng Yang, Han Cai, Weinan Zhang, Jun Wang, Yong Yu
Abstract	We introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents’ optimal polices, but more importantly, the observation and understanding of individual agent’s behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch.
Tasks	Multi-agent Reinforcement Learning
Published	2017-12-02
URL	http://arxiv.org/abs/1712.00600v1
PDF	http://arxiv.org/pdf/1712.00600v1.pdf
PWC	https://paperswithcode.com/paper/magent-a-many-agent-reinforcement-learning
Repo	https://github.com/hjjimmykim/SchwabRoyale
Framework	pytorch

Prototypical Networks for Few-shot Learning


Title	Prototypical Networks for Few-shot Learning
Authors	Jake Snell, Kevin Swersky, Richard S. Zemel
Abstract	We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to zero-shot learning and achieve state-of-the-art results on the CU-Birds dataset.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Meta-Learning, One-Shot Learning, Zero-Shot Learning
Published	2017-03-15
URL	http://arxiv.org/abs/1703.05175v2
PDF	http://arxiv.org/pdf/1703.05175v2.pdf
PWC	https://paperswithcode.com/paper/prototypical-networks-for-few-shot-learning
Repo	https://github.com/ash3n/Prototypical-Network-TF
Framework	tf

A System for Accessible Artificial Intelligence


Title	A System for Accessible Artificial Intelligence
Authors	Randal S. Olson, Moshe Sipper, William La Cava, Sharon Tartarone, Steven Vitale, Weixuan Fu, Patryk Orzechowski, Ryan J. Urbanowicz, John H. Holmes, Jason H. Moore
Abstract	While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects.
Tasks
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00594v2
PDF	http://arxiv.org/pdf/1705.00594v2.pdf
PWC	https://paperswithcode.com/paper/a-system-for-accessible-artificial
Repo	https://github.com/EpistasisLab/pennai
Framework	none

Structured Generative Adversarial Networks


Title	Structured Generative Adversarial Networks
Authors	Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, Eric P. Xing
Abstract	We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately control the semantics of generated samples. We propose structured generative adversarial networks (SGANs) for semi-supervised conditional generative modeling. SGAN assumes the data x is generated conditioned on two independent latent variables: y that encodes the designated semantics, and z that contains other factors of variation. To ensure disentangled semantics in y and z, SGAN builds two collaborative games in the hidden space to minimize the reconstruction error of y and z, respectively. Training SGAN also involves solving two adversarial games that have their equilibrium concentrating at the true joint data distributions p(x, z) and p(x, y), avoiding distributing the probability mass diffusely over data space that MLE-based methods may suffer. We assess SGAN by evaluating its trained networks, and its performance on downstream tasks. We show that SGAN delivers a highly controllable generator, and disentangled representations; it also establishes start-of-the-art results across multiple datasets when applied for semi-supervised image classification (1.27%, 5.73%, 17.26% error rates on MNIST, SVHN and CIFAR-10 using 50, 1000 and 4000 labels, respectively). Benefiting from the separate modeling of y and z, SGAN can generate images with high visual quality and strictly following the designated semantic, and can be extended to a wide spectrum of applications, such as style transfer.
Tasks	Image Classification, Semi-Supervised Image Classification, Style Transfer
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00889v1
PDF	http://arxiv.org/pdf/1711.00889v1.pdf
PWC	https://paperswithcode.com/paper/structured-generative-adversarial-networks
Repo	https://github.com/thudzj/StructuredGAN
Framework	none

Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation


Title	Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation
Authors	Satoshi Tsutsui, David Crandall
Abstract	Recent work in computer vision has yielded impressive results in automatically describing images with natural language. Most of these systems generate captions in a sin- gle language, requiring multiple language-specific models to build a multilingual captioning system. We propose a very simple technique to build a single unified model across languages, using artificial tokens to control the language, making the captioning system more compact. We evaluate our approach on generating English and Japanese captions, and show that a typical neural captioning architecture is capable of learning a single model that can switch between two different languages.
Tasks
Published	2017-06-20
URL	http://arxiv.org/abs/1706.06275v1
PDF	http://arxiv.org/pdf/1706.06275v1.pdf
PWC	https://paperswithcode.com/paper/using-artificial-tokens-to-control-languages
Repo	https://github.com/chetangr/Speech-Image-Captioning
Framework	none

Exploring the similarity of medical imaging classification problems


Title	Exploring the similarity of medical imaging classification problems
Authors	Veronika Cheplygina, Pim Moeskops, Mitko Veta, Behdad Dasht Bozorg, Josien Pluim
Abstract	Supervised learning is ubiquitous in medical image analysis. In this paper we consider the problem of meta-learning – predicting which methods will perform well in an unseen classification problem, given previous experience with other classification problems. We investigate the first step of such an approach: how to quantify the similarity of different classification problems. We characterize datasets sampled from six classification problems by performance ranks of simple classifiers, and define the similarity by the inverse of Euclidean distance in this meta-feature space. We visualize the similarities in a 2D space, where meaningful clusters start to emerge, and show that the proposed representation can be used to classify datasets according to their origin with 89.3% accuracy. These findings, together with the observations of recent trends in machine learning, suggest that meta-learning could be a valuable tool for the medical imaging community.
Tasks	Meta-Learning
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03509v1
PDF	http://arxiv.org/pdf/1706.03509v1.pdf
PWC	https://paperswithcode.com/paper/exploring-the-similarity-of-medical-imaging
Repo	https://github.com/tueimage/similarity-medical-2017
Framework	none