Paper Group AWR 173
RON: Reverse Connection with Objectness Prior Networks for Object Detection. Recent Trends in Deep Learning Based Natural Language Processing. Causal Effect Inference with Deep Latent-Variable Models. Joint Maximum Purity Forest with Application to Image Super-Resolution. Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outpu …
RON: Reverse Connection with Objectness Prior Networks for Object Detection
Title | RON: Reverse Connection with Objectness Prior Networks for Object Detection |
Authors | Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, Yurong Chen |
Abstract | We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384X384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3X faster than the Faster R-CNN counterpart. |
Tasks | Object Detection, Object Localization |
Published | 2017-07-06 |
URL | http://arxiv.org/abs/1707.01691v1 |
http://arxiv.org/pdf/1707.01691v1.pdf | |
PWC | https://paperswithcode.com/paper/ron-reverse-connection-with-objectness-prior |
Repo | https://github.com/taokong/RON |
Framework | tf |
Recent Trends in Deep Learning Based Natural Language Processing
Title | Recent Trends in Deep Learning Based Natural Language Processing |
Authors | Tom Young, Devamanyu Hazarika, Soujanya Poria, Erik Cambria |
Abstract | Deep learning methods employ multiple processing layers to learn hierarchical representations of data and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding of the past, present and future of deep learning in NLP. |
Tasks | |
Published | 2017-08-09 |
URL | http://arxiv.org/abs/1708.02709v8 |
http://arxiv.org/pdf/1708.02709v8.pdf | |
PWC | https://paperswithcode.com/paper/recent-trends-in-deep-learning-based-natural |
Repo | https://github.com/ridakadri14/AspectBasedSentimentAnalysis |
Framework | tf |
Causal Effect Inference with Deep Latent-Variable Models
Title | Causal Effect Inference with Deep Latent-Variable Models |
Authors | Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, Max Welling |
Abstract | Learning individual-level causal effects from observational data, such as inferring the most effective medication for a specific patient, is a problem of growing importance for policy makers. The most important aspect of inferring causal effects from observational data is the handling of confounders, factors that affect both an intervention and its outcome. A carefully designed observational study attempts to measure all important confounders. However, even if one does not have direct access to all confounders, there may exist noisy and uncertain measurement of proxies for confounders. We build on recent advances in latent variable modeling to simultaneously estimate the unknown latent space summarizing the confounders and the causal effect. Our method is based on Variational Autoencoders (VAE) which follow the causal structure of inference with proxies. We show our method is significantly more robust than existing methods, and matches the state-of-the-art on previous benchmarks focused on individual treatment effects. |
Tasks | Latent Variable Models |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08821v2 |
http://arxiv.org/pdf/1705.08821v2.pdf | |
PWC | https://paperswithcode.com/paper/causal-effect-inference-with-deep-latent |
Repo | https://github.com/kim-hyunsu/CEVAE-pyro |
Framework | pytorch |
Joint Maximum Purity Forest with Application to Image Super-Resolution
Title | Joint Maximum Purity Forest with Application to Image Super-Resolution |
Authors | Hailiang Li, Kin-Man Lam, Dong Li |
Abstract | In this paper, we propose a novel random-forest scheme, namely Joint Maximum Purity Forest (JMPF), for classification, clustering, and regression tasks. In the JMPF scheme, the original feature space is transformed into a compactly pre-clustered feature space, via a trained rotation matrix. The rotation matrix is obtained through an iterative quantization process, where the input data belonging to different classes are clustered to the respective vertices of the new feature space with maximum purity. In the new feature space, orthogonal hyperplanes, which are employed at the split-nodes of decision trees in random forests, can tackle the clustering problems effectively. We evaluated our proposed method on public benchmark datasets for regression and classification tasks, and experiments showed that JMPF remarkably outperforms other state-of-the-art random-forest-based approaches. Furthermore, we applied JMPF to image super-resolution, because the transformed, compact features are more discriminative to the clustering-regression scheme. Experiment results on several public benchmark datasets also showed that the JMPF-based image super-resolution scheme is consistently superior to recent state-of-the-art image super-resolution algorithms. |
Tasks | Image Super-Resolution, Quantization, Super-Resolution |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09200v1 |
http://arxiv.org/pdf/1708.09200v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-maximum-purity-forest-with-application |
Repo | https://github.com/HarleyHK/JMPF |
Framework | none |
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
Title | Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs |
Authors | Michael Gygli, Mohammad Norouzi, Anelia Angelova |
Abstract | We approach structured output prediction by optimizing a deep value network (DVN) to precisely estimate the task loss on different output configurations for a given input. Once the model is trained, we perform inference by gradient descent on the continuous relaxations of the output variables to find outputs with promising scores from the value network. When applied to image segmentation, the value network takes an image and a segmentation mask as inputs and predicts a scalar estimating the intersection over union between the input and ground truth masks. For multi-label classification, the DVN’s objective is to correctly predict the F1 score for any potential label configuration. The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks. |
Tasks | Multi-Label Classification, Semantic Segmentation |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04363v2 |
http://arxiv.org/pdf/1703.04363v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-value-networks-learn-to-evaluate-and |
Repo | https://github.com/gyglim/dvn |
Framework | pytorch |
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Title | V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map |
Authors | Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee |
Abstract | Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly non-linear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge. The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE. |
Tasks | 3D Human Pose Estimation, Hand Pose Estimation, Pose Estimation |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07399v3 |
http://arxiv.org/pdf/1711.07399v3.pdf | |
PWC | https://paperswithcode.com/paper/v2v-posenet-voxel-to-voxel-prediction-network |
Repo | https://github.com/rajbharat/PoseNet-V2V-Pytorch1.0-Win10 |
Framework | pytorch |
Learning Deep Features via Congenerous Cosine Loss for Person Recognition
Title | Learning Deep Features via Congenerous Cosine Loss for Person Recognition |
Authors | Yu Liu, Hongyang Li, Xiaogang Wang |
Abstract | Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance. In this paper, we propose a novel method to address this task by training a network to obtain robust and representative features. The intuition is that we directly compare and optimize the cosine distance between two features - enlarging inter-class distinction as well as alleviating inner-class variance. We propose a congenerous cosine loss by minimizing the cosine distance between samples and their cluster centroid in a cooperative way. Such a design reduces the complexity and could be implemented via softmax with normalized inputs. Our method also differs from previous work in person recognition that we do not conduct a second training on the test subset. The identity of a person is determined by measuring the similarity from several body regions in the reference set. Experimental results show that the proposed approach achieves better classification accuracy against previous state-of-the-arts. |
Tasks | Person Recognition |
Published | 2017-02-22 |
URL | http://arxiv.org/abs/1702.06890v2 |
http://arxiv.org/pdf/1702.06890v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-features-via-congenerous-cosine |
Repo | https://github.com/sciencefans/coco_loss |
Framework | none |
Watch Your Step: Learning Node Embeddings via Graph Attention
Title | Watch Your Step: Learning Node Embeddings via Graph Attention |
Authors | Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi |
Abstract | Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyper-parameters with trainable parameters that we automatically learn via backpropagation. In particular, we learn a novel attention model on the power series of the transition matrix, which guides the random walk to optimize an upstream objective. Unlike previous approaches to attention models, the method that we propose utilizes attention parameters exclusively on the data (e.g. on the random walk), and not used by the model for inference. We experiment on link prediction tasks, as we aim to produce embeddings that best-preserve the graph structure, generalizing to unseen information. We improve state-of-the-art on a comprehensive suite of real world datasets including social, collaboration, and biological networks. Adding attention to random walks can reduce the error by 20% to 45% on datasets we attempted. Further, our learned attention parameters are different for every graph, and our automatically-found values agree with the optimal choice of hyper-parameter if we manually tune existing methods. |
Tasks | Graph Embedding, Link Prediction, Node Classification |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09599v2 |
http://arxiv.org/pdf/1710.09599v2.pdf | |
PWC | https://paperswithcode.com/paper/watch-your-step-learning-node-embeddings-via |
Repo | https://github.com/karthik63/attention |
Framework | tf |
Learning Less is More - 6D Camera Localization via 3D Surface Regression
Title | Learning Less is More - 6D Camera Localization via 3D Surface Regression |
Authors | Eric Brachmann, Carsten Rother |
Abstract | Popular research areas like autonomous driving and augmented reality have renewed the interest in image-based camera localization. In this work, we address the task of predicting the 6D camera pose from a single RGB image in a given 3D environment. With the advent of neural networks, previous works have either learned the entire camera localization process, or multiple components of a camera localization pipeline. Our key contribution is to demonstrate and explain that learning a single component of this pipeline is sufficient. This component is a fully convolutional neural network for densely regressing so-called scene coordinates, defining the correspondence between the input image and the 3D scene space. The neural network is prepended to a new end-to-end trainable pipeline. Our system is efficient, highly accurate, robust in training, and exhibits outstanding generalization capabilities. It exceeds state-of-the-art consistently on indoor and outdoor datasets. Interestingly, our approach surpasses existing techniques even without utilizing a 3D model of the scene during training, since the network is able to discover 3D scene geometry automatically, solely from single-view constraints. |
Tasks | Autonomous Driving, Camera Localization |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10228v2 |
http://arxiv.org/pdf/1711.10228v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-less-is-more-6d-camera-localization |
Repo | https://github.com/HanjiangHu/DIFL-FCL |
Framework | pytorch |
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence
Title | MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence |
Authors | Lianmin Zheng, Jiacheng Yang, Han Cai, Weinan Zhang, Jun Wang, Yong Yu |
Abstract | We introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents’ optimal polices, but more importantly, the observation and understanding of individual agent’s behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2017-12-02 |
URL | http://arxiv.org/abs/1712.00600v1 |
http://arxiv.org/pdf/1712.00600v1.pdf | |
PWC | https://paperswithcode.com/paper/magent-a-many-agent-reinforcement-learning |
Repo | https://github.com/hjjimmykim/SchwabRoyale |
Framework | pytorch |
Prototypical Networks for Few-shot Learning
Title | Prototypical Networks for Few-shot Learning |
Authors | Jake Snell, Kevin Swersky, Richard S. Zemel |
Abstract | We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to zero-shot learning and achieve state-of-the-art results on the CU-Birds dataset. |
Tasks | Few-Shot Image Classification, Few-Shot Learning, Meta-Learning, One-Shot Learning, Zero-Shot Learning |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.05175v2 |
http://arxiv.org/pdf/1703.05175v2.pdf | |
PWC | https://paperswithcode.com/paper/prototypical-networks-for-few-shot-learning |
Repo | https://github.com/ash3n/Prototypical-Network-TF |
Framework | tf |
A System for Accessible Artificial Intelligence
Title | A System for Accessible Artificial Intelligence |
Authors | Randal S. Olson, Moshe Sipper, William La Cava, Sharon Tartarone, Steven Vitale, Weixuan Fu, Patryk Orzechowski, Ryan J. Urbanowicz, John H. Holmes, Jason H. Moore |
Abstract | While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects. |
Tasks | |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00594v2 |
http://arxiv.org/pdf/1705.00594v2.pdf | |
PWC | https://paperswithcode.com/paper/a-system-for-accessible-artificial |
Repo | https://github.com/EpistasisLab/pennai |
Framework | none |
Structured Generative Adversarial Networks
Title | Structured Generative Adversarial Networks |
Authors | Zhijie Deng, Hao Zhang, Xiaodan Liang, Luona Yang, Shizhen Xu, Jun Zhu, Eric P. Xing |
Abstract | We study the problem of conditional generative modeling based on designated semantics or structures. Existing models that build conditional generators either require massive labeled instances as supervision or are unable to accurately control the semantics of generated samples. We propose structured generative adversarial networks (SGANs) for semi-supervised conditional generative modeling. SGAN assumes the data x is generated conditioned on two independent latent variables: y that encodes the designated semantics, and z that contains other factors of variation. To ensure disentangled semantics in y and z, SGAN builds two collaborative games in the hidden space to minimize the reconstruction error of y and z, respectively. Training SGAN also involves solving two adversarial games that have their equilibrium concentrating at the true joint data distributions p(x, z) and p(x, y), avoiding distributing the probability mass diffusely over data space that MLE-based methods may suffer. We assess SGAN by evaluating its trained networks, and its performance on downstream tasks. We show that SGAN delivers a highly controllable generator, and disentangled representations; it also establishes start-of-the-art results across multiple datasets when applied for semi-supervised image classification (1.27%, 5.73%, 17.26% error rates on MNIST, SVHN and CIFAR-10 using 50, 1000 and 4000 labels, respectively). Benefiting from the separate modeling of y and z, SGAN can generate images with high visual quality and strictly following the designated semantic, and can be extended to a wide spectrum of applications, such as style transfer. |
Tasks | Image Classification, Semi-Supervised Image Classification, Style Transfer |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00889v1 |
http://arxiv.org/pdf/1711.00889v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-generative-adversarial-networks |
Repo | https://github.com/thudzj/StructuredGAN |
Framework | none |
Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation
Title | Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation |
Authors | Satoshi Tsutsui, David Crandall |
Abstract | Recent work in computer vision has yielded impressive results in automatically describing images with natural language. Most of these systems generate captions in a sin- gle language, requiring multiple language-specific models to build a multilingual captioning system. We propose a very simple technique to build a single unified model across languages, using artificial tokens to control the language, making the captioning system more compact. We evaluate our approach on generating English and Japanese captions, and show that a typical neural captioning architecture is capable of learning a single model that can switch between two different languages. |
Tasks | |
Published | 2017-06-20 |
URL | http://arxiv.org/abs/1706.06275v1 |
http://arxiv.org/pdf/1706.06275v1.pdf | |
PWC | https://paperswithcode.com/paper/using-artificial-tokens-to-control-languages |
Repo | https://github.com/chetangr/Speech-Image-Captioning |
Framework | none |
Exploring the similarity of medical imaging classification problems
Title | Exploring the similarity of medical imaging classification problems |
Authors | Veronika Cheplygina, Pim Moeskops, Mitko Veta, Behdad Dasht Bozorg, Josien Pluim |
Abstract | Supervised learning is ubiquitous in medical image analysis. In this paper we consider the problem of meta-learning – predicting which methods will perform well in an unseen classification problem, given previous experience with other classification problems. We investigate the first step of such an approach: how to quantify the similarity of different classification problems. We characterize datasets sampled from six classification problems by performance ranks of simple classifiers, and define the similarity by the inverse of Euclidean distance in this meta-feature space. We visualize the similarities in a 2D space, where meaningful clusters start to emerge, and show that the proposed representation can be used to classify datasets according to their origin with 89.3% accuracy. These findings, together with the observations of recent trends in machine learning, suggest that meta-learning could be a valuable tool for the medical imaging community. |
Tasks | Meta-Learning |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03509v1 |
http://arxiv.org/pdf/1706.03509v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-similarity-of-medical-imaging |
Repo | https://github.com/tueimage/similarity-medical-2017 |
Framework | none |