July 30, 2019

2923 words 14 mins read

Paper Group AWR 54

Persistence Diagrams with Linear Machine Learning Models. Memory-Efficient Implementation of DenseNets. Modular Multi-Objective Deep Reinforcement Learning with Decision Values. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph. Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range cap …

Persistence Diagrams with Linear Machine Learning Models


Title	Persistence Diagrams with Linear Machine Learning Models
Authors	Ippei Obayashi, Yasuaki Hiraoka
Abstract	Persistence diagrams have been widely recognized as a compact descriptor for characterizing multiscale topological features in data. When many datasets are available, statistical features embedded in those persistence diagrams can be extracted by applying machine learnings. In particular, the ability for explicitly analyzing the inverse in the original data space from those statistical features of persistence diagrams is significantly important for practical applications. In this paper, we propose a unified method for the inverse analysis by combining linear machine learning models with persistence images. The method is applied to point clouds and cubical sets, showing the ability of the statistical inverse analysis and its advantages.
Tasks
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10082v2
PDF	http://arxiv.org/pdf/1706.10082v2.pdf
PWC	https://paperswithcode.com/paper/persistence-diagrams-with-linear-machine
Repo	https://github.com/scikit-tda/persim
Framework	none

Memory-Efficient Implementation of DenseNets


Title	Memory-Efficient Implementation of DenseNets
Authors	Geoff Pleiss, Danlu Chen, Gao Huang, Tongcheng Li, Laurens van der Maaten, Kilian Q. Weinberger
Abstract	The DenseNet architecture is highly computationally efficient as a result of feature reuse. However, a naive DenseNet implementation can require a significant amount of GPU memory: If not properly managed, pre-activation batch normalization and contiguous convolution operations can produce feature maps that grow quadratically with network depth. In this technical report, we introduce strategies to reduce the memory consumption of DenseNets during training. By strategically using shared memory allocations, we reduce the memory cost for storing feature maps from quadratic to linear. Without the GPU memory bottleneck, it is now possible to train extremely deep DenseNets. Networks with 14M parameters can be trained on a single GPU, up from 4M. A 264-layer DenseNet (73M parameters), which previously would have been infeasible to train, can now be trained on a single workstation with 8 NVIDIA Tesla M40 GPUs. On the ImageNet ILSVRC classification dataset, this large DenseNet obtains a state-of-the-art single-crop top-1 error of 20.26%.
Tasks
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06990v1
PDF	http://arxiv.org/pdf/1707.06990v1.pdf
PWC	https://paperswithcode.com/paper/memory-efficient-implementation-of-densenets
Repo	https://github.com/facebookresearch/ResNeXt
Framework	torch

Modular Multi-Objective Deep Reinforcement Learning with Decision Values


Title	Modular Multi-Objective Deep Reinforcement Learning with Decision Values
Authors	Tomasz Tajmajer
Abstract	In this work we present a method for using Deep Q-Networks (DQNs) in multi-objective environments. Deep Q-Networks provide remarkable performance in single objective problems learning from high-level visual state representations. However, in many scenarios (e.g in robotics, games), the agent needs to pursue multiple objectives simultaneously. We propose an architecture in which separate DQNs are used to control the agent’s behaviour with respect to particular objectives. In this architecture we introduce decision values to improve the scalarization of multiple DQNs into a single action. Our architecture enables the decomposition of the agent’s behaviour into controllable and replaceable sub-behaviours learned by distinct modules. Moreover, it allows to change the priorities of particular objectives post-learning, while preserving the overall performance of the agent. To evaluate our solution we used a game-like simulator in which an agent - provided with high-level visual input - pursues multiple objectives in a 2D world.
Tasks
Published	2017-04-21
URL	http://arxiv.org/abs/1704.06676v2
PDF	http://arxiv.org/pdf/1704.06676v2.pdf
PWC	https://paperswithcode.com/paper/modular-multi-objective-deep-reinforcement
Repo	https://github.com/ttajmajer/morl-dv
Framework	tf

Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph


Title	Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
Authors	Cong Fu, Chao Xiang, Changxu Wang, Deng Cai
Abstract	Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial search scenario of Taobao (Alibaba Group) and has been integrated into their search engine at billion-node scale.
Tasks
Published	2017-07-01
URL	http://arxiv.org/abs/1707.00143v9
PDF	http://arxiv.org/pdf/1707.00143v9.pdf
PWC	https://paperswithcode.com/paper/fast-approximate-nearest-neighbor-search-with
Repo	https://github.com/ZJULearning/nsg
Framework	none

Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range captures


Title	Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range captures
Authors	Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
Abstract	Ground-based whole sky imagers are popular for monitoring cloud formations, which is necessary for various applications. We present two new Wide Angle High-Resolution Sky Imaging System (WAHRSIS) models, which were designed especially to withstand the hot and humid climate of Singapore. The first uses a fully sealed casing, whose interior temperature is regulated using a Peltier cooler. The second features a double roof design with ventilation grids on the sides, allowing the outside air to flow through the device. Measurements of temperature inside these two devices show their ability to operate in Singapore weather conditions. Unlike our original WAHRSIS model, neither uses a mechanical sun blocker to prevent the direct sunlight from reaching the camera; instead they rely on high-dynamic-range imaging (HDRI) techniques to reduce the glare from the sun.
Tasks
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05678v1
PDF	http://arxiv.org/pdf/1704.05678v1.pdf
PWC	https://paperswithcode.com/paper/design-of-low-cost-compact-and-weather-proof
Repo	https://github.com/Soumyabrata/HDRCaptures
Framework	none

Dataset for a Neural Natural Language Interface for Databases (NNLIDB)


Title	Dataset for a Neural Natural Language Interface for Databases (NNLIDB)
Authors	Florin Brad, Radu Iacob, Ionel Hosu, Traian Rebedea
Abstract	Progress in natural language interfaces to databases (NLIDB) has been slow mainly due to linguistic issues (such as language ambiguity) and domain portability. Moreover, the lack of a large corpus to be used as a standard benchmark has made data-driven approaches difficult to develop and compare. In this paper, we revisit the problem of NLIDBs and recast it as a sequence translation problem. To this end, we introduce a large dataset extracted from the Stack Exchange Data Explorer website, which can be used for training neural natural language interfaces for databases. We also report encouraging baseline results on a smaller manually annotated test corpus, obtained using an attention-based sequence-to-sequence neural network.
Tasks
Published	2017-07-11
URL	http://arxiv.org/abs/1707.03172v1
PDF	http://arxiv.org/pdf/1707.03172v1.pdf
PWC	https://paperswithcode.com/paper/dataset-for-a-neural-natural-language
Repo	https://github.com/fbrad/text2sql
Framework	none

Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation


Title	Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation
Authors	Kota Hara, Raviteja Vemulapalli, Rama Chellappa
Abstract	Deep Convolutional Neural Networks (DCNN) have been proven to be effective for various computer vision problems. In this work, we demonstrate its effectiveness on a continuous object orientation estimation task, which requires prediction of 0 to 360 degrees orientation of the objects. We do so by proposing and comparing three continuous orientation prediction approaches designed for the DCNNs. The first two approaches work by representing an orientation as a point on a unit circle and minimizing either L2 loss or angular difference loss. The third method works by first converting the continuous orientation estimation task into a set of discrete orientation estimation tasks and then converting the discrete orientation outputs back to the continuous orientation using a mean-shift algorithm. By evaluating on a vehicle orientation estimation task and a pedestrian orientation estimation task, we demonstrate that the discretization-based approach not only works better than the other two approaches but also achieves state-of-the-art performance. We also demonstrate that finding an appropriate feature representation is critical to achieve a good performance when adapting a DCNN trained for an image recognition task.
Tasks
Published	2017-02-06
URL	http://arxiv.org/abs/1702.01499v1
PDF	http://arxiv.org/pdf/1702.01499v1.pdf
PWC	https://paperswithcode.com/paper/designing-deep-convolutional-neural-networks
Repo	https://github.com/kevinzakka/angle-pred
Framework	pytorch

Learning to Generate Chairs with Generative Adversarial Nets


Title	Learning to Generate Chairs with Generative Adversarial Nets
Authors	Evgeny Zamyatin, Andrey Filchenkov
Abstract	Generative adversarial networks (GANs) has gained tremendous popularity lately due to an ability to reinforce quality of its predictive model with generated objects and the quality of the generative model with and supervised feedback. GANs allow to synthesize images with a high degree of realism. However, the learning process of such models is a very complicated optimization problem and certain limitation for such models were found. It affects the choice of certain layers and nonlinearities when designing architectures. In particular, it does not allow to train convolutional GAN models with fully-connected hidden layers. In our work, we propose a modification of the previously described set of rules, as well as new approaches to designing architectures that will allow us to train more powerful GAN models. We show the effectiveness of our methods on the problem of synthesizing projections of 3D objects with the possibility of interpolation by class and view point.
Tasks
Published	2017-05-29
URL	http://arxiv.org/abs/1705.10413v1
PDF	http://arxiv.org/pdf/1705.10413v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-generate-chairs-with-generative
Repo	https://github.com/EvgenyZamyatin/chair-gan-code
Framework	none

Meta Learning Shared Hierarchies


Title	Meta Learning Shared Hierarchies
Authors	Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman
Abstract	We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives—policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.
Tasks	Legged Robots, Meta-Learning
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09767v1
PDF	http://arxiv.org/pdf/1710.09767v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-shared-hierarchies
Repo	https://github.com/dsapandora/s_cera
Framework	tf

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume


Title	PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
Authors	Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz
Abstract	We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the cur- rent optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436) images. Our models are available on https://github.com/NVlabs/PWC-Net.
Tasks	Dense Pixel Correspondence Estimation, Optical Flow Estimation
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02371v3
PDF	http://arxiv.org/pdf/1709.02371v3.pdf
PWC	https://paperswithcode.com/paper/pwc-net-cnns-for-optical-flow-using-pyramid
Repo	https://github.com/yanqi1811/PWC-Net
Framework	pytorch

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation


Title	VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Authors	Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong
Abstract	Rich and dense human labeled datasets are among the main enabling factors for the recent advance on vision-language understanding. Many seemingly distant annotations (e.g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes — and even the same set of images (e.g., of COCO). The popularity of COCO correlates those annotations and tasks. Explicitly linking them up may significantly benefit both individual tasks and the unified vision and language modeling. We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links visual questions and segmentation answers (VQS). They transfer human supervision between the previously separate tasks, offer more effective leverage to existing problems, and also open the door for new research problems and models. We study two applications of the VQS data in this paper: supervised attention for VQA and a novel question-focused semantic segmentation task. For the former, we obtain state-of-the-art results on the VQA real multiple-choice task by simply augmenting the multilayer perceptrons with some attention features that are learned using the segmentation-QA links as explicit supervision. To put the latter in perspective, we study two plausible methods and compare them to an oracle method assuming that the instance segmentations are given at the test stage.
Tasks	Language Modelling, Question Answering, Semantic Segmentation, Visual Question Answering
Published	2017-08-15
URL	http://arxiv.org/abs/1708.04686v1
PDF	http://arxiv.org/pdf/1708.04686v1.pdf
PWC	https://paperswithcode.com/paper/vqs-linking-segmentations-to-questions-and
Repo	https://github.com/Cold-Winter/vqs
Framework	caffe2

Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments


Title	Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments
Authors	Oier Mees, Andreas Eitel, Wolfram Burgard
Abstract	Object detection is an essential task for autonomous robots operating in dynamic and changing environments. A robot should be able to detect objects in the presence of sensor noise that can be induced by changing lighting conditions for cameras and false depth readings for range sensors, especially RGB-D cameras. To tackle these challenges, we propose a novel adaptive fusion approach for object detection that learns weighting the predictions of different sensor modalities in an online manner. Our approach is based on a mixture of convolutional neural network (CNN) experts and incorporates multiple modalities including appearance, depth and motion. We test our method in extensive robot experiments, in which we detect people in a combined indoor and outdoor scenario from RGB-D data, and we demonstrate that our method can adapt to harsh lighting changes and severe camera motion blur. Furthermore, we present a new RGB-D dataset for people detection in mixed in- and outdoor environments, recorded with a mobile robot. Code, pretrained models and dataset are available at http://adaptivefusion.cs.uni-freiburg.de
Tasks	Object Detection
Published	2017-07-18
URL	https://arxiv.org/abs/1707.05733v2
PDF	https://arxiv.org/pdf/1707.05733v2.pdf
PWC	https://paperswithcode.com/paper/choosing-smartly-adaptive-multimodal-fusion
Repo	https://github.com/mees/deep_adaptive_fusion
Framework	none

Inhomogeneous Hypergraph Clustering with Applications


Title	Inhomogeneous Hypergraph Clustering with Applications
Authors	Pan Li, Olgica Milenkovic
Abstract	Hypergraph partitioning is an important problem in machine learning, computer vision and network analytics. A widely used method for hypergraph partitioning relies on minimizing a normalized sum of the costs of partitioning hyperedges across clusters. Algorithmic solutions based on this approach assume that different partitions of a hyperedge incur the same cost. However, this assumption fails to leverage the fact that different subsets of vertices within the same hyperedge may have different structural importance. We hence propose a new hypergraph clustering technique, termed inhomogeneous hypergraph partitioning, which assigns different costs to different hyperedge cuts. We prove that inhomogeneous partitioning produces a quadratic approximation to the optimal solution if the inhomogeneous costs satisfy submodularity constraints. Moreover, we demonstrate that inhomogenous partitioning offers significant performance improvements in applications such as structure learning of rankings, subspace segmentation and motif clustering.
Tasks	hypergraph partitioning
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01249v4
PDF	http://arxiv.org/pdf/1709.01249v4.pdf
PWC	https://paperswithcode.com/paper/inhomogeneous-hypergraph-clustering-with
Repo	https://github.com/lipan00123/InHclustering
Framework	none

Hyperspectral Image Classification with Markov Random Fields and a Convolutional Neural Network


Title	Hyperspectral Image Classification with Markov Random Fields and a Convolutional Neural Network
Authors	Xiangyong Cao, Feng Zhou, Lin Xu, Deyu Meng, Zongben Xu, John Paisley
Abstract	This paper presents a new supervised classification algorithm for remotely sensed hyperspectral image (HSI) which integrates spectral and spatial information in a unified Bayesian framework. First, we formulate the HSI classification problem from a Bayesian perspective. Then, we adopt a convolutional neural network (CNN) to learn the posterior class distributions using a patch-wise training strategy to better use the spatial information. Next, spatial information is further considered by placing a spatial smoothness prior on the labels. Finally, we iteratively update the CNN parameters using stochastic gradient decent (SGD) and update the class labels of all pixel vectors using an alpha-expansion min-cut-based algorithm. Compared with other state-of-the-art methods, the proposed classification method achieves better performance on one synthetic dataset and two benchmark HSI datasets in a number of experimental settings.
Tasks	Hyperspectral Image Classification, Image Classification
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00727v2
PDF	http://arxiv.org/pdf/1705.00727v2.pdf
PWC	https://paperswithcode.com/paper/hyperspectral-image-classification-with-1
Repo	https://github.com/xiangyongcao/CNN_HSIC_MRF
Framework	tf

Geometric features for voxel-based surface recognition


Title	Geometric features for voxel-based surface recognition
Authors	Dmitry Yarotsky
Abstract	We introduce a library of geometric voxel features for CAD surface recognition/retrieval tasks. Our features include local versions of the intrinsic volumes (the usual 3D volume, surface area, integrated mean and Gaussian curvature) and a few closely related quantities. We also compute Haar wavelet and statistical distribution features by aggregating raw voxel features. We apply our features to object classification on the ESB data set and demonstrate accurate results with a small number of shallow decision trees.
Tasks	Object Classification
Published	2017-01-16
URL	http://arxiv.org/abs/1701.04249v1
PDF	http://arxiv.org/pdf/1701.04249v1.pdf
PWC	https://paperswithcode.com/paper/geometric-features-for-voxel-based-surface
Repo	https://github.com/yarotsky/voxelfeatures
Framework	none