Paper Group AWR 54
Persistence Diagrams with Linear Machine Learning Models. Memory-Efficient Implementation of DenseNets. Modular Multi-Objective Deep Reinforcement Learning with Decision Values. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph. Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range cap …
Persistence Diagrams with Linear Machine Learning Models
Title | Persistence Diagrams with Linear Machine Learning Models |
Authors | Ippei Obayashi, Yasuaki Hiraoka |
Abstract | Persistence diagrams have been widely recognized as a compact descriptor for characterizing multiscale topological features in data. When many datasets are available, statistical features embedded in those persistence diagrams can be extracted by applying machine learnings. In particular, the ability for explicitly analyzing the inverse in the original data space from those statistical features of persistence diagrams is significantly important for practical applications. In this paper, we propose a unified method for the inverse analysis by combining linear machine learning models with persistence images. The method is applied to point clouds and cubical sets, showing the ability of the statistical inverse analysis and its advantages. |
Tasks | |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1706.10082v2 |
http://arxiv.org/pdf/1706.10082v2.pdf | |
PWC | https://paperswithcode.com/paper/persistence-diagrams-with-linear-machine |
Repo | https://github.com/scikit-tda/persim |
Framework | none |
Memory-Efficient Implementation of DenseNets
Title | Memory-Efficient Implementation of DenseNets |
Authors | Geoff Pleiss, Danlu Chen, Gao Huang, Tongcheng Li, Laurens van der Maaten, Kilian Q. Weinberger |
Abstract | The DenseNet architecture is highly computationally efficient as a result of feature reuse. However, a naive DenseNet implementation can require a significant amount of GPU memory: If not properly managed, pre-activation batch normalization and contiguous convolution operations can produce feature maps that grow quadratically with network depth. In this technical report, we introduce strategies to reduce the memory consumption of DenseNets during training. By strategically using shared memory allocations, we reduce the memory cost for storing feature maps from quadratic to linear. Without the GPU memory bottleneck, it is now possible to train extremely deep DenseNets. Networks with 14M parameters can be trained on a single GPU, up from 4M. A 264-layer DenseNet (73M parameters), which previously would have been infeasible to train, can now be trained on a single workstation with 8 NVIDIA Tesla M40 GPUs. On the ImageNet ILSVRC classification dataset, this large DenseNet obtains a state-of-the-art single-crop top-1 error of 20.26%. |
Tasks | |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.06990v1 |
http://arxiv.org/pdf/1707.06990v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-efficient-implementation-of-densenets |
Repo | https://github.com/facebookresearch/ResNeXt |
Framework | torch |
Modular Multi-Objective Deep Reinforcement Learning with Decision Values
Title | Modular Multi-Objective Deep Reinforcement Learning with Decision Values |
Authors | Tomasz Tajmajer |
Abstract | In this work we present a method for using Deep Q-Networks (DQNs) in multi-objective environments. Deep Q-Networks provide remarkable performance in single objective problems learning from high-level visual state representations. However, in many scenarios (e.g in robotics, games), the agent needs to pursue multiple objectives simultaneously. We propose an architecture in which separate DQNs are used to control the agent’s behaviour with respect to particular objectives. In this architecture we introduce decision values to improve the scalarization of multiple DQNs into a single action. Our architecture enables the decomposition of the agent’s behaviour into controllable and replaceable sub-behaviours learned by distinct modules. Moreover, it allows to change the priorities of particular objectives post-learning, while preserving the overall performance of the agent. To evaluate our solution we used a game-like simulator in which an agent - provided with high-level visual input - pursues multiple objectives in a 2D world. |
Tasks | |
Published | 2017-04-21 |
URL | http://arxiv.org/abs/1704.06676v2 |
http://arxiv.org/pdf/1704.06676v2.pdf | |
PWC | https://paperswithcode.com/paper/modular-multi-objective-deep-reinforcement |
Repo | https://github.com/ttajmajer/morl-dv |
Framework | tf |
Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
Title | Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph |
Authors | Cong Fu, Chao Xiang, Changxu Wang, Deng Cai |
Abstract | Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial search scenario of Taobao (Alibaba Group) and has been integrated into their search engine at billion-node scale. |
Tasks | |
Published | 2017-07-01 |
URL | http://arxiv.org/abs/1707.00143v9 |
http://arxiv.org/pdf/1707.00143v9.pdf | |
PWC | https://paperswithcode.com/paper/fast-approximate-nearest-neighbor-search-with |
Repo | https://github.com/ZJULearning/nsg |
Framework | none |
Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range captures
Title | Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range captures |
Authors | Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler |
Abstract | Ground-based whole sky imagers are popular for monitoring cloud formations, which is necessary for various applications. We present two new Wide Angle High-Resolution Sky Imaging System (WAHRSIS) models, which were designed especially to withstand the hot and humid climate of Singapore. The first uses a fully sealed casing, whose interior temperature is regulated using a Peltier cooler. The second features a double roof design with ventilation grids on the sides, allowing the outside air to flow through the device. Measurements of temperature inside these two devices show their ability to operate in Singapore weather conditions. Unlike our original WAHRSIS model, neither uses a mechanical sun blocker to prevent the direct sunlight from reaching the camera; instead they rely on high-dynamic-range imaging (HDRI) techniques to reduce the glare from the sun. |
Tasks | |
Published | 2017-04-19 |
URL | http://arxiv.org/abs/1704.05678v1 |
http://arxiv.org/pdf/1704.05678v1.pdf | |
PWC | https://paperswithcode.com/paper/design-of-low-cost-compact-and-weather-proof |
Repo | https://github.com/Soumyabrata/HDRCaptures |
Framework | none |
Dataset for a Neural Natural Language Interface for Databases (NNLIDB)
Title | Dataset for a Neural Natural Language Interface for Databases (NNLIDB) |
Authors | Florin Brad, Radu Iacob, Ionel Hosu, Traian Rebedea |
Abstract | Progress in natural language interfaces to databases (NLIDB) has been slow mainly due to linguistic issues (such as language ambiguity) and domain portability. Moreover, the lack of a large corpus to be used as a standard benchmark has made data-driven approaches difficult to develop and compare. In this paper, we revisit the problem of NLIDBs and recast it as a sequence translation problem. To this end, we introduce a large dataset extracted from the Stack Exchange Data Explorer website, which can be used for training neural natural language interfaces for databases. We also report encouraging baseline results on a smaller manually annotated test corpus, obtained using an attention-based sequence-to-sequence neural network. |
Tasks | |
Published | 2017-07-11 |
URL | http://arxiv.org/abs/1707.03172v1 |
http://arxiv.org/pdf/1707.03172v1.pdf | |
PWC | https://paperswithcode.com/paper/dataset-for-a-neural-natural-language |
Repo | https://github.com/fbrad/text2sql |
Framework | none |
Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation
Title | Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation |
Authors | Kota Hara, Raviteja Vemulapalli, Rama Chellappa |
Abstract | Deep Convolutional Neural Networks (DCNN) have been proven to be effective for various computer vision problems. In this work, we demonstrate its effectiveness on a continuous object orientation estimation task, which requires prediction of 0 to 360 degrees orientation of the objects. We do so by proposing and comparing three continuous orientation prediction approaches designed for the DCNNs. The first two approaches work by representing an orientation as a point on a unit circle and minimizing either L2 loss or angular difference loss. The third method works by first converting the continuous orientation estimation task into a set of discrete orientation estimation tasks and then converting the discrete orientation outputs back to the continuous orientation using a mean-shift algorithm. By evaluating on a vehicle orientation estimation task and a pedestrian orientation estimation task, we demonstrate that the discretization-based approach not only works better than the other two approaches but also achieves state-of-the-art performance. We also demonstrate that finding an appropriate feature representation is critical to achieve a good performance when adapting a DCNN trained for an image recognition task. |
Tasks | |
Published | 2017-02-06 |
URL | http://arxiv.org/abs/1702.01499v1 |
http://arxiv.org/pdf/1702.01499v1.pdf | |
PWC | https://paperswithcode.com/paper/designing-deep-convolutional-neural-networks |
Repo | https://github.com/kevinzakka/angle-pred |
Framework | pytorch |
Learning to Generate Chairs with Generative Adversarial Nets
Title | Learning to Generate Chairs with Generative Adversarial Nets |
Authors | Evgeny Zamyatin, Andrey Filchenkov |
Abstract | Generative adversarial networks (GANs) has gained tremendous popularity lately due to an ability to reinforce quality of its predictive model with generated objects and the quality of the generative model with and supervised feedback. GANs allow to synthesize images with a high degree of realism. However, the learning process of such models is a very complicated optimization problem and certain limitation for such models were found. It affects the choice of certain layers and nonlinearities when designing architectures. In particular, it does not allow to train convolutional GAN models with fully-connected hidden layers. In our work, we propose a modification of the previously described set of rules, as well as new approaches to designing architectures that will allow us to train more powerful GAN models. We show the effectiveness of our methods on the problem of synthesizing projections of 3D objects with the possibility of interpolation by class and view point. |
Tasks | |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10413v1 |
http://arxiv.org/pdf/1705.10413v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-generate-chairs-with-generative |
Repo | https://github.com/EvgenyZamyatin/chair-gan-code |
Framework | none |
Meta Learning Shared Hierarchies
Title | Meta Learning Shared Hierarchies |
Authors | Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman |
Abstract | We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives—policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy. |
Tasks | Legged Robots, Meta-Learning |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09767v1 |
http://arxiv.org/pdf/1710.09767v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-shared-hierarchies |
Repo | https://github.com/dsapandora/s_cera |
Framework | tf |
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
Title | PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume |
Authors | Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz |
Abstract | We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the cur- rent optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436) images. Our models are available on https://github.com/NVlabs/PWC-Net. |
Tasks | Dense Pixel Correspondence Estimation, Optical Flow Estimation |
Published | 2017-09-07 |
URL | http://arxiv.org/abs/1709.02371v3 |
http://arxiv.org/pdf/1709.02371v3.pdf | |
PWC | https://paperswithcode.com/paper/pwc-net-cnns-for-optical-flow-using-pyramid |
Repo | https://github.com/yanqi1811/PWC-Net |
Framework | pytorch |
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Title | VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation |
Authors | Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong |
Abstract | Rich and dense human labeled datasets are among the main enabling factors for the recent advance on vision-language understanding. Many seemingly distant annotations (e.g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes — and even the same set of images (e.g., of COCO). The popularity of COCO correlates those annotations and tasks. Explicitly linking them up may significantly benefit both individual tasks and the unified vision and language modeling. We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links visual questions and segmentation answers (VQS). They transfer human supervision between the previously separate tasks, offer more effective leverage to existing problems, and also open the door for new research problems and models. We study two applications of the VQS data in this paper: supervised attention for VQA and a novel question-focused semantic segmentation task. For the former, we obtain state-of-the-art results on the VQA real multiple-choice task by simply augmenting the multilayer perceptrons with some attention features that are learned using the segmentation-QA links as explicit supervision. To put the latter in perspective, we study two plausible methods and compare them to an oracle method assuming that the instance segmentations are given at the test stage. |
Tasks | Language Modelling, Question Answering, Semantic Segmentation, Visual Question Answering |
Published | 2017-08-15 |
URL | http://arxiv.org/abs/1708.04686v1 |
http://arxiv.org/pdf/1708.04686v1.pdf | |
PWC | https://paperswithcode.com/paper/vqs-linking-segmentations-to-questions-and |
Repo | https://github.com/Cold-Winter/vqs |
Framework | caffe2 |
Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments
Title | Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments |
Authors | Oier Mees, Andreas Eitel, Wolfram Burgard |
Abstract | Object detection is an essential task for autonomous robots operating in dynamic and changing environments. A robot should be able to detect objects in the presence of sensor noise that can be induced by changing lighting conditions for cameras and false depth readings for range sensors, especially RGB-D cameras. To tackle these challenges, we propose a novel adaptive fusion approach for object detection that learns weighting the predictions of different sensor modalities in an online manner. Our approach is based on a mixture of convolutional neural network (CNN) experts and incorporates multiple modalities including appearance, depth and motion. We test our method in extensive robot experiments, in which we detect people in a combined indoor and outdoor scenario from RGB-D data, and we demonstrate that our method can adapt to harsh lighting changes and severe camera motion blur. Furthermore, we present a new RGB-D dataset for people detection in mixed in- and outdoor environments, recorded with a mobile robot. Code, pretrained models and dataset are available at http://adaptivefusion.cs.uni-freiburg.de |
Tasks | Object Detection |
Published | 2017-07-18 |
URL | https://arxiv.org/abs/1707.05733v2 |
https://arxiv.org/pdf/1707.05733v2.pdf | |
PWC | https://paperswithcode.com/paper/choosing-smartly-adaptive-multimodal-fusion |
Repo | https://github.com/mees/deep_adaptive_fusion |
Framework | none |
Inhomogeneous Hypergraph Clustering with Applications
Title | Inhomogeneous Hypergraph Clustering with Applications |
Authors | Pan Li, Olgica Milenkovic |
Abstract | Hypergraph partitioning is an important problem in machine learning, computer vision and network analytics. A widely used method for hypergraph partitioning relies on minimizing a normalized sum of the costs of partitioning hyperedges across clusters. Algorithmic solutions based on this approach assume that different partitions of a hyperedge incur the same cost. However, this assumption fails to leverage the fact that different subsets of vertices within the same hyperedge may have different structural importance. We hence propose a new hypergraph clustering technique, termed inhomogeneous hypergraph partitioning, which assigns different costs to different hyperedge cuts. We prove that inhomogeneous partitioning produces a quadratic approximation to the optimal solution if the inhomogeneous costs satisfy submodularity constraints. Moreover, we demonstrate that inhomogenous partitioning offers significant performance improvements in applications such as structure learning of rankings, subspace segmentation and motif clustering. |
Tasks | hypergraph partitioning |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01249v4 |
http://arxiv.org/pdf/1709.01249v4.pdf | |
PWC | https://paperswithcode.com/paper/inhomogeneous-hypergraph-clustering-with |
Repo | https://github.com/lipan00123/InHclustering |
Framework | none |
Hyperspectral Image Classification with Markov Random Fields and a Convolutional Neural Network
Title | Hyperspectral Image Classification with Markov Random Fields and a Convolutional Neural Network |
Authors | Xiangyong Cao, Feng Zhou, Lin Xu, Deyu Meng, Zongben Xu, John Paisley |
Abstract | This paper presents a new supervised classification algorithm for remotely sensed hyperspectral image (HSI) which integrates spectral and spatial information in a unified Bayesian framework. First, we formulate the HSI classification problem from a Bayesian perspective. Then, we adopt a convolutional neural network (CNN) to learn the posterior class distributions using a patch-wise training strategy to better use the spatial information. Next, spatial information is further considered by placing a spatial smoothness prior on the labels. Finally, we iteratively update the CNN parameters using stochastic gradient decent (SGD) and update the class labels of all pixel vectors using an alpha-expansion min-cut-based algorithm. Compared with other state-of-the-art methods, the proposed classification method achieves better performance on one synthetic dataset and two benchmark HSI datasets in a number of experimental settings. |
Tasks | Hyperspectral Image Classification, Image Classification |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00727v2 |
http://arxiv.org/pdf/1705.00727v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperspectral-image-classification-with-1 |
Repo | https://github.com/xiangyongcao/CNN_HSIC_MRF |
Framework | tf |
Geometric features for voxel-based surface recognition
Title | Geometric features for voxel-based surface recognition |
Authors | Dmitry Yarotsky |
Abstract | We introduce a library of geometric voxel features for CAD surface recognition/retrieval tasks. Our features include local versions of the intrinsic volumes (the usual 3D volume, surface area, integrated mean and Gaussian curvature) and a few closely related quantities. We also compute Haar wavelet and statistical distribution features by aggregating raw voxel features. We apply our features to object classification on the ESB data set and demonstrate accurate results with a small number of shallow decision trees. |
Tasks | Object Classification |
Published | 2017-01-16 |
URL | http://arxiv.org/abs/1701.04249v1 |
http://arxiv.org/pdf/1701.04249v1.pdf | |
PWC | https://paperswithcode.com/paper/geometric-features-for-voxel-based-surface |
Repo | https://github.com/yarotsky/voxelfeatures |
Framework | none |