Paper Group AWR 61
Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization. Accurate, Efficient and Scalable Graph Embedding. Torchbearer: A Model Fitting Library for PyTorch. DEMorphy, German Language Morphological Analyzer. C3: Concentrated-Comprehensive Convolution and its application to semantic segmentation. The TUM VI Benchmark for Evaluati …
Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization
Title | Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization |
Authors | Liu Liu, Hongdong Li, Yuchao Dai |
Abstract | This paper tackles the problem of large-scale image-based localization (IBL) where the spatial location of a query image is determined by finding out the most similar reference images in a large database. For solving this problem, a critical task is to learn discriminative image representation that captures informative information relevant for localization. We propose a novel representation learning method having higher location-discriminating power. It provides the following contributions: 1) we represent a place (location) as a set of exemplar images depicting the same landmarks and aim to maximize similarities among intra-place images while minimizing similarities among inter-place images; 2) we model a similarity measure as a probability distribution on L_2-metric distances between intra-place and inter-place image representations; 3) we propose a new Stochastic Attraction and Repulsion Embedding (SARE) loss function minimizing the KL divergence between the learned and the actual probability distributions; 4) we give theoretical comparisons between SARE, triplet ranking and contrastive losses. It provides insights into why SARE is better by analyzing gradients. Our SARE loss is easy to implement and pluggable to any CNN. Experiments show that our proposed method improves the localization performance on standard benchmarks by a large margin. Demonstrating the broad applicability of our method, we obtained the third place out of 209 teams in the 2018 Google Landmark Retrieval Challenge. Our code and model are available at https://github.com/Liumouliu/deepIBL. |
Tasks | Image-Based Localization, Representation Learning |
Published | 2018-08-27 |
URL | https://arxiv.org/abs/1808.08779v2 |
https://arxiv.org/pdf/1808.08779v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-stochastic-attraction-and-repulsion |
Repo | https://github.com/Liumouliu/deepIBL |
Framework | none |
Accurate, Efficient and Scalable Graph Embedding
Title | Accurate, Efficient and Scalable Graph Embedding |
Authors | Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna |
Abstract | The Graph Convolutional Network (GCN) model and its variants are powerful graph embedding tools for facilitating classification and clustering on graphs. However, a major challenge is to reduce the complexity of layered GCNs and make them parallelizable and scalable on very large graphs — state-of the art techniques are unable to achieve scalability without losing accuracy and efficiency. In this paper, we propose novel parallelization techniques for graph sampling-based GCNs that achieve superior scalable performance on very large graphs without compromising accuracy. Specifically, our GCN guarantees work-efficient training and produces order of magnitude savings in computation and communication. To scale GCN training on tightly-coupled shared memory systems, we develop parallelization strategies for the key steps in training: For the graph sampling step, we exploit parallelism within and across multiple sampling instances, and devise an efficient data structure for concurrent accesses that provides theoretical guarantee of near-linear speedup with number of processing units. For the feature propagation step within the sampled graph, we improve cache utilization and reduce DRAM communication by data partitioning. We prove that our partitioning strategy is a 2-approximation for minimizing the communication time compared to the optimal strategy. We demonstrate that our parallel graph embedding outperforms state-of-the-art methods in scalability (with respect to number of processors, graph size and GCN model size), efficiency and accuracy on several large datasets. On a 40-core Xeon platform, our parallel training achieves 64$\times$ speedup (with AVX) in the sampling step and 25$\times$ speedup in the feature propagation step, compared to the serial implementation, resulting in a net speedup of 21$\times$. |
Tasks | Graph Embedding |
Published | 2018-10-28 |
URL | http://arxiv.org/abs/1810.11899v2 |
http://arxiv.org/pdf/1810.11899v2.pdf | |
PWC | https://paperswithcode.com/paper/accurate-efficient-and-scalable-graph |
Repo | https://github.com/ZimpleX/gcn-ipdps19 |
Framework | tf |
Torchbearer: A Model Fitting Library for PyTorch
Title | Torchbearer: A Model Fitting Library for PyTorch |
Authors | Ethan Harris, Matthew Painter, Jonathon Hare |
Abstract | We introduce torchbearer, a model fitting library for pytorch aimed at researchers working on deep learning or differentiable programming. The torchbearer library provides a high level metric and callback API that can be used for a wide range of applications. We also include a series of built in callbacks that can be used for: model persistence, learning rate decay, logging, data visualization and more. The extensive documentation includes an example library for deep learning and dynamic programming problems and can be found at http://torchbearer.readthedocs.io. The code is licensed under the MIT License and available at https://github.com/ecs-vlc/torchbearer. |
Tasks | |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03363v1 |
http://arxiv.org/pdf/1809.03363v1.pdf | |
PWC | https://paperswithcode.com/paper/torchbearer-a-model-fitting-library-for |
Repo | https://github.com/pytorchbearer/torchbearer |
Framework | pytorch |
DEMorphy, German Language Morphological Analyzer
Title | DEMorphy, German Language Morphological Analyzer |
Authors | Duygu Altinok |
Abstract | DEMorphy is a morphological analyzer for German. It is built onto large, compactified lexicons from German Morphological Dictionary. A guesser based on German declension suffixed is also provided. For German, we provided a state-of-art morphological analyzer. DEMorphy is implemented in Python with ease of usability and accompanying documentation. The package is suitable for both academic and commercial purposes wit a permissive licence. |
Tasks | |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.00902v1 |
http://arxiv.org/pdf/1803.00902v1.pdf | |
PWC | https://paperswithcode.com/paper/demorphy-german-language-morphological |
Repo | https://github.com/DuyguA/DEMorphy |
Framework | none |
C3: Concentrated-Comprehensive Convolution and its application to semantic segmentation
Title | C3: Concentrated-Comprehensive Convolution and its application to semantic segmentation |
Authors | Hyojin Park, Youngjoon Yoo, Geonseok Seo, Dongyoon Han, Sangdoo Yun, Nojun Kwak |
Abstract | One of the practical choices for making a lightweight semantic segmentation model is to combine a depth-wise separable convolution with a dilated convolution. However, the simple combination of these two methods results in an over-simplified operation which causes severe performance degradation due to loss of information contained in the feature map. To resolve this problem, we propose a new block called Concentrated-Comprehensive Convolution (C3) which applies the asymmetric convolutions before the depth-wise separable dilated convolution to compensate for the information loss due to dilated convolution. The C3 block consists of a concentration stage and a comprehensive convolution stage. The first stage uses two depth-wise asymmetric convolutions for compressed information from the neighboring pixels to alleviate the information loss. The second stage increases the receptive field by using a depth-wise separable dilated convolution from the feature map of the first stage. We applied the C3 block to various segmentation frameworks (ESPNet, DRN, ERFNet, ENet) for proving the beneficial properties of our proposed method. Experimental results show that the proposed method preserves the original accuracies on Cityscapes dataset while reducing the complexity. Furthermore, we modified ESPNet to achieve about 2% better performance while reducing the number of parameters by half and the number of FLOPs by 35% compared with the original ESPNet. Finally, experiments on ImageNet classification task show that C3 block can successfully replace dilated convolutions. |
Tasks | Semantic Segmentation |
Published | 2018-12-12 |
URL | https://arxiv.org/abs/1812.04920v3 |
https://arxiv.org/pdf/1812.04920v3.pdf | |
PWC | https://paperswithcode.com/paper/concentrated-comprehensive-convolutions-for |
Repo | https://github.com/clovaai/c3_sinet |
Framework | pytorch |
The TUM VI Benchmark for Evaluating Visual-Inertial Odometry
Title | The TUM VI Benchmark for Evaluating Visual-Inertial Odometry |
Authors | David Schubert, Thore Goll, Nikolaus Demmel, Vladyslav Usenko, Jörg Stückler, Daniel Cremers |
Abstract | Visual odometry and SLAM methods have a large variety of applications in domains such as augmented reality or robotics. Complementing vision sensors with inertial measurements tremendously improves tracking accuracy and robustness, and thus has spawned large interest in the development of visual-inertial (VI) odometry approaches. In this paper, we propose the TUM VI benchmark, a novel dataset with a diverse set of sequences in different scenes for evaluating VI odometry. It provides camera images with 1024x1024 resolution at 20 Hz, high dynamic range and photometric calibration. An IMU measures accelerations and angular velocities on 3 axes at 200 Hz, while the cameras and IMU sensors are time-synchronized in hardware. For trajectory evaluation, we also provide accurate pose ground truth from a motion capture system at high frequency (120 Hz) at the start and end of the sequences which we accurately aligned with the camera and IMU measurements. The full dataset with raw and calibrated data is publicly available. We also evaluate state-of-the-art VI odometry approaches on our dataset. |
Tasks | Calibration, Motion Capture, Visual Odometry |
Published | 2018-04-17 |
URL | https://arxiv.org/abs/1804.06120v3 |
https://arxiv.org/pdf/1804.06120v3.pdf | |
PWC | https://paperswithcode.com/paper/the-tum-vi-benchmark-for-evaluating-visual |
Repo | https://github.com/VladyslavUsenko/basalt-mirror |
Framework | none |
Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection
Title | Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection |
Authors | Jeremias Knoblauch, Theodoros Damoulas |
Abstract | Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its time complexity is linear and its space complexity constant, and thus it is two orders of magnitudes faster than its closest competitor. In addition, it outperforms the state of the art for multivariate data. |
Tasks | Change Point Detection, Model Selection |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05383v2 |
http://arxiv.org/pdf/1805.05383v2.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-bayesian-on-line-changepoint |
Repo | https://github.com/alan-turing-institute/bocpdms |
Framework | none |
On gradient regularizers for MMD GANs
Title | On gradient regularizers for MMD GANs |
Authors | Michael Arbel, Dougal J. Sutherland, Mikołaj Bińkowski, Arthur Gretton |
Abstract | We propose a principled method for gradient-based regularization of the critic of GAN-like models trained by adversarially optimizing the kernel of a Maximum Mean Discrepancy (MMD). We show that controlling the gradient of the critic is vital to having a sensible loss function, and devise a method to enforce exact, analytical gradient constraints at no additional cost compared to existing approximate techniques based on additive regularizers. The new loss function is provably continuous, and experiments show that it stabilizes and accelerates training, giving image generation models that outperform state-of-the art methods on $160 \times 160$ CelebA and $64 \times 64$ unconditional ImageNet. |
Tasks | Image Generation |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11565v4 |
http://arxiv.org/pdf/1805.11565v4.pdf | |
PWC | https://paperswithcode.com/paper/on-gradient-regularizers-for-mmd-gans |
Repo | https://github.com/MichaelArbel/Scaled-MMD-GAN |
Framework | tf |
Revisiting Gray Pixel for Statistical Illumination Estimation
Title | Revisiting Gray Pixel for Statistical Illumination Estimation |
Authors | Yanlin Qian, Said Pertuz, Jarno Nikkanen, Joni-Kristian Kämäräinen, Jiri Matas |
Abstract | We present a statistical color constancy method that relies on novel gray pixel detection and mean shift clustering. The method, called Mean Shifted Grey Pixel – MSGP, is based on the observation: true-gray pixels are aligned towards one single direction. Our solution is compact, easy to compute and requires no training. Experiments on two real-world benchmarks show that the proposed approach outperforms state-of-the-art methods in the camera-agnostic scenario. In the setting where the camera is known, MSGP outperforms all statistical methods. |
Tasks | Color Constancy |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08326v4 |
http://arxiv.org/pdf/1803.08326v4.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-gray-pixel-for-statistical |
Repo | https://github.com/yanlinqian/Mean-shifted-Gray-Pixel |
Framework | none |
Towards a Better Match in Siamese Network Based Visual Object Tracker
Title | Towards a Better Match in Siamese Network Based Visual Object Tracker |
Authors | Anfeng He, Chong Luo, Xinmei Tian, Wenjun Zeng |
Abstract | Recently, Siamese network based trackers have received tremendous interest for their fast tracking speed and high performance. Despite the great success, this tracking framework still suffers from several limitations. First, it cannot properly handle large object rotation. Second, tracking gets easily distracted when the background contains salient objects. In this paper, we propose two simple yet effective mechanisms, namely angle estimation and spatial masking, to address these issues. The objective is to extract more representative features so that a better match can be obtained between the same object from different frames. The resulting tracker, named Siam-BM, not only significantly improves the tracking performance, but more importantly maintains the realtime capability. Evaluations on the VOT2017 dataset show that Siam-BM achieves an EAO of 0.335, which makes it the best-performing realtime tracker to date. |
Tasks | Visual Object Tracking |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01368v1 |
http://arxiv.org/pdf/1809.01368v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-better-match-in-siamese-network |
Repo | https://github.com/77695/Siam-BM |
Framework | tf |
Neuronal Circuit Policies
Title | Neuronal Circuit Policies |
Authors | Mathias Lechner, Ramin M. Hasani, Radu Grosu |
Abstract | We propose an effective way to create interpretable control agents, by re-purposing the function of a biological neural circuit model, to govern simulated and real world reinforcement learning (RL) test-beds. We model the tap-withdrawal (TW) neural circuit of the nematode, C. elegans, a circuit responsible for the worm’s reflexive response to external mechanical touch stimulations, and learn its synaptic and neuronal parameters as a policy for controlling basic RL tasks. We also autonomously park a real rover robot on a pre-defined trajectory, by deploying such neuronal circuit policies learned in a simulated environment. For reconfiguration of the purpose of the TW neural circuit, we adopt a search-based RL algorithm. We show that our neuronal policies perform as good as deep neural network policies with the advantage of realizing interpretable dynamics at the cell level. |
Tasks | |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08554v1 |
http://arxiv.org/pdf/1803.08554v1.pdf | |
PWC | https://paperswithcode.com/paper/neuronal-circuit-policies |
Repo | https://github.com/mlech26l/neuronal_circuit_policies |
Framework | none |
Choosing to Rank
Title | Choosing to Rank |
Authors | Stephen Ragain, Johan Ugander |
Abstract | Ranking data arises in a wide variety of application areas but remains difficult to model, learn from, and predict. Datasets often exhibit multimodality, intransitivity, or incomplete rankings—particularly when generated by humans—yet popular probabilistic models are often too rigid to capture such complexities. In this work we leverage recent progress on similar challenges in discrete choice modeling to form flexible and tractable choice-based models for ranking data. We study choice representations, maps from rankings (complete or top-$k$) to collections of choices, as a way of forming ranking models from choice models. We focus on the repeated selection (RS) choice representation, first used to form the Plackett-Luce ranking model from the conditional multinomial logit choice model. We fully characterize, for a prime number of alternatives, the choice representations that admit ranking distributions with unit normalization, a desirably property that greatly simplifies maximum likelihood estimation. We further show that only specific minor variations on repeated selection exhibit this property. Our choice-based ranking models provide higher out-of-sample likelihood when compared to Plackett-Luce and Mallows models on a broad collection of ranking tasks including food preferences, ranked-choice elections, car racing, and search engine relevance tasks. |
Tasks | Car Racing |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.05139v2 |
http://arxiv.org/pdf/1809.05139v2.pdf | |
PWC | https://paperswithcode.com/paper/choosing-to-rank |
Repo | https://github.com/sragain/CTR |
Framework | pytorch |
Biological Mechanisms for Learning: A Computational Model of Olfactory Learning in the Manduca sexta Moth, with Applications to Neural Nets
Title | Biological Mechanisms for Learning: A Computational Model of Olfactory Learning in the Manduca sexta Moth, with Applications to Neural Nets |
Authors | Charles B. Delahunt, Jeffrey A. Riffell, J. Nathan Kutz |
Abstract | The insect olfactory system, which includes the antennal lobe (AL), mushroom body (MB), and ancillary structures, is a relatively simple neural system capable of learning. Its structural features, which are widespread in biological neural systems, process olfactory stimuli through a cascade of networks where large dimension shifts occur from stage to stage and where sparsity and randomness play a critical role in coding. Learning is partly enabled by a neuromodulatory reward mechanism of octopamine stimulation of the AL, whose increased activity induces rewiring of the MB through Hebbian plasticity. Enforced sparsity in the MB focuses Hebbian growth on neurons that are the most important for the representation of the learned odor. Based upon current biophysical knowledge, we have constructed an end-to-end computational model of the Manduca sexta moth olfactory system which includes the interaction of the AL and MB under octopamine stimulation. Our model is able to robustly learn new odors, and our simulations of integrate-and-fire neurons match the statistical features of in-vivo firing rate data. From a biological perspective, the model provides a valuable tool for examining the role of neuromodulators, like octopamine, in learning, and gives insight into critical interactions between sparsity, Hebbian growth, and stimulation during learning. Our simulations also inform predictions about structural details of the olfactory system that are not currently well-characterized. From a machine learning perspective, the model yields bio-inspired mechanisms that are potentially useful in constructing neural nets for rapid learning from very few samples. These mechanisms include high-noise layers, sparse layers as noise filters, and a biologically-plausible optimization method to train the network based on octopamine stimulation, sparse layers, and Hebbian growth. |
Tasks | |
Published | 2018-02-08 |
URL | http://arxiv.org/abs/1802.02678v1 |
http://arxiv.org/pdf/1802.02678v1.pdf | |
PWC | https://paperswithcode.com/paper/biological-mechanisms-for-learning-a |
Repo | https://github.com/charlesDelahunt/SmartAsABug |
Framework | none |
Cross-View Image Synthesis using Conditional GANs
Title | Cross-View Image Synthesis using Conditional GANs |
Authors | Krishna Regmi, Ali Borji |
Abstract | Learning to generate natural scenes has always been a challenging task in computer vision. It is even more painstaking when the generation is conditioned on images with drastically different views. This is mainly because understanding, corresponding, and transforming appearance and semantic information across the views is not trivial. In this paper, we attempt to solve the novel problem of cross-view image synthesis, aerial to street-view and vice versa, using conditional generative adversarial networks (cGAN). Two new architectures called Crossview Fork (X-Fork) and Crossview Sequential (X-Seq) are proposed to generate scenes with resolutions of 64x64 and 256x256 pixels. X-Fork architecture has a single discriminator and a single generator. The generator hallucinates both the image and its semantic segmentation in the target view. X-Seq architecture utilizes two cGANs. The first one generates the target image which is subsequently fed to the second cGAN for generating its corresponding semantic segmentation map. The feedback from the second cGAN helps the first cGAN generate sharper images. Both of our proposed architectures learn to generate natural images as well as their semantic segmentation maps. The proposed methods show that they are able to capture and maintain the true semantics of objects in source and target views better than the traditional image-to-image translation method which considers only the visual appearance of the scene. Extensive qualitative and quantitative evaluations support the effectiveness of our frameworks, compared to two state of the art methods, for natural scene generation across drastically different views. |
Tasks | Cross-View Image-to-Image Translation, Image Generation, Image-to-Image Translation, Scene Generation, Semantic Segmentation |
Published | 2018-03-09 |
URL | http://arxiv.org/abs/1803.03396v2 |
http://arxiv.org/pdf/1803.03396v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-view-image-synthesis-using-conditional |
Repo | https://github.com/kregmi/cross-view-image-synthesis |
Framework | pytorch |
Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search
Title | Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search |
Authors | Linnan Wang, Yiyang Zhao, Yuu Jinnai, Yuandong Tian, Rodrigo Fonseca |
Abstract | Neural Architecture Search (NAS) has shown great success in automating the design of neural networks, but the prohibitive amount of computations behind current NAS methods requires further investigations in improving the sample efficiency and the network evaluation cost to get better results in a shorter time. In this paper, we present a novel scalable Monte Carlo Tree Search (MCTS) based NAS agent, named AlphaX, to tackle these two aspects. AlphaX improves the search efficiency by adaptively balancing the exploration and exploitation at the state level, and by a Meta-Deep Neural Network (DNN) to predict network accuracies for biasing the search toward a promising region. To amortize the network evaluation cost, AlphaX accelerates MCTS rollouts with a distributed design and reduces the number of epochs in evaluating a network by transfer learning, which is guided with the tree structure in MCTS. In 12 GPU days and 1000 samples, AlphaX found an architecture that reaches 97.84% top-1 accuracy on CIFAR-10, and 75.5% top-1 accuracy on ImageNet, exceeding SOTA NAS methods in both the accuracy and sampling efficiency. Particularly, we also evaluate AlphaX on NASBench-101, a large scale NAS dataset; AlphaX is 3x and 2.8x more sample efficient than Random Search and Regularized Evolution in finding the global optimum. Finally, we show the searched architecture improves a variety of vision applications from Neural Style Transfer, to Image Captioning and Object Detection. |
Tasks | Image Captioning, Neural Architecture Search, Object Detection, Style Transfer, Transfer Learning |
Published | 2018-05-18 |
URL | https://arxiv.org/abs/1805.07440v5 |
https://arxiv.org/pdf/1805.07440v5.pdf | |
PWC | https://paperswithcode.com/paper/alphax-exploring-neural-architectures-with |
Repo | https://github.com/linnanwang/AlphaX-NASBench101 |
Framework | none |