May 7, 2019

2976 words 14 mins read

Paper Group AWR 57

Paper Group AWR 57

MLPnP - A Real-Time Maximum Likelihood Solution to the Perspective-n-Point Problem. DehazeNet: An End-to-End System for Single Image Haze Removal. The Influence of Frequency, Recency and Semantic Context on the Reuse of Tags in Social Tagging Systems. Xception: Deep Learning with Depthwise Separable Convolutions. Deep3D: Fully Automatic 2D-to-3D Vi …

MLPnP - A Real-Time Maximum Likelihood Solution to the Perspective-n-Point Problem

Title MLPnP - A Real-Time Maximum Likelihood Solution to the Perspective-n-Point Problem
Authors Steffen Urban, Jens Leitloff, Stefan Hinz
Abstract In this paper, a statistically optimal solution to the Perspective-n-Point (PnP) problem is presented. Many solutions to the PnP problem are geometrically optimal, but do not consider the uncertainties of the observations. In addition, it would be desirable to have an internal estimation of the accuracy of the estimated rotation and translation parameters of the camera pose. Thus, we propose a novel maximum likelihood solution to the PnP problem, that incorporates image observation uncertainties and remains real-time capable at the same time. Further, the presented method is general, as is works with 3D direction vectors instead of 2D image points and is thus able to cope with arbitrary central camera models. This is achieved by projecting (and thus reducing) the covariance matrices of the observations to the corresponding vector tangent space.
Tasks
Published 2016-07-27
URL http://arxiv.org/abs/1607.08112v1
PDF http://arxiv.org/pdf/1607.08112v1.pdf
PWC https://paperswithcode.com/paper/mlpnp-a-real-time-maximum-likelihood-solution
Repo https://github.com/derleeG/EAPPnP
Framework none

DehazeNet: An End-to-End System for Single Image Haze Removal

Title DehazeNet: An End-to-End System for Single Image Haze Removal
Authors Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, Dacheng Tao
Abstract Single image haze removal is a challenging ill-posed problem. Existing methods use various constraints/priors to get plausible dehazing solutions. The key to achieve haze removal is to estimate a medium transmission map for an input hazy image. In this paper, we propose a trainable end-to-end system called DehazeNet, for medium transmission estimation. DehazeNet takes a hazy image as input, and outputs its medium transmission map that is subsequently used to recover a haze-free image via atmospheric scattering model. DehazeNet adopts Convolutional Neural Networks (CNN) based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing. Specifically, layers of Maxout units are used for feature extraction, which can generate almost all haze-relevant features. We also propose a novel nonlinear activation function in DehazeNet, called Bilateral Rectified Linear Unit (BReLU), which is able to improve the quality of recovered haze-free image. We establish connections between components of the proposed DehazeNet and those used in existing methods. Experiments on benchmark images show that DehazeNet achieves superior performance over existing methods, yet keeps efficient and easy to use.
Tasks Image Dehazing, Single Image Haze Removal
Published 2016-01-28
URL http://arxiv.org/abs/1601.07661v2
PDF http://arxiv.org/pdf/1601.07661v2.pdf
PWC https://paperswithcode.com/paper/dehazenet-an-end-to-end-system-for-single
Repo https://github.com/saintazunya/DehazeNet
Framework tf

The Influence of Frequency, Recency and Semantic Context on the Reuse of Tags in Social Tagging Systems

Title The Influence of Frequency, Recency and Semantic Context on the Reuse of Tags in Social Tagging Systems
Authors Kowald Dominik, Lex Elisabeth
Abstract In this paper, we study factors that influence tag reuse behavior in social tagging systems. Our work is guided by the activation equation of the cognitive model ACT-R, which states that the usefulness of information in human memory depends on the three factors usage frequency, recency and semantic context. It is our aim to shed light on the influence of these factors on tag reuse. In our experiments, we utilize six datasets from the social tagging systems Flickr, CiteULike, BibSonomy, Delicious, LastFM and MovieLens, covering a range of various tagging settings. Our results confirm that frequency, recency and semantic context positively influence the reuse probability of tags. However, the extent to which each factor individually influences tag reuse strongly depends on the type of folksonomy present in a social tagging system. Our work can serve as guideline for researchers and developers of tag-based recommender systems when designing algorithms for social tagging environments.
Tasks Recommendation Systems
Published 2016-04-04
URL https://arxiv.org/abs/1604.00837v1
PDF https://arxiv.org/pdf/1604.00837v1.pdf
PWC https://paperswithcode.com/paper/the-influence-of-frequency-recency-and
Repo https://github.com/learning-layers/TagRec
Framework none

Xception: Deep Learning with Depthwise Separable Convolutions

Title Xception: Deep Learning with Depthwise Separable Convolutions
Authors François Chollet
Abstract We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.
Tasks Image Classification
Published 2016-10-07
URL http://arxiv.org/abs/1610.02357v3
PDF http://arxiv.org/pdf/1610.02357v3.pdf
PWC https://paperswithcode.com/paper/xception-deep-learning-with-depthwise
Repo https://github.com/LouisFoucard/w-net
Framework none

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

Title Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks
Authors Junyuan Xie, Ross Girshick, Ali Farhadi
Abstract As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly. Producing 3D videos, however, remains challenging. In this paper we propose to use deep neural networks for automatically converting 2D videos and images to stereoscopic 3D format. In contrast to previous automatic 2D-to-3D conversion algorithms, which have separate stages and need ground truth depth map as supervision, our approach is trained end-to-end directly on stereo pairs extracted from 3D movies. This novel training scheme makes it possible to exploit orders of magnitude more data and significantly increases performance. Indeed, Deep3D outperforms baselines in both quantitative and human subject evaluations.
Tasks
Published 2016-04-13
URL http://arxiv.org/abs/1604.03650v1
PDF http://arxiv.org/pdf/1604.03650v1.pdf
PWC https://paperswithcode.com/paper/deep3d-fully-automatic-2d-to-3d-video
Repo https://github.com/LouisFoucard/w-net
Framework none

Feature Learning for Chord Recognition: The Deep Chroma Extractor

Title Feature Learning for Chord Recognition: The Deep Chroma Extractor
Authors Filip Korzeniowski, Gerhard Widmer
Abstract We explore frame-level audio feature learning for chord recognition using artificial neural networks. We present the argument that chroma vectors potentially hold enough information to model harmonic content of audio for chord recognition, but that standard chroma extractors compute too noisy features. This leads us to propose a learned chroma feature extractor based on artificial neural networks. It is trained to compute chroma features that encode harmonic information important for chord recognition, while being robust to irrelevant interferences. We achieve this by feeding the network an audio spectrum with context instead of a single frame as input. This way, the network can learn to selectively compensate noise and resolve harmonic ambiguities. We compare the resulting features to hand-crafted ones by using a simple linear frame-wise classifier for chord recognition on various data sets. The results show that the learned feature extractor produces superior chroma vectors for chord recognition.
Tasks Chord Recognition
Published 2016-12-15
URL http://arxiv.org/abs/1612.05065v1
PDF http://arxiv.org/pdf/1612.05065v1.pdf
PWC https://paperswithcode.com/paper/feature-learning-for-chord-recognition-the
Repo https://github.com/CPJKU/madmom
Framework none

Entity Identification as Multitasking

Title Entity Identification as Multitasking
Authors Karl Stratos
Abstract Standard approaches in entity identification hard-code boundary detection and type prediction into labels (e.g., John/B-PER Smith/I-PER) and then perform Viterbi. This has two disadvantages: 1. the runtime complexity grows quadratically in the number of types, and 2. there is no natural segment-level representation. In this paper, we propose a novel neural architecture that addresses these disadvantages. We frame the problem as multitasking, separating boundary detection and type prediction but optimizing them jointly. Despite its simplicity, this architecture performs competitively with fully structured models such as BiLSTM-CRFs while scaling linearly in the number of types. Furthermore, by construction, the model induces type-disambiguating embeddings of predicted mentions.
Tasks Boundary Detection
Published 2016-12-08
URL http://arxiv.org/abs/1612.02706v2
PDF http://arxiv.org/pdf/1612.02706v2.pdf
PWC https://paperswithcode.com/paper/entity-identification-as-multitasking
Repo https://github.com/karlstratos/mention2vec
Framework none

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

Title Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs
Authors Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré
Abstract We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel techniques, throughput can be improved by at least 5.5x over state-of-the-art systems on CPUs. This ensures an end-to-end training speed directly proportional to the throughput of a device regardless of its underlying hardware, allowing each node in the cluster to be treated as a black box. Our second contribution is a theoretical and empirical study of the tradeoffs affecting end-to-end training time in a multiple-device setting. We identify the degree of asynchronous parallelization as a key factor affecting both hardware and statistical efficiency. We see that asynchrony can be viewed as introducing a momentum term. Our results imply that tuning momentum is critical in asynchronous parallel configurations, and suggest that published results that have not been fully tuned might report suboptimal performance for some configurations. For our third contribution, we use our novel understanding of the interaction between system and optimization dynamics to provide an efficient hyperparameter optimizer. Our optimizer involves a predictive model for the total time to convergence and selects an allocation of resources to minimize that time. We demonstrate that the most popular distributed deep learning systems fall within our tradeoff space, but do not optimize within the space. By doing this optimization, our prototype runs 1.9x to 12x faster than the fastest state-of-the-art systems.
Tasks
Published 2016-06-14
URL http://arxiv.org/abs/1606.04487v4
PDF http://arxiv.org/pdf/1606.04487v4.pdf
PWC https://paperswithcode.com/paper/omnivore-an-optimizer-for-multi-device-deep
Repo https://github.com/HazyResearch/Omnivore
Framework none

Unifying Count-Based Exploration and Intrinsic Motivation

Title Unifying Count-Based Exploration and Intrinsic Motivation
Authors Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos
Abstract We consider an agent’s uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma’s Revenge.
Tasks Atari Games, Montezuma’s Revenge
Published 2016-06-06
URL http://arxiv.org/abs/1606.01868v2
PDF http://arxiv.org/pdf/1606.01868v2.pdf
PWC https://paperswithcode.com/paper/unifying-count-based-exploration-and
Repo https://github.com/RLAgent/state-marginal-matching
Framework none

Membership Inference Attacks against Machine Learning Models

Title Membership Inference Attacks against Machine Learning Models
Authors Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov
Abstract We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model’s training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model’s predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial “machine learning as a service” providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.
Tasks Inference Attack
Published 2016-10-18
URL http://arxiv.org/abs/1610.05820v2
PDF http://arxiv.org/pdf/1610.05820v2.pdf
PWC https://paperswithcode.com/paper/membership-inference-attacks-against-machine
Repo https://github.com/spring-epfl/mia
Framework pytorch

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

Title Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Authors Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
Abstract Deep neural networks (NNs) are powerful black box predictors that have recently achieved impressive performance on a wide spectrum of tasks. Quantifying predictive uncertainty in NNs is a challenging and yet unsolved problem. Bayesian NNs, which learn a distribution over weights, are currently the state-of-the-art for estimating predictive uncertainty; however these require significant modifications to the training procedure and are computationally expensive compared to standard (non-Bayesian) NNs. We propose an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates. Through a series of experiments on classification and regression benchmarks, we demonstrate that our method produces well-calibrated uncertainty estimates which are as good or better than approximate Bayesian NNs. To assess robustness to dataset shift, we evaluate the predictive uncertainty on test examples from known and unknown distributions, and show that our method is able to express higher uncertainty on out-of-distribution examples. We demonstrate the scalability of our method by evaluating predictive uncertainty estimates on ImageNet.
Tasks
Published 2016-12-05
URL http://arxiv.org/abs/1612.01474v3
PDF http://arxiv.org/pdf/1612.01474v3.pdf
PWC https://paperswithcode.com/paper/simple-and-scalable-predictive-uncertainty
Repo https://github.com/hayoung-kim/tf2-deep-ensemble-uncertainty
Framework tf

Interaction Networks for Learning about Objects, Relations and Physics

Title Interaction Networks for Learning about Objects, Relations and Physics
Authors Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Rezende, Koray Kavukcuoglu
Abstract Reasoning about objects, relations, and physics is central to human intelligence, and a key goal of artificial intelligence. Here we introduce the interaction network, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. Our model takes graphs as input, performs object- and relation-centric reasoning in a way that is analogous to a simulation, and is implemented using deep neural networks. We evaluate its ability to reason about several challenging physical domains: n-body problems, rigid-body collision, and non-rigid dynamics. Our results show it can be trained to accurately simulate the physical trajectories of dozens of objects over thousands of time steps, estimate abstract quantities such as energy, and generalize automatically to systems with different numbers and configurations of objects and relations. Our interaction network implementation is the first general-purpose, learnable physics engine, and a powerful general framework for reasoning about object and relations in a wide variety of complex real-world domains.
Tasks
Published 2016-12-01
URL http://arxiv.org/abs/1612.00222v1
PDF http://arxiv.org/pdf/1612.00222v1.pdf
PWC https://paperswithcode.com/paper/interaction-networks-for-learning-about
Repo https://github.com/ToruOwO/InteractionNetwork-pytorch
Framework pytorch

Marvin: Semantic annotation using multiple knowledge sources

Title Marvin: Semantic annotation using multiple knowledge sources
Authors Nikola Milosevic
Abstract People are producing more written material then anytime in the history. The increase is so high that professionals from the various fields are no more able to cope with this amount of publications. Text mining tools can offer tools to help them and one of the tools that can aid information retrieval and information extraction is semantic text annotation. In this report we present Marvin, a text annotator written in Java, which can be used as a command line tool and as a Java library. Marvin is able to annotate text using multiple sources, including WordNet, MetaMap, DBPedia and thesauri represented as SKOS.
Tasks Information Retrieval
Published 2016-02-01
URL http://arxiv.org/abs/1602.00515v2
PDF http://arxiv.org/pdf/1602.00515v2.pdf
PWC https://paperswithcode.com/paper/marvin-semantic-annotation-using-multiple
Repo https://github.com/nikolamilosevic86/Marvin
Framework none

Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values

Title Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values
Authors Eric V. Strobl, Peter L. Spirtes, Shyam Visweswaran
Abstract The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.
Tasks
Published 2016-07-14
URL http://arxiv.org/abs/1607.03975v2
PDF http://arxiv.org/pdf/1607.03975v2.pdf
PWC https://paperswithcode.com/paper/estimating-and-controlling-the-false
Repo https://github.com/ericstrobl/PCp
Framework none

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

Title The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
Authors Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio
Abstract State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions. Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train. In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets. Code to reproduce the experiments is available here : https://github.com/SimJeg/FC-DenseNet/blob/master/train.py
Tasks Semantic Segmentation
Published 2016-11-28
URL http://arxiv.org/abs/1611.09326v3
PDF http://arxiv.org/pdf/1611.09326v3.pdf
PWC https://paperswithcode.com/paper/the-one-hundred-layers-tiramisu-fully
Repo https://github.com/SimJeg/FC-DenseNet
Framework none
comments powered by Disqus