Paper Group AWR 62
Single-Image Depth Perception in the Wild. CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases. A deep language model for software code. The Grail theorem prover: Type theory for syntax and semantics. Hybrid Recommender System based on Autoencoders. Deep Exploration via Bootstrapped DQN. Parallelizing Stochastic Gradient D …
Single-Image Depth Perception in the Wild
Title | Single-Image Depth Perception in the Wild |
Authors | Weifeng Chen, Zhao Fu, Dawei Yang, Jia Deng |
Abstract | This paper studies single-image depth perception in the wild, i.e., recovering depth from a single image taken in unconstrained settings. We introduce a new dataset “Depth in the Wild” consisting of images in the wild annotated with relative depth between pairs of random points. We also propose a new algorithm that learns to estimate metric depth using annotations of relative depth. Compared to the state of the art, our algorithm is simpler and performs better. Experiments show that our algorithm, combined with existing RGB-D data and our new relative depth annotations, significantly improves single-image depth perception in the wild. |
Tasks | |
Published | 2016-04-13 |
URL | http://arxiv.org/abs/1604.03901v2 |
http://arxiv.org/pdf/1604.03901v2.pdf | |
PWC | https://paperswithcode.com/paper/single-image-depth-perception-in-the-wild |
Repo | https://github.com/dfan/single-image-surface-normal-estimation |
Framework | pytorch |
CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
Title | CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases |
Authors | Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han |
Abstract | Extracting entities and relations for types of interest from text is important for understanding massive text corpora. Traditionally, systems of entity relation extraction have relied on human-annotated corpora for training and adopted an incremental pipeline. Such systems require additional human expertise to be ported to a new domain, and are vulnerable to errors cascading down the pipeline. In this paper, we investigate joint extraction of typed entities and relations with labeled data heuristically obtained from knowledge bases (i.e., distant supervision). As our algorithm for type labeling via distant supervision is context-agnostic, noisy training data poses unique challenges for the task. We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations. CoType, then using these learned embeddings, estimates the types of test (unlinkable) mentions. We formulate a joint optimization problem to learn embeddings from text corpora and knowledge bases, adopting a novel partial-label loss function for noisy labeled data and introducing an object “translation” function to capture the cross-constraints of entities and relations on each other. Experiments on three public datasets demonstrate the effectiveness of CoType across different domains (e.g., news, biomedical), with an average of 25% improvement in F1 score compared to the next best method. |
Tasks | Joint Entity and Relation Extraction, Relation Extraction |
Published | 2016-10-27 |
URL | http://arxiv.org/abs/1610.08763v2 |
http://arxiv.org/pdf/1610.08763v2.pdf | |
PWC | https://paperswithcode.com/paper/cotype-joint-extraction-of-typed-entities-and |
Repo | https://github.com/ellenmellon/ReQuest |
Framework | none |
A deep language model for software code
Title | A deep language model for software code |
Authors | Hoa Khanh Dam, Truyen Tran, Trang Pham |
Abstract | Existing language models such as n-grams for software code often fail to capture a long context where dependent code elements scatter far apart. In this paper, we propose a novel approach to build a language model for software code to address this particular issue. Our language model, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term dependencies which occur frequently in software code. Results from our intrinsic evaluation on a corpus of Java projects have demonstrated the effectiveness of our language model. This work contributes to realizing our vision for DeepSoft, an end-to-end, generic deep learning-based framework for modeling software and its development process. |
Tasks | Language Modelling |
Published | 2016-08-09 |
URL | http://arxiv.org/abs/1608.02715v1 |
http://arxiv.org/pdf/1608.02715v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-language-model-for-software-code |
Repo | https://github.com/D-a-r-e-k/Source-Code-Modelling |
Framework | pytorch |
The Grail theorem prover: Type theory for syntax and semantics
Title | The Grail theorem prover: Type theory for syntax and semantics |
Authors | Richard Moot |
Abstract | As the name suggests, type-logical grammars are a grammar formalism based on logic and type theory. From the prespective of grammar design, type-logical grammars develop the syntactic and semantic aspects of linguistic phenomena hand-in-hand, letting the desired semantics of an expression inform the syntactic type and vice versa. Prototypical examples of the successful application of type-logical grammars to the syntax-semantics interface include coordination, quantifier scope and extraction.This chapter describes the Grail theorem prover, a series of tools for designing and testing grammars in various modern type-logical grammars which functions as a tool . All tools described in this chapter are freely available. |
Tasks | |
Published | 2016-02-02 |
URL | http://arxiv.org/abs/1602.00812v2 |
http://arxiv.org/pdf/1602.00812v2.pdf | |
PWC | https://paperswithcode.com/paper/the-grail-theorem-prover-type-theory-for |
Repo | https://github.com/RichardMoot/GrailLight |
Framework | none |
Hybrid Recommender System based on Autoencoders
Title | Hybrid Recommender System based on Autoencoders |
Authors | Florian Strub, Romaric Gaudel, Jérémie Mary |
Abstract | A standard model for Recommender Systems is the Matrix Completion setting: given partially known matrix of ratings given by users (rows) to items (columns), infer the unknown ratings. In the last decades, few attempts where done to handle that objective with Neural Networks, but recently an architecture based on Autoencoders proved to be a promising approach. In current paper, we enhanced that architecture (i) by using a loss function adapted to input data with missing values, and (ii) by incorporating side information. The experiments demonstrate that while side information only slightly improve the test error averaged on all users/items, it has more impact on cold users/items. |
Tasks | Matrix Completion, Recommendation Systems |
Published | 2016-06-24 |
URL | http://arxiv.org/abs/1606.07659v3 |
http://arxiv.org/pdf/1606.07659v3.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-recommender-system-based-on |
Repo | https://github.com/jowoojun/collaborative_filtering_keras |
Framework | none |
Deep Exploration via Bootstrapped DQN
Title | Deep Exploration via Bootstrapped DQN |
Authors | Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy |
Abstract | Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games. |
Tasks | Atari Games, Efficient Exploration |
Published | 2016-02-15 |
URL | http://arxiv.org/abs/1602.04621v3 |
http://arxiv.org/pdf/1602.04621v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-exploration-via-bootstrapped-dqn |
Repo | https://github.com/tensorflow/models |
Framework | tf |
Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification
Title | Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification |
Authors | Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford |
Abstract | This work characterizes the benefits of averaging schemes widely used in conjunction with stochastic gradient descent (SGD). In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few iterates of SGD to decrease the variance in SGD’s final iterate. This work presents non-asymptotic excess risk bounds for these schemes for the stochastic approximation problem of least squares regression. Furthermore, this work establishes a precise problem-dependent extent to which mini-batch SGD yields provable near-linear parallelization speedups over SGD with batch size one. This allows for understanding learning rate versus batch size tradeoffs for the final iterate of an SGD method. These results are then utilized in providing a highly parallelizable SGD method that obtains the minimax risk with nearly the same number of serial updates as batch gradient descent, improving significantly over existing SGD methods. A non-asymptotic analysis of communication efficient parallelization schemes such as model-averaging/parameter mixing methods is then provided. Finally, this work sheds light on some fundamental differences in SGD’s behavior when dealing with agnostic noise in the (non-realizable) least squares regression problem. In particular, the work shows that the stepsizes that ensure minimax risk for the agnostic case must be a function of the noise properties. This paper builds on the operator view of analyzing SGD methods, introduced by Defossez and Bach (2015), followed by developing a novel analysis in bounding these operators to characterize the excess risk. These techniques are of broader interest in analyzing computational aspects of stochastic approximation. |
Tasks | |
Published | 2016-10-12 |
URL | http://arxiv.org/abs/1610.03774v4 |
http://arxiv.org/pdf/1610.03774v4.pdf | |
PWC | https://paperswithcode.com/paper/parallelizing-stochastic-gradient-descent-for |
Repo | https://github.com/rahulkidambi/AccSGD |
Framework | pytorch |
DeepLearningKit - an GPU Optimized Deep Learning Framework for Apple’s iOS, OS X and tvOS developed in Metal and Swift
Title | DeepLearningKit - an GPU Optimized Deep Learning Framework for Apple’s iOS, OS X and tvOS developed in Metal and Swift |
Authors | Amund Tveit, Torbjørn Morland, Thomas Brox Røst |
Abstract | In this paper we present DeepLearningKit - an open source framework that supports using pretrained deep learning models (convolutional neural networks) for iOS, OS X and tvOS. DeepLearningKit is developed in Metal in order to utilize the GPU efficiently and Swift for integration with applications, e.g. iOS-based mobile apps on iPhone/iPad, tvOS-based apps for the big screen, or OS X desktop applications. The goal is to support using deep learning models trained with popular frameworks such as Caffe, Torch, TensorFlow, Theano, Pylearn, Deeplearning4J and Mocha. Given the massive GPU resources and time required to train Deep Learning models we suggest an App Store like model to distribute and download pretrained and reusable Deep Learning models. |
Tasks | |
Published | 2016-05-15 |
URL | http://arxiv.org/abs/1605.04614v1 |
http://arxiv.org/pdf/1605.04614v1.pdf | |
PWC | https://paperswithcode.com/paper/deeplearningkit-an-gpu-optimized-deep |
Repo | https://github.com/DeepLearningKit/DeepLearningKit |
Framework | none |
Learning to Discover Sparse Graphical Models
Title | Learning to Discover Sparse Graphical Models |
Authors | Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko |
Abstract | We consider structure discovery of undirected graphical models from observational data. Inferring likely structures from few examples is a complex task often requiring the formulation of priors and sophisticated inference procedures. Popular methods rely on estimating a penalized maximum likelihood of the precision matrix. However, in these approaches structure recovery is an indirect consequence of the data-fit term, the penalty can be difficult to adapt for domain-specific knowledge, and the inference is computationally demanding. By contrast, it may be easier to generate training samples of data that arise from graphs with the desired structure properties. We propose here to leverage this latter source of information as training data to learn a function, parametrized by a neural network that maps empirical covariance matrices to estimated graph structures. Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood. Applying this framework, we find our learnable graph-discovery method trained on synthetic data generalizes well: identifying relevant edges in both synthetic and real data, completely unknown at training time. We find that on genetics, brain imaging, and simulation data we obtain performance generally superior to analytical methods. |
Tasks | |
Published | 2016-05-20 |
URL | http://arxiv.org/abs/1605.06359v3 |
http://arxiv.org/pdf/1605.06359v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-discover-sparse-graphical-models |
Repo | https://github.com/eugenium/LearnGraphDiscovery |
Framework | none |
FastBDT: A speed-optimized and cache-friendly implementation of stochastic gradient-boosted decision trees for multivariate classification
Title | FastBDT: A speed-optimized and cache-friendly implementation of stochastic gradient-boosted decision trees for multivariate classification |
Authors | Thomas Keck |
Abstract | Stochastic gradient-boosted decision trees are widely employed for multivariate classification and regression tasks. This paper presents a speed-optimized and cache-friendly implementation for multivariate classification called FastBDT. FastBDT is one order of magnitude faster during the fitting-phase and application-phase, in comparison with popular implementations in software frameworks like TMVA, scikit-learn and XGBoost. The concepts used to optimize the execution time and performance studies are discussed in detail in this paper. The key ideas include: An equal-frequency binning on the input data, which allows replacing expensive floating-point with integer operations, while at the same time increasing the quality of the classification; a cache-friendly linear access pattern to the input data, in contrast to usual implementations, which exhibit a random access pattern. FastBDT provides interfaces to C/C++, Python and TMVA. It is extensively used in the field of high energy physics by the Belle II experiment. |
Tasks | |
Published | 2016-09-20 |
URL | http://arxiv.org/abs/1609.06119v1 |
http://arxiv.org/pdf/1609.06119v1.pdf | |
PWC | https://paperswithcode.com/paper/fastbdt-a-speed-optimized-and-cache-friendly |
Repo | https://github.com/thomaskeck/FastBDT |
Framework | none |
Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning
Title | Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning |
Authors | Dilin Wang, Qiang Liu |
Abstract | We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient that maximumly decreases the KL divergence with the target distribution. Our method works for any target distribution specified by their unnormalized density function, and can train any black-box architectures that are differentiable in terms of the parameters we want to adapt. As an application of our method, we propose an amortized MLE algorithm for training deep energy model, where a neural sampler is adaptively trained to approximate the likelihood function. Our method mimics an adversarial game between the deep energy model and the neural sampler, and obtains realistic-looking images competitive with the state-of-the-art results. |
Tasks | Conditional Image Generation |
Published | 2016-11-06 |
URL | http://arxiv.org/abs/1611.01722v2 |
http://arxiv.org/pdf/1611.01722v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-draw-samples-with-application-to |
Repo | https://github.com/DartML/SteinGAN |
Framework | none |
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Title | SampleRNN: An Unconditional End-to-End Neural Audio Generation Model |
Authors | Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, Yoshua Bengio |
Abstract | In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance. |
Tasks | Audio Generation |
Published | 2016-12-22 |
URL | http://arxiv.org/abs/1612.07837v2 |
http://arxiv.org/pdf/1612.07837v2.pdf | |
PWC | https://paperswithcode.com/paper/samplernn-an-unconditional-end-to-end-neural |
Repo | https://github.com/soroushmehr/sampleRNN_ICLR2017 |
Framework | torch |
A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology
Title | A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology |
Authors | Kyunghyun Paeng, Sangheum Hwang, Sunggyun Park, Minsoo Kim |
Abstract | We present a unified framework to predict tumor proliferation scores from breast histopathology whole slide images. Our system offers a fully automated solution to predicting both a molecular data-based, and a mitosis counting-based tumor proliferation score. The framework integrates three modules, each fine-tuned to maximize the overall performance: An image processing component for handling whole slide images, a deep learning based mitosis detection network, and a proliferation scores prediction module. We have achieved 0.567 quadratic weighted Cohen’s kappa in mitosis counting-based score prediction and 0.652 F1-score in mitosis detection. On Spearman’s correlation coefficient, which evaluates predictive accuracy on the molecular data based score, the system obtained 0.6171. Our approach won first place in all of the three tasks in Tumor Proliferation Assessment Challenge 2016 which is MICCAI grand challenge. |
Tasks | Mitosis Detection |
Published | 2016-12-21 |
URL | http://arxiv.org/abs/1612.07180v2 |
http://arxiv.org/pdf/1612.07180v2.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-framework-for-tumor-proliferation |
Repo | https://github.com/CODAIT/deep-histopath |
Framework | tf |
Asynchrony begets Momentum, with an Application to Deep Learning
Title | Asynchrony begets Momentum, with an Application to Deep Learning |
Authors | Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré |
Abstract | Asynchronous methods are widely used in deep learning, but have limited theoretical justification when applied to non-convex problems. We show that running stochastic gradient descent (SGD) in an asynchronous manner can be viewed as adding a momentum-like term to the SGD iteration. Our result does not assume convexity of the objective function, so it is applicable to deep learning systems. We observe that a standard queuing model of asynchrony results in a form of momentum that is commonly used by deep learning practitioners. This forges a link between queuing theory and asynchrony in deep learning systems, which could be useful for systems builders. For convolutional neural networks, we experimentally validate that the degree of asynchrony directly correlates with the momentum, confirming our main result. An important implication is that tuning the momentum parameter is important when considering different levels of asynchrony. We assert that properly tuned momentum reduces the number of steps required for convergence. Finally, our theory suggests new ways of counteracting the adverse effects of asynchrony: a simple mechanism like using negative algorithmic momentum can improve performance under high asynchrony. Since asynchronous methods have better hardware efficiency, this result may shed light on when asynchronous execution is more efficient for deep learning systems. |
Tasks | |
Published | 2016-05-31 |
URL | http://arxiv.org/abs/1605.09774v2 |
http://arxiv.org/pdf/1605.09774v2.pdf | |
PWC | https://paperswithcode.com/paper/asynchrony-begets-momentum-with-an |
Repo | https://github.com/JoeriHermans/dist-keras |
Framework | none |
MOT16: A Benchmark for Multi-Object Tracking
Title | MOT16: A Benchmark for Multi-Object Tracking |
Authors | Anton Milan, Laura Leal-Taixe, Ian Reid, Stefan Roth, Konrad Schindler |
Abstract | Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for reseach. Recently, a new benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal of collecting existing and new data and creating a framework for the standardized evaluation of multiple object tracking methods. The first release of the benchmark focuses on multiple people tracking, since pedestrians are by far the most studied object in the tracking community. This paper accompanies a new release of the MOTChallenge benchmark. Unlike the initial release, all videos of MOT16 have been carefully annotated following a consistent protocol. Moreover, it not only offers a significant increase in the number of labeled boxes, but also provides multiple object classes beside pedestrians and the level of visibility for every single object of interest. |
Tasks | Multi-Object Tracking, Multiple Object Tracking, Multiple People Tracking, Object Tracking |
Published | 2016-03-02 |
URL | http://arxiv.org/abs/1603.00831v2 |
http://arxiv.org/pdf/1603.00831v2.pdf | |
PWC | https://paperswithcode.com/paper/mot16-a-benchmark-for-multi-object-tracking |
Repo | https://github.com/yihongXU/deepMOT |
Framework | pytorch |