May 7, 2019

2805 words 14 mins read

Paper Group AWR 101

Paper Group AWR 101

Input Convex Neural Networks. Temporal Learning and Sequence Modeling for a Job Recommender System. A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans. Deep Fully-Connected Networks for Video Compressive Sensing. Deep Learning for Music. Word Ordering Without Syntax. On Image segmentation using Fractional Gradients-Learning Model P …

Input Convex Neural Networks

Title Input Convex Neural Networks
Authors Brandon Amos, Lei Xu, J. Zico Kolter
Abstract This paper presents the input convex neural network architecture. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the network is a convex function of (some of) the inputs. The networks allow for efficient inference via optimization over some inputs to the network given others, and can be applied to settings including structured prediction, data imputation, reinforcement learning, and others. In this paper we lay the basic groundwork for these models, proposing methods for inference, optimization and learning, and analyze their representational power. We show that many existing neural network architectures can be made input-convex with a minor modification, and develop specialized optimization algorithms tailored to this setting. Finally, we highlight the performance of the methods on multi-label prediction, image completion, and reinforcement learning problems, where we show improvement over the existing state of the art in many cases.
Tasks Imputation, Structured Prediction
Published 2016-09-22
URL http://arxiv.org/abs/1609.07152v3
PDF http://arxiv.org/pdf/1609.07152v3.pdf
PWC https://paperswithcode.com/paper/input-convex-neural-networks
Repo https://github.com/locuslab/icnn
Framework tf

Temporal Learning and Sequence Modeling for a Job Recommender System

Title Temporal Learning and Sequence Modeling for a Job Recommender System
Authors Kuan Liu, Xing Shi, Anoop Kumar, Linhong Zhu, Prem Natarajan
Abstract We present our solution to the job recommendation task for RecSys Challenge 2016. The main contribution of our work is to combine temporal learning with sequence modeling to capture complex user-item activity patterns to improve job recommendations. First, we propose a time-based ranking model applied to historical observations and a hybrid matrix factorization over time re-weighted interactions. Second, we exploit sequence properties in user-items activities and develop a RNN-based recommendation model. Our solution achieved 5$^{th}$ place in the challenge among more than 100 participants. Notably, the strong performance of our RNN approach shows a promising new direction in employing sequence modeling for recommendation systems.
Tasks Recommendation Systems
Published 2016-08-11
URL http://arxiv.org/abs/1608.03333v1
PDF http://arxiv.org/pdf/1608.03333v1.pdf
PWC https://paperswithcode.com/paper/temporal-learning-and-sequence-modeling-for-a
Repo https://github.com/skywaLKer518/A-Recsys
Framework tf

A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans

Title A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans
Authors Yuyin Zhou, Lingxi Xie, Wei Shen, Yan Wang, Elliot K. Fishman, Alan L. Yuille
Abstract Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than 4%, measured by the average Dice-S{\o}rensen Coefficient (DSC). In addition, we report 62.43% DSC in the worst case, which guarantees the reliability of our approach in clinical applications.
Tasks Pancreas Segmentation
Published 2016-12-25
URL http://arxiv.org/abs/1612.08230v4
PDF http://arxiv.org/pdf/1612.08230v4.pdf
PWC https://paperswithcode.com/paper/a-fixed-point-model-for-pancreas-segmentation
Repo https://github.com/198808xc/OrganSegC2F
Framework caffe2

Deep Fully-Connected Networks for Video Compressive Sensing

Title Deep Fully-Connected Networks for Video Compressive Sensing
Authors Michael Iliadis, Leonidas Spinoulas, Aggelos K. Katsaggelos
Abstract In this work we present a deep learning framework for video compressive sensing. The proposed formulation enables recovery of video frames in a few seconds at significantly improved reconstruction quality compared to previous approaches. Our investigation starts by learning a linear mapping between video sequences and corresponding measured frames which turns out to provide promising results. We then extend the linear formulation to deep fully-connected networks and explore the performance gains using deeper architectures. Our analysis is always driven by the applicability of the proposed framework on existing compressive video architectures. Extensive simulations on several video sequences document the superiority of our approach both quantitatively and qualitatively. Finally, our analysis offers insights into understanding how dataset sizes and number of layers affect reconstruction performance while raising a few points for future investigation. Code is available at Github: https://github.com/miliadis/DeepVideoCS
Tasks Compressive Sensing, Video Compressive Sensing
Published 2016-03-16
URL http://arxiv.org/abs/1603.04930v2
PDF http://arxiv.org/pdf/1603.04930v2.pdf
PWC https://paperswithcode.com/paper/deep-fully-connected-networks-for-video
Repo https://github.com/miliadis/DeepVideoCS
Framework pytorch

Deep Learning for Music

Title Deep Learning for Music
Authors Allen Huang, Raymond Wu
Abstract Our goal is to be able to build a generative model from a deep neural network architecture to try to create music that has both harmony and melody and is passable as music composed by humans. Previous work in music generation has mainly been focused on creating a single melody. More recent work on polyphonic music modeling, centered around time series probability density estimation, has met some partial success. In particular, there has been a lot of work based off of Recurrent Neural Networks combined with Restricted Boltzmann Machines (RNN-RBM) and other similar recurrent energy based models. Our approach, however, is to perform end-to-end learning and generation with deep neural nets alone.
Tasks Density Estimation, Music Generation, Music Modeling, Time Series
Published 2016-06-15
URL http://arxiv.org/abs/1606.04930v1
PDF http://arxiv.org/pdf/1606.04930v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-music
Repo https://github.com/sarthak15169/Deep-Music
Framework none

Word Ordering Without Syntax

Title Word Ordering Without Syntax
Authors Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber
Abstract Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence. We find that, in fact, an n-gram language model with a simple heuristic gives strong results on this task. Furthermore, we show that a long short-term memory (LSTM) language model is even more effective at recovering order, with our basic model outperforming a state-of-the-art syntactic model by 11.5 BLEU points. Additional data and larger beams yield further gains, at the expense of training and search time.
Tasks Language Modelling
Published 2016-04-28
URL http://arxiv.org/abs/1604.08633v2
PDF http://arxiv.org/pdf/1604.08633v2.pdf
PWC https://paperswithcode.com/paper/word-ordering-without-syntax
Repo https://github.com/allenschmaltz/word_ordering
Framework none

On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal Inference

Title On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal Inference
Authors Anish Acharya, Uddipan Mukherjee, Charless Fowlkes
Abstract Estimates of image gradients play a ubiquitous role in image segmentation and classification problems since gradients directly relate to the boundaries or the edges of a scene. This paper proposes an unified approach to gradient estimation based on fractional calculus that is computationally cheap and readily applicable to any existing algorithm that relies on image gradients. We show experiments on edge detection and image segmentation on the Stanford Backgrounds Dataset where these improved local gradients outperforms state of the art, achieving a performance of 79.2% average accuracy.
Tasks Edge Detection, Semantic Segmentation
Published 2016-05-07
URL http://arxiv.org/abs/1605.02240v1
PDF http://arxiv.org/pdf/1605.02240v1.pdf
PWC https://paperswithcode.com/paper/on-image-segmentation-using-fractional
Repo https://github.com/anishacharya/Image-Segmentation-FDOG-TRW
Framework none

End-to-end Learning of Driving Models from Large-scale Video Datasets

Title End-to-end Learning of Driving Models from Large-scale Video Datasets
Authors Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell
Abstract Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm.
Tasks Scene Segmentation
Published 2016-12-04
URL http://arxiv.org/abs/1612.01079v2
PDF http://arxiv.org/pdf/1612.01079v2.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-of-driving-models-from
Repo https://github.com/NupurBhaisare/BDD-Model
Framework tf

Learning a Natural Language Interface with Neural Programmer

Title Learning a Natural Language Interface with Neural Programmer
Authors Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei
Abstract Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser.
Tasks Question Answering
Published 2016-11-28
URL http://arxiv.org/abs/1611.08945v4
PDF http://arxiv.org/pdf/1611.08945v4.pdf
PWC https://paperswithcode.com/paper/learning-a-natural-language-interface-with
Repo https://github.com/pramodkaushik/np_analysis
Framework tf

Learning Non-Lambertian Object Intrinsics across ShapeNet Categories

Title Learning Non-Lambertian Object Intrinsics across ShapeNet Categories
Authors Jian Shi, Yue Dong, Hao Su, Stella X. Yu
Abstract We consider the non-Lambertian object intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object. We build a large-scale object intrinsics database based on existing 3D models in the ShapeNet database. Rendered with realistic environment maps, millions of synthetic images of objects and their corresponding albedo, shading, and specular ground-truth images are used to train an encoder-decoder CNN. Once trained, the network can decompose an image into the product of albedo and shading components, along with an additive specular component. Our CNN delivers accurate and sharp results in this classical inverse problem of computer vision, sharp details attributed to skip layer connections at corresponding resolutions from the encoder to the decoder. Benchmarked on our ShapeNet and MIT intrinsics datasets, our model consistently outperforms the state-of-the-art by a large margin. We train and test our CNN on different object categories. Perhaps surprising especially from the CNN classification perspective, our intrinsics CNN generalizes very well across categories. Our analysis shows that feature learning at the encoder stage is more crucial for developing a universal representation across categories. We apply our synthetic data trained model to images and videos downloaded from the internet, and observe robust and realistic intrinsics results. Quality non-Lambertian intrinsics could open up many interesting applications such as image-based albedo and specular editing.
Tasks
Published 2016-12-27
URL http://arxiv.org/abs/1612.08510v1
PDF http://arxiv.org/pdf/1612.08510v1.pdf
PWC https://paperswithcode.com/paper/learning-non-lambertian-object-intrinsics
Repo https://github.com/shi-jian/shapenet-intrinsics
Framework torch

Effectiveness of greedily collecting items in open world games

Title Effectiveness of greedily collecting items in open world games
Authors Andrej Gajduk
Abstract Since Pokemon Go sent millions on the quest of collecting virtual monsters, an important question has been on the minds of many people: Is going after the closest item first a time-and-cost-effective way to play? Here, we show that this is in fact a good strategy which performs on average only 7% worse than the best possible solution in terms of the total distance traveled to gather all the items. Even when accounting for errors due to the inability of people to accurately measure distances by eye, the performance only goes down to 16% of the optimal solution.
Tasks
Published 2016-08-17
URL http://arxiv.org/abs/1608.06175v1
PDF http://arxiv.org/pdf/1608.06175v1.pdf
PWC https://paperswithcode.com/paper/effectiveness-of-greedily-collecting-items-in
Repo https://github.com/gajduk/greedy-tsp
Framework none

AGA: Attribute Guided Augmentation

Title AGA: Attribute Guided Augmentation
Authors Mandar Dixit, Roland Kwitt, Marc Niethammer, Nuno Vasconcelos
Abstract We consider the problem of data augmentation, i.e., generating artificial samples to extend a given corpus of training data. Specifically, we propose attributed-guided augmentation (AGA) which learns a mapping that allows to synthesize data such that an attribute of a synthesized sample is at a desired value or strength. This is particularly interesting in situations where little data with no attribute annotation is available for learning, but we have access to a large external corpus of heavily annotated samples. While prior works primarily augment in the space of images, we propose to perform augmentation in feature space instead. We implement our approach as a deep encoder-decoder architecture that learns the synthesis function in an end-to-end manner. We demonstrate the utility of our approach on the problems of (1) one-shot object recognition in a transfer-learning setting where we have no prior knowledge of the new classes, as well as (2) object-based one-shot scene recognition. As external data, we leverage 3D depth and pose information from the SUN RGB-D dataset. Our experiments show that attribute-guided augmentation of high-level CNN features considerably improves one-shot recognition performance on both problems.
Tasks Data Augmentation, Object Recognition, Scene Recognition, Transfer Learning
Published 2016-12-08
URL http://arxiv.org/abs/1612.02559v2
PDF http://arxiv.org/pdf/1612.02559v2.pdf
PWC https://paperswithcode.com/paper/aga-attribute-guided-augmentation
Repo https://github.com/rkwitt/GuidedAugmentation
Framework torch

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

Title Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Authors Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, Ian Reid
Abstract A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manu- ally labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photomet- ric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset (without any further augmentation) gives com- parable performance to that of the state of art supervised methods for single view depth estimation.
Tasks Calibration, Depth Estimation
Published 2016-03-16
URL http://arxiv.org/abs/1603.04992v2
PDF http://arxiv.org/pdf/1603.04992v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-cnn-for-single-view-depth
Repo https://github.com/Ravi-Garg/Unsupervised_Depth_Estimation
Framework none

Energy-based Generative Adversarial Network

Title Energy-based Generative Adversarial Network
Authors Junbo Zhao, Michael Mathieu, Yann LeCun
Abstract We introduce the “Energy-based Generative Adversarial Network” model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images.
Tasks
Published 2016-09-11
URL http://arxiv.org/abs/1609.03126v4
PDF http://arxiv.org/pdf/1609.03126v4.pdf
PWC https://paperswithcode.com/paper/energy-based-generative-adversarial-network
Repo https://github.com/buriburisuri/ebgan
Framework tf

Political Speech Generation

Title Political Speech Generation
Authors Valentin Kassarnig
Abstract In this report we present a system that can generate political speeches for a desired political party. Furthermore, the system allows to specify whether a speech should hold a supportive or opposing opinion. The system relies on a combination of several state-of-the-art NLP methods which are discussed in this report. These include n-grams, Justeson & Katz POS tag filter, recurrent neural networks, and latent Dirichlet allocation. Sequences of words are generated based on probabilities obtained from two underlying models: A language model takes care of the grammatical correctness while a topic model aims for textual consistency. Both models were trained on the Convote dataset which contains transcripts from US congressional floor debates. Furthermore, we present a manual and an automated approach to evaluate the quality of generated speeches. In an experimental evaluation generated speeches have shown very high quality in terms of grammatical correctness and sentence transitions.
Tasks Language Modelling
Published 2016-01-13
URL http://arxiv.org/abs/1601.03313v2
PDF http://arxiv.org/pdf/1601.03313v2.pdf
PWC https://paperswithcode.com/paper/political-speech-generation
Repo https://github.com/valentin012/conspeech
Framework none
comments powered by Disqus