May 7, 2019

2805 words 14 mins read

Paper Group AWR 101

Input Convex Neural Networks. Temporal Learning and Sequence Modeling for a Job Recommender System. A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans. Deep Fully-Connected Networks for Video Compressive Sensing. Deep Learning for Music. Word Ordering Without Syntax. On Image segmentation using Fractional Gradients-Learning Model P …

Input Convex Neural Networks


Title	Input Convex Neural Networks
Authors	Brandon Amos, Lei Xu, J. Zico Kolter
Abstract	This paper presents the input convex neural network architecture. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the network is a convex function of (some of) the inputs. The networks allow for efficient inference via optimization over some inputs to the network given others, and can be applied to settings including structured prediction, data imputation, reinforcement learning, and others. In this paper we lay the basic groundwork for these models, proposing methods for inference, optimization and learning, and analyze their representational power. We show that many existing neural network architectures can be made input-convex with a minor modification, and develop specialized optimization algorithms tailored to this setting. Finally, we highlight the performance of the methods on multi-label prediction, image completion, and reinforcement learning problems, where we show improvement over the existing state of the art in many cases.
Tasks	Imputation, Structured Prediction
Published	2016-09-22
URL	http://arxiv.org/abs/1609.07152v3
PDF	http://arxiv.org/pdf/1609.07152v3.pdf
PWC	https://paperswithcode.com/paper/input-convex-neural-networks
Repo	https://github.com/locuslab/icnn
Framework	tf

Temporal Learning and Sequence Modeling for a Job Recommender System


Title	Temporal Learning and Sequence Modeling for a Job Recommender System
Authors	Kuan Liu, Xing Shi, Anoop Kumar, Linhong Zhu, Prem Natarajan
Abstract	We present our solution to the job recommendation task for RecSys Challenge 2016. The main contribution of our work is to combine temporal learning with sequence modeling to capture complex user-item activity patterns to improve job recommendations. First, we propose a time-based ranking model applied to historical observations and a hybrid matrix factorization over time re-weighted interactions. Second, we exploit sequence properties in user-items activities and develop a RNN-based recommendation model. Our solution achieved 5$^{th}$ place in the challenge among more than 100 participants. Notably, the strong performance of our RNN approach shows a promising new direction in employing sequence modeling for recommendation systems.
Tasks	Recommendation Systems
Published	2016-08-11
URL	http://arxiv.org/abs/1608.03333v1
PDF	http://arxiv.org/pdf/1608.03333v1.pdf
PWC	https://paperswithcode.com/paper/temporal-learning-and-sequence-modeling-for-a
Repo	https://github.com/skywaLKer518/A-Recsys
Framework	tf

A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans


Title	A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans
Authors	Yuyin Zhou, Lingxi Xie, Wei Shen, Yan Wang, Elliot K. Fishman, Alan L. Yuille
Abstract	Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than 4%, measured by the average Dice-S{\o}rensen Coefficient (DSC). In addition, we report 62.43% DSC in the worst case, which guarantees the reliability of our approach in clinical applications.
Tasks	Pancreas Segmentation
Published	2016-12-25
URL	http://arxiv.org/abs/1612.08230v4
PDF	http://arxiv.org/pdf/1612.08230v4.pdf
PWC	https://paperswithcode.com/paper/a-fixed-point-model-for-pancreas-segmentation
Repo	https://github.com/198808xc/OrganSegC2F
Framework	caffe2

Deep Fully-Connected Networks for Video Compressive Sensing


Title	Deep Fully-Connected Networks for Video Compressive Sensing
Authors	Michael Iliadis, Leonidas Spinoulas, Aggelos K. Katsaggelos
Abstract	In this work we present a deep learning framework for video compressive sensing. The proposed formulation enables recovery of video frames in a few seconds at significantly improved reconstruction quality compared to previous approaches. Our investigation starts by learning a linear mapping between video sequences and corresponding measured frames which turns out to provide promising results. We then extend the linear formulation to deep fully-connected networks and explore the performance gains using deeper architectures. Our analysis is always driven by the applicability of the proposed framework on existing compressive video architectures. Extensive simulations on several video sequences document the superiority of our approach both quantitatively and qualitatively. Finally, our analysis offers insights into understanding how dataset sizes and number of layers affect reconstruction performance while raising a few points for future investigation. Code is available at Github: https://github.com/miliadis/DeepVideoCS
Tasks	Compressive Sensing, Video Compressive Sensing
Published	2016-03-16
URL	http://arxiv.org/abs/1603.04930v2
PDF	http://arxiv.org/pdf/1603.04930v2.pdf
PWC	https://paperswithcode.com/paper/deep-fully-connected-networks-for-video
Repo	https://github.com/miliadis/DeepVideoCS
Framework	pytorch

Deep Learning for Music


Title	Deep Learning for Music
Authors	Allen Huang, Raymond Wu
Abstract	Our goal is to be able to build a generative model from a deep neural network architecture to try to create music that has both harmony and melody and is passable as music composed by humans. Previous work in music generation has mainly been focused on creating a single melody. More recent work on polyphonic music modeling, centered around time series probability density estimation, has met some partial success. In particular, there has been a lot of work based off of Recurrent Neural Networks combined with Restricted Boltzmann Machines (RNN-RBM) and other similar recurrent energy based models. Our approach, however, is to perform end-to-end learning and generation with deep neural nets alone.
Tasks	Density Estimation, Music Generation, Music Modeling, Time Series
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04930v1
PDF	http://arxiv.org/pdf/1606.04930v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-music
Repo	https://github.com/sarthak15169/Deep-Music
Framework	none

Word Ordering Without Syntax


Title	Word Ordering Without Syntax
Authors	Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber
Abstract	Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence. We find that, in fact, an n-gram language model with a simple heuristic gives strong results on this task. Furthermore, we show that a long short-term memory (LSTM) language model is even more effective at recovering order, with our basic model outperforming a state-of-the-art syntactic model by 11.5 BLEU points. Additional data and larger beams yield further gains, at the expense of training and search time.
Tasks	Language Modelling
Published	2016-04-28
URL	http://arxiv.org/abs/1604.08633v2
PDF	http://arxiv.org/pdf/1604.08633v2.pdf
PWC	https://paperswithcode.com/paper/word-ordering-without-syntax
Repo	https://github.com/allenschmaltz/word_ordering
Framework	none

On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal Inference


Title	On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal Inference
Authors	Anish Acharya, Uddipan Mukherjee, Charless Fowlkes
Abstract	Estimates of image gradients play a ubiquitous role in image segmentation and classification problems since gradients directly relate to the boundaries or the edges of a scene. This paper proposes an unified approach to gradient estimation based on fractional calculus that is computationally cheap and readily applicable to any existing algorithm that relies on image gradients. We show experiments on edge detection and image segmentation on the Stanford Backgrounds Dataset where these improved local gradients outperforms state of the art, achieving a performance of 79.2% average accuracy.
Tasks	Edge Detection, Semantic Segmentation
Published	2016-05-07
URL	http://arxiv.org/abs/1605.02240v1
PDF	http://arxiv.org/pdf/1605.02240v1.pdf
PWC	https://paperswithcode.com/paper/on-image-segmentation-using-fractional
Repo	https://github.com/anishacharya/Image-Segmentation-FDOG-TRW
Framework	none

End-to-end Learning of Driving Models from Large-scale Video Datasets


Title	End-to-end Learning of Driving Models from Large-scale Video Datasets
Authors	Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell
Abstract	Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm.
Tasks	Scene Segmentation
Published	2016-12-04
URL	http://arxiv.org/abs/1612.01079v2
PDF	http://arxiv.org/pdf/1612.01079v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-of-driving-models-from
Repo	https://github.com/NupurBhaisare/BDD-Model
Framework	tf

Learning a Natural Language Interface with Neural Programmer


Title	Learning a Natural Language Interface with Neural Programmer
Authors	Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei
Abstract	Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser.
Tasks	Question Answering
Published	2016-11-28
URL	http://arxiv.org/abs/1611.08945v4
PDF	http://arxiv.org/pdf/1611.08945v4.pdf
PWC	https://paperswithcode.com/paper/learning-a-natural-language-interface-with
Repo	https://github.com/pramodkaushik/np_analysis
Framework	tf

Learning Non-Lambertian Object Intrinsics across ShapeNet Categories


Title	Learning Non-Lambertian Object Intrinsics across ShapeNet Categories
Authors	Jian Shi, Yue Dong, Hao Su, Stella X. Yu
Abstract	We consider the non-Lambertian object intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object. We build a large-scale object intrinsics database based on existing 3D models in the ShapeNet database. Rendered with realistic environment maps, millions of synthetic images of objects and their corresponding albedo, shading, and specular ground-truth images are used to train an encoder-decoder CNN. Once trained, the network can decompose an image into the product of albedo and shading components, along with an additive specular component. Our CNN delivers accurate and sharp results in this classical inverse problem of computer vision, sharp details attributed to skip layer connections at corresponding resolutions from the encoder to the decoder. Benchmarked on our ShapeNet and MIT intrinsics datasets, our model consistently outperforms the state-of-the-art by a large margin. We train and test our CNN on different object categories. Perhaps surprising especially from the CNN classification perspective, our intrinsics CNN generalizes very well across categories. Our analysis shows that feature learning at the encoder stage is more crucial for developing a universal representation across categories. We apply our synthetic data trained model to images and videos downloaded from the internet, and observe robust and realistic intrinsics results. Quality non-Lambertian intrinsics could open up many interesting applications such as image-based albedo and specular editing.
Tasks
Published	2016-12-27
URL	http://arxiv.org/abs/1612.08510v1
PDF	http://arxiv.org/pdf/1612.08510v1.pdf
PWC	https://paperswithcode.com/paper/learning-non-lambertian-object-intrinsics
Repo	https://github.com/shi-jian/shapenet-intrinsics
Framework	torch

Effectiveness of greedily collecting items in open world games


Title	Effectiveness of greedily collecting items in open world games
Authors	Andrej Gajduk
Abstract	Since Pokemon Go sent millions on the quest of collecting virtual monsters, an important question has been on the minds of many people: Is going after the closest item first a time-and-cost-effective way to play? Here, we show that this is in fact a good strategy which performs on average only 7% worse than the best possible solution in terms of the total distance traveled to gather all the items. Even when accounting for errors due to the inability of people to accurately measure distances by eye, the performance only goes down to 16% of the optimal solution.
Tasks
Published	2016-08-17
URL	http://arxiv.org/abs/1608.06175v1
PDF	http://arxiv.org/pdf/1608.06175v1.pdf
PWC	https://paperswithcode.com/paper/effectiveness-of-greedily-collecting-items-in
Repo	https://github.com/gajduk/greedy-tsp
Framework	none

AGA: Attribute Guided Augmentation


Title	AGA: Attribute Guided Augmentation
Authors	Mandar Dixit, Roland Kwitt, Marc Niethammer, Nuno Vasconcelos
Abstract	We consider the problem of data augmentation, i.e., generating artificial samples to extend a given corpus of training data. Specifically, we propose attributed-guided augmentation (AGA) which learns a mapping that allows to synthesize data such that an attribute of a synthesized sample is at a desired value or strength. This is particularly interesting in situations where little data with no attribute annotation is available for learning, but we have access to a large external corpus of heavily annotated samples. While prior works primarily augment in the space of images, we propose to perform augmentation in feature space instead. We implement our approach as a deep encoder-decoder architecture that learns the synthesis function in an end-to-end manner. We demonstrate the utility of our approach on the problems of (1) one-shot object recognition in a transfer-learning setting where we have no prior knowledge of the new classes, as well as (2) object-based one-shot scene recognition. As external data, we leverage 3D depth and pose information from the SUN RGB-D dataset. Our experiments show that attribute-guided augmentation of high-level CNN features considerably improves one-shot recognition performance on both problems.
Tasks	Data Augmentation, Object Recognition, Scene Recognition, Transfer Learning
Published	2016-12-08
URL	http://arxiv.org/abs/1612.02559v2
PDF	http://arxiv.org/pdf/1612.02559v2.pdf
PWC	https://paperswithcode.com/paper/aga-attribute-guided-augmentation
Repo	https://github.com/rkwitt/GuidedAugmentation
Framework	torch

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue


Title	Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Authors	Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, Ian Reid
Abstract	A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manu- ally labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photomet- ric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset (without any further augmentation) gives com- parable performance to that of the state of art supervised methods for single view depth estimation.
Tasks	Calibration, Depth Estimation
Published	2016-03-16
URL	http://arxiv.org/abs/1603.04992v2
PDF	http://arxiv.org/pdf/1603.04992v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-cnn-for-single-view-depth
Repo	https://github.com/Ravi-Garg/Unsupervised_Depth_Estimation
Framework	none

Energy-based Generative Adversarial Network


Title	Energy-based Generative Adversarial Network
Authors	Junbo Zhao, Michael Mathieu, Yann LeCun
Abstract	We introduce the “Energy-based Generative Adversarial Network” model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images.
Tasks
Published	2016-09-11
URL	http://arxiv.org/abs/1609.03126v4
PDF	http://arxiv.org/pdf/1609.03126v4.pdf
PWC	https://paperswithcode.com/paper/energy-based-generative-adversarial-network
Repo	https://github.com/buriburisuri/ebgan
Framework	tf

Political Speech Generation


Title	Political Speech Generation
Authors	Valentin Kassarnig
Abstract	In this report we present a system that can generate political speeches for a desired political party. Furthermore, the system allows to specify whether a speech should hold a supportive or opposing opinion. The system relies on a combination of several state-of-the-art NLP methods which are discussed in this report. These include n-grams, Justeson & Katz POS tag filter, recurrent neural networks, and latent Dirichlet allocation. Sequences of words are generated based on probabilities obtained from two underlying models: A language model takes care of the grammatical correctness while a topic model aims for textual consistency. Both models were trained on the Convote dataset which contains transcripts from US congressional floor debates. Furthermore, we present a manual and an automated approach to evaluate the quality of generated speeches. In an experimental evaluation generated speeches have shown very high quality in terms of grammatical correctness and sentence transitions.
Tasks	Language Modelling
Published	2016-01-13
URL	http://arxiv.org/abs/1601.03313v2
PDF	http://arxiv.org/pdf/1601.03313v2.pdf
PWC	https://paperswithcode.com/paper/political-speech-generation
Repo	https://github.com/valentin012/conspeech
Framework	none