Paper Group AWR 101
Input Convex Neural Networks. Temporal Learning and Sequence Modeling for a Job Recommender System. A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans. Deep Fully-Connected Networks for Video Compressive Sensing. Deep Learning for Music. Word Ordering Without Syntax. On Image segmentation using Fractional Gradients-Learning Model P …
Input Convex Neural Networks
Title | Input Convex Neural Networks |
Authors | Brandon Amos, Lei Xu, J. Zico Kolter |
Abstract | This paper presents the input convex neural network architecture. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the network is a convex function of (some of) the inputs. The networks allow for efficient inference via optimization over some inputs to the network given others, and can be applied to settings including structured prediction, data imputation, reinforcement learning, and others. In this paper we lay the basic groundwork for these models, proposing methods for inference, optimization and learning, and analyze their representational power. We show that many existing neural network architectures can be made input-convex with a minor modification, and develop specialized optimization algorithms tailored to this setting. Finally, we highlight the performance of the methods on multi-label prediction, image completion, and reinforcement learning problems, where we show improvement over the existing state of the art in many cases. |
Tasks | Imputation, Structured Prediction |
Published | 2016-09-22 |
URL | http://arxiv.org/abs/1609.07152v3 |
http://arxiv.org/pdf/1609.07152v3.pdf | |
PWC | https://paperswithcode.com/paper/input-convex-neural-networks |
Repo | https://github.com/locuslab/icnn |
Framework | tf |
Temporal Learning and Sequence Modeling for a Job Recommender System
Title | Temporal Learning and Sequence Modeling for a Job Recommender System |
Authors | Kuan Liu, Xing Shi, Anoop Kumar, Linhong Zhu, Prem Natarajan |
Abstract | We present our solution to the job recommendation task for RecSys Challenge 2016. The main contribution of our work is to combine temporal learning with sequence modeling to capture complex user-item activity patterns to improve job recommendations. First, we propose a time-based ranking model applied to historical observations and a hybrid matrix factorization over time re-weighted interactions. Second, we exploit sequence properties in user-items activities and develop a RNN-based recommendation model. Our solution achieved 5$^{th}$ place in the challenge among more than 100 participants. Notably, the strong performance of our RNN approach shows a promising new direction in employing sequence modeling for recommendation systems. |
Tasks | Recommendation Systems |
Published | 2016-08-11 |
URL | http://arxiv.org/abs/1608.03333v1 |
http://arxiv.org/pdf/1608.03333v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-learning-and-sequence-modeling-for-a |
Repo | https://github.com/skywaLKer518/A-Recsys |
Framework | tf |
A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans
Title | A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans |
Authors | Yuyin Zhou, Lingxi Xie, Wei Shen, Yan Wang, Elliot K. Fishman, Alan L. Yuille |
Abstract | Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than 4%, measured by the average Dice-S{\o}rensen Coefficient (DSC). In addition, we report 62.43% DSC in the worst case, which guarantees the reliability of our approach in clinical applications. |
Tasks | Pancreas Segmentation |
Published | 2016-12-25 |
URL | http://arxiv.org/abs/1612.08230v4 |
http://arxiv.org/pdf/1612.08230v4.pdf | |
PWC | https://paperswithcode.com/paper/a-fixed-point-model-for-pancreas-segmentation |
Repo | https://github.com/198808xc/OrganSegC2F |
Framework | caffe2 |
Deep Fully-Connected Networks for Video Compressive Sensing
Title | Deep Fully-Connected Networks for Video Compressive Sensing |
Authors | Michael Iliadis, Leonidas Spinoulas, Aggelos K. Katsaggelos |
Abstract | In this work we present a deep learning framework for video compressive sensing. The proposed formulation enables recovery of video frames in a few seconds at significantly improved reconstruction quality compared to previous approaches. Our investigation starts by learning a linear mapping between video sequences and corresponding measured frames which turns out to provide promising results. We then extend the linear formulation to deep fully-connected networks and explore the performance gains using deeper architectures. Our analysis is always driven by the applicability of the proposed framework on existing compressive video architectures. Extensive simulations on several video sequences document the superiority of our approach both quantitatively and qualitatively. Finally, our analysis offers insights into understanding how dataset sizes and number of layers affect reconstruction performance while raising a few points for future investigation. Code is available at Github: https://github.com/miliadis/DeepVideoCS |
Tasks | Compressive Sensing, Video Compressive Sensing |
Published | 2016-03-16 |
URL | http://arxiv.org/abs/1603.04930v2 |
http://arxiv.org/pdf/1603.04930v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-fully-connected-networks-for-video |
Repo | https://github.com/miliadis/DeepVideoCS |
Framework | pytorch |
Deep Learning for Music
Title | Deep Learning for Music |
Authors | Allen Huang, Raymond Wu |
Abstract | Our goal is to be able to build a generative model from a deep neural network architecture to try to create music that has both harmony and melody and is passable as music composed by humans. Previous work in music generation has mainly been focused on creating a single melody. More recent work on polyphonic music modeling, centered around time series probability density estimation, has met some partial success. In particular, there has been a lot of work based off of Recurrent Neural Networks combined with Restricted Boltzmann Machines (RNN-RBM) and other similar recurrent energy based models. Our approach, however, is to perform end-to-end learning and generation with deep neural nets alone. |
Tasks | Density Estimation, Music Generation, Music Modeling, Time Series |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04930v1 |
http://arxiv.org/pdf/1606.04930v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-music |
Repo | https://github.com/sarthak15169/Deep-Music |
Framework | none |
Word Ordering Without Syntax
Title | Word Ordering Without Syntax |
Authors | Allen Schmaltz, Alexander M. Rush, Stuart M. Shieber |
Abstract | Recent work on word ordering has argued that syntactic structure is important, or even required, for effectively recovering the order of a sentence. We find that, in fact, an n-gram language model with a simple heuristic gives strong results on this task. Furthermore, we show that a long short-term memory (LSTM) language model is even more effective at recovering order, with our basic model outperforming a state-of-the-art syntactic model by 11.5 BLEU points. Additional data and larger beams yield further gains, at the expense of training and search time. |
Tasks | Language Modelling |
Published | 2016-04-28 |
URL | http://arxiv.org/abs/1604.08633v2 |
http://arxiv.org/pdf/1604.08633v2.pdf | |
PWC | https://paperswithcode.com/paper/word-ordering-without-syntax |
Repo | https://github.com/allenschmaltz/word_ordering |
Framework | none |
On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal Inference
Title | On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal Inference |
Authors | Anish Acharya, Uddipan Mukherjee, Charless Fowlkes |
Abstract | Estimates of image gradients play a ubiquitous role in image segmentation and classification problems since gradients directly relate to the boundaries or the edges of a scene. This paper proposes an unified approach to gradient estimation based on fractional calculus that is computationally cheap and readily applicable to any existing algorithm that relies on image gradients. We show experiments on edge detection and image segmentation on the Stanford Backgrounds Dataset where these improved local gradients outperforms state of the art, achieving a performance of 79.2% average accuracy. |
Tasks | Edge Detection, Semantic Segmentation |
Published | 2016-05-07 |
URL | http://arxiv.org/abs/1605.02240v1 |
http://arxiv.org/pdf/1605.02240v1.pdf | |
PWC | https://paperswithcode.com/paper/on-image-segmentation-using-fractional |
Repo | https://github.com/anishacharya/Image-Segmentation-FDOG-TRW |
Framework | none |
End-to-end Learning of Driving Models from Large-scale Video Datasets
Title | End-to-end Learning of Driving Models from Large-scale Video Datasets |
Authors | Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell |
Abstract | Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm. |
Tasks | Scene Segmentation |
Published | 2016-12-04 |
URL | http://arxiv.org/abs/1612.01079v2 |
http://arxiv.org/pdf/1612.01079v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-driving-models-from |
Repo | https://github.com/NupurBhaisare/BDD-Model |
Framework | tf |
Learning a Natural Language Interface with Neural Programmer
Title | Learning a Natural Language Interface with Neural Programmer |
Authors | Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario Amodei |
Abstract | Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser. |
Tasks | Question Answering |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.08945v4 |
http://arxiv.org/pdf/1611.08945v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-natural-language-interface-with |
Repo | https://github.com/pramodkaushik/np_analysis |
Framework | tf |
Learning Non-Lambertian Object Intrinsics across ShapeNet Categories
Title | Learning Non-Lambertian Object Intrinsics across ShapeNet Categories |
Authors | Jian Shi, Yue Dong, Hao Su, Stella X. Yu |
Abstract | We consider the non-Lambertian object intrinsic problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object. We build a large-scale object intrinsics database based on existing 3D models in the ShapeNet database. Rendered with realistic environment maps, millions of synthetic images of objects and their corresponding albedo, shading, and specular ground-truth images are used to train an encoder-decoder CNN. Once trained, the network can decompose an image into the product of albedo and shading components, along with an additive specular component. Our CNN delivers accurate and sharp results in this classical inverse problem of computer vision, sharp details attributed to skip layer connections at corresponding resolutions from the encoder to the decoder. Benchmarked on our ShapeNet and MIT intrinsics datasets, our model consistently outperforms the state-of-the-art by a large margin. We train and test our CNN on different object categories. Perhaps surprising especially from the CNN classification perspective, our intrinsics CNN generalizes very well across categories. Our analysis shows that feature learning at the encoder stage is more crucial for developing a universal representation across categories. We apply our synthetic data trained model to images and videos downloaded from the internet, and observe robust and realistic intrinsics results. Quality non-Lambertian intrinsics could open up many interesting applications such as image-based albedo and specular editing. |
Tasks | |
Published | 2016-12-27 |
URL | http://arxiv.org/abs/1612.08510v1 |
http://arxiv.org/pdf/1612.08510v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-non-lambertian-object-intrinsics |
Repo | https://github.com/shi-jian/shapenet-intrinsics |
Framework | torch |
Effectiveness of greedily collecting items in open world games
Title | Effectiveness of greedily collecting items in open world games |
Authors | Andrej Gajduk |
Abstract | Since Pokemon Go sent millions on the quest of collecting virtual monsters, an important question has been on the minds of many people: Is going after the closest item first a time-and-cost-effective way to play? Here, we show that this is in fact a good strategy which performs on average only 7% worse than the best possible solution in terms of the total distance traveled to gather all the items. Even when accounting for errors due to the inability of people to accurately measure distances by eye, the performance only goes down to 16% of the optimal solution. |
Tasks | |
Published | 2016-08-17 |
URL | http://arxiv.org/abs/1608.06175v1 |
http://arxiv.org/pdf/1608.06175v1.pdf | |
PWC | https://paperswithcode.com/paper/effectiveness-of-greedily-collecting-items-in |
Repo | https://github.com/gajduk/greedy-tsp |
Framework | none |
AGA: Attribute Guided Augmentation
Title | AGA: Attribute Guided Augmentation |
Authors | Mandar Dixit, Roland Kwitt, Marc Niethammer, Nuno Vasconcelos |
Abstract | We consider the problem of data augmentation, i.e., generating artificial samples to extend a given corpus of training data. Specifically, we propose attributed-guided augmentation (AGA) which learns a mapping that allows to synthesize data such that an attribute of a synthesized sample is at a desired value or strength. This is particularly interesting in situations where little data with no attribute annotation is available for learning, but we have access to a large external corpus of heavily annotated samples. While prior works primarily augment in the space of images, we propose to perform augmentation in feature space instead. We implement our approach as a deep encoder-decoder architecture that learns the synthesis function in an end-to-end manner. We demonstrate the utility of our approach on the problems of (1) one-shot object recognition in a transfer-learning setting where we have no prior knowledge of the new classes, as well as (2) object-based one-shot scene recognition. As external data, we leverage 3D depth and pose information from the SUN RGB-D dataset. Our experiments show that attribute-guided augmentation of high-level CNN features considerably improves one-shot recognition performance on both problems. |
Tasks | Data Augmentation, Object Recognition, Scene Recognition, Transfer Learning |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.02559v2 |
http://arxiv.org/pdf/1612.02559v2.pdf | |
PWC | https://paperswithcode.com/paper/aga-attribute-guided-augmentation |
Repo | https://github.com/rkwitt/GuidedAugmentation |
Framework | torch |
Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue
Title | Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue |
Authors | Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, Ian Reid |
Abstract | A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manu- ally labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photomet- ric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset (without any further augmentation) gives com- parable performance to that of the state of art supervised methods for single view depth estimation. |
Tasks | Calibration, Depth Estimation |
Published | 2016-03-16 |
URL | http://arxiv.org/abs/1603.04992v2 |
http://arxiv.org/pdf/1603.04992v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-cnn-for-single-view-depth |
Repo | https://github.com/Ravi-Garg/Unsupervised_Depth_Estimation |
Framework | none |
Energy-based Generative Adversarial Network
Title | Energy-based Generative Adversarial Network |
Authors | Junbo Zhao, Michael Mathieu, Yann LeCun |
Abstract | We introduce the “Energy-based Generative Adversarial Network” model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images. |
Tasks | |
Published | 2016-09-11 |
URL | http://arxiv.org/abs/1609.03126v4 |
http://arxiv.org/pdf/1609.03126v4.pdf | |
PWC | https://paperswithcode.com/paper/energy-based-generative-adversarial-network |
Repo | https://github.com/buriburisuri/ebgan |
Framework | tf |
Political Speech Generation
Title | Political Speech Generation |
Authors | Valentin Kassarnig |
Abstract | In this report we present a system that can generate political speeches for a desired political party. Furthermore, the system allows to specify whether a speech should hold a supportive or opposing opinion. The system relies on a combination of several state-of-the-art NLP methods which are discussed in this report. These include n-grams, Justeson & Katz POS tag filter, recurrent neural networks, and latent Dirichlet allocation. Sequences of words are generated based on probabilities obtained from two underlying models: A language model takes care of the grammatical correctness while a topic model aims for textual consistency. Both models were trained on the Convote dataset which contains transcripts from US congressional floor debates. Furthermore, we present a manual and an automated approach to evaluate the quality of generated speeches. In an experimental evaluation generated speeches have shown very high quality in terms of grammatical correctness and sentence transitions. |
Tasks | Language Modelling |
Published | 2016-01-13 |
URL | http://arxiv.org/abs/1601.03313v2 |
http://arxiv.org/pdf/1601.03313v2.pdf | |
PWC | https://paperswithcode.com/paper/political-speech-generation |
Repo | https://github.com/valentin012/conspeech |
Framework | none |