July 29, 2019

3352 words 16 mins read

Paper Group AWR 141

Paper Group AWR 141

MoDL: Model Based Deep Learning Architecture for Inverse Problems. Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features. Learning Face Age Progression: A Pyramid Architecture of GANs. Learning to Compose Domain-Specific Transformations for Data Augmentation. Learning From Noisy Singly-labeled Data. Don’t Just Assume; Loo …

MoDL: Model Based Deep Learning Architecture for Inverse Problems

Title MoDL: Model Based Deep Learning Architecture for Inverse Problems
Authors Hemant Kumar Aggarwal, Merry P. Mani, Mathews Jacob
Abstract We introduce a model-based image reconstruction framework with a convolution neural network (CNN) based regularization prior. The proposed formulation provides a systematic approach for deriving deep architectures for inverse problems with the arbitrary structure. Since the forward model is explicitly accounted for, a smaller network with fewer parameters is sufficient to capture the image information compared to black-box deep learning approaches, thus reducing the demand for training data and training time. Since we rely on end-to-end training, the CNN weights are customized to the forward model, thus offering improved performance over approaches that rely on pre-trained denoisers. The main difference of the framework from existing end-to-end training strategies is the sharing of the network weights across iterations and channels. Our experiments show that the decoupling of the number of iterations from the network complexity offered by this approach provides benefits including lower demand for training data, reduced risk of overfitting, and implementations with significantly reduced memory footprint. We propose to enforce data-consistency by using numerical optimization blocks such as conjugate gradients algorithm within the network; this approach offers faster convergence per iteration, compared to methods that rely on proximal gradients steps to enforce data consistency. Our experiments show that the faster convergence translates to improved performance, especially when the available GPU memory restricts the number of iterations.
Tasks Image Reconstruction
Published 2017-12-07
URL https://arxiv.org/abs/1712.02862v4
PDF https://arxiv.org/pdf/1712.02862v4.pdf
PWC https://paperswithcode.com/paper/modl-model-based-deep-learning-architecture
Repo https://github.com/hkaggarwal/modl
Framework tf

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

Title Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
Authors Matteo Pagliardini, Prakhar Gupta, Martin Jaggi
Abstract The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.
Tasks Sentence Embeddings, Word Embeddings
Published 2017-03-07
URL http://arxiv.org/abs/1703.02507v3
PDF http://arxiv.org/pdf/1703.02507v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-sentence-embeddings
Repo https://github.com/celento/sent2vec
Framework none

Learning Face Age Progression: A Pyramid Architecture of GANs

Title Learning Face Age Progression: A Pyramid Architecture of GANs
Authors Hongyu Yang, Di Huang, Yunhong Wang, Anil K. Jain
Abstract The two underlying requirements of face age progression, i.e. aging accuracy and identity permanence, are not well studied in the literature. In this paper, we present a novel generative adversarial network based approach. It separately models the constraints for the intrinsic subject-specific characteristics and the age-specific facial changes with respect to the elapsed time, ensuring that the generated faces present desired aging effects while simultaneously keeping personalized properties stable. Further, to generate more lifelike facial details, high-level age-specific features conveyed by the synthesized face are estimated by a pyramidal adversarial discriminator at multiple scales, which simulates the aging effects in a finer manner. The proposed method is applicable to diverse face samples in the presence of variations in pose, expression, makeup, etc., and remarkably vivid aging effects are achieved. Both visual fidelity and quantitative evaluations show that the approach advances the state-of-the-art.
Tasks
Published 2017-11-28
URL http://arxiv.org/abs/1711.10352v4
PDF http://arxiv.org/pdf/1711.10352v4.pdf
PWC https://paperswithcode.com/paper/learning-face-age-progression-a-pyramid
Repo https://github.com/lumosity4tpj/Pytorch-Implementation-of-A-Pyramid-Architecture-of-GANs
Framework pytorch

Learning to Compose Domain-Specific Transformations for Data Augmentation

Title Learning to Compose Domain-Specific Transformations for Data Augmentation
Authors Alexander J. Ratner, Henry R. Ehrenberg, Zeshan Hussain, Jared Dunnmon, Christopher Ré
Abstract Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual transformations, constructing and tuning the more sophisticated compositions typically needed to achieve state-of-the-art results is a time-consuming manual task in practice. We propose a method for automating this process by learning a generative sequence model over user-specified transformation functions using a generative adversarial approach. Our method can make use of arbitrary, non-deterministic transformation functions, is robust to misspecified user input, and is trained on unlabeled data. The learned transformation model can then be used to perform data augmentation for any end discriminative model. In our experiments, we show the efficacy of our approach on both image and text datasets, achieving improvements of 4.0 accuracy points on CIFAR-10, 1.4 F1 points on the ACE relation extraction task, and 3.4 accuracy points when using domain-specific transformation operations on a medical imaging dataset as compared to standard heuristic augmentation approaches.
Tasks Data Augmentation, Image Augmentation, Relation Extraction, Text Augmentation
Published 2017-09-06
URL http://arxiv.org/abs/1709.01643v3
PDF http://arxiv.org/pdf/1709.01643v3.pdf
PWC https://paperswithcode.com/paper/learning-to-compose-domain-specific
Repo https://github.com/HazyResearch/tanda
Framework tf

Learning From Noisy Singly-labeled Data

Title Learning From Noisy Singly-labeled Data
Authors Ashish Khetan, Zachary C. Lipton, Anima Anandkumar
Abstract Supervised learning depends on annotated examples, which are taken to be the \emph{ground truth}. But these labels often come from noisy crowdsourcing platforms, like Amazon Mechanical Turk. Practitioners typically collect multiple labels per example and aggregate the results to mitigate noise (the classic crowdsourcing problem). Given a fixed annotation budget and unlimited unlabeled data, redundant annotation comes at the expense of fewer labeled examples. This raises two fundamental questions: (1) How can we best learn from noisy workers? (2) How should we allocate our labeling budget to maximize the performance of a classifier? We propose a new algorithm for jointly modeling labels and worker quality from noisy crowd-sourced data. The alternating minimization proceeds in rounds, estimating worker quality from disagreement with the current model and then updating the model by optimizing a loss function that accounts for the current estimate of worker quality. Unlike previous approaches, even with only one annotation per example, our algorithm can estimate worker quality. We establish a generalization error bound for models learned with our algorithm and establish theoretically that it’s better to label many examples once (vs less multiply) when worker quality is above a threshold. Experiments conducted on both ImageNet (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm’s benefits.
Tasks
Published 2017-12-13
URL http://arxiv.org/abs/1712.04577v2
PDF http://arxiv.org/pdf/1712.04577v2.pdf
PWC https://paperswithcode.com/paper/learning-from-noisy-singly-labeled-data
Repo https://github.com/khetan2/MBEM
Framework mxnet

Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Title Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Authors Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
Abstract A number of studies have found that today’s Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the model from ‘cheating’ by primarily relying on priors in the training data. Specifically, GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. GVQA is built off an existing VQA model – Stacked Attention Networks (SAN). Our experiments demonstrate that GVQA significantly outperforms SAN on both VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in several cases. GVQA offers strengths complementary to SAN when trained and evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more transparent and interpretable than existing VQA models.
Tasks Question Answering, Visual Question Answering
Published 2017-12-01
URL http://arxiv.org/abs/1712.00377v2
PDF http://arxiv.org/pdf/1712.00377v2.pdf
PWC https://paperswithcode.com/paper/dont-just-assume-look-and-answer-overcoming
Repo https://github.com/AishwaryaAgrawal/GVQA
Framework none

MoCoGAN: Decomposing Motion and Content for Video Generation

Title MoCoGAN: Decomposing Motion and Content for Video Generation
Authors Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz
Abstract Visual signals in a video can be divided into content and motion. While content specifies which objects are in the video, motion describes their dynamics. Based on this prior, we propose the Motion and Content decomposed Generative Adversarial Network (MoCoGAN) framework for video generation. The proposed framework generates a video by mapping a sequence of random vectors to a sequence of video frames. Each random vector consists of a content part and a motion part. While the content part is kept fixed, the motion part is realized as a stochastic process. To learn motion and content decomposition in an unsupervised manner, we introduce a novel adversarial learning scheme utilizing both image and video discriminators. Extensive experimental results on several challenging datasets with qualitative and quantitative comparison to the state-of-the-art approaches, verify effectiveness of the proposed framework. In addition, we show that MoCoGAN allows one to generate videos with same content but different motion as well as videos with different content and same motion.
Tasks Video Generation
Published 2017-07-17
URL http://arxiv.org/abs/1707.04993v2
PDF http://arxiv.org/pdf/1707.04993v2.pdf
PWC https://paperswithcode.com/paper/mocogan-decomposing-motion-and-content-for
Repo https://github.com/sergeytulyakov/mocogan
Framework pytorch

Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes

Title Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes
Authors Feras Saad, Leonardo Casarsa, Vikash Mansinghka
Abstract Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching structured data based on probabilistic programming and nonparametric Bayes. Users specify queries in a probabilistic language that combines standard SQL database search operators with an information theoretic ranking function called predictive relevance. Predictive relevance can be calculated by a fast sparse matrix algorithm based on posterior samples from CrossCat, a nonparametric Bayesian model for high-dimensional, heterogeneously-typed data tables. The result is a flexible search technique that applies to a broad class of information retrieval problems, which we integrate into BayesDB, a probabilistic programming platform for probabilistic data analysis. This paper demonstrates applications to databases of US colleges, global macroeconomic indicators of public health, and classic cars. We found that human evaluators often prefer the results from probabilistic search to results from a standard baseline.
Tasks Information Retrieval, Probabilistic Programming
Published 2017-04-04
URL http://arxiv.org/abs/1704.01087v1
PDF http://arxiv.org/pdf/1704.01087v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-search-for-structured-data-via
Repo https://github.com/probcomp/cgpm
Framework none

Learning Certifiably Optimal Rule Lists for Categorical Data

Title Learning Certifiably Optimal Rule Lists for Categorical Data
Authors Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, Cynthia Rudin
Abstract We present the design and implementation of a custom discrete optimization technique for building rule lists over a categorical feature space. Our algorithm produces rule lists with optimal training performance, according to the regularized empirical risk, with a certificate of optimality. By leveraging algorithmic bounds, efficient data structures, and computational reuse, we achieve several orders of magnitude speedup in time and a massive reduction of memory consumption. We demonstrate that our approach produces optimal rule lists on practical problems in seconds. Our results indicate that it is possible to construct optimal sparse rule lists that are approximately as accurate as the COMPAS proprietary risk prediction tool on data from Broward County, Florida, but that are completely interpretable. This framework is a novel alternative to CART and other decision tree methods for interpretable modeling.
Tasks
Published 2017-04-06
URL http://arxiv.org/abs/1704.01701v4
PDF http://arxiv.org/pdf/1704.01701v4.pdf
PWC https://paperswithcode.com/paper/learning-certifiably-optimal-rule-lists-for
Repo https://github.com/nlarusstone/corels
Framework none

Patterns versus Characters in Subword-aware Neural Language Modeling

Title Patterns versus Characters in Subword-aware Neural Language Modeling
Authors Rustem Takhanov, Zhenisbek Assylbekov
Abstract Words in some natural languages can have a composite structure. Elements of this structure include the root (that could also be composite), prefixes and suffixes with which various nuances and relations to other words can be expressed. Thus, in order to build a proper word representation one must take into account its internal structure. From a corpus of texts we extract a set of frequent subwords and from the latter set we select patterns, i.e. subwords which encapsulate information on character $n$-gram regularities. The selection is made using the pattern-based Conditional Random Field model with $l_1$ regularization. Further, for every word we construct a new sequence over an alphabet of patterns. The new alphabet’s symbols confine a local statistical context stronger than the characters, therefore they allow better representations in ${\mathbb{R}}^n$ and are better building blocks for word representation. In the task of subword-aware language modeling, pattern-based models outperform character-based analogues by 2-20 perplexity points. Also, a recurrent neural network in which a word is represented as a sum of embeddings of its patterns is on par with a competitive and significantly more sophisticated character-based convolutional architecture.
Tasks Language Modelling
Published 2017-09-02
URL http://arxiv.org/abs/1709.00541v1
PDF http://arxiv.org/pdf/1709.00541v1.pdf
PWC https://paperswithcode.com/paper/patterns-versus-characters-in-subword-aware
Repo https://github.com/zh3nis/pat-sum
Framework tf

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

Title HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
Authors Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang
Abstract Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems. Despite that the convolutional neural networks are remarkable in learning discriminative features from images, the learning of comprehensive features of pedestrians for fine-grained tasks remains an open problem. In this study, we propose a new attention-based deep neural network, named as HydraPlus-Net (HP-net), that multi-directionally feeds the multi-level attention maps to different feature layers. The attentive deep features learned from the proposed HP-net bring unique advantages: (1) the model is capable of capturing multiple attentions from low-level to semantic-level, and (2) it explores the multi-scale selectiveness of attentive features to enrich the final feature representations for a pedestrian image. We demonstrate the effectiveness and generality of the proposed HP-net for pedestrian analysis on two tasks, i.e. pedestrian attribute recognition and person re-identification. Intensive experimental results have been provided to prove that the HP-net outperforms the state-of-the-art methods on various datasets.
Tasks Pedestrian Attribute Recognition, Person Re-Identification
Published 2017-09-28
URL http://arxiv.org/abs/1709.09930v1
PDF http://arxiv.org/pdf/1709.09930v1.pdf
PWC https://paperswithcode.com/paper/hydraplus-net-attentive-deep-features-for
Repo https://github.com/xh-liu/HydraPlus-Net
Framework none

Process Monitoring on Sequences of System Call Count Vectors

Title Process Monitoring on Sequences of System Call Count Vectors
Authors Michael Dymshits, Ben Myara, David Tolpin
Abstract We introduce a methodology for efficient monitoring of processes running on hosts in a corporate network. The methodology is based on collecting streams of system calls produced by all or selected processes on the hosts, and sending them over the network to a monitoring server, where machine learning algorithms are used to identify changes in process behavior due to malicious activity, hardware failures, or software errors. The methodology uses a sequence of system call count vectors as the data format which can handle large and varying volumes of data. Unlike previous approaches, the methodology introduced in this paper is suitable for distributed collection and processing of data in large corporate networks. We evaluate the methodology both in a laboratory setting on a real-life setup and provide statistics characterizing performance and accuracy of the methodology.
Tasks
Published 2017-07-12
URL http://arxiv.org/abs/1707.03821v1
PDF http://arxiv.org/pdf/1707.03821v1.pdf
PWC https://paperswithcode.com/paper/process-monitoring-on-sequences-of-system
Repo https://github.com/michael135/count-vector-paper-experiments
Framework tf

A Survey of Neuromorphic Computing and Neural Networks in Hardware

Title A Survey of Neuromorphic Computing and Neural Networks in Hardware
Authors Catherine D. Schuman, Thomas E. Potok, Robert M. Patton, J. Douglas Birdwell, Mark E. Dean, Garrett S. Rose, James S. Plank
Abstract Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture. This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems. The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities. In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history. We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications. We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled. The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed.
Tasks
Published 2017-05-19
URL http://arxiv.org/abs/1705.06963v1
PDF http://arxiv.org/pdf/1705.06963v1.pdf
PWC https://paperswithcode.com/paper/a-survey-of-neuromorphic-computing-and-neural
Repo https://github.com/THULemon/Neuromorphic-Computing
Framework none

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Title Accurate Single Stage Detector Using Recurrent Rolling Convolution
Authors Jimmy Ren, Xiaohao Chen, Jianbo Liu, Wenxiu Sun, Jiahao Pang, Qiong Yan, Yu-Wing Tai, Li Xu
Abstract Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive when evaluated in benchmarks consider mAP for high IoU thresholds. In this paper, we proposed a novel single stage end-to-end trainable object detection network to overcome this limitation. We achieved this by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are “deep in context”. We evaluated our method in the challenging KITTI dataset which measures methods under IoU threshold of 0.7. We showed that with RRC, a single reduced VGG-16 based model already significantly outperformed all the previously published results. At the time this paper was written our models ranked the first in KITTI car detection (the hard level), the first in cyclist detection and the second in pedestrian detection. These results were not reached by the previous single stage methods. The code is publicly available.
Tasks 3D Object Detection, Object Detection, Pedestrian Detection
Published 2017-04-19
URL http://arxiv.org/abs/1704.05776v1
PDF http://arxiv.org/pdf/1704.05776v1.pdf
PWC https://paperswithcode.com/paper/accurate-single-stage-detector-using
Repo https://github.com/xiaohaoChen/rrc_detection
Framework none

Deep Depth From Focus

Title Deep Depth From Focus
Authors Caner Hazirbas, Sebastian Georg Soyer, Maximilian Christian Staab, Laura Leal-Taixé, Daniel Cremers
Abstract Depth from focus (DFF) is one of the classical ill-posed inverse problems in computer vision. Most approaches recover the depth at each pixel based on the focal setting which exhibits maximal sharpness. Yet, it is not obvious how to reliably estimate the sharpness level, particularly in low-textured areas. In this paper, we propose Deep Depth From Focus (DDFF)' as the first end-to-end learning approach to this problem. One of the main challenges we face is the hunger for data of deep neural networks. In order to obtain a significant amount of focal stacks with corresponding groundtruth depth, we propose to leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us to digitally create focal stacks of varying sizes. Compared to existing benchmarks our dataset is 25 times larger, enabling the use of machine learning for this inverse problem. We compare our results with state-of-the-art DFF methods and we also analyze the effect of several key deep architectural components. These experiments show that our proposed method DDFFNet’ achieves state-of-the-art performance in all scenes, reducing depth error by more than 75% compared to the classical DFF methods.
Tasks
Published 2017-04-04
URL http://arxiv.org/abs/1704.01085v3
PDF http://arxiv.org/pdf/1704.01085v3.pdf
PWC https://paperswithcode.com/paper/deep-depth-from-focus
Repo https://github.com/soyers/ddff-pytorch
Framework pytorch
comments powered by Disqus