October 21, 2019

3254 words 16 mins read

Paper Group AWR 29

Paper Group AWR 29

Changing the Image Memorability: From Basic Photo Editing to GANs. Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking. Learning image-to-image translation using paired and unpaired training samples. Transport-Based Pattern Theory: A Signal Transformation Approach. Chinese Lexical Analysis with Deep Bi-GRU-CRF Network. DynamicGE …

Changing the Image Memorability: From Basic Photo Editing to GANs

Title Changing the Image Memorability: From Basic Photo Editing to GANs
Authors Oleksii Sidorov
Abstract Memorability is considered to be an important characteristic of visual content, whereas for advertisement and educational purposes it is often crucial. Despite numerous studies on understanding and predicting image memorability, there are almost no achievements in memorability modification. In this work, we study two approaches to image editing - GAN and classical image processing - and show their impact on memorability. The visual features which influence memorability directly stay unknown till now, hence it is impossible to control it manually. As a solution, we let GAN learn it deeply using labeled data, and then use it for conditional generation of new images. By analogy with algorithms which edit facial attributes, we consider memorability as yet another attribute and operate with it in the same way. Obtained data is also interesting for analysis, simply because there are no real-world examples of successful change of image memorability while preserving its other attributes. We believe this may give many new answers to the question “what makes an image memorable?” Apart from that we also study the influence of conventional photo-editing tools (Photoshop, Instagram, etc.) used daily by a wide audience on memorability. In this case, we start from real practical methods and study it using statistics and recent advances in memorability prediction. Photographers, designers, and advertisers will benefit from the results of this study directly.
Tasks
Published 2018-11-09
URL http://arxiv.org/abs/1811.03825v4
PDF http://arxiv.org/pdf/1811.03825v4.pdf
PWC https://paperswithcode.com/paper/changing-the-image-memorability-from-basic
Repo https://github.com/acecreamu/changing-the-memorability
Framework tf

Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking

Title Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking
Authors Heng Fan, Haibin Ling
Abstract Region proposal networks (RPN) have been recently combined with the Siamese network for tracking, and shown excellent accuracy with high efficiency. Nevertheless, previously proposed one-stage Siamese-RPN trackers degenerate in presence of similar distractors and large scale variation. Addressing these issues, we propose a multi-stage tracking framework, Siamese Cascaded RPN (C-RPN), which consists of a sequence of RPNs cascaded from deep high-level to shallow low-level layers in a Siamese network. Compared to previous solutions, C-RPN has several advantages: (1) Each RPN is trained using the outputs of RPN in the previous stage. Such process stimulates hard negative sampling, resulting in more balanced training samples. Consequently, the RPNs are sequentially more discriminative in distinguishing difficult background (i.e., similar distractors). (2) Multi-level features are fully leveraged through a novel feature transfer block (FTB) for each RPN, further improving the discriminability of C-RPN using both high-level semantic and low-level spatial information. (3) With multiple steps of regressions, C-RPN progressively refines the location and shape of the target in each RPN with adjusted anchor boxes in the previous stage, which makes localization more accurate. C-RPN is trained end-to-end with the multi-task loss function. In inference, C-RPN is deployed as it is, without any temporal adaption, for real-time tracking. In extensive experiments on OTB-2013, OTB-2015, VOT-2016, VOT-2017, LaSOT and TrackingNet, C-RPN consistently achieves state-of-the-art results and runs in real-time.
Tasks Real-Time Visual Tracking, Visual Tracking
Published 2018-12-14
URL http://arxiv.org/abs/1812.06148v1
PDF http://arxiv.org/pdf/1812.06148v1.pdf
PWC https://paperswithcode.com/paper/siamese-cascaded-region-proposal-networks-for
Repo https://github.com/LeeWise9/Target-Tracking-Overview
Framework tf

Learning image-to-image translation using paired and unpaired training samples

Title Learning image-to-image translation using paired and unpaired training samples
Authors Soumya Tripathy, Juho Kannala, Esa Rahtu
Abstract Image-to-image translation is a general name for a task where an image from one domain is converted to a corresponding image in another domain, given sufficient training data. Traditionally different approaches have been proposed depending on whether aligned image pairs or two sets of (unaligned) examples from both domains are available for training. While paired training samples might be difficult to obtain, the unpaired setup leads to a highly under-constrained problem and inferior results. In this paper, we propose a new general purpose image-to-image translation model that is able to utilize both paired and unpaired training data simultaneously. We compare our method with two strong baselines and obtain both qualitatively and quantitatively improved results. Our model outperforms the baselines also in the case of purely paired and unpaired training data. To our knowledge, this is the first work to consider such hybrid setup in image-to-image translation.
Tasks Image-to-Image Translation
Published 2018-05-08
URL http://arxiv.org/abs/1805.03189v1
PDF http://arxiv.org/pdf/1805.03189v1.pdf
PWC https://paperswithcode.com/paper/learning-image-to-image-translation-using
Repo https://github.com/Blade6570/Learningimage-to-imagetranslationusingpairedandunpairedtrainingsamples
Framework pytorch

Transport-Based Pattern Theory: A Signal Transformation Approach

Title Transport-Based Pattern Theory: A Signal Transformation Approach
Authors Liam Cattell, Gustavo K. Rohde
Abstract In many scientific fields imaging is used to relate a certain physical quantity to other dependent variables. Therefore, images can be considered as a map from a real-world coordinate system to the non-negative measurements being acquired. In this work we describe an approach for simultaneous modeling and inference of such data, using the mathematics of optimal transport. To achieve this, we describe a numerical implementation of the linear optimal transport transform, based on the solution of the Monge-Ampere equation, which uses Brenier’s theorem to characterize the solution of the Monge functional as the derivative of a convex potential function. We use our implementation of the transform to compute a curl-free mapping between two images, and show that it is able to match images with lower error that existing methods. Moreover, we provide theoretical justification for properties of the linear optimal transport framework observed in the literature, including a theorem for the linear separation of data classes. Finally, we use our optimal transport method to empirically demonstrate that the linear separability theorem holds, by rendering non-linearly separable data as linearly separable following transform to transport space.
Tasks
Published 2018-02-20
URL http://arxiv.org/abs/1802.07163v2
PDF http://arxiv.org/pdf/1802.07163v2.pdf
PWC https://paperswithcode.com/paper/transport-based-pattern-theory-a-signal
Repo https://github.com/skolouri/BAMC2019
Framework none

Chinese Lexical Analysis with Deep Bi-GRU-CRF Network

Title Chinese Lexical Analysis with Deep Bi-GRU-CRF Network
Authors Zhenyu Jiao, Shuqi Sun, Ke Sun
Abstract Lexical analysis is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end lexical analysis models with recurrent neural networks have gained increasing attention. In this report, we introduce a deep Bi-GRU-CRF network that jointly models word segmentation, part-of-speech tagging and named entity recognition tasks. We trained the model using several massive corpus pre-tagged by our best Chinese lexical analysis tool, together with a small, yet high-quality human annotated corpus. We conducted balanced sampling between different corpora to guarantee the influence of human annotations, and fine-tune the CRF decoding layer regularly during the training progress. As evaluated by linguistic experts, the model achieved a 95.5% accuracy on the test set, roughly 13% relative error reduction over our (previously) best Chinese lexical analysis tool. The model is computationally efficient, achieving the speed of 2.3K characters per second with one thread.
Tasks Lexical Analysis, Named Entity Recognition, Part-Of-Speech Tagging
Published 2018-07-05
URL http://arxiv.org/abs/1807.01882v1
PDF http://arxiv.org/pdf/1807.01882v1.pdf
PWC https://paperswithcode.com/paper/chinese-lexical-analysis-with-deep-bi-gru-crf
Repo https://github.com/baidu/lac
Framework none

DynamicGEM: A Library for Dynamic Graph Embedding Methods

Title DynamicGEM: A Library for Dynamic Graph Embedding Methods
Authors Palash Goyal, Sujit Rokka Chhetri, Ninareh Mehrabi, Emilio Ferrara, Arquimedes Canedo
Abstract DynamicGEM is an open-source Python library for learning node representations of dynamic graphs. It consists of state-of-the-art algorithms for defining embeddings of nodes whose connections evolve over time. The library also contains the evaluation framework for four downstream tasks on the network: graph reconstruction, static and temporal link prediction, node classification, and temporal visualization. We have implemented various metrics to evaluate the state-of-the-art methods, and examples of evolving networks from various domains. We have easy-to-use functions to call and evaluate the methods and have extensive usage documentation. Furthermore, DynamicGEM provides a template to add new algorithms with ease to facilitate further research on the topic.
Tasks Graph Embedding, Link Prediction, Node Classification
Published 2018-11-26
URL http://arxiv.org/abs/1811.10734v1
PDF http://arxiv.org/pdf/1811.10734v1.pdf
PWC https://paperswithcode.com/paper/dynamicgem-a-library-for-dynamic-graph
Repo https://github.com/palash1992/DynamicGEM
Framework tf

World Models

Title World Models
Authors David Ha, Jürgen Schmidhuber
Abstract We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment. An interactive version of this paper is available at https://worldmodels.github.io/
Tasks Car Racing
Published 2018-03-27
URL http://arxiv.org/abs/1803.10122v4
PDF http://arxiv.org/pdf/1803.10122v4.pdf
PWC https://paperswithcode.com/paper/world-models
Repo https://github.com/yueqiw/gqn-world-model
Framework pytorch

AutoLoss: Learning Discrete Schedules for Alternate Optimization

Title AutoLoss: Learning Discrete Schedules for Alternate Optimization
Authors Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing
Abstract Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters. Appropriately scheduling the optimization of a task objective or a set of parameters is usually crucial to the quality of convergence. In this paper, we present AutoLoss, a meta-learning framework that automatically learns and determines the optimization schedule. AutoLoss provides a generic way to represent and learn the discrete optimization schedule from metadata, allows for a dynamic and data-driven schedule in ML problems that involve alternating updates of different parameters or from different loss objectives. We apply AutoLoss on four ML tasks: d-ary quadratic regression, classification using a multi-layer perceptron (MLP), image generation using GANs, and multi-task neural machine translation (NMT). We show that the AutoLoss controller is able to capture the distribution of better optimization schedules that result in higher quality of convergence on all four tasks. The trained AutoLoss controller is generalizable – it can guide and improve the learning of a new task model with different specifications, or on different datasets.
Tasks Image Generation, Machine Translation, Meta-Learning
Published 2018-10-04
URL http://arxiv.org/abs/1810.02442v1
PDF http://arxiv.org/pdf/1810.02442v1.pdf
PWC https://paperswithcode.com/paper/autoloss-learning-discrete-schedules-for
Repo https://github.com/safpla/AutoLossRelease
Framework tf

Learning Quickly to Plan Quickly Using Modular Meta-Learning

Title Learning Quickly to Plan Quickly Using Modular Meta-Learning
Authors Rohan Chitnis, Leslie Pack Kaelbling, Tomás Lozano-Pérez
Abstract Multi-object manipulation problems in continuous state and action spaces can be solved by planners that search over sampled values for the continuous parameters of operators. The efficiency of these planners depends critically on the effectiveness of the samplers used, but effective sampling in turn depends on details of the robot, environment, and task. Our strategy is to learn functions called “specializers” that generate values for continuous operator parameters, given a state description and values for the discrete parameters. Rather than trying to learn a single specializer for each operator from large amounts of data on a single task, we take a modular meta-learning approach. We train on multiple tasks and learn a variety of specializers that, on a new task, can be quickly adapted using relatively little data – thus, our system “learns quickly to plan quickly” using these specializers. We validate our approach experimentally in simulated 3D pick-and-place tasks with continuous state and action spaces. Visit http://tinyurl.com/chitnis-icra-19 for a supplementary video.
Tasks Meta-Learning
Published 2018-09-20
URL http://arxiv.org/abs/1809.07878v2
PDF http://arxiv.org/pdf/1809.07878v2.pdf
PWC https://paperswithcode.com/paper/learning-quickly-to-plan-quickly-using
Repo https://github.com/FerranAlet/modular-metalearning
Framework pytorch

A Unified Model for Opinion Target Extraction and Target Sentiment Prediction

Title A Unified Model for Opinion Target Extraction and Target Sentiment Prediction
Authors Xin Li, Lidong Bing, Piji Li, Wai Lam
Abstract Target-based sentiment analysis involves opinion target extraction and target sentiment classification. However, most of the existing works usually studied one of these two sub-tasks alone, which hinders their practical use. This paper aims to solve the complete task of target-based sentiment analysis in an end-to-end fashion, and presents a novel unified model which applies a unified tagging scheme. Our framework involves two stacked recurrent neural networks: The upper one predicts the unified tags to produce the final output results of the primary target-based sentiment analysis; The lower one performs an auxiliary target boundary prediction aiming at guiding the upper network to improve the performance of the primary task. To explore the inter-task dependency, we propose to explicitly model the constrained transitions from target boundaries to target sentiment polarities. We also propose to maintain the sentiment consistency within an opinion target via a gate mechanism which models the relation between the features for the current word and the previous word. We conduct extensive experiments on three benchmark datasets and our framework achieves consistently superior results.
Tasks Sentiment Analysis
Published 2018-11-13
URL http://arxiv.org/abs/1811.05082v2
PDF http://arxiv.org/pdf/1811.05082v2.pdf
PWC https://paperswithcode.com/paper/a-unified-model-for-opinion-target-extraction
Repo https://github.com/IssacLin/GithubStarRepository
Framework pytorch

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Title The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Authors Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, Oliver Wang
Abstract While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called “perceptual losses”? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
Tasks
Published 2018-01-11
URL http://arxiv.org/abs/1801.03924v2
PDF http://arxiv.org/pdf/1801.03924v2.pdf
PWC https://paperswithcode.com/paper/the-unreasonable-effectiveness-of-deep
Repo https://github.com/kozistr/gan-metrics
Framework pytorch

PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction

Title PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction
Authors Qingyuan Feng, Evgenia Dueva, Artem Cherkasov, Martin Ester
Abstract In silico drug-target interaction (DTI) prediction is an important and challenging problem in biomedical research with a huge potential benefit to the pharmaceutical industry and patients. Most existing methods for DTI prediction including deep learning models generally have binary endpoints, which could be an oversimplification of the problem, and those methods are typically unable to handle cold-target problems, i.e., problems involving target protein that never appeared in the training set. Towards this, we contrived PADME (Protein And Drug Molecule interaction prEdiction), a framework based on Deep Neural Networks, to predict real-valued interaction strength between compounds and proteins without requiring feature engineering. PADME takes both compound and protein information as inputs, so it is capable of solving cold-target (and cold-drug) problems. To our knowledge, we are the first to combine Molecular Graph Convolution (MGC) for compound featurization with protein descriptors for DTI prediction. We used multiple cross-validation split schemes and evaluation metrics to measure the performance of PADME on multiple datasets, including the ToxCast dataset, and PADME consistently dominates baseline methods. The results of a case study, which predicts the binding affinity between various compounds and androgen receptor (AR), suggest PADME’s potential in drug development. The scalability of PADME is another advantage in the age of Big Data.
Tasks Feature Engineering
Published 2018-07-25
URL https://arxiv.org/abs/1807.09741v4
PDF https://arxiv.org/pdf/1807.09741v4.pdf
PWC https://paperswithcode.com/paper/padme-a-deep-learning-based-framework-for
Repo https://github.com/simonfqy/PADME
Framework tf

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Title Localization Recall Precision (LRP): A New Performance Metric for Object Detection
Authors Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan
Abstract Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose ‘Localization Recall Precision (LRP) Error’, a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the ‘Optimal LRP’, the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the ‘best’ confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. At https://github.com/cancam/LRP we provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets. Our source code can easily be adapted to other datasets as well.
Tasks Object Detection
Published 2018-07-04
URL http://arxiv.org/abs/1807.01696v2
PDF http://arxiv.org/pdf/1807.01696v2.pdf
PWC https://paperswithcode.com/paper/localization-recall-precision-lrp-a-new
Repo https://github.com/VladimirYugay/category-level-6D-pose-estimation
Framework none

The streaming rollout of deep networks - towards fully model-parallel execution

Title The streaming rollout of deep networks - towards fully model-parallel execution
Authors Volker Fischer, Jan Köhler, Thomas Pfeil
Abstract Deep neural networks, and in particular recurrent networks, are promising candidates to control autonomous agents that interact in real-time with the physical world. However, this requires a seamless integration of temporal features into the network’s architecture. For the training of and inference with recurrent neural networks, they are usually rolled out over time, and different rollouts exist. Conventionally during inference, the layers of a network are computed in a sequential manner resulting in sparse temporal integration of information and long response times. In this study, we present a theoretical framework to describe rollouts, the level of model-parallelization they induce, and demonstrate differences in solving specific tasks. We prove that certain rollouts, also for networks with only skip and no recurrent connections, enable earlier and more frequent responses, and show empirically that these early responses have better performance. The streaming rollout maximizes these properties and enables a fully parallel execution of the network reducing runtime on massively parallel devices. Finally, we provide an open-source toolbox to design, train, evaluate, and interact with streaming rollouts.
Tasks
Published 2018-06-13
URL http://arxiv.org/abs/1806.04965v2
PDF http://arxiv.org/pdf/1806.04965v2.pdf
PWC https://paperswithcode.com/paper/the-streaming-rollout-of-deep-networks
Repo https://github.com/boschresearch/statestream
Framework tf
Title ProMP: Proximal Meta-Policy Search
Authors Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel
Abstract Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.
Tasks Meta-Learning
Published 2018-10-16
URL http://arxiv.org/abs/1810.06784v3
PDF http://arxiv.org/pdf/1810.06784v3.pdf
PWC https://paperswithcode.com/paper/promp-proximal-meta-policy-search
Repo https://github.com/clrrrr/promp_plus
Framework none
comments powered by Disqus