July 29, 2019

2903 words 14 mins read

Paper Group AWR 134

Paper Group AWR 134

A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning. CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning. Lectures on …

A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning

Title A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning
Authors Jingjia Huang, Nannan Li, Tao Zhang, Ge Li
Abstract Existing action detection algorithms usually generate action proposals through an extensive search over the video at multiple temporal scales, which brings about huge computational overhead and deviates from the human perception procedure. We argue that the process of detecting actions should be naturally one of observation and refinement: observe the current window and refine the span of attended window to cover true action regions. In this paper, we propose an active action proposal model that learns to find actions through continuously adjusting the temporal bounds in a self-adaptive way. The whole process can be deemed as an agent, which is firstly placed at a position in the video at random, adopts a sequence of transformations on the current attended region to discover actions according to a learned policy. We utilize reinforcement learning, especially the Deep Q-learning algorithm to learn the agent’s decision policy. In addition, we use temporal pooling operation to extract more effective feature representation for the long temporal window, and design a regression network to adjust the position offsets between predicted results and the ground truth. Experiment results on THUMOS 2014 validate the effectiveness of the proposed approach, which can achieve competitive performance with current action detection algorithms via much fewer proposals.
Tasks Action Detection, Q-Learning
Published 2017-06-22
URL http://arxiv.org/abs/1706.07251v1
PDF http://arxiv.org/pdf/1706.07251v1.pdf
PWC https://paperswithcode.com/paper/a-self-adaptive-proposal-model-for-temporal
Repo https://github.com/Parapompadoo/Temporal_Action_Detection
Framework none

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Title CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Authors Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda, Dewan Md. Farid
Abstract Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.
Tasks
Published 2017-12-12
URL http://arxiv.org/abs/1712.04356v1
PDF http://arxiv.org/pdf/1712.04356v1.pdf
PWC https://paperswithcode.com/paper/cusboost-cluster-based-under-sampling-with
Repo https://github.com/farshidrayhanuiu/CUSBoost
Framework none

DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Title DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks
Authors Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, Jiri Matas
Abstract We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss . DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem – object detection on (de-)blurred images. The method is 5 times faster than the closest competitor – DeepDeblur. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation. The model, code and the dataset are available at https://github.com/KupynOrest/DeblurGAN
Tasks Deblurring, Object Detection
Published 2017-11-19
URL http://arxiv.org/abs/1711.07064v4
PDF http://arxiv.org/pdf/1711.07064v4.pdf
PWC https://paperswithcode.com/paper/deblurgan-blind-motion-deblurring-using
Repo https://github.com/KupynOrest/DeblurGAN
Framework pytorch

Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

Title Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Authors Dipendra Misra, John Langford, Yoav Artzi
Abstract We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning in a contextual bandit setting to train a neural network agent. To guide the agent’s exploration, we use reward shaping with different forms of supervision. Our approach does not require intermediate representations, planning procedures, or training different models. We evaluate in a simulated environment, and show significant improvements over supervised learning and common reinforcement learning variants.
Tasks
Published 2017-04-28
URL http://arxiv.org/abs/1704.08795v2
PDF http://arxiv.org/pdf/1704.08795v2.pdf
PWC https://paperswithcode.com/paper/mapping-instructions-and-visual-observations
Repo https://github.com/clic-lab/blocks
Framework tf

Lectures on Randomized Numerical Linear Algebra

Title Lectures on Randomized Numerical Linear Algebra
Authors Petros Drineas, Michael W. Mahoney
Abstract This chapter is based on lectures on Randomized Numerical Linear Algebra from the 2016 Park City Mathematics Institute summer school on The Mathematics of Data.
Tasks
Published 2017-12-24
URL http://arxiv.org/abs/1712.08880v1
PDF http://arxiv.org/pdf/1712.08880v1.pdf
PWC https://paperswithcode.com/paper/lectures-on-randomized-numerical-linear
Repo https://github.com/bkmi/RandLowRank
Framework none

Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal

Title Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal
Authors Jifeng Wang, Xiang Li, Le Hui, Jian Yang
Abstract Understanding shadows from a single image spontaneously derives into two types of task in previous studies, containing shadow detection and shadow removal. In this paper, we present a multi-task perspective, which is not embraced by any existing work, to jointly learn both detection and removal in an end-to-end fashion that aims at enjoying the mutually improved benefits from each other. Our framework is based on a novel STacked Conditional Generative Adversarial Network (ST-CGAN), which is composed of two stacked CGANs, each with a generator and a discriminator. Specifically, a shadow image is fed into the first generator which produces a shadow detection mask. That shadow image, concatenated with its predicted mask, goes through the second generator in order to recover its shadow-free image consequently. In addition, the two corresponding discriminators are very likely to model higher level relationships and global scene characteristics for the detected shadow region and reconstruction via removing shadows, respectively. More importantly, for multi-task learning, our design of stacked paradigm provides a novel view which is notably different from the commonly used one as the multi-branch version. To fully evaluate the performance of our proposed framework, we construct the first large-scale benchmark with 1870 image triplets (shadow image, shadow mask image, and shadow-free image) under 135 scenes. Extensive experimental results consistently show the advantages of ST-CGAN over several representative state-of-the-art methods on two large-scale publicly available datasets and our newly released one.
Tasks Multi-Task Learning, Shadow Detection
Published 2017-12-07
URL http://arxiv.org/abs/1712.02478v1
PDF http://arxiv.org/pdf/1712.02478v1.pdf
PWC https://paperswithcode.com/paper/stacked-conditional-generative-adversarial
Repo https://github.com/kjybinp/SCGAN
Framework none

Automatic Semantic Style Transfer using Deep Convolutional Neural Networks and Soft Masks

Title Automatic Semantic Style Transfer using Deep Convolutional Neural Networks and Soft Masks
Authors Huihuang Zhao, Paul L. Rosin, Yu-Kun Lai
Abstract This paper presents an automatic image synthesis method to transfer the style of an example image to a content image. When standard neural style transfer approaches are used, the textures and colours in different semantic regions of the style image are often applied inappropriately to the content image, ignoring its semantic layout, and ruining the transfer result. In order to reduce or avoid such effects, we propose a novel method based on automatically segmenting the objects and extracting their soft semantic masks from the style and content images, in order to preserve the structure of the content image while having the style transferred. Each soft mask of the style image represents a specific part of the style image, corresponding to the soft mask of the content image with the same semantics. Both the soft masks and source images are provided as multichannel input to an augmented deep CNN framework for style transfer which incorporates a generative Markov random field (MRF) model. Results on various images show that our method outperforms the most recent techniques.
Tasks Image Generation, Style Transfer
Published 2017-08-31
URL http://arxiv.org/abs/1708.09641v1
PDF http://arxiv.org/pdf/1708.09641v1.pdf
PWC https://paperswithcode.com/paper/automatic-semantic-style-transfer-using-deep
Repo https://github.com/huihuangz/Neural-Style-Transfer-Papers-Code
Framework torch

Separating Style and Content for Generalized Style Transfer

Title Separating Style and Content for Generalized Style Transfer
Authors Yexun Zhang, Ya Zhang, Wenbin Cai, Jie Chang
Abstract Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here attempt to separate the representations for styles and contents, and propose a generalized style transfer network consisting of style encoder, content encoder, mixer and decoder. The style encoder and content encoder are used to extract the style and content factors from the style reference images and content reference images, respectively. The mixer employs a bilinear model to integrate the above two factors and finally feeds it into a decoder to generate images with target style and content. To separate the style features and content features, we leverage the conditional dependence of styles and contents given an image. During training, the encoder network learns to extract styles and contents from two sets of reference images in limited size, one with shared style and the other with shared content. This learning framework allows simultaneous style transfer among multiple styles and can be deemed as a special `multi-task’ learning scenario. The encoders are expected to capture the underlying features for different styles and contents which is generalizable to new styles and contents. For validation, we applied the proposed algorithm to the Chinese Typeface transfer problem. Extensive experiment results on character generation have demonstrated the effectiveness and robustness of our method. |
Tasks Multi-Task Learning, Style Transfer
Published 2017-11-17
URL http://arxiv.org/abs/1711.06454v6
PDF http://arxiv.org/pdf/1711.06454v6.pdf
PWC https://paperswithcode.com/paper/separating-style-and-content-for-generalized
Repo https://github.com/ycjing/Character-Stylization
Framework none

Neural Language Modeling by Jointly Learning Syntax and Lexicon

Title Neural Language Modeling by Jointly Learning Syntax and Lexicon
Authors Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville
Abstract We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.
Tasks Constituency Grammar Induction, Language Modelling
Published 2017-11-02
URL http://arxiv.org/abs/1711.02013v2
PDF http://arxiv.org/pdf/1711.02013v2.pdf
PWC https://paperswithcode.com/paper/neural-language-modeling-by-jointly-learning
Repo https://github.com/nyu-mll/PRPN-Analysis
Framework pytorch

Task-based End-to-end Model Learning in Stochastic Optimization

Title Task-based End-to-end Model Learning in Stochastic Optimization
Authors Priya L. Donti, Brandon Amos, J. Zico Kolter
Abstract With the increasing popularity of machine learning techniques, it has become common to see prediction algorithms operating within some larger process. However, the criteria by which we train these algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming. We present three experimental evaluations of the proposed approach: a classical inventory stock problem, a real-world electrical grid scheduling task, and a real-world energy storage arbitrage task. We show that the proposed approach can outperform both traditional modeling and purely black-box policy optimization approaches in these applications.
Tasks Stochastic Optimization
Published 2017-03-13
URL http://arxiv.org/abs/1703.04529v4
PDF http://arxiv.org/pdf/1703.04529v4.pdf
PWC https://paperswithcode.com/paper/task-based-end-to-end-model-learning-in
Repo https://github.com/locuslab/e2e-model-learning
Framework pytorch

Training Deep Networks without Learning Rates Through Coin Betting

Title Training Deep Networks without Learning Rates Through Coin Betting
Authors Francesco Orabona, Tatiana Tommasi
Abstract Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms.
Tasks Stochastic Optimization
Published 2017-05-22
URL http://arxiv.org/abs/1705.07795v3
PDF http://arxiv.org/pdf/1705.07795v3.pdf
PWC https://paperswithcode.com/paper/training-deep-networks-without-learning-rates
Repo https://github.com/bremen79/cocob
Framework tf

Probabilistic Line Searches for Stochastic Optimization

Title Probabilistic Line Searches for Stochastic Optimization
Authors Maren Mahsereci, Philipp Hennig
Abstract In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.
Tasks Stochastic Optimization
Published 2017-03-29
URL http://arxiv.org/abs/1703.10034v2
PDF http://arxiv.org/pdf/1703.10034v2.pdf
PWC https://paperswithcode.com/paper/probabilistic-line-searches-for-stochastic
Repo https://github.com/lessw2020/Best-Deep-Learning-Optimizers
Framework pytorch

Face Super-Resolution Through Wasserstein GANs

Title Face Super-Resolution Through Wasserstein GANs
Authors Zhimin Chen, Yuguang Tong
Abstract Generative adversarial networks (GANs) have received a tremendous amount of attention in the past few years, and have inspired applications addressing a wide range of problems. Despite its great potential, GANs are difficult to train. Recently, a series of papers (Arjovsky & Bottou, 2017a; Arjovsky et al. 2017b; and Gulrajani et al. 2017) proposed using Wasserstein distance as the training objective and promised easy, stable GAN training across architectures with minimal hyperparameter tuning. In this paper, we compare the performance of Wasserstein distance with other training objectives on a variety of GAN architectures in the context of single image super-resolution. Our results agree that Wasserstein GAN with gradient penalty (WGAN-GP) provides stable and converging GAN training and that Wasserstein distance is an effective metric to gauge training progress.
Tasks Image Super-Resolution, Super-Resolution
Published 2017-05-06
URL http://arxiv.org/abs/1705.02438v1
PDF http://arxiv.org/pdf/1705.02438v1.pdf
PWC https://paperswithcode.com/paper/face-super-resolution-through-wasserstein
Repo https://github.com/MandyZChen/srez
Framework tf

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation

Title A parallel corpus of Python functions and documentation strings for automated code documentation and code generation
Authors Antonio Valerio Miceli Barone, Rico Sennrich
Abstract Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains. In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings (“docstrings”) generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data. We release our datasets and processing scripts in order to stimulate research in these areas.
Tasks Code Generation, Data Augmentation, Machine Translation
Published 2017-07-07
URL http://arxiv.org/abs/1707.02275v1
PDF http://arxiv.org/pdf/1707.02275v1.pdf
PWC https://paperswithcode.com/paper/a-parallel-corpus-of-python-functions-and
Repo https://github.com/Avmb/code-docstring-corpus
Framework none

Real-time Deep Video Deinterlacing

Title Real-time Deep Video Deinterlacing
Authors Haichao Zhu, Xueting Liu, Xiangyu Mao, Tien-Tsin Wong
Abstract Interlacing is a widely used technique, for television broadcast and video recording, to double the perceived frame rate without increasing the bandwidth. But it presents annoying visual artifacts, such as flickering and silhouette “serration,” during the playback. Existing state-of-the-art deinterlacing methods either ignore the temporal information to provide real-time performance but lower visual quality, or estimate the motion for better deinterlacing but with a trade-off of higher computational cost. In this paper, we present the first and novel deep convolutional neural networks (DCNNs) based method to deinterlace with high visual quality and real-time performance. Unlike existing models for super-resolution problems which relies on the translation-invariant assumption, our proposed DCNN model utilizes the temporal information from both the odd and even half frames to reconstruct only the missing scanlines, and retains the given odd and even scanlines for producing the full deinterlaced frames. By further introducing a layer-sharable architecture, our system can achieve real-time performance on a single GPU. Experiments shows that our method outperforms all existing methods, in terms of reconstruction accuracy and computational performance.
Tasks Super-Resolution, Video Deinterlacing
Published 2017-08-01
URL http://arxiv.org/abs/1708.00187v1
PDF http://arxiv.org/pdf/1708.00187v1.pdf
PWC https://paperswithcode.com/paper/real-time-deep-video-deinterlacing
Repo https://github.com/lszhuhaichao/Deep-Video-Deinterlacing
Framework tf
comments powered by Disqus