July 29, 2019

2903 words 14 mins read

Paper Group AWR 134

A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning. CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning. Lectures on …

A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning


Title	A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning
Authors	Jingjia Huang, Nannan Li, Tao Zhang, Ge Li
Abstract	Existing action detection algorithms usually generate action proposals through an extensive search over the video at multiple temporal scales, which brings about huge computational overhead and deviates from the human perception procedure. We argue that the process of detecting actions should be naturally one of observation and refinement: observe the current window and refine the span of attended window to cover true action regions. In this paper, we propose an active action proposal model that learns to find actions through continuously adjusting the temporal bounds in a self-adaptive way. The whole process can be deemed as an agent, which is firstly placed at a position in the video at random, adopts a sequence of transformations on the current attended region to discover actions according to a learned policy. We utilize reinforcement learning, especially the Deep Q-learning algorithm to learn the agent’s decision policy. In addition, we use temporal pooling operation to extract more effective feature representation for the long temporal window, and design a regression network to adjust the position offsets between predicted results and the ground truth. Experiment results on THUMOS 2014 validate the effectiveness of the proposed approach, which can achieve competitive performance with current action detection algorithms via much fewer proposals.
Tasks	Action Detection, Q-Learning
Published	2017-06-22
URL	http://arxiv.org/abs/1706.07251v1
PDF	http://arxiv.org/pdf/1706.07251v1.pdf
PWC	https://paperswithcode.com/paper/a-self-adaptive-proposal-model-for-temporal
Repo	https://github.com/Parapompadoo/Temporal_Action_Detection
Framework	none

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification


Title	CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Authors	Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda, Dewan Md. Farid
Abstract	Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.
Tasks
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04356v1
PDF	http://arxiv.org/pdf/1712.04356v1.pdf
PWC	https://paperswithcode.com/paper/cusboost-cluster-based-under-sampling-with
Repo	https://github.com/farshidrayhanuiu/CUSBoost
Framework	none


Title	DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks
Authors	Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, Jiri Matas
Abstract	We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss . DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem – object detection on (de-)blurred images. The method is 5 times faster than the closest competitor – DeepDeblur. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation. The model, code and the dataset are available at https://github.com/KupynOrest/DeblurGAN
Tasks	Deblurring, Object Detection
Published	2017-11-19
URL	http://arxiv.org/abs/1711.07064v4
PDF	http://arxiv.org/pdf/1711.07064v4.pdf
PWC	https://paperswithcode.com/paper/deblurgan-blind-motion-deblurring-using
Repo	https://github.com/KupynOrest/DeblurGAN
Framework	pytorch

Mapping Instructions and Visual Observations to Actions with Reinforcement Learning


Title	Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Authors	Dipendra Misra, John Langford, Yoav Artzi
Abstract	We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning in a contextual bandit setting to train a neural network agent. To guide the agent’s exploration, we use reward shaping with different forms of supervision. Our approach does not require intermediate representations, planning procedures, or training different models. We evaluate in a simulated environment, and show significant improvements over supervised learning and common reinforcement learning variants.
Tasks
Published	2017-04-28
URL	http://arxiv.org/abs/1704.08795v2
PDF	http://arxiv.org/pdf/1704.08795v2.pdf
PWC	https://paperswithcode.com/paper/mapping-instructions-and-visual-observations
Repo	https://github.com/clic-lab/blocks
Framework	tf

Lectures on Randomized Numerical Linear Algebra


Title	Lectures on Randomized Numerical Linear Algebra
Authors	Petros Drineas, Michael W. Mahoney
Abstract	This chapter is based on lectures on Randomized Numerical Linear Algebra from the 2016 Park City Mathematics Institute summer school on The Mathematics of Data.
Tasks
Published	2017-12-24
URL	http://arxiv.org/abs/1712.08880v1
PDF	http://arxiv.org/pdf/1712.08880v1.pdf
PWC	https://paperswithcode.com/paper/lectures-on-randomized-numerical-linear
Repo	https://github.com/bkmi/RandLowRank
Framework	none

Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal


Title	Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal
Authors	Jifeng Wang, Xiang Li, Le Hui, Jian Yang
Abstract	Understanding shadows from a single image spontaneously derives into two types of task in previous studies, containing shadow detection and shadow removal. In this paper, we present a multi-task perspective, which is not embraced by any existing work, to jointly learn both detection and removal in an end-to-end fashion that aims at enjoying the mutually improved benefits from each other. Our framework is based on a novel STacked Conditional Generative Adversarial Network (ST-CGAN), which is composed of two stacked CGANs, each with a generator and a discriminator. Specifically, a shadow image is fed into the first generator which produces a shadow detection mask. That shadow image, concatenated with its predicted mask, goes through the second generator in order to recover its shadow-free image consequently. In addition, the two corresponding discriminators are very likely to model higher level relationships and global scene characteristics for the detected shadow region and reconstruction via removing shadows, respectively. More importantly, for multi-task learning, our design of stacked paradigm provides a novel view which is notably different from the commonly used one as the multi-branch version. To fully evaluate the performance of our proposed framework, we construct the first large-scale benchmark with 1870 image triplets (shadow image, shadow mask image, and shadow-free image) under 135 scenes. Extensive experimental results consistently show the advantages of ST-CGAN over several representative state-of-the-art methods on two large-scale publicly available datasets and our newly released one.
Tasks	Multi-Task Learning, Shadow Detection
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02478v1
PDF	http://arxiv.org/pdf/1712.02478v1.pdf
PWC	https://paperswithcode.com/paper/stacked-conditional-generative-adversarial
Repo	https://github.com/kjybinp/SCGAN
Framework	none

Automatic Semantic Style Transfer using Deep Convolutional Neural Networks and Soft Masks


Title	Automatic Semantic Style Transfer using Deep Convolutional Neural Networks and Soft Masks
Authors	Huihuang Zhao, Paul L. Rosin, Yu-Kun Lai
Abstract	This paper presents an automatic image synthesis method to transfer the style of an example image to a content image. When standard neural style transfer approaches are used, the textures and colours in different semantic regions of the style image are often applied inappropriately to the content image, ignoring its semantic layout, and ruining the transfer result. In order to reduce or avoid such effects, we propose a novel method based on automatically segmenting the objects and extracting their soft semantic masks from the style and content images, in order to preserve the structure of the content image while having the style transferred. Each soft mask of the style image represents a specific part of the style image, corresponding to the soft mask of the content image with the same semantics. Both the soft masks and source images are provided as multichannel input to an augmented deep CNN framework for style transfer which incorporates a generative Markov random field (MRF) model. Results on various images show that our method outperforms the most recent techniques.
Tasks	Image Generation, Style Transfer
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09641v1
PDF	http://arxiv.org/pdf/1708.09641v1.pdf
PWC	https://paperswithcode.com/paper/automatic-semantic-style-transfer-using-deep
Repo	https://github.com/huihuangz/Neural-Style-Transfer-Papers-Code
Framework	torch

Separating Style and Content for Generalized Style Transfer


Title	Separating Style and Content for Generalized Style Transfer
Authors	Yexun Zhang, Ya Zhang, Wenbin Cai, Jie Chang
Abstract	Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here attempt to separate the representations for styles and contents, and propose a generalized style transfer network consisting of style encoder, content encoder, mixer and decoder. The style encoder and content encoder are used to extract the style and content factors from the style reference images and content reference images, respectively. The mixer employs a bilinear model to integrate the above two factors and finally feeds it into a decoder to generate images with target style and content. To separate the style features and content features, we leverage the conditional dependence of styles and contents given an image. During training, the encoder network learns to extract styles and contents from two sets of reference images in limited size, one with shared style and the other with shared content. This learning framework allows simultaneous style transfer among multiple styles and can be deemed as a special `multi-task’ learning scenario. The encoders are expected to capture the underlying features for different styles and contents which is generalizable to new styles and contents. For validation, we applied the proposed algorithm to the Chinese Typeface transfer problem. Extensive experiment results on character generation have demonstrated the effectiveness and robustness of our method. \|
Tasks	Multi-Task Learning, Style Transfer
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06454v6
PDF	http://arxiv.org/pdf/1711.06454v6.pdf
PWC	https://paperswithcode.com/paper/separating-style-and-content-for-generalized
Repo	https://github.com/ycjing/Character-Stylization
Framework	none

Neural Language Modeling by Jointly Learning Syntax and Lexicon


Title	Neural Language Modeling by Jointly Learning Syntax and Lexicon
Authors	Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville
Abstract	We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.
Tasks	Constituency Grammar Induction, Language Modelling
Published	2017-11-02
URL	http://arxiv.org/abs/1711.02013v2
PDF	http://arxiv.org/pdf/1711.02013v2.pdf
PWC	https://paperswithcode.com/paper/neural-language-modeling-by-jointly-learning
Repo	https://github.com/nyu-mll/PRPN-Analysis
Framework	pytorch

Task-based End-to-end Model Learning in Stochastic Optimization


Title	Task-based End-to-end Model Learning in Stochastic Optimization
Authors	Priya L. Donti, Brandon Amos, J. Zico Kolter
Abstract	With the increasing popularity of machine learning techniques, it has become common to see prediction algorithms operating within some larger process. However, the criteria by which we train these algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming. We present three experimental evaluations of the proposed approach: a classical inventory stock problem, a real-world electrical grid scheduling task, and a real-world energy storage arbitrage task. We show that the proposed approach can outperform both traditional modeling and purely black-box policy optimization approaches in these applications.
Tasks	Stochastic Optimization
Published	2017-03-13
URL	http://arxiv.org/abs/1703.04529v4
PDF	http://arxiv.org/pdf/1703.04529v4.pdf
PWC	https://paperswithcode.com/paper/task-based-end-to-end-model-learning-in
Repo	https://github.com/locuslab/e2e-model-learning
Framework	pytorch

Training Deep Networks without Learning Rates Through Coin Betting


Title	Training Deep Networks without Learning Rates Through Coin Betting
Authors	Francesco Orabona, Tatiana Tommasi
Abstract	Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms.
Tasks	Stochastic Optimization
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07795v3
PDF	http://arxiv.org/pdf/1705.07795v3.pdf
PWC	https://paperswithcode.com/paper/training-deep-networks-without-learning-rates
Repo	https://github.com/bremen79/cocob
Framework	tf

Probabilistic Line Searches for Stochastic Optimization


Title	Probabilistic Line Searches for Stochastic Optimization
Authors	Maren Mahsereci, Philipp Hennig
Abstract	In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.
Tasks	Stochastic Optimization
Published	2017-03-29
URL	http://arxiv.org/abs/1703.10034v2
PDF	http://arxiv.org/pdf/1703.10034v2.pdf
PWC	https://paperswithcode.com/paper/probabilistic-line-searches-for-stochastic
Repo	https://github.com/lessw2020/Best-Deep-Learning-Optimizers
Framework	pytorch

Face Super-Resolution Through Wasserstein GANs


Title	Face Super-Resolution Through Wasserstein GANs
Authors	Zhimin Chen, Yuguang Tong
Abstract	Generative adversarial networks (GANs) have received a tremendous amount of attention in the past few years, and have inspired applications addressing a wide range of problems. Despite its great potential, GANs are difficult to train. Recently, a series of papers (Arjovsky & Bottou, 2017a; Arjovsky et al. 2017b; and Gulrajani et al. 2017) proposed using Wasserstein distance as the training objective and promised easy, stable GAN training across architectures with minimal hyperparameter tuning. In this paper, we compare the performance of Wasserstein distance with other training objectives on a variety of GAN architectures in the context of single image super-resolution. Our results agree that Wasserstein GAN with gradient penalty (WGAN-GP) provides stable and converging GAN training and that Wasserstein distance is an effective metric to gauge training progress.
Tasks	Image Super-Resolution, Super-Resolution
Published	2017-05-06
URL	http://arxiv.org/abs/1705.02438v1
PDF	http://arxiv.org/pdf/1705.02438v1.pdf
PWC	https://paperswithcode.com/paper/face-super-resolution-through-wasserstein
Repo	https://github.com/MandyZChen/srez
Framework	tf

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation


Title	A parallel corpus of Python functions and documentation strings for automated code documentation and code generation
Authors	Antonio Valerio Miceli Barone, Rico Sennrich
Abstract	Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains. In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings (“docstrings”) generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data. We release our datasets and processing scripts in order to stimulate research in these areas.
Tasks	Code Generation, Data Augmentation, Machine Translation
Published	2017-07-07
URL	http://arxiv.org/abs/1707.02275v1
PDF	http://arxiv.org/pdf/1707.02275v1.pdf
PWC	https://paperswithcode.com/paper/a-parallel-corpus-of-python-functions-and
Repo	https://github.com/Avmb/code-docstring-corpus
Framework	none

Real-time Deep Video Deinterlacing


Title	Real-time Deep Video Deinterlacing
Authors	Haichao Zhu, Xueting Liu, Xiangyu Mao, Tien-Tsin Wong
Abstract	Interlacing is a widely used technique, for television broadcast and video recording, to double the perceived frame rate without increasing the bandwidth. But it presents annoying visual artifacts, such as flickering and silhouette “serration,” during the playback. Existing state-of-the-art deinterlacing methods either ignore the temporal information to provide real-time performance but lower visual quality, or estimate the motion for better deinterlacing but with a trade-off of higher computational cost. In this paper, we present the first and novel deep convolutional neural networks (DCNNs) based method to deinterlace with high visual quality and real-time performance. Unlike existing models for super-resolution problems which relies on the translation-invariant assumption, our proposed DCNN model utilizes the temporal information from both the odd and even half frames to reconstruct only the missing scanlines, and retains the given odd and even scanlines for producing the full deinterlaced frames. By further introducing a layer-sharable architecture, our system can achieve real-time performance on a single GPU. Experiments shows that our method outperforms all existing methods, in terms of reconstruction accuracy and computational performance.
Tasks	Super-Resolution, Video Deinterlacing
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00187v1
PDF	http://arxiv.org/pdf/1708.00187v1.pdf
PWC	https://paperswithcode.com/paper/real-time-deep-video-deinterlacing
Repo	https://github.com/lszhuhaichao/Deep-Video-Deinterlacing
Framework	tf