Paper Group AWR 134
A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning. CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning. Lectures on …
A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning
Title | A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning |
Authors | Jingjia Huang, Nannan Li, Tao Zhang, Ge Li |
Abstract | Existing action detection algorithms usually generate action proposals through an extensive search over the video at multiple temporal scales, which brings about huge computational overhead and deviates from the human perception procedure. We argue that the process of detecting actions should be naturally one of observation and refinement: observe the current window and refine the span of attended window to cover true action regions. In this paper, we propose an active action proposal model that learns to find actions through continuously adjusting the temporal bounds in a self-adaptive way. The whole process can be deemed as an agent, which is firstly placed at a position in the video at random, adopts a sequence of transformations on the current attended region to discover actions according to a learned policy. We utilize reinforcement learning, especially the Deep Q-learning algorithm to learn the agent’s decision policy. In addition, we use temporal pooling operation to extract more effective feature representation for the long temporal window, and design a regression network to adjust the position offsets between predicted results and the ground truth. Experiment results on THUMOS 2014 validate the effectiveness of the proposed approach, which can achieve competitive performance with current action detection algorithms via much fewer proposals. |
Tasks | Action Detection, Q-Learning |
Published | 2017-06-22 |
URL | http://arxiv.org/abs/1706.07251v1 |
http://arxiv.org/pdf/1706.07251v1.pdf | |
PWC | https://paperswithcode.com/paper/a-self-adaptive-proposal-model-for-temporal |
Repo | https://github.com/Parapompadoo/Temporal_Action_Detection |
Framework | none |
CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Title | CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification |
Authors | Farshid Rayhan, Sajid Ahmed, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda, Dewan Md. Farid |
Abstract | Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets. |
Tasks | |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04356v1 |
http://arxiv.org/pdf/1712.04356v1.pdf | |
PWC | https://paperswithcode.com/paper/cusboost-cluster-based-under-sampling-with |
Repo | https://github.com/farshidrayhanuiu/CUSBoost |
Framework | none |
DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks
Title | DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks |
Authors | Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, Jiri Matas |
Abstract | We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss . DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem – object detection on (de-)blurred images. The method is 5 times faster than the closest competitor – DeepDeblur. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation. The model, code and the dataset are available at https://github.com/KupynOrest/DeblurGAN |
Tasks | Deblurring, Object Detection |
Published | 2017-11-19 |
URL | http://arxiv.org/abs/1711.07064v4 |
http://arxiv.org/pdf/1711.07064v4.pdf | |
PWC | https://paperswithcode.com/paper/deblurgan-blind-motion-deblurring-using |
Repo | https://github.com/KupynOrest/DeblurGAN |
Framework | pytorch |
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Title | Mapping Instructions and Visual Observations to Actions with Reinforcement Learning |
Authors | Dipendra Misra, John Langford, Yoav Artzi |
Abstract | We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning in a contextual bandit setting to train a neural network agent. To guide the agent’s exploration, we use reward shaping with different forms of supervision. Our approach does not require intermediate representations, planning procedures, or training different models. We evaluate in a simulated environment, and show significant improvements over supervised learning and common reinforcement learning variants. |
Tasks | |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1704.08795v2 |
http://arxiv.org/pdf/1704.08795v2.pdf | |
PWC | https://paperswithcode.com/paper/mapping-instructions-and-visual-observations |
Repo | https://github.com/clic-lab/blocks |
Framework | tf |
Lectures on Randomized Numerical Linear Algebra
Title | Lectures on Randomized Numerical Linear Algebra |
Authors | Petros Drineas, Michael W. Mahoney |
Abstract | This chapter is based on lectures on Randomized Numerical Linear Algebra from the 2016 Park City Mathematics Institute summer school on The Mathematics of Data. |
Tasks | |
Published | 2017-12-24 |
URL | http://arxiv.org/abs/1712.08880v1 |
http://arxiv.org/pdf/1712.08880v1.pdf | |
PWC | https://paperswithcode.com/paper/lectures-on-randomized-numerical-linear |
Repo | https://github.com/bkmi/RandLowRank |
Framework | none |
Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal
Title | Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal |
Authors | Jifeng Wang, Xiang Li, Le Hui, Jian Yang |
Abstract | Understanding shadows from a single image spontaneously derives into two types of task in previous studies, containing shadow detection and shadow removal. In this paper, we present a multi-task perspective, which is not embraced by any existing work, to jointly learn both detection and removal in an end-to-end fashion that aims at enjoying the mutually improved benefits from each other. Our framework is based on a novel STacked Conditional Generative Adversarial Network (ST-CGAN), which is composed of two stacked CGANs, each with a generator and a discriminator. Specifically, a shadow image is fed into the first generator which produces a shadow detection mask. That shadow image, concatenated with its predicted mask, goes through the second generator in order to recover its shadow-free image consequently. In addition, the two corresponding discriminators are very likely to model higher level relationships and global scene characteristics for the detected shadow region and reconstruction via removing shadows, respectively. More importantly, for multi-task learning, our design of stacked paradigm provides a novel view which is notably different from the commonly used one as the multi-branch version. To fully evaluate the performance of our proposed framework, we construct the first large-scale benchmark with 1870 image triplets (shadow image, shadow mask image, and shadow-free image) under 135 scenes. Extensive experimental results consistently show the advantages of ST-CGAN over several representative state-of-the-art methods on two large-scale publicly available datasets and our newly released one. |
Tasks | Multi-Task Learning, Shadow Detection |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02478v1 |
http://arxiv.org/pdf/1712.02478v1.pdf | |
PWC | https://paperswithcode.com/paper/stacked-conditional-generative-adversarial |
Repo | https://github.com/kjybinp/SCGAN |
Framework | none |
Automatic Semantic Style Transfer using Deep Convolutional Neural Networks and Soft Masks
Title | Automatic Semantic Style Transfer using Deep Convolutional Neural Networks and Soft Masks |
Authors | Huihuang Zhao, Paul L. Rosin, Yu-Kun Lai |
Abstract | This paper presents an automatic image synthesis method to transfer the style of an example image to a content image. When standard neural style transfer approaches are used, the textures and colours in different semantic regions of the style image are often applied inappropriately to the content image, ignoring its semantic layout, and ruining the transfer result. In order to reduce or avoid such effects, we propose a novel method based on automatically segmenting the objects and extracting their soft semantic masks from the style and content images, in order to preserve the structure of the content image while having the style transferred. Each soft mask of the style image represents a specific part of the style image, corresponding to the soft mask of the content image with the same semantics. Both the soft masks and source images are provided as multichannel input to an augmented deep CNN framework for style transfer which incorporates a generative Markov random field (MRF) model. Results on various images show that our method outperforms the most recent techniques. |
Tasks | Image Generation, Style Transfer |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09641v1 |
http://arxiv.org/pdf/1708.09641v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-semantic-style-transfer-using-deep |
Repo | https://github.com/huihuangz/Neural-Style-Transfer-Papers-Code |
Framework | torch |
Separating Style and Content for Generalized Style Transfer
Title | Separating Style and Content for Generalized Style Transfer |
Authors | Yexun Zhang, Ya Zhang, Wenbin Cai, Jie Chang |
Abstract | Neural style transfer has drawn broad attention in recent years. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is thus not generalizable to new styles. We here attempt to separate the representations for styles and contents, and propose a generalized style transfer network consisting of style encoder, content encoder, mixer and decoder. The style encoder and content encoder are used to extract the style and content factors from the style reference images and content reference images, respectively. The mixer employs a bilinear model to integrate the above two factors and finally feeds it into a decoder to generate images with target style and content. To separate the style features and content features, we leverage the conditional dependence of styles and contents given an image. During training, the encoder network learns to extract styles and contents from two sets of reference images in limited size, one with shared style and the other with shared content. This learning framework allows simultaneous style transfer among multiple styles and can be deemed as a special `multi-task’ learning scenario. The encoders are expected to capture the underlying features for different styles and contents which is generalizable to new styles and contents. For validation, we applied the proposed algorithm to the Chinese Typeface transfer problem. Extensive experiment results on character generation have demonstrated the effectiveness and robustness of our method. | |
Tasks | Multi-Task Learning, Style Transfer |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06454v6 |
http://arxiv.org/pdf/1711.06454v6.pdf | |
PWC | https://paperswithcode.com/paper/separating-style-and-content-for-generalized |
Repo | https://github.com/ycjing/Character-Stylization |
Framework | none |
Neural Language Modeling by Jointly Learning Syntax and Lexicon
Title | Neural Language Modeling by Jointly Learning Syntax and Lexicon |
Authors | Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville |
Abstract | We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks. |
Tasks | Constituency Grammar Induction, Language Modelling |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.02013v2 |
http://arxiv.org/pdf/1711.02013v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-language-modeling-by-jointly-learning |
Repo | https://github.com/nyu-mll/PRPN-Analysis |
Framework | pytorch |
Task-based End-to-end Model Learning in Stochastic Optimization
Title | Task-based End-to-end Model Learning in Stochastic Optimization |
Authors | Priya L. Donti, Brandon Amos, J. Zico Kolter |
Abstract | With the increasing popularity of machine learning techniques, it has become common to see prediction algorithms operating within some larger process. However, the criteria by which we train these algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming. We present three experimental evaluations of the proposed approach: a classical inventory stock problem, a real-world electrical grid scheduling task, and a real-world energy storage arbitrage task. We show that the proposed approach can outperform both traditional modeling and purely black-box policy optimization approaches in these applications. |
Tasks | Stochastic Optimization |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04529v4 |
http://arxiv.org/pdf/1703.04529v4.pdf | |
PWC | https://paperswithcode.com/paper/task-based-end-to-end-model-learning-in |
Repo | https://github.com/locuslab/e2e-model-learning |
Framework | pytorch |
Training Deep Networks without Learning Rates Through Coin Betting
Title | Training Deep Networks without Learning Rates Through Coin Betting |
Authors | Francesco Orabona, Tatiana Tommasi |
Abstract | Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning rates in the stochastic optimization process is still one of the main bottlenecks. In this paper, we propose a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting. Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidence shows the advantage of our algorithm over popular stochastic gradient algorithms. |
Tasks | Stochastic Optimization |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07795v3 |
http://arxiv.org/pdf/1705.07795v3.pdf | |
PWC | https://paperswithcode.com/paper/training-deep-networks-without-learning-rates |
Repo | https://github.com/bremen79/cocob |
Framework | tf |
Probabilistic Line Searches for Stochastic Optimization
Title | Probabilistic Line Searches for Stochastic Optimization |
Authors | Maren Mahsereci, Philipp Hennig |
Abstract | In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent. |
Tasks | Stochastic Optimization |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.10034v2 |
http://arxiv.org/pdf/1703.10034v2.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-line-searches-for-stochastic |
Repo | https://github.com/lessw2020/Best-Deep-Learning-Optimizers |
Framework | pytorch |
Face Super-Resolution Through Wasserstein GANs
Title | Face Super-Resolution Through Wasserstein GANs |
Authors | Zhimin Chen, Yuguang Tong |
Abstract | Generative adversarial networks (GANs) have received a tremendous amount of attention in the past few years, and have inspired applications addressing a wide range of problems. Despite its great potential, GANs are difficult to train. Recently, a series of papers (Arjovsky & Bottou, 2017a; Arjovsky et al. 2017b; and Gulrajani et al. 2017) proposed using Wasserstein distance as the training objective and promised easy, stable GAN training across architectures with minimal hyperparameter tuning. In this paper, we compare the performance of Wasserstein distance with other training objectives on a variety of GAN architectures in the context of single image super-resolution. Our results agree that Wasserstein GAN with gradient penalty (WGAN-GP) provides stable and converging GAN training and that Wasserstein distance is an effective metric to gauge training progress. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2017-05-06 |
URL | http://arxiv.org/abs/1705.02438v1 |
http://arxiv.org/pdf/1705.02438v1.pdf | |
PWC | https://paperswithcode.com/paper/face-super-resolution-through-wasserstein |
Repo | https://github.com/MandyZChen/srez |
Framework | tf |
A parallel corpus of Python functions and documentation strings for automated code documentation and code generation
Title | A parallel corpus of Python functions and documentation strings for automated code documentation and code generation |
Authors | Antonio Valerio Miceli Barone, Rico Sennrich |
Abstract | Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest. Progress in these areas has been limited by the low availability of parallel corpora of code and natural language descriptions, which tend to be small and constrained to specific domains. In this work we introduce a large and diverse parallel corpus of a hundred thousands Python functions with their documentation strings (“docstrings”) generated by scraping open source repositories on GitHub. We describe baseline results for the code documentation and code generation tasks obtained by neural machine translation. We also experiment with data augmentation techniques to further increase the amount of training data. We release our datasets and processing scripts in order to stimulate research in these areas. |
Tasks | Code Generation, Data Augmentation, Machine Translation |
Published | 2017-07-07 |
URL | http://arxiv.org/abs/1707.02275v1 |
http://arxiv.org/pdf/1707.02275v1.pdf | |
PWC | https://paperswithcode.com/paper/a-parallel-corpus-of-python-functions-and |
Repo | https://github.com/Avmb/code-docstring-corpus |
Framework | none |
Real-time Deep Video Deinterlacing
Title | Real-time Deep Video Deinterlacing |
Authors | Haichao Zhu, Xueting Liu, Xiangyu Mao, Tien-Tsin Wong |
Abstract | Interlacing is a widely used technique, for television broadcast and video recording, to double the perceived frame rate without increasing the bandwidth. But it presents annoying visual artifacts, such as flickering and silhouette “serration,” during the playback. Existing state-of-the-art deinterlacing methods either ignore the temporal information to provide real-time performance but lower visual quality, or estimate the motion for better deinterlacing but with a trade-off of higher computational cost. In this paper, we present the first and novel deep convolutional neural networks (DCNNs) based method to deinterlace with high visual quality and real-time performance. Unlike existing models for super-resolution problems which relies on the translation-invariant assumption, our proposed DCNN model utilizes the temporal information from both the odd and even half frames to reconstruct only the missing scanlines, and retains the given odd and even scanlines for producing the full deinterlaced frames. By further introducing a layer-sharable architecture, our system can achieve real-time performance on a single GPU. Experiments shows that our method outperforms all existing methods, in terms of reconstruction accuracy and computational performance. |
Tasks | Super-Resolution, Video Deinterlacing |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00187v1 |
http://arxiv.org/pdf/1708.00187v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-deep-video-deinterlacing |
Repo | https://github.com/lszhuhaichao/Deep-Video-Deinterlacing |
Framework | tf |