February 1, 2020

3240 words 16 mins read

Paper Group AWR 288

Paper Group AWR 288

Label Propagation for Deep Semi-supervised Learning. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Fully Decoupled Neural Network Learning Using Delayed Gradients. Residual Flows for Invertible Generative Modeling. Embarrassingly Simple Binary Representation Learning. Latent-Variable Non-Autoregressive Neural Machine Transl …

Label Propagation for Deep Semi-supervised Learning

Title Label Propagation for Deep Semi-supervised Learning
Authors Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondrej Chum
Abstract Semi-supervised learning is becoming increasingly important because it can combine data carefully labeled by humans with abundant unlabeled data to train deep neural networks. Classic methods on semi-supervised learning that have focused on transductive learning have not been fully exploited in the inductive framework followed by modern deep learning. The same holds for the manifold assumption—that similar examples should get the same prediction. In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network. At the core of the transductive method lies a nearest neighbor graph of the dataset that we create based on the embeddings of the same network.Therefore our learning process iterates between these two steps. We improve performance on several datasets especially in the few labels regime and show that our work is complementary to current state of the art.
Tasks
Published 2019-04-09
URL http://arxiv.org/abs/1904.04717v1
PDF http://arxiv.org/pdf/1904.04717v1.pdf
PWC https://paperswithcode.com/paper/label-propagation-for-deep-semi-supervised
Repo https://github.com/kleinzcy/Semi-supervised-Learning
Framework none

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Title Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Authors Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver
Abstract Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Tasks Atari Games, Game of Chess, Game of Go, Game of Shogi
Published 2019-11-19
URL https://arxiv.org/abs/1911.08265v2
PDF https://arxiv.org/pdf/1911.08265v2.pdf
PWC https://paperswithcode.com/paper/mastering-atari-go-chess-and-shogi-by
Repo https://github.com/johan-gras/MuZero
Framework tf

Fully Decoupled Neural Network Learning Using Delayed Gradients

Title Fully Decoupled Neural Network Learning Using Delayed Gradients
Authors Huiping Zhuang, Yi Wang, Qinglai Liu, Shuai Zhang, Zhiping Lin
Abstract Training neural networks with back-propagation (BP) requires a sequential passing of activations and gradients, which forces the network modules to work in a synchronous fashion. This has been recognized as the lockings (i.e., the forward, backward and update lockings) inherited from the BP. In this paper, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and asynchronously using different workers (e.g., GPUs). We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. In addition, we prove that the proposed FDG algorithm guarantees a statistical convergence during training. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on benchmark datasets, showing comparable or better results against the state-of-the-art methods as well as the BP in terms of both generalization and acceleration abilities. In particular, we show that the FDG is also able to train very wide networks (e.g., WRN-28-10) and extremely deep networks (e.g., ResNet-1202). Code is available at https://github.com/ZHUANGHP/FDG.
Tasks
Published 2019-06-21
URL https://arxiv.org/abs/1906.09108v3
PDF https://arxiv.org/pdf/1906.09108v3.pdf
PWC https://paperswithcode.com/paper/fully-decoupled-neural-network-learning-using
Repo https://github.com/ZHUANGHP/FDG
Framework pytorch

Residual Flows for Invertible Generative Modeling

Title Residual Flows for Invertible Generative Modeling
Authors Ricky T. Q. Chen, Jens Behrmann, David Duvenaud, Jörn-Henrik Jacobsen
Abstract Flow-based generative models parameterize probability distributions through an invertible transformation and can be trained by maximum likelihood. Invertible residual networks provide a flexible family of transformations where only Lipschitz conditions rather than strict architectural constraints are needed for enforcing invertibility. However, prior work trained invertible residual networks for density estimation by relying on biased log-density estimates whose bias increased with the network’s expressiveness. We give a tractable unbiased estimate of the log density using a “Russian roulette” estimator, and reduce the memory required during training by using an alternative infinite series for the gradient. Furthermore, we improve invertible residual blocks by proposing the use of activation functions that avoid derivative saturation and generalizing the Lipschitz condition to induced mixed norms. The resulting approach, called Residual Flows, achieves state-of-the-art performance on density estimation amongst flow-based models, and outperforms networks that use coupling blocks at joint generative and discriminative modeling.
Tasks Density Estimation, Image Generation
Published 2019-06-06
URL https://arxiv.org/abs/1906.02735v5
PDF https://arxiv.org/pdf/1906.02735v5.pdf
PWC https://paperswithcode.com/paper/residual-flows-for-invertible-generative
Repo https://github.com/rtqichen/residual-flows
Framework pytorch

Embarrassingly Simple Binary Representation Learning

Title Embarrassingly Simple Binary Representation Learning
Authors Yuming Shen, Jie Qin, Jiaxin Chen, Li Liu, Fan Zhu
Abstract Recent binary representation learning models usually require sophisticated binary optimization, similarity measure or even generative models as auxiliaries. However, one may wonder whether these non-trivial components are needed to formulate practical and effective hashing models. In this paper, we answer the above question by proposing an embarrassingly simple approach to binary representation learning. With a simple classification objective, our model only incorporates two additional fully-connected layers onto the top of an arbitrary backbone network, whilst complying with the binary constraints during training. The proposed model lower-bounds the Information Bottleneck (IB) between data samples and their semantics, and can be related to many recent `learning to hash’ paradigms. We show that, when properly designed, even such a simple network can generate effective binary codes, by fully exploring data semantics without any held-out alternating updating steps or auxiliary models. Experiments are conducted on conventional large-scale benchmarks, i.e., CIFAR-10, NUS-WIDE, and ImageNet, where the proposed simple model outperforms the state-of-the-art methods. |
Tasks Representation Learning
Published 2019-08-26
URL https://arxiv.org/abs/1908.09573v1
PDF https://arxiv.org/pdf/1908.09573v1.pdf
PWC https://paperswithcode.com/paper/embarrassingly-simple-binary-representation
Repo https://github.com/ymcidence/JMLH
Framework none

Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior

Title Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior
Authors Raphael Shu, Jason Lee, Hideki Nakayama, Kyunghyun Cho
Abstract Although neural machine translation models reached high translation quality, the autoregressive nature makes inference difficult to parallelize and leads to high translation latency. Inspired by recent refinement-based approaches, we propose LaNMT, a latent-variable non-autoregressive model with continuous latent variables and deterministic inference procedure. In contrast to existing approaches, we use a deterministic inference algorithm to find the target sequence that maximizes the lowerbound to the log-probability. During inference, the length of translation automatically adapts itself. Our experiments show that the lowerbound can be greatly increased by running the inference algorithm, resulting in significantly improved translation quality. Our proposed model closes the performance gap between non-autoregressive and autoregressive approaches on ASPEC Ja-En dataset with 8.6x faster decoding. On WMT’14 En-De dataset, our model narrows the gap with autoregressive baseline to 2.0 BLEU points with 12.5x speedup. By decoding multiple initial latent variables in parallel and rescore using a teacher model, the proposed model further brings the gap down to 1.0 BLEU point on WMT’14 En-De task with 6.8x speedup.
Tasks Machine Translation
Published 2019-08-20
URL https://arxiv.org/abs/1908.07181v5
PDF https://arxiv.org/pdf/1908.07181v5.pdf
PWC https://paperswithcode.com/paper/latent-variable-non-autoregressive-neural
Repo https://github.com/zomux/lanmt
Framework pytorch

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

Title Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Authors Rafael Valle, Jason Li, Ryan Prenger, Bryan Catanzaro
Abstract Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from monotonous voice to singing voice. Unlike other methods, we train Mellotron using only read speech data without alignments between text and audio. We evaluate our models using the LJSpeech and LibriTTS datasets. We provide F0 Frame Errors and synthesized samples that include style transfer from other speakers, singers and styles not seen during training, procedural manipulation of rhythm and pitch and choir synthesis.
Tasks Style Transfer
Published 2019-10-26
URL https://arxiv.org/abs/1910.11997v1
PDF https://arxiv.org/pdf/1910.11997v1.pdf
PWC https://paperswithcode.com/paper/mellotron-multispeaker-expressive-voice
Repo https://github.com/NVIDIA/mellotron
Framework pytorch

Deep Neural Network Ensembles for Time Series Classification

Title Deep Neural Network Ensembles for Time Series Classification
Authors Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller
Abstract Deep neural networks have revolutionized many fields such as computer vision and natural language processing. Inspired by this recent success, deep learning started to show promising results for Time Series Classification (TSC). However, neural networks are still behind the state-of-the-art TSC algorithms, that are currently composed of ensembles of 37 non deep learning based classifiers. We attribute this gap in performance due to the lack of neural network ensembles for TSC. Therefore in this paper, we show how an ensemble of 60 deep learning models can significantly improve upon the current state-of-the-art performance of neural networks for TSC, when evaluated over the UCR/UEA archive: the largest publicly available benchmark for time series analysis. Finally, we show how our proposed Neural Network Ensemble (NNE) is the first time series classifier to outperform COTE while reaching similar performance to the current state-of-the-art ensemble HIVE-COTE.
Tasks Time Series, Time Series Analysis, Time Series Classification
Published 2019-03-15
URL http://arxiv.org/abs/1903.06602v2
PDF http://arxiv.org/pdf/1903.06602v2.pdf
PWC https://paperswithcode.com/paper/deep-neural-network-ensembles-for-time-series
Repo https://github.com/hfawaz/ijcnn19ensemble
Framework tf

A New Defense Against Adversarial Images: Turning a Weakness into a Strength

Title A New Defense Against Adversarial Images: Turning a Weakness into a Strength
Authors Tao Yu, Shengyuan Hu, Chuan Guo, Wei-Lun Chao, Kilian Q. Weinberger
Abstract Natural images are virtually surrounded by low-density misclassified regions that can be efficiently discovered by gradient-guided search — enabling the generation of adversarial images. While many techniques for detecting these attacks have been proposed, they are easily bypassed when the adversary has full knowledge of the detection mechanism and adapts the attack strategy accordingly. In this paper, we adopt a novel perspective and regard the omnipresence of adversarial perturbations as a strength rather than a weakness. We postulate that if an image has been tampered with, these adversarial directions either become harder to find with gradient methods or have substantially higher density than for natural images. We develop a practical test for this signature characteristic to successfully detect adversarial attacks, achieving unprecedented accuracy under the white-box setting where the adversary is given full knowledge of our detection mechanism.
Tasks Adversarial Defense
Published 2019-10-16
URL https://arxiv.org/abs/1910.07629v2
PDF https://arxiv.org/pdf/1910.07629v2.pdf
PWC https://paperswithcode.com/paper/a-new-defense-against-adversarial-images
Repo https://github.com/s-huu/TurningWeaknessIntoStrength
Framework pytorch

Attentive Normalization

Title Attentive Normalization
Authors Xilai Li, Wei Sun, Tianfu Wu
Abstract In state-of-the-art deep neural networks, both feature normalization and feature attention have become ubiquitous with significant performance improvement shown in a vast amount of tasks. They are usually studied as separate modules, however. In this paper, we propose a light-weight integration between, and thus harness the best of, the two schema. We present Attentive Normalization (AN) which generalizes the common affine transformation component in the vanilla feature normalization. Instead of learning a single affine transformation, AN learns a mixture of affine transformations and utilizes their weighted-sum as the final affine transformation applied to re-calibrate features in an instance-specific way. The weights are learned by leveraging feature attention. AN introduces negligible extra parameters and computational cost (i.e., light-weight). AN can be used as a drop-in replacement for any feature normalization technique which includes the affine transformation component. In experiments, we test the proposed AN using three representative neural architectures (ResNets, MobileNets-v2 and AOGNets) in the ImageNet-1000 classification benchmark and the MS-COCO 2107 object detection and instance segmentation benchmark. AN obtains consistent performance improvement for different neural architectures in both benchmarks with absolute increase of top-1 accuracy in ImageNet-1000 between 0.5% and 2.0%, and absolute increase up to 1.8% and 2.2% for bounding box and mask AP in MS-COCO respectively. The source codes are publicly available(Classification in ImageNet: \url{https://github.com/iVMCL/AOGNets-v2} and Detection in MS-COCO: \url{https://github.com/iVMCL/AttentiveNorm_Detection}}).
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2019-08-04
URL https://arxiv.org/abs/1908.01259v2
PDF https://arxiv.org/pdf/1908.01259v2.pdf
PWC https://paperswithcode.com/paper/attentive-normalization
Repo https://github.com/ivMCL/AttentiveNorm_Detection
Framework pytorch

ViDeNN: Deep Blind Video Denoising

Title ViDeNN: Deep Blind Video Denoising
Authors Michele Claus, Jan van Gemert
Abstract We propose ViDeNN: a CNN for Video Denoising without prior knowledge on the noise distribution (blind denoising). The CNN architecture uses a combination of spatial and temporal filtering, learning to spatially denoise the frames first and at the same time how to combine their temporal information, handling objects motion, brightness changes, low-light conditions and temporal inconsistencies. We demonstrate the importance of the data used for CNNs training, creating for this purpose a specific dataset for low-light conditions. We test ViDeNN on common benchmarks and on self-collected data, achieving good results comparable with the state-of-the-art.
Tasks Denoising, Video Denoising
Published 2019-04-24
URL http://arxiv.org/abs/1904.10898v1
PDF http://arxiv.org/pdf/1904.10898v1.pdf
PWC https://paperswithcode.com/paper/videnn-deep-blind-video-denoising
Repo https://github.com/clausmichele/ViDeNN
Framework tf

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

Title Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
Authors Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky
Abstract We propose to reinterpret a standard discriminative classifier of p(yx) as an energy based model for the joint distribution p(x,y). In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(xy). Within this framework, standard discriminative architectures may beused and the model can also be trained on unlabeled data. We demonstrate that energy based training of the joint distribution improves calibration, robustness, andout-of-distribution detection while also enabling our models to generate samplesrivaling the quality of recent GAN approaches. We improve upon recently proposed techniques for scaling up the training of energy based models and presentan approach which adds little overhead compared to standard classification training. Our approach is the first to achieve performance rivaling the state-of-the-artin both generative and discriminative learning within one hybrid model.
Tasks Calibration
Published 2019-12-06
URL https://arxiv.org/abs/1912.03263v2
PDF https://arxiv.org/pdf/1912.03263v2.pdf
PWC https://paperswithcode.com/paper/your-classifier-is-secretly-an-energy-based-1
Repo https://github.com/tohmae/pytorch-jem
Framework pytorch
Title Automating dynamic consent decisions for the processing of social media data in health research
Authors Chris Norval, Tristan Henderson
Abstract Social media have become a rich source of data, particularly in health research. Yet, the use of such data raises significant ethical questions about the need for the informed consent of those being studied. Consent mechanisms, if even obtained, are typically broad and inflexible, or place a significant burden on the participant. Machine learning algorithms show much promise for facilitating a ‘middle ground’ approach: using trained models to predict and automate granular consent decisions. Such techniques, however, raise a myriad of follow-on ethical and technical considerations. In this paper, we present an exploratory user study (n = 67) in which we find that we can predict the appropriate flow of health-related social media data with reasonable accuracy, while minimising undesired data leaks. We then attempt to deconstruct the findings of this study, identifying and discussing a number of real-world implications if such a technique were put into practice.
Tasks
Published 2019-10-11
URL https://arxiv.org/abs/1910.05265v1
PDF https://arxiv.org/pdf/1910.05265v1.pdf
PWC https://paperswithcode.com/paper/automating-dynamic-consent-decisions-for-the
Repo https://github.com/cnorval/automating-dynamic-consent-dataset
Framework none

Complex Network based Supervised Keyword Extractor

Title Complex Network based Supervised Keyword Extractor
Authors Swagata Duari, Vasudha Bhatnagar
Abstract In this paper, we present a supervised framework for automatic keyword extraction from single document. We model the text as complex network, and construct the feature set by extracting select node properties from it. Several node properties have been exploited by unsupervised, graph-based keyword extraction methods to discriminate keywords from non-keywords. We exploit the complex interplay of node properties to design a supervised keyword extraction method. The training set is created from the feature set by assigning a label to each candidate keyword depending on whether the candidate is listed as a gold-standard keyword or not. Since the number of keywords in a document is much less than non-keywords, the curated training set is naturally imbalanced. We train a binary classifier to predict keywords after balancing the training set. The model is trained using two public datasets from scientific domain and tested using three unseen scientific corpora and one news corpus. Comparative study of the results with several recent keyword and keyphrase extraction methods establishes that the proposed method performs better in most cases. This substantiates our claim that graph-theoretic properties of words are effective discriminators between keywords and non-keywords. We support our argument by showing that the improved performance of the proposed method is statistically significant for all datasets. We also evaluate the effectiveness of the pre-trained model on Hindi and Assamese language documents. We observe that the model performs equally well for the cross-language text even though it was trained only on English language documents. This shows that the proposed method is independent of the domain, collection, and language of the training corpora.
Tasks Keyword Extraction
Published 2019-09-26
URL https://arxiv.org/abs/1909.12009v1
PDF https://arxiv.org/pdf/1909.12009v1.pdf
PWC https://paperswithcode.com/paper/complex-network-based-supervised-keyword
Repo https://github.com/SDuari/Supervised-Keyword-Extraction
Framework none

Distilling Object Detectors with Fine-grained Feature Imitation

Title Distilling Object Detectors with Fine-grained Feature Imitation
Authors Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng
Abstract State-of-the-art CNN based recognition models are often computationally prohibitive to deploy on low-end devices. A promising high level approach tackling this limitation is knowledge distillation, which let small student model mimic cumbersome teacher model’s output to get improved generalization. However, related methods mainly focus on simple task of classification while do not consider complex tasks like object detection. We show applying the vanilla knowledge distillation to detection model gets minor gain. To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response. Our intuition is that detectors care more about local near object regions. Thus the discrepancy of feature response on the near object anchor locations reveals important information of how teacher model tends to generalize. We design a novel mechanism to estimate those locations and let student model imitate the teacher on them to get enhanced performance. We first validate the idea on a developed lightweight toy detector which carries simplest notion of current state-of-the-art anchor based detection models on challenging KITTI dataset, our method generates up to 15% boost of mAP for the student model compared to the non-imitated counterpart. We then extensively evaluate the method with Faster R-CNN model under various scenarios with common object detection benchmark of Pascal VOC and COCO, imitation alleviates up to 74% performance drop of student model compared to teacher. Codes released at https://github.com/twangnh/Distilling-Object-Detectors
Tasks Object Detection
Published 2019-06-09
URL https://arxiv.org/abs/1906.03609v1
PDF https://arxiv.org/pdf/1906.03609v1.pdf
PWC https://paperswithcode.com/paper/distilling-object-detectors-with-fine-grained-1
Repo https://github.com/twangnh/Distilling-Object-Detectors
Framework pytorch
comments powered by Disqus