February 1, 2020

3185 words 15 mins read

Paper Group AWR 289

Paper Group AWR 289

BlenderProc. Dynamic Deep Multi-task Learning for Caricature-Visual Face Recognition. Opytimizer: A Nature-Inspired Python Optimizer. WHAM!: Extending Speech Separation to Noisy Environments. Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph. ETNet: Error Transition Network for Arbitrary Style Transfer. …

BlenderProc

Title BlenderProc
Authors Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Youssef Zidan, Dmitry Olefir, Mohamad Elbadrawy, Ahsan Lodhi, Harinandan Katam
Abstract BlenderProc is a modular procedural pipeline, which helps in generating real looking images for the training of convolutional neural networks. These can be used in a variety of use cases including segmentation, depth, normal and pose estimation and many others. A key feature of our extension of blender is the simple to use modular pipeline, which was designed to be easily extendable. By offering standard modules, which cover a variety of scenarios, we provide a starting point on which new modules can be created.
Tasks 3D Object Recognition, Depth Image Estimation, Instance Segmentation, Pose Estimation, Semantic Segmentation, Surface Normals Estimation
Published 2019-10-25
URL https://arxiv.org/abs/1911.01911v1
PDF https://arxiv.org/pdf/1911.01911v1.pdf
PWC https://paperswithcode.com/paper/blenderproc
Repo https://github.com/DLR-RM/BlenderProc
Framework none

Dynamic Deep Multi-task Learning for Caricature-Visual Face Recognition

Title Dynamic Deep Multi-task Learning for Caricature-Visual Face Recognition
Authors Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman
Abstract Rather than the visual images, the face recognition of the caricatures is far from the performance of the visual images. The challenge is the extreme non-rigid distortions of the caricatures introduced by exaggerating the facial features to strengthen the characters. In this paper, we propose dynamic multi-task learning based on deep CNNs for cross-modal caricature-visual face recognition. Instead of the conventional multi-task learning with fixed weights of the tasks, the proposed dynamic multi-task learning dynamically updates the weights of tasks according to the importance of the tasks, which enables the training of the networks focus on the hard task instead of being stuck in the overtraining of the easy task. The experimental results demonstrate the effectiveness of the dynamic multi-task learning for caricature-visual face recognition. The performance evaluated on the datasets CaVI and WebCaricature show the superiority over the state-of-art methods. The implementation code is available here.
Tasks Caricature, Face Recognition, Multi-Task Learning
Published 2019-11-08
URL https://arxiv.org/abs/1911.03341v1
PDF https://arxiv.org/pdf/1911.03341v1.pdf
PWC https://paperswithcode.com/paper/dynamic-deep-multi-task-learning-for
Repo https://github.com/hengxyz/cari-visual-recognition-via-multitask-learning
Framework tf

Opytimizer: A Nature-Inspired Python Optimizer

Title Opytimizer: A Nature-Inspired Python Optimizer
Authors Gustavo H. de Rosa, João P. Papa
Abstract Optimization aims at selecting a feasible set of parameters in an attempt to solve a particular problem, being applied in a wide range of applications, such as operations research, machine learning fine-tuning, and control engineering, among others. Nevertheless, traditional iterative optimization methods use the evaluation of gradients and Hessians to find their solutions, not being practical due to their computational burden and when working with non-convex functions. Recent biological-inspired methods, known as meta-heuristics, have arisen in an attempt to fulfill these problems. Even though they do not guarantee to find optimal solutions, they usually find a suitable solution. In this paper, we proposed a Python-based meta-heuristic optimization framework denoted as Opytimizer. Several methods and classes are implemented to provide a user-friendly workspace among diverse meta-heuristics, ranging from evolutionary- to swarm-based techniques.
Tasks
Published 2019-12-30
URL https://arxiv.org/abs/1912.13002v1
PDF https://arxiv.org/pdf/1912.13002v1.pdf
PWC https://paperswithcode.com/paper/opytimizer-a-nature-inspired-python-optimizer
Repo https://github.com/gugarosa/opytimizer
Framework tf

WHAM!: Extending Speech Separation to Noisy Environments

Title WHAM!: Extending Speech Separation to Noisy Environments
Authors Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux
Abstract Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.
Tasks Speech Separation
Published 2019-07-02
URL https://arxiv.org/abs/1907.01160v1
PDF https://arxiv.org/pdf/1907.01160v1.pdf
PWC https://paperswithcode.com/paper/wham-extending-speech-separation-to-noisy
Repo https://github.com/AkojimaSLP/Neural-mask-estimation
Framework tf

Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph

Title Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph
Authors Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Lina Yao, Chengqi Zhang
Abstract A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. The resulting graph of prototypes can be continually re-used and updated for new tasks and classes. We also introduce two practical test/inference settings which differ according to whether the test task can leverage any weakly-supervised information as in training. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data.
Tasks Few-Shot Learning, Meta-Learning
Published 2019-05-10
URL https://arxiv.org/abs/1905.04042v2
PDF https://arxiv.org/pdf/1905.04042v2.pdf
PWC https://paperswithcode.com/paper/prototype-propagation-networks-ppn-for-weakly
Repo https://github.com/liulu112601/PPN
Framework pytorch

ETNet: Error Transition Network for Arbitrary Style Transfer

Title ETNet: Error Transition Network for Arbitrary Style Transfer
Authors Chunjin Song, Zhijie Wu, Yang Zhou, Minglun Gong, Hui Huang
Abstract Numerous valuable efforts have been devoted to achieving arbitrary style transfer since the seminal work of Gatys et al. However, existing state-of-the-art approaches often generate insufficiently stylized results under challenging cases. We believe a fundamental reason is that these approaches try to generate the stylized result in a single shot and hence fail to fully satisfy the constraints on semantic structures in the content images and style patterns in the style images. Inspired by the works on error-correction, instead, we propose a self-correcting model to predict what is wrong with the current stylization and refine it accordingly in an iterative manner. For each refinement, we transit the error features across both the spatial and scale domain and invert the processed features into a residual image, with a network we call Error Transition Network (ETNet). The proposed model improves over the state-of-the-art methods with better semantic structures and more adaptive style pattern details. Various qualitative and quantitative experiments show that the key concept of both progressive strategy and error-correction leads to better results. Code and models are available at https://github.com/zhijieW94/ETNet.
Tasks Style Transfer
Published 2019-10-26
URL https://arxiv.org/abs/1910.12056v2
PDF https://arxiv.org/pdf/1910.12056v2.pdf
PWC https://paperswithcode.com/paper/etnet-error-transition-network-for-arbitrary
Repo https://github.com/zhijieW94/ETNet
Framework tf

LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup

Title LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
Authors Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang
Abstract We propose a local adversarial disentangling network (LADN) for facial makeup and de-makeup. Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details. Existing techniques do not demonstrate or fail to transfer high-frequency details in a global adversarial setting, or train a single local discriminator only to ensure image structure consistency and thus work only for relatively simple styles. Unlike others, our proposed local adversarial discriminators can distinguish whether the generated local image details are consistent with the corresponding regions in the given reference image in cross-image style transfer in an unsupervised setting. Incorporating these technical contributions, we achieve not only state-of-the-art results on conventional styles but also novel results involving complex and dramatic styles with high-frequency details covering large areas across multiple facial features. A carefully designed dataset of unpaired before and after makeup images is released.
Tasks Style Transfer
Published 2019-04-25
URL https://arxiv.org/abs/1904.11272v2
PDF https://arxiv.org/pdf/1904.11272v2.pdf
PWC https://paperswithcode.com/paper/ladn-local-adversarial-disentangling-network
Repo https://github.com/wangguanzhi/LADN
Framework pytorch

Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

Title Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Authors Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Mike Rabbat
Abstract Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.
Tasks
Published 2019-06-09
URL https://arxiv.org/abs/1906.04585v1
PDF https://arxiv.org/pdf/1906.04585v1.pdf
PWC https://paperswithcode.com/paper/gossip-based-actor-learner-architectures-for
Repo https://github.com/facebookresearch/gala
Framework pytorch

Hard Pixel Mining for Depth Privileged Semantic Segmentation

Title Hard Pixel Mining for Depth Privileged Semantic Segmentation
Authors Zhangxuan Gu, Li Niu, Haohua Zhao, Liqing Zhang
Abstract Semantic segmentation has achieved remarkable progress but remains challenging due to the complex scene, object occlusion, and so on. Some research works have attempted to use extra information such as a depth map to help RGB based semantic segmentation because the depth map could provide complementary geometric cues. However, due to the inaccessibility of depth sensors, depth information is usually unavailable for the test images. In this paper, we leverage only the depth of training images as the privileged information to mine the hard pixels in semantic segmentation, in which depth information is only available for training images but not available for test images. Specifically, we propose a novel Loss Weight Module, which outputs a loss weight map by employing two depth-related measurements of hard pixels: Depth Prediction Error and Depthaware Segmentation Error. The loss weight map is then applied to segmentation loss, with the goal of learning a more robust model by paying more attention to the hard pixels. Besides, we also explore a curriculum learning strategy based on the loss weight map. Meanwhile, to fully mine the hard pixels on different scales, we apply our loss weight module to multi-scale side outputs. Our hard pixels mining method achieves the state-of-the-art results on two benchmark datasets, and even outperforms the methods which need depth input during testing.
Tasks Depth Estimation, Semantic Segmentation
Published 2019-06-27
URL https://arxiv.org/abs/1906.11437v5
PDF https://arxiv.org/pdf/1906.11437v5.pdf
PWC https://paperswithcode.com/paper/hard-pixels-mining-learning-using-privileged
Repo https://github.com/strivebo/image_segmentation_dl
Framework tf

META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

Title META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation
Authors Mingde Zhao, Sitao Luan, Ian Porada, Xiao-Wen Chang, Doina Precup
Abstract Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter $\lambda$. However, tuning this parameter can be time-consuming, and not tuning it can lead to inefficient learning. For better sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online, incurring roughly the same computational complexity per step as the usual value learner. Our approach can be used both in on-policy and off-policy learning. We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error. This method can be viewed as a plugin to assist prediction with function approximation by meta-learning feature (observation)-based $\lambda$ online, or even in the control case to assist policy improvement. Our empirical evaluation demonstrates significant performance improvements, as well as improved robustness of the proposed algorithm to learning rate variation.
Tasks Meta-Learning
Published 2019-04-25
URL https://arxiv.org/abs/1904.11439v5
PDF https://arxiv.org/pdf/1904.11439v5.pdf
PWC https://paperswithcode.com/paper/faster-and-more-accurate-learning-with-meta
Repo https://github.com/PwnerHarry/MTA
Framework none

Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data

Title Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data
Authors Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, Jingming Liu
Abstract Neural machine translation systems have become state-of-the-art approaches for Grammatical Error Correction (GEC) task. In this paper, we propose a copy-augmented architecture for the GEC task by copying the unchanged words from the source sentence to the target sentence. Since the GEC suffers from not having enough labeled training data to achieve high accuracy. We pre-train the copy-augmented architecture with a denoising auto-encoder using the unlabeled One Billion Benchmark and make comparisons between the fully pre-trained model and a partially pre-trained model. It is the first time copying words from the source context and fully pre-training a sequence to sequence model are experimented on the GEC task. Moreover, We add token-level and sentence-level multi-task learning for the GEC task. The evaluation results on the CoNLL-2014 test set show that our approach outperforms all recently published state-of-the-art results by a large margin. The code and pre-trained models are released at https://github.com/zhawe01/fairseq-gec.
Tasks Denoising, Grammatical Error Correction, Machine Translation, Multi-Task Learning
Published 2019-03-01
URL https://arxiv.org/abs/1903.00138v3
PDF https://arxiv.org/pdf/1903.00138v3.pdf
PWC https://paperswithcode.com/paper/improving-grammatical-error-correction-via
Repo https://github.com/zhawe01/fairseq-gec
Framework pytorch

TEASPN: Framework and Protocol for Integrated Writing Assistance Environments

Title TEASPN: Framework and Protocol for Integrated Writing Assistance Environments
Authors Masato Hagiwara, Takumi Ito, Tatsuki Kuribayashi, Jun Suzuki, Kentaro Inui
Abstract Language technologies play a key role in assisting people with their writing. Although there has been steady progress in e.g., grammatical error correction (GEC), human writers are yet to benefit from this progress due to the high development cost of integrating with writing software. We propose TEASPN, a protocol and an open-source framework for achieving integrated writing assistance environments. The protocol standardizes the way writing software communicates with servers that implement such technologies, allowing developers and researchers to integrate the latest developments in natural language processing (NLP) with low cost. As a result, users can enjoy the integrated experience in their favorite writing software. The results from experiments with human participants show that users use a wide range of technologies and rate their writing experience favorably, allowing them to write more fluent text.
Tasks Grammatical Error Correction
Published 2019-09-05
URL https://arxiv.org/abs/1909.02621v1
PDF https://arxiv.org/pdf/1909.02621v1.pdf
PWC https://paperswithcode.com/paper/teaspn-framework-and-protocol-for-integrated
Repo https://github.com/teaspn/teaspn-sdk
Framework none

Face Recognition via Locality Constrained Low Rank Representation and Dictionary Learning

Title Face Recognition via Locality Constrained Low Rank Representation and Dictionary Learning
Authors He-Feng Yin, Xiao-Jun Wu, Josef Kittler
Abstract Face recognition has been widely studied due to its importance in smart cities applications. However, the case when both training and test images are corrupted is not well solved. To address such a problem, this paper proposes a locality constrained low rank representation and dictionary learning (LCLRRDL) algorithm for robust face recognition. In particular, we present three contributions in the proposed formulation. First, a low-rank representation is introduced to handle the possible contamination of the training as well as test data. Second, a locality constraint is incorporated to acknowledge the intrinsic manifold structure of training data. With the locality constraint term, our scheme induces similar samples to have similar representations. Third, a compact dictionary is learned to handle the problem of corrupted data. The experimental results on two public databases demonstrate the effectiveness of the proposed approach. Matlab code of our proposed LCLRRDL can be downloaded from https://github.com/yinhefeng/LCLRRDL.
Tasks Dictionary Learning, Face Recognition, Robust Face Recognition
Published 2019-12-06
URL https://arxiv.org/abs/1912.03145v1
PDF https://arxiv.org/pdf/1912.03145v1.pdf
PWC https://paperswithcode.com/paper/face-recognition-via-locality-constrained-low
Repo https://github.com/yinhefeng/LCLRRDL
Framework none

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Title On the Connection Between Adversarial Robustness and Saliency Map Interpretability
Authors Christian Etmann, Sebastian Lunz, Peter Maass, Carola-Bibiane Schönlieb
Abstract Recent studies on the adversarial vulnerability of neural networks have shown that models trained to be more robust to adversarial attacks exhibit more interpretable saliency maps than their non-robust counterparts. We aim to quantify this behavior by considering the alignment between input image and saliency map. We hypothesize that as the distance to the decision boundary grows,so does the alignment. This connection is strictly true in the case of linear models. We confirm these theoretical findings with experiments based on models trained with a local Lipschitz regularization and identify where the non-linear nature of neural networks weakens the relation.
Tasks
Published 2019-05-10
URL https://arxiv.org/abs/1905.04172v1
PDF https://arxiv.org/pdf/1905.04172v1.pdf
PWC https://paperswithcode.com/paper/on-the-connection-between-adversarial
Repo https://github.com/cetmann/robustness-interpretability
Framework tf

JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python

Title JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python
Authors Samuel S. Schoenholz, Ekin D. Cubuk
Abstract A large fraction of computational science involves simulating the dynamics of particles that interact via pairwise or many-body interactions. These simulations, called Molecular Dynamics (MD), span a vast range of subjects from physics and materials science to biochemistry and drug discovery. Most MD software involves significant use of handwritten derivatives and code reuse across C++, FORTRAN, and CUDA. This is reminiscent of the state of machine learning before automatic differentiation became popular. In this work we bring the substantial advances in software that have taken place in machine learning to MD with JAX, M.D. (JAX MD). JAX MD is an end-to-end differentiable MD package written entirely in Python that can be just-in-time compiled to CPU, GPU, or TPU. JAX MD allows researchers to iterate extremely quickly and lets researchers easily incorporate machine learning models into their workflows. Finally, since all of the simulation code is written in Python, researchers can have unprecedented flexibility in setting up experiments without having to edit any low-level C++ or CUDA code. In addition to making existing workloads easier, JAX MD allows researchers to take derivatives through whole-simulations as well as seamlessly incorporate neural networks into simulations. This paper explores the architecture of JAX MD and its capabilities through several vignettes. Code is available at www.github.com/google/jax-md. We also provide an interactive Colab notebook that goes through all of the experiments discussed in the paper.
Tasks Drug Discovery
Published 2019-12-09
URL https://arxiv.org/abs/1912.04232v1
PDF https://arxiv.org/pdf/1912.04232v1.pdf
PWC https://paperswithcode.com/paper/jax-md-end-to-end-differentiable-hardware-1
Repo https://github.com/google/jax-md
Framework jax
comments powered by Disqus