Paper Group AWR 289
BlenderProc. Dynamic Deep Multi-task Learning for Caricature-Visual Face Recognition. Opytimizer: A Nature-Inspired Python Optimizer. WHAM!: Extending Speech Separation to Noisy Environments. Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph. ETNet: Error Transition Network for Arbitrary Style Transfer. …
BlenderProc
Title | BlenderProc |
Authors | Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Youssef Zidan, Dmitry Olefir, Mohamad Elbadrawy, Ahsan Lodhi, Harinandan Katam |
Abstract | BlenderProc is a modular procedural pipeline, which helps in generating real looking images for the training of convolutional neural networks. These can be used in a variety of use cases including segmentation, depth, normal and pose estimation and many others. A key feature of our extension of blender is the simple to use modular pipeline, which was designed to be easily extendable. By offering standard modules, which cover a variety of scenarios, we provide a starting point on which new modules can be created. |
Tasks | 3D Object Recognition, Depth Image Estimation, Instance Segmentation, Pose Estimation, Semantic Segmentation, Surface Normals Estimation |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1911.01911v1 |
https://arxiv.org/pdf/1911.01911v1.pdf | |
PWC | https://paperswithcode.com/paper/blenderproc |
Repo | https://github.com/DLR-RM/BlenderProc |
Framework | none |
Dynamic Deep Multi-task Learning for Caricature-Visual Face Recognition
Title | Dynamic Deep Multi-task Learning for Caricature-Visual Face Recognition |
Authors | Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman |
Abstract | Rather than the visual images, the face recognition of the caricatures is far from the performance of the visual images. The challenge is the extreme non-rigid distortions of the caricatures introduced by exaggerating the facial features to strengthen the characters. In this paper, we propose dynamic multi-task learning based on deep CNNs for cross-modal caricature-visual face recognition. Instead of the conventional multi-task learning with fixed weights of the tasks, the proposed dynamic multi-task learning dynamically updates the weights of tasks according to the importance of the tasks, which enables the training of the networks focus on the hard task instead of being stuck in the overtraining of the easy task. The experimental results demonstrate the effectiveness of the dynamic multi-task learning for caricature-visual face recognition. The performance evaluated on the datasets CaVI and WebCaricature show the superiority over the state-of-art methods. The implementation code is available here. |
Tasks | Caricature, Face Recognition, Multi-Task Learning |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03341v1 |
https://arxiv.org/pdf/1911.03341v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-deep-multi-task-learning-for |
Repo | https://github.com/hengxyz/cari-visual-recognition-via-multitask-learning |
Framework | tf |
Opytimizer: A Nature-Inspired Python Optimizer
Title | Opytimizer: A Nature-Inspired Python Optimizer |
Authors | Gustavo H. de Rosa, João P. Papa |
Abstract | Optimization aims at selecting a feasible set of parameters in an attempt to solve a particular problem, being applied in a wide range of applications, such as operations research, machine learning fine-tuning, and control engineering, among others. Nevertheless, traditional iterative optimization methods use the evaluation of gradients and Hessians to find their solutions, not being practical due to their computational burden and when working with non-convex functions. Recent biological-inspired methods, known as meta-heuristics, have arisen in an attempt to fulfill these problems. Even though they do not guarantee to find optimal solutions, they usually find a suitable solution. In this paper, we proposed a Python-based meta-heuristic optimization framework denoted as Opytimizer. Several methods and classes are implemented to provide a user-friendly workspace among diverse meta-heuristics, ranging from evolutionary- to swarm-based techniques. |
Tasks | |
Published | 2019-12-30 |
URL | https://arxiv.org/abs/1912.13002v1 |
https://arxiv.org/pdf/1912.13002v1.pdf | |
PWC | https://paperswithcode.com/paper/opytimizer-a-nature-inspired-python-optimizer |
Repo | https://github.com/gugarosa/opytimizer |
Framework | tf |
WHAM!: Extending Speech Separation to Noisy Environments
Title | WHAM!: Extending Speech Separation to Noisy Environments |
Authors | Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux |
Abstract | Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches. |
Tasks | Speech Separation |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01160v1 |
https://arxiv.org/pdf/1907.01160v1.pdf | |
PWC | https://paperswithcode.com/paper/wham-extending-speech-separation-to-noisy |
Repo | https://github.com/AkojimaSLP/Neural-mask-estimation |
Framework | tf |
Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph
Title | Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph |
Authors | Lu Liu, Tianyi Zhou, Guodong Long, Jing Jiang, Lina Yao, Chengqi Zhang |
Abstract | A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. The resulting graph of prototypes can be continually re-used and updated for new tasks and classes. We also introduce two practical test/inference settings which differ according to whether the test task can leverage any weakly-supervised information as in training. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04042v2 |
https://arxiv.org/pdf/1905.04042v2.pdf | |
PWC | https://paperswithcode.com/paper/prototype-propagation-networks-ppn-for-weakly |
Repo | https://github.com/liulu112601/PPN |
Framework | pytorch |
ETNet: Error Transition Network for Arbitrary Style Transfer
Title | ETNet: Error Transition Network for Arbitrary Style Transfer |
Authors | Chunjin Song, Zhijie Wu, Yang Zhou, Minglun Gong, Hui Huang |
Abstract | Numerous valuable efforts have been devoted to achieving arbitrary style transfer since the seminal work of Gatys et al. However, existing state-of-the-art approaches often generate insufficiently stylized results under challenging cases. We believe a fundamental reason is that these approaches try to generate the stylized result in a single shot and hence fail to fully satisfy the constraints on semantic structures in the content images and style patterns in the style images. Inspired by the works on error-correction, instead, we propose a self-correcting model to predict what is wrong with the current stylization and refine it accordingly in an iterative manner. For each refinement, we transit the error features across both the spatial and scale domain and invert the processed features into a residual image, with a network we call Error Transition Network (ETNet). The proposed model improves over the state-of-the-art methods with better semantic structures and more adaptive style pattern details. Various qualitative and quantitative experiments show that the key concept of both progressive strategy and error-correction leads to better results. Code and models are available at https://github.com/zhijieW94/ETNet. |
Tasks | Style Transfer |
Published | 2019-10-26 |
URL | https://arxiv.org/abs/1910.12056v2 |
https://arxiv.org/pdf/1910.12056v2.pdf | |
PWC | https://paperswithcode.com/paper/etnet-error-transition-network-for-arbitrary |
Repo | https://github.com/zhijieW94/ETNet |
Framework | tf |
LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup
Title | LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup |
Authors | Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang |
Abstract | We propose a local adversarial disentangling network (LADN) for facial makeup and de-makeup. Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details. Existing techniques do not demonstrate or fail to transfer high-frequency details in a global adversarial setting, or train a single local discriminator only to ensure image structure consistency and thus work only for relatively simple styles. Unlike others, our proposed local adversarial discriminators can distinguish whether the generated local image details are consistent with the corresponding regions in the given reference image in cross-image style transfer in an unsupervised setting. Incorporating these technical contributions, we achieve not only state-of-the-art results on conventional styles but also novel results involving complex and dramatic styles with high-frequency details covering large areas across multiple facial features. A carefully designed dataset of unpaired before and after makeup images is released. |
Tasks | Style Transfer |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11272v2 |
https://arxiv.org/pdf/1904.11272v2.pdf | |
PWC | https://paperswithcode.com/paper/ladn-local-adversarial-disentangling-network |
Repo | https://github.com/wangguanzhi/LADN |
Framework | pytorch |
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Title | Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning |
Authors | Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Mike Rabbat |
Abstract | Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws. |
Tasks | |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.04585v1 |
https://arxiv.org/pdf/1906.04585v1.pdf | |
PWC | https://paperswithcode.com/paper/gossip-based-actor-learner-architectures-for |
Repo | https://github.com/facebookresearch/gala |
Framework | pytorch |
Hard Pixel Mining for Depth Privileged Semantic Segmentation
Title | Hard Pixel Mining for Depth Privileged Semantic Segmentation |
Authors | Zhangxuan Gu, Li Niu, Haohua Zhao, Liqing Zhang |
Abstract | Semantic segmentation has achieved remarkable progress but remains challenging due to the complex scene, object occlusion, and so on. Some research works have attempted to use extra information such as a depth map to help RGB based semantic segmentation because the depth map could provide complementary geometric cues. However, due to the inaccessibility of depth sensors, depth information is usually unavailable for the test images. In this paper, we leverage only the depth of training images as the privileged information to mine the hard pixels in semantic segmentation, in which depth information is only available for training images but not available for test images. Specifically, we propose a novel Loss Weight Module, which outputs a loss weight map by employing two depth-related measurements of hard pixels: Depth Prediction Error and Depthaware Segmentation Error. The loss weight map is then applied to segmentation loss, with the goal of learning a more robust model by paying more attention to the hard pixels. Besides, we also explore a curriculum learning strategy based on the loss weight map. Meanwhile, to fully mine the hard pixels on different scales, we apply our loss weight module to multi-scale side outputs. Our hard pixels mining method achieves the state-of-the-art results on two benchmark datasets, and even outperforms the methods which need depth input during testing. |
Tasks | Depth Estimation, Semantic Segmentation |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11437v5 |
https://arxiv.org/pdf/1906.11437v5.pdf | |
PWC | https://paperswithcode.com/paper/hard-pixels-mining-learning-using-privileged |
Repo | https://github.com/strivebo/image_segmentation_dl |
Framework | tf |
META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation
Title | META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation |
Authors | Mingde Zhao, Sitao Luan, Ian Porada, Xiao-Wen Chang, Doina Precup |
Abstract | Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter $\lambda$. However, tuning this parameter can be time-consuming, and not tuning it can lead to inefficient learning. For better sample efficiency of TD-learning, we propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner. The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online, incurring roughly the same computational complexity per step as the usual value learner. Our approach can be used both in on-policy and off-policy learning. We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error. This method can be viewed as a plugin to assist prediction with function approximation by meta-learning feature (observation)-based $\lambda$ online, or even in the control case to assist policy improvement. Our empirical evaluation demonstrates significant performance improvements, as well as improved robustness of the proposed algorithm to learning rate variation. |
Tasks | Meta-Learning |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11439v5 |
https://arxiv.org/pdf/1904.11439v5.pdf | |
PWC | https://paperswithcode.com/paper/faster-and-more-accurate-learning-with-meta |
Repo | https://github.com/PwnerHarry/MTA |
Framework | none |
Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data
Title | Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data |
Authors | Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, Jingming Liu |
Abstract | Neural machine translation systems have become state-of-the-art approaches for Grammatical Error Correction (GEC) task. In this paper, we propose a copy-augmented architecture for the GEC task by copying the unchanged words from the source sentence to the target sentence. Since the GEC suffers from not having enough labeled training data to achieve high accuracy. We pre-train the copy-augmented architecture with a denoising auto-encoder using the unlabeled One Billion Benchmark and make comparisons between the fully pre-trained model and a partially pre-trained model. It is the first time copying words from the source context and fully pre-training a sequence to sequence model are experimented on the GEC task. Moreover, We add token-level and sentence-level multi-task learning for the GEC task. The evaluation results on the CoNLL-2014 test set show that our approach outperforms all recently published state-of-the-art results by a large margin. The code and pre-trained models are released at https://github.com/zhawe01/fairseq-gec. |
Tasks | Denoising, Grammatical Error Correction, Machine Translation, Multi-Task Learning |
Published | 2019-03-01 |
URL | https://arxiv.org/abs/1903.00138v3 |
https://arxiv.org/pdf/1903.00138v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-grammatical-error-correction-via |
Repo | https://github.com/zhawe01/fairseq-gec |
Framework | pytorch |
TEASPN: Framework and Protocol for Integrated Writing Assistance Environments
Title | TEASPN: Framework and Protocol for Integrated Writing Assistance Environments |
Authors | Masato Hagiwara, Takumi Ito, Tatsuki Kuribayashi, Jun Suzuki, Kentaro Inui |
Abstract | Language technologies play a key role in assisting people with their writing. Although there has been steady progress in e.g., grammatical error correction (GEC), human writers are yet to benefit from this progress due to the high development cost of integrating with writing software. We propose TEASPN, a protocol and an open-source framework for achieving integrated writing assistance environments. The protocol standardizes the way writing software communicates with servers that implement such technologies, allowing developers and researchers to integrate the latest developments in natural language processing (NLP) with low cost. As a result, users can enjoy the integrated experience in their favorite writing software. The results from experiments with human participants show that users use a wide range of technologies and rate their writing experience favorably, allowing them to write more fluent text. |
Tasks | Grammatical Error Correction |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02621v1 |
https://arxiv.org/pdf/1909.02621v1.pdf | |
PWC | https://paperswithcode.com/paper/teaspn-framework-and-protocol-for-integrated |
Repo | https://github.com/teaspn/teaspn-sdk |
Framework | none |
Face Recognition via Locality Constrained Low Rank Representation and Dictionary Learning
Title | Face Recognition via Locality Constrained Low Rank Representation and Dictionary Learning |
Authors | He-Feng Yin, Xiao-Jun Wu, Josef Kittler |
Abstract | Face recognition has been widely studied due to its importance in smart cities applications. However, the case when both training and test images are corrupted is not well solved. To address such a problem, this paper proposes a locality constrained low rank representation and dictionary learning (LCLRRDL) algorithm for robust face recognition. In particular, we present three contributions in the proposed formulation. First, a low-rank representation is introduced to handle the possible contamination of the training as well as test data. Second, a locality constraint is incorporated to acknowledge the intrinsic manifold structure of training data. With the locality constraint term, our scheme induces similar samples to have similar representations. Third, a compact dictionary is learned to handle the problem of corrupted data. The experimental results on two public databases demonstrate the effectiveness of the proposed approach. Matlab code of our proposed LCLRRDL can be downloaded from https://github.com/yinhefeng/LCLRRDL. |
Tasks | Dictionary Learning, Face Recognition, Robust Face Recognition |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03145v1 |
https://arxiv.org/pdf/1912.03145v1.pdf | |
PWC | https://paperswithcode.com/paper/face-recognition-via-locality-constrained-low |
Repo | https://github.com/yinhefeng/LCLRRDL |
Framework | none |
On the Connection Between Adversarial Robustness and Saliency Map Interpretability
Title | On the Connection Between Adversarial Robustness and Saliency Map Interpretability |
Authors | Christian Etmann, Sebastian Lunz, Peter Maass, Carola-Bibiane Schönlieb |
Abstract | Recent studies on the adversarial vulnerability of neural networks have shown that models trained to be more robust to adversarial attacks exhibit more interpretable saliency maps than their non-robust counterparts. We aim to quantify this behavior by considering the alignment between input image and saliency map. We hypothesize that as the distance to the decision boundary grows,so does the alignment. This connection is strictly true in the case of linear models. We confirm these theoretical findings with experiments based on models trained with a local Lipschitz regularization and identify where the non-linear nature of neural networks weakens the relation. |
Tasks | |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04172v1 |
https://arxiv.org/pdf/1905.04172v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-connection-between-adversarial |
Repo | https://github.com/cetmann/robustness-interpretability |
Framework | tf |
JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python
Title | JAX, M.D.: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python |
Authors | Samuel S. Schoenholz, Ekin D. Cubuk |
Abstract | A large fraction of computational science involves simulating the dynamics of particles that interact via pairwise or many-body interactions. These simulations, called Molecular Dynamics (MD), span a vast range of subjects from physics and materials science to biochemistry and drug discovery. Most MD software involves significant use of handwritten derivatives and code reuse across C++, FORTRAN, and CUDA. This is reminiscent of the state of machine learning before automatic differentiation became popular. In this work we bring the substantial advances in software that have taken place in machine learning to MD with JAX, M.D. (JAX MD). JAX MD is an end-to-end differentiable MD package written entirely in Python that can be just-in-time compiled to CPU, GPU, or TPU. JAX MD allows researchers to iterate extremely quickly and lets researchers easily incorporate machine learning models into their workflows. Finally, since all of the simulation code is written in Python, researchers can have unprecedented flexibility in setting up experiments without having to edit any low-level C++ or CUDA code. In addition to making existing workloads easier, JAX MD allows researchers to take derivatives through whole-simulations as well as seamlessly incorporate neural networks into simulations. This paper explores the architecture of JAX MD and its capabilities through several vignettes. Code is available at www.github.com/google/jax-md. We also provide an interactive Colab notebook that goes through all of the experiments discussed in the paper. |
Tasks | Drug Discovery |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.04232v1 |
https://arxiv.org/pdf/1912.04232v1.pdf | |
PWC | https://paperswithcode.com/paper/jax-md-end-to-end-differentiable-hardware-1 |
Repo | https://github.com/google/jax-md |
Framework | jax |