April 3, 2020

3079 words 15 mins read

Paper Group AWR 32

Paper Group AWR 32

Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals. GANILLA: Generative Adversarial Networks for Image to Illustration Translation. MNN: A Universal and Efficient Inference Engine. Collaborative Distillation for Ultra-Resolution Universal Style Transfer. An Explicit Local and Global Representation Disentanglement Framewo …

Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals

Title Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals
Authors Eric Guizzo, Tillman Weyde, Jack Barnett Leveson
Abstract Robustness against temporal variations is important for emotion recognition from speech audio, since emotion is ex-pressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. MTS extends convolutional neural networks with convolution kernels that are scaled and re-sampled along the time axis, to increase temporal flexibility without increasing the number of trainable parameters compared to standard convolutional layers. We evaluate MTS and standard convolutional layers in different architectures for emotion recognition from speech audio, using 4 datasets of different sizes. The results show that the use of MTS layers consistently improves the generalization of networks of different capacity and depth, compared to standard convolution, especially on smaller datasets
Tasks Emotion Recognition
Published 2020-03-06
URL https://arxiv.org/abs/2003.03375v1
PDF https://arxiv.org/pdf/2003.03375v1.pdf
PWC https://paperswithcode.com/paper/multi-time-scale-convolution-for-emotion
Repo https://github.com/ericguizzo/multi_time_scale
Framework pytorch

GANILLA: Generative Adversarial Networks for Image to Illustration Translation

Title GANILLA: Generative Adversarial Networks for Image to Illustration Translation
Authors Samet Hicsonmez, Nermin Samet, Emre Akbas, Pinar Duygulu
Abstract In this paper, we explore illustrations in children’s books as a new domain in unpaired image-to-image translation. We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time. We propose a new generator network to address this issue and show that the resulting network strikes a better balance between style and content. There are no well-defined or agreed-upon evaluation metrics for unpaired image-to-image translation. So far, the success of image translation models has been based on subjective, qualitative visual comparison on a limited number of images. To address this problem, we propose a new framework for the quantitative evaluation of image-to-illustration models, where both content and style are taken into account using separate classifiers. In this new evaluation framework, our proposed model performs better than the current state-of-the-art models on the illustrations dataset. Our code and pretrained models can be found at https://github.com/giddyyupp/ganilla.
Tasks Image-to-Image Translation
Published 2020-02-13
URL https://arxiv.org/abs/2002.05638v2
PDF https://arxiv.org/pdf/2002.05638v2.pdf
PWC https://paperswithcode.com/paper/ganilla-generative-adversarial-networks-for
Repo https://github.com/giddyyupp/ganilla
Framework pytorch

MNN: A Universal and Efficient Inference Engine

Title MNN: A Universal and Efficient Inference Engine
Authors Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, Zhihua Wu
Abstract Deploying deep learning models on mobile devices draws more and more attention recently. However, designing an efficient inference engine on devices is under the great challenges of model compatibility, device diversity, and resource limitation. To deal with these challenges, we propose Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications. In this paper, the contributions of MNN include: (1) presenting a mechanism called pre-inference that manages to conduct runtime optimization; (2)deliveringthorough kernel optimization on operators to achieve optimal computation performance; (3) introducing backend abstraction module which enables hybrid scheduling and keeps the engine lightweight. Extensive benchmark experiments demonstrate that MNN performs favorably against other popular lightweight deep learning frameworks. MNN is available to public at: https://github.com/alibaba/MNN.
Tasks
Published 2020-02-27
URL https://arxiv.org/abs/2002.12418v1
PDF https://arxiv.org/pdf/2002.12418v1.pdf
PWC https://paperswithcode.com/paper/mnn-a-universal-and-efficient-inference
Repo https://github.com/alibaba/MNN
Framework tf

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

Title Collaborative Distillation for Ultra-Resolution Universal Style Transfer
Authors Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang
Abstract Universal style transfer methods typically leverage rich representations from deep Convolutional Neural Network (CNN) models (e.g., VGG-19) pre-trained on large collections of images. Despite the effectiveness, its application is heavily constrained by the large model size to handle ultra-resolution images given limited memory. In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters. The main idea is underpinned by a finding that the encoder-decoder pairs construct an exclusive collaborative relationship, which is regarded as a new kind of knowledge for style transfer models. Moreover, to overcome the feature size mismatch when applying collaborative distillation, a linear embedding loss is introduced to drive the student network to learn a linear embedding of the teacher’s features. Extensive experiments show the effectiveness of our method when applied to different universal style transfer approaches (WCT and AdaIN), even if the model size is reduced by 15.5 times. Especially, on WCT with the compressed models, we achieve ultra-resolution (over 40 megapixels) universal style transfer on a 12GB GPU for the first time. Further experiments on optimization-based stylization scheme show the generality of our algorithm on different stylization paradigms. Our code and trained models are available at https://github.com/mingsun-tse/collaborative-distillation.
Tasks Style Transfer
Published 2020-03-18
URL https://arxiv.org/abs/2003.08436v2
PDF https://arxiv.org/pdf/2003.08436v2.pdf
PWC https://paperswithcode.com/paper/collaborative-distillation-for-ultra
Repo https://github.com/mingsun-tse/collaborative-distillation
Framework pytorch

An Explicit Local and Global Representation Disentanglement Framework with Applications in Deep Clustering and Unsupervised Object Detection

Title An Explicit Local and Global Representation Disentanglement Framework with Applications in Deep Clustering and Unsupervised Object Detection
Authors Rujikorn Charakorn, Yuttapong Thawornwattana, Sirawaj Itthipuripat, Nick Pawlowski, Poramate Manoonpong, Nat Dilokthanakul
Abstract Visual data can be understood at different levels of granularity, where global features correspond to semantic-level information and local features correspond to texture patterns. In this work, we propose a framework, called SPLIT, which allows us to disentangle local and global information into two separate sets of latent variables within the variational autoencoder (VAE) framework. Our framework adds generative assumption to the VAE by requiring a subset of the latent variables to generate an auxiliary set of observable data. This additional generative assumption primes the latent variables to local information and encourages the other latent variables to represent global information. We examine three different flavours of VAEs with different generative assumptions. We show that the framework can effectively disentangle local and global information within these models leads to improved representation, with better clustering and unsupervised object detection benchmarks. Finally, we establish connections between SPLIT and recent research in cognitive neuroscience regarding the disentanglement in human visual perception. The code for our experiments is at https://github.com/51616/split-vae .
Tasks Object Detection, Representation Learning, Style Transfer
Published 2020-01-24
URL https://arxiv.org/abs/2001.08957v2
PDF https://arxiv.org/pdf/2001.08957v2.pdf
PWC https://paperswithcode.com/paper/an-explicit-local-and-global-representation
Repo https://github.com/51616/split-vae
Framework tf

Continual Local Replacement for Few-shot Learning

Title Continual Local Replacement for Few-shot Learning
Authors Canyu Le, Zhonggui Chen, Xihan Wei, Biao Wang, Lei Zhang
Abstract The goal of few-shot learning is to learn a model that can recognize novel classes based on one or few training data. It is challenging mainly due to two aspects: (1) it lacks good feature representation of novel classes; (2) a few of labeled data could not accurately represent the true data distribution and thus it’s hard to learn a good decision function for classification. In this work, we use a sophisticated network architecture to learn better feature representation and focus on the second issue. A novel continual local replacement strategy is proposed to address the data deficiency problem. It takes advantage of the content in unlabeled images to continually enhance labeled ones. Specifically, a pseudo labeling method is adopted to constantly select semantically similar images on the fly. Original labeled images will be locally replaced by the selected images for the next epoch training. In this way, the model can directly learn new semantic information from unlabeled images and the capacity of supervised signals in the embedding space can be significantly enlarged. This allows the model to improve generalization and learn a better decision boundary for classification. Our method is conceptually simple and easy to implement. Extensive experiments demonstrate that it can achieve state-of-the-art results on various few-shot image recognition benchmarks.
Tasks Few-Shot Learning
Published 2020-01-23
URL https://arxiv.org/abs/2001.08366v2
PDF https://arxiv.org/pdf/2001.08366v2.pdf
PWC https://paperswithcode.com/paper/continual-local-replacement-for-few-shot
Repo https://github.com/Lecanyu/ContinualLocalReplacement
Framework pytorch

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Title FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Authors Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel
Abstract Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model’s performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model’s predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 – just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch’s success. We make our code available at https://github.com/google-research/fixmatch.
Tasks Semi-Supervised Image Classification
Published 2020-01-21
URL https://arxiv.org/abs/2001.07685v1
PDF https://arxiv.org/pdf/2001.07685v1.pdf
PWC https://paperswithcode.com/paper/fixmatch-simplifying-semi-supervised-learning
Repo https://github.com/kekmodel/FixMatch-pytorch
Framework pytorch

Adversarial Detection and Correction by Matching Prediction Distributions

Title Adversarial Detection and Correction by Matching Prediction Distributions
Authors Giovanni Vacanti, Arnaud Van Looveren
Abstract We present a novel adversarial detection and correction method for machine learning classifiers.The detector consists of an autoencoder trained with a custom loss function based on the Kullback-Leibler divergence between the classifier predictions on the original and reconstructed instances.The method is unsupervised, easy to train and does not require any knowledge about the underlying attack. The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST, and remains very effective on CIFAR-10 when the attack is granted full access to the classification model but not the defence. We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence and investigate the robustness of the attack. The method is very flexible and can also be used to detect common data corruptions and perturbations which negatively impact the model performance. We illustrate this capability on the CIFAR-10-C dataset.
Tasks
Published 2020-02-21
URL https://arxiv.org/abs/2002.09364v1
PDF https://arxiv.org/pdf/2002.09364v1.pdf
PWC https://paperswithcode.com/paper/adversarial-detection-and-correction-by
Repo https://github.com/SeldonIO/alibi-detect
Framework tf

Convex Density Constraints for Computing Plausible Counterfactual Explanations

Title Convex Density Constraints for Computing Plausible Counterfactual Explanations
Authors André Artelt, Barbara Hammer
Abstract The increasing deployment of machine learning as well as legal regulations such as EU’s GDPR cause a need for user-friendly explanations of decisions proposed by machine learning models. Counterfactual explanations are considered as one of the most popular techniques to explain a specific decision of a model. While the computation of “arbitrary” counterfactual explanations is well studied, it is still an open research problem how to efficiently compute plausible and feasible counterfactual explanations. We build upon recent work and propose and study a formal definition of plausible counterfactual explanations. In particular, we investigate how to use density estimators for enforcing plausibility and feasibility of counterfactual explanations. For the purpose of efficient computations, we propose convex density constraints that ensure that the resulting counterfactual is located in a region of the data space of high density.
Tasks
Published 2020-02-12
URL https://arxiv.org/abs/2002.04862v1
PDF https://arxiv.org/pdf/2002.04862v1.pdf
PWC https://paperswithcode.com/paper/convex-density-constraints-for-computing
Repo https://github.com/andreArtelt/ConvexDensityConstraintsForPlausibleCounterfactuals
Framework none

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Title Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
Authors Shuyan Zhou, Shruti Rijhawani, John Wieting, Jaime Carbonell, Graham Neubig
Abstract Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages (HRL), but these do not extend well to low-resource languages (LRL) with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the LRL by utilizing resources in closely-related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: we experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared to state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL.
Tasks Cross-Lingual Entity Linking, Entity Linking, Transfer Learning
Published 2020-03-03
URL https://arxiv.org/abs/2003.01343v1
PDF https://arxiv.org/pdf/2003.01343v1.pdf
PWC https://paperswithcode.com/paper/improving-candidate-generation-for-low
Repo https://github.com/shuyanzhou/pbel_plus
Framework pytorch

Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data

Title Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data
Authors Marc Finzi, Samuel Stanton, Pavel Izmailov, Andrew Gordon Wilson
Abstract The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for non-image data. We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group with a surjective exponential map. Incorporating equivariance to a new group requires implementing only the group exponential and logarithm maps, enabling rapid prototyping. Showcasing the simplicity and generality of our method, we apply the same model architecture to images, ball-and-stick molecular data, and Hamiltonian dynamical systems. For Hamiltonian systems, the equivariance of our models is especially impactful, leading to exact conservation of linear and angular momentum.
Tasks
Published 2020-02-25
URL https://arxiv.org/abs/2002.12880v1
PDF https://arxiv.org/pdf/2002.12880v1.pdf
PWC https://paperswithcode.com/paper/generalizing-convolutional-neural-networks
Repo https://github.com/mfinzi/LieConv
Framework pytorch

Predict your Click-out: Modeling User-Item Interactions and Session Actions in an Ensemble Learning Fashion

Title Predict your Click-out: Modeling User-Item Interactions and Session Actions in an Ensemble Learning Fashion
Authors Andrea Fiandro, Giorgio Crepaldi, Diego Monti, Giuseppe Rizzo, Maurizio Morisio
Abstract This paper describes the solution of the POLINKS team to the RecSys Challenge 2019 that focuses on the task of predicting the last click-out in a session-based interaction. We propose an ensemble approach comprising a matrix factorization for modeling the interaction user-item, and a session-aware learning model implemented with a recurrent neural network. This method appears to be effective in predicting the last click-out scoring a 0.60277 of Mean Reciprocal Rank on the local test set.
Tasks
Published 2020-02-08
URL https://arxiv.org/abs/2002.03124v1
PDF https://arxiv.org/pdf/2002.03124v1.pdf
PWC https://paperswithcode.com/paper/predict-your-click-out-modeling-user-item
Repo https://github.com/D2KLab/touringrec
Framework none

Centralized and distributed online learning for sparse time-varying optimization

Title Centralized and distributed online learning for sparse time-varying optimization
Authors Sophie M. Fosson
Abstract The development of online algorithms to track time-varying systems has drawn a lot of attention in the last years, in particular in the framework of online convex optimization. Meanwhile, sparse time-varying optimization has emerged as a powerful tool to deal with widespread applications, ranging from dynamic compressed sensing to parsimonious system identification. In most of the literature on sparse time-varying problems, some prior information on the system’s evolution is assumed to be available. In contrast, in this paper, we propose an online learning approach, which does not employ a given model and is suitable for adversarial frameworks. Specifically, we develop centralized and distributed algorithms, and we theoretically analyze them in terms of dynamic regret, in an online learning perspective. Further, we propose numerical experiments that illustrate their practical effectiveness.
Tasks
Published 2020-01-31
URL https://arxiv.org/abs/2001.11939v1
PDF https://arxiv.org/pdf/2001.11939v1.pdf
PWC https://paperswithcode.com/paper/centralized-and-distributed-online-learning
Repo https://github.com/sophie27/Sparse-Time-Varying-Optimization
Framework none

Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Title Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering
Authors Chihao Zhang, Yang Yang, Wei Zhang, Shihua Zhang
Abstract Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it is still inefficient or infeasible to process very big data using such a method in a single machine. Moreover, big data are often distributedly collected and stored on different machines. Thus, such data generally bear strong heterogeneous noise. It is essential and useful to develop distributed matrix decomposition for big data analytics. Such a method should scale up well, model the heterogeneous noise, and address the communication issue in a distributed system. To this end, we propose a distributed Bayesian matrix decomposition model (DBMD) for big data mining and clustering. Specifically, we adopt three strategies to implement the distributed computing including 1) the accelerated gradient descent, 2) the alternating direction method of multipliers (ADMM), and 3) the statistical inference. We investigate the theoretical convergence behaviors of these algorithms. To address the heterogeneity of the noise, we propose an optimal plug-in weighted average that reduces the variance of the estimation. Synthetic experiments validate our theoretical results, and real-world experiments show that our algorithms scale up well to big data and achieves superior or competing performance compared to other distributed methods.
Tasks
Published 2020-02-10
URL https://arxiv.org/abs/2002.03703v1
PDF https://arxiv.org/pdf/2002.03703v1.pdf
PWC https://paperswithcode.com/paper/distributed-bayesian-matrix-decomposition-for
Repo https://github.com/zhanglabtools/dbmd-dev
Framework none

A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU

Title A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU
Authors Shuang Li, Jiaxi Jiang, Philipp Ruppel, Hongzhuo Liang, Xiaojian Ma, Norman Hendrich, Fuchun Sun, Jianwei Zhang
Abstract In this paper, we present a multimodal mobile teleoperation system that consists of a novel vision-based hand pose regression network (Transteleop) and an IMU-based arm tracking method. Transteleop observes the human hand through a low-cost depth camera and generates not only joint angles but also depth images of paired robot hand poses through an image-to-image translation process. A keypoint-based reconstruction loss explores the resemblance in appearance and anatomy between human and robotic hands and enriches the local features of reconstructed images. A wearable camera holder enables simultaneous hand-arm control and facilitates the mobility of the whole teleoperation system. Network evaluation results on a test dataset and a variety of complex manipulation tasks that go beyond simple pick-and-place operations show the efficiency and stability of our multimodal teleoperation system.
Tasks Image-to-Image Translation
Published 2020-03-11
URL https://arxiv.org/abs/2003.05212v1
PDF https://arxiv.org/pdf/2003.05212v1.pdf
PWC https://paperswithcode.com/paper/a-mobile-robot-hand-arm-teleoperation-system
Repo https://github.com/Smilels/multimodal-translation-teleop
Framework pytorch
comments powered by Disqus