October 20, 2019

3313 words 16 mins read

Paper Group AWR 339

Training Generative Reversible Networks. Deep Micro-Dictionary Learning and Coding Network. The Cinderella Complex: Word Embeddings Reveal ender Stereotypes in Movies and Books. Nonparallel Emotional Speech Conversion. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data. Generating Text through Adversarial Training using Skip-Tho …

Training Generative Reversible Networks


Title	Training Generative Reversible Networks
Authors	Robin Tibor Schirrmeister, Patryk Chrabąszcz, Frank Hutter, Tonio Ball
Abstract	Generative models with an encoding component such as autoencoders currently receive great interest. However, training of autoencoders is typically complicated by the need to train a separate encoder and decoder model that have to be enforced to be reciprocal to each other. To overcome this problem, by-design reversible neural networks (RevNets) had been previously used as generative models either directly optimizing the likelihood of the data under the model or using an adversarial approach on the generated data. Here, we instead investigate their performance using an adversary on the latent space in the adversarial autoencoder framework. We investigate the generative performance of RevNets on the CelebA dataset, showing that generative RevNets can generate coherent faces with similar quality as Variational Autoencoders. This first attempt to use RevNets inside the adversarial autoencoder framework slightly underperformed relative to recent advanced generative models using an autoencoder component on CelebA, but this gap may diminish with further optimization of the training setup of generative RevNets. In addition to the experiments on CelebA, we show a proof-of-principle experiment on the MNIST dataset suggesting that adversary-free trained RevNets can discover meaningful latent dimensions without pre-specifying the number of dimensions of the latent sampling distribution. In summary, this study shows that RevNets can be employed in different generative training settings. Source code for this study is at https://github.com/robintibor/generative-reversible
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01610v4
PDF	http://arxiv.org/pdf/1806.01610v4.pdf
PWC	https://paperswithcode.com/paper/training-generative-reversible-networks
Repo	https://github.com/robintibor/generative-reversible
Framework	pytorch

Deep Micro-Dictionary Learning and Coding Network


Title	Deep Micro-Dictionary Learning and Coding Network
Authors	Hao Tang, Heng Wei, Wei Xiao, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe
Abstract	In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN). DDLCN has most of the standard deep learning layers (pooling, fully, connected, input/output, etc.) but the main difference is that the fundamental convolutional layers are replaced by novel compound dictionary learning and coding layers. The dictionary learning layer learns an over-complete dictionary for the input training data. At the deep coding layer, a locality constraint is added to guarantee that the activated dictionary bases are close to each other. Next, the activated dictionary atoms are assembled together and passed to the next compound dictionary learning and coding layers. In this way, the activated atoms in the first layer can be represented by the deeper atoms in the second dictionary. Intuitively, the second dictionary is designed to learn the fine-grained components which are shared among the input dictionary atoms. In this way, a more informative and discriminative low-level representation of the dictionary atoms can be obtained. We empirically compare the proposed DDLCN with several dictionary learning methods and deep learning architectures. The experimental results on four popular benchmark datasets demonstrate that the proposed DDLCN achieves competitive results compared with state-of-the-art approaches.
Tasks	Dictionary Learning
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04185v2
PDF	http://arxiv.org/pdf/1809.04185v2.pdf
PWC	https://paperswithcode.com/paper/deep-micro-dictionary-learning-and-coding
Repo	https://github.com/Ha0Tang/DDLCN
Framework	none

The Cinderella Complex: Word Embeddings Reveal ender Stereotypes in Movies and Books


Title	The Cinderella Complex: Word Embeddings Reveal ender Stereotypes in Movies and Books
Authors	Huimin Xu, Zhang Zhang, Lingfei Wu, Cheng-Jun Wang
Abstract	Our analysis of thousands of movies and books reveals how these cultural products weave stereotypical gender roles into morality tales and perpetuate gender inequality through storytelling. Using the word embedding techniques, we reveal the constructed emotional dependency of female characters on male characters in stories.
Tasks	Word Embeddings
Published	2018-11-12
URL	https://arxiv.org/abs/1811.04599v3
PDF	https://arxiv.org/pdf/1811.04599v3.pdf
PWC	https://paperswithcode.com/paper/the-hidden-shape-of-stories-reveals
Repo	https://github.com/wenoptics/viz-of-gender-stereotypes-from-texts
Framework	none

Nonparallel Emotional Speech Conversion


Title	Nonparallel Emotional Speech Conversion
Authors	Jian Gao, Deep Chakraborty, Hamidou Tembine, Olaitan Olaleye
Abstract	We propose a nonparallel data-driven emotional speech conversion method. It enables the transfer of emotion-related characteristics of a speech signal while preserving the speaker’s identity and linguistic content. Most existing approaches require parallel data and time alignment, which is not available in most real applications. We achieve nonparallel training based on an unsupervised style transfer technique, which learns a translation model between two distributions instead of a deterministic one-to-one mapping between paired examples. The conversion model consists of an encoder and a decoder for each emotion domain. We assume that the speech signal can be decomposed into an emotion-invariant content code and an emotion-related style code in latent space. Emotion conversion is performed by extracting and recombining the content code of the source speech and the style code of the target emotion. We tested our method on a nonparallel corpora with four emotions. Both subjective and objective evaluations show the effectiveness of our approach.
Tasks	Style Transfer
Published	2018-11-03
URL	http://arxiv.org/abs/1811.01174v2
PDF	http://arxiv.org/pdf/1811.01174v2.pdf
PWC	https://paperswithcode.com/paper/nonparallel-emotional-speech-conversion
Repo	https://github.com/Speech-VINO/SER
Framework	pytorch

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data


Title	Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Authors	Antoine Miech, Ivan Laptev, Josef Sivic
Abstract	Joint understanding of video and language is an active research area with many applications. Prior work in this domain typically relies on learning text-video embeddings. One difficulty with this approach, however, is the lack of large-scale annotated video-caption datasets for training. To address this issue, we aim at learning text-video embeddings from heterogeneous data sources. To this end, we propose a Mixture-of-Embedding-Experts (MEE) model with ability to handle missing input modalities during training. As a result, our framework can learn improved text-video embeddings simultaneously from image and video datasets. We also show the generalization of MEE to other input modalities such as face descriptors. We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets. The proposed MEE model demonstrates significant improvements and outperforms previously reported methods on both text-to-video and video-to-text retrieval tasks. Code is available at: https://github.com/antoine77340/Mixture-of-Embedding-Experts
Tasks	Video Retrieval
Published	2018-04-07
URL	https://arxiv.org/abs/1804.02516v2
PDF	https://arxiv.org/pdf/1804.02516v2.pdf
PWC	https://paperswithcode.com/paper/learning-a-text-video-embedding-from
Repo	https://github.com/jayleicn/TVRetrieval
Framework	pytorch

Generating Text through Adversarial Training using Skip-Thought Vectors


Title	Generating Text through Adversarial Training using Skip-Thought Vectors
Authors	Afroz Ahamad
Abstract	In the past few years, various advancements have been made in generative models owing to the formulation of Generative Adversarial Networks (GANs). GANs have been shown to perform exceedingly well on a wide variety of tasks pertaining to image generation and style transfer. In the field of Natural Language Processing, word embeddings such as word2vec and GLoVe are state-of-the-art methods for applying neural network models on textual data. Attempts have been made for utilizing GANs with word embeddings for text generation. This work presents an approach to text generation using Skip-Thought sentence embeddings in conjunction with GANs based on gradient penalty functions and f-measures. The results of using sentence embeddings with GANs for generating text conditioned on input information are comparable to the approaches where word embeddings are used.
Tasks	Sentence Embeddings, Style Transfer, Text Generation, Word Embeddings
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08703v2
PDF	http://arxiv.org/pdf/1808.08703v2.pdf
PWC	https://paperswithcode.com/paper/generating-text-through-adversarial-training
Repo	https://github.com/enigmaeth/skip-thought-gan
Framework	none

Deep Learning: An Introduction for Applied Mathematicians


Title	Deep Learning: An Introduction for Applied Mathematicians
Authors	Catherine F. Higham, Desmond J. Higham
Abstract	Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature.
Tasks	Image Classification
Published	2018-01-17
URL	http://arxiv.org/abs/1801.05894v1
PDF	http://arxiv.org/pdf/1801.05894v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-an-introduction-for-applied
Repo	https://github.com/jamesrynn/Basic_DNN
Framework	none

Introducing the Simulated Flying Shapes and Simulated Planar Manipulator Datasets


Title	Introducing the Simulated Flying Shapes and Simulated Planar Manipulator Datasets
Authors	Fabio Ferreira, Jonas Rothfuss, Eren Erdal Aksoy, You Zhou, Tamim Asfour
Abstract	We release two artificial datasets, Simulated Flying Shapes and Simulated Planar Manipulator that allow to test the learning ability of video processing systems. In particular, the dataset is meant as a tool which allows to easily assess the sanity of deep neural network models that aim to encode, reconstruct or predict video frame sequences. The datasets each consist of 90000 videos. The Simulated Flying Shapes dataset comprises scenes showing two objects of equal shape (rectangle, triangle and circle) and size in which one object approaches its counterpart. The Simulated Planar Manipulator shows a 3-DOF planar manipulator that executes a pick-and-place task in which it has to place a size-varying circle on a squared platform. Different from other widely used datasets such as moving MNIST [1], [2], the two presented datasets involve goal-oriented tasks (e.g. the manipulator grasping an object and placing it on a platform), rather than showing random movements. This makes our datasets more suitable for testing prediction capabilities and the learning of sophisticated motions by a machine learning model. This technical document aims at providing an introduction into the usage of both datasets.
Tasks
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00703v1
PDF	http://arxiv.org/pdf/1807.00703v1.pdf
PWC	https://paperswithcode.com/paper/introducing-the-simulated-flying-shapes-and
Repo	https://github.com/ferreirafabio/FlyingShapesDataset
Framework	tf

Conditional Neural Processes


Title	Conditional Neural Processes
Authors	Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami
Abstract	Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification and image completion.
Tasks	Gaussian Processes
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01613v1
PDF	http://arxiv.org/pdf/1807.01613v1.pdf
PWC	https://paperswithcode.com/paper/conditional-neural-processes
Repo	https://github.com/CocoJam/Conditional_Neural_process
Framework	none

High-speed Tracking with Multi-kernel Correlation Filters


Title	High-speed Tracking with Multi-kernel Correlation Filters
Authors	Ming Tang, Bin Yu, Fan Zhang, Jinqiao Wang
Abstract	Correlation filter (CF) based trackers are currently ranked top in terms of their performances. Nevertheless, only some of them, such as KCF~\cite{henriques15} and MKCF~\cite{tangm15}, are able to exploit the powerful discriminability of non-linear kernels. Although MKCF achieves more powerful discriminability than KCF through introducing multi-kernel learning (MKL) into KCF, its improvement over KCF is quite limited and its computational burden increases significantly in comparison with KCF. In this paper, we will introduce the MKL into KCF in a different way than MKCF. We reformulate the MKL version of CF objective function with its upper bound, alleviating the negative mutual interference of different kernels significantly. Our novel MKCF tracker, MKCFup, outperforms KCF and MKCF with large margins and can still work at very high fps. Extensive experiments on public datasets show that our method is superior to state-of-the-art algorithms for target objects of small move at very high speed.
Tasks
Published	2018-06-17
URL	http://arxiv.org/abs/1806.06418v1
PDF	http://arxiv.org/pdf/1806.06418v1.pdf
PWC	https://paperswithcode.com/paper/high-speed-tracking-with-multi-kernel
Repo	https://github.com/lukaswals/cf-trackers
Framework	pytorch

CosFace: Large Margin Cosine Loss for Deep Face Recognition


Title	CosFace: Large Margin Cosine Loss for Deep Face Recognition
Authors	Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, Wei Liu
Abstract	Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by $L_2$ normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach.
Tasks	Face Identification, Face Recognition, Face Verification
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09414v2
PDF	http://arxiv.org/pdf/1801.09414v2.pdf
PWC	https://paperswithcode.com/paper/cosface-large-margin-cosine-loss-for-deep
Repo	https://github.com/rupaai/60DaysOfUdacity
Framework	pytorch

LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking


Title	LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking
Authors	Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling
Abstract	In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking. LaSOT consists of 1,400 sequences with more than 3.5M frames in total. Each frame in these sequences is carefully and manually annotated with a bounding box, making LaSOT the largest, to the best of our knowledge, densely annotated tracking benchmark. The average video length of LaSOT is more than 2,500 frames, and each sequence comprises various challenges deriving from the wild where target objects may disappear and re-appear again in the view. By releasing LaSOT, we expect to provide the community with a large-scale dedicated benchmark with high quality for both the training of deep trackers and the veritable evaluation of tracking algorithms. Moreover, considering the close connections of visual appearance and natural language, we enrich LaSOT by providing additional language specification, aiming at encouraging the exploration of natural linguistic feature for tracking. A thorough experimental evaluation of 35 tracking algorithms on LaSOT is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements.
Tasks	Object Tracking
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07845v2
PDF	http://arxiv.org/pdf/1809.07845v2.pdf
PWC	https://paperswithcode.com/paper/lasot-a-high-quality-benchmark-for-large
Repo	https://github.com/HengLan/LaSOT_Evaluation_Toolkit
Framework	none

BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN


Title	BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN
Authors	Jogendra Nath Kundu, Maharshi Gor, R. Venkatesh Babu
Abstract	Human motion prediction model has applications in various fields of computer vision. Without taking into account the inherent stochasticity in the prediction of future pose dynamics, such methods often converges to a deterministic undesired mean of multiple probable outcomes. Devoid of this, we propose a novel probabilistic generative approach called Bidirectional Human motion prediction GAN, or BiHMP-GAN. To be able to generate multiple probable human-pose sequences, conditioned on a given starting sequence, we introduce a random extrinsic factor r, drawn from a predefined prior distribution. Furthermore, to enforce a direct content loss on the predicted motion sequence and also to avoid mode-collapse, a novel bidirectional framework is incorporated by modifying the usual discriminator architecture. The discriminator is trained also to regress this extrinsic factor r, which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. To further regularize the training, we introduce a novel recursive prediction strategy. In spite of being in a probabilistic framework, the enhanced discriminator architecture allows predictions of an intermediate part of pose sequence to be used as a conditioning for prediction of the latter part of the sequence. The bidirectional setup also provides a new direction to evaluate the prediction quality against a given test sequence. For a fair assessment of BiHMP-GAN, we report performance of the generated motion sequence using (i) a critic model trained to discriminate between real and fake motion sequence, and (ii) an action classifier trained on real human motion dynamics. Outcomes of both qualitative and quantitative evaluations, on the probabilistic generations of the model, demonstrate the superiority of BiHMP-GAN over previously available methods.
Tasks	motion prediction
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02591v1
PDF	http://arxiv.org/pdf/1812.02591v1.pdf
PWC	https://paperswithcode.com/paper/bihmp-gan-bidirectional-3d-human-motion
Repo	https://github.com/maharshi95/Pose2vec
Framework	tf

Training behavior of deep neural network in frequency domain


Title	Training behavior of deep neural network in frequency domain
Authors	Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao
Abstract	Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle help understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insights into a general principle underlying DNN optimization and generalization.
Tasks
Published	2018-07-03
URL	https://arxiv.org/abs/1807.01251v6
PDF	https://arxiv.org/pdf/1807.01251v6.pdf
PWC	https://paperswithcode.com/paper/training-behavior-of-deep-neural-network-in
Repo	https://github.com/xuzhiqin1990/F-Principle
Framework	tf

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data


Title	A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data
Authors	Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla
Abstract	Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-of-the-art baseline methods.
Tasks	Anomaly Detection, Time Series, Unsupervised Anomaly Detection
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08055v1
PDF	http://arxiv.org/pdf/1811.08055v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-neural-network-for-unsupervised
Repo	https://github.com/albertwujj/MSCRED
Framework	tf