Paper Group AWR 339
Training Generative Reversible Networks. Deep Micro-Dictionary Learning and Coding Network. The Cinderella Complex: Word Embeddings Reveal ender Stereotypes in Movies and Books. Nonparallel Emotional Speech Conversion. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data. Generating Text through Adversarial Training using Skip-Tho …
Training Generative Reversible Networks
Title | Training Generative Reversible Networks |
Authors | Robin Tibor Schirrmeister, Patryk Chrabąszcz, Frank Hutter, Tonio Ball |
Abstract | Generative models with an encoding component such as autoencoders currently receive great interest. However, training of autoencoders is typically complicated by the need to train a separate encoder and decoder model that have to be enforced to be reciprocal to each other. To overcome this problem, by-design reversible neural networks (RevNets) had been previously used as generative models either directly optimizing the likelihood of the data under the model or using an adversarial approach on the generated data. Here, we instead investigate their performance using an adversary on the latent space in the adversarial autoencoder framework. We investigate the generative performance of RevNets on the CelebA dataset, showing that generative RevNets can generate coherent faces with similar quality as Variational Autoencoders. This first attempt to use RevNets inside the adversarial autoencoder framework slightly underperformed relative to recent advanced generative models using an autoencoder component on CelebA, but this gap may diminish with further optimization of the training setup of generative RevNets. In addition to the experiments on CelebA, we show a proof-of-principle experiment on the MNIST dataset suggesting that adversary-free trained RevNets can discover meaningful latent dimensions without pre-specifying the number of dimensions of the latent sampling distribution. In summary, this study shows that RevNets can be employed in different generative training settings. Source code for this study is at https://github.com/robintibor/generative-reversible |
Tasks | |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01610v4 |
http://arxiv.org/pdf/1806.01610v4.pdf | |
PWC | https://paperswithcode.com/paper/training-generative-reversible-networks |
Repo | https://github.com/robintibor/generative-reversible |
Framework | pytorch |
Deep Micro-Dictionary Learning and Coding Network
Title | Deep Micro-Dictionary Learning and Coding Network |
Authors | Hao Tang, Heng Wei, Wei Xiao, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe |
Abstract | In this paper, we propose a novel Deep Micro-Dictionary Learning and Coding Network (DDLCN). DDLCN has most of the standard deep learning layers (pooling, fully, connected, input/output, etc.) but the main difference is that the fundamental convolutional layers are replaced by novel compound dictionary learning and coding layers. The dictionary learning layer learns an over-complete dictionary for the input training data. At the deep coding layer, a locality constraint is added to guarantee that the activated dictionary bases are close to each other. Next, the activated dictionary atoms are assembled together and passed to the next compound dictionary learning and coding layers. In this way, the activated atoms in the first layer can be represented by the deeper atoms in the second dictionary. Intuitively, the second dictionary is designed to learn the fine-grained components which are shared among the input dictionary atoms. In this way, a more informative and discriminative low-level representation of the dictionary atoms can be obtained. We empirically compare the proposed DDLCN with several dictionary learning methods and deep learning architectures. The experimental results on four popular benchmark datasets demonstrate that the proposed DDLCN achieves competitive results compared with state-of-the-art approaches. |
Tasks | Dictionary Learning |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04185v2 |
http://arxiv.org/pdf/1809.04185v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-micro-dictionary-learning-and-coding |
Repo | https://github.com/Ha0Tang/DDLCN |
Framework | none |
The Cinderella Complex: Word Embeddings Reveal ender Stereotypes in Movies and Books
Title | The Cinderella Complex: Word Embeddings Reveal ender Stereotypes in Movies and Books |
Authors | Huimin Xu, Zhang Zhang, Lingfei Wu, Cheng-Jun Wang |
Abstract | Our analysis of thousands of movies and books reveals how these cultural products weave stereotypical gender roles into morality tales and perpetuate gender inequality through storytelling. Using the word embedding techniques, we reveal the constructed emotional dependency of female characters on male characters in stories. |
Tasks | Word Embeddings |
Published | 2018-11-12 |
URL | https://arxiv.org/abs/1811.04599v3 |
https://arxiv.org/pdf/1811.04599v3.pdf | |
PWC | https://paperswithcode.com/paper/the-hidden-shape-of-stories-reveals |
Repo | https://github.com/wenoptics/viz-of-gender-stereotypes-from-texts |
Framework | none |
Nonparallel Emotional Speech Conversion
Title | Nonparallel Emotional Speech Conversion |
Authors | Jian Gao, Deep Chakraborty, Hamidou Tembine, Olaitan Olaleye |
Abstract | We propose a nonparallel data-driven emotional speech conversion method. It enables the transfer of emotion-related characteristics of a speech signal while preserving the speaker’s identity and linguistic content. Most existing approaches require parallel data and time alignment, which is not available in most real applications. We achieve nonparallel training based on an unsupervised style transfer technique, which learns a translation model between two distributions instead of a deterministic one-to-one mapping between paired examples. The conversion model consists of an encoder and a decoder for each emotion domain. We assume that the speech signal can be decomposed into an emotion-invariant content code and an emotion-related style code in latent space. Emotion conversion is performed by extracting and recombining the content code of the source speech and the style code of the target emotion. We tested our method on a nonparallel corpora with four emotions. Both subjective and objective evaluations show the effectiveness of our approach. |
Tasks | Style Transfer |
Published | 2018-11-03 |
URL | http://arxiv.org/abs/1811.01174v2 |
http://arxiv.org/pdf/1811.01174v2.pdf | |
PWC | https://paperswithcode.com/paper/nonparallel-emotional-speech-conversion |
Repo | https://github.com/Speech-VINO/SER |
Framework | pytorch |
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Title | Learning a Text-Video Embedding from Incomplete and Heterogeneous Data |
Authors | Antoine Miech, Ivan Laptev, Josef Sivic |
Abstract | Joint understanding of video and language is an active research area with many applications. Prior work in this domain typically relies on learning text-video embeddings. One difficulty with this approach, however, is the lack of large-scale annotated video-caption datasets for training. To address this issue, we aim at learning text-video embeddings from heterogeneous data sources. To this end, we propose a Mixture-of-Embedding-Experts (MEE) model with ability to handle missing input modalities during training. As a result, our framework can learn improved text-video embeddings simultaneously from image and video datasets. We also show the generalization of MEE to other input modalities such as face descriptors. We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets. The proposed MEE model demonstrates significant improvements and outperforms previously reported methods on both text-to-video and video-to-text retrieval tasks. Code is available at: https://github.com/antoine77340/Mixture-of-Embedding-Experts |
Tasks | Video Retrieval |
Published | 2018-04-07 |
URL | https://arxiv.org/abs/1804.02516v2 |
https://arxiv.org/pdf/1804.02516v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-text-video-embedding-from |
Repo | https://github.com/jayleicn/TVRetrieval |
Framework | pytorch |
Generating Text through Adversarial Training using Skip-Thought Vectors
Title | Generating Text through Adversarial Training using Skip-Thought Vectors |
Authors | Afroz Ahamad |
Abstract | In the past few years, various advancements have been made in generative models owing to the formulation of Generative Adversarial Networks (GANs). GANs have been shown to perform exceedingly well on a wide variety of tasks pertaining to image generation and style transfer. In the field of Natural Language Processing, word embeddings such as word2vec and GLoVe are state-of-the-art methods for applying neural network models on textual data. Attempts have been made for utilizing GANs with word embeddings for text generation. This work presents an approach to text generation using Skip-Thought sentence embeddings in conjunction with GANs based on gradient penalty functions and f-measures. The results of using sentence embeddings with GANs for generating text conditioned on input information are comparable to the approaches where word embeddings are used. |
Tasks | Sentence Embeddings, Style Transfer, Text Generation, Word Embeddings |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08703v2 |
http://arxiv.org/pdf/1808.08703v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-text-through-adversarial-training |
Repo | https://github.com/enigmaeth/skip-thought-gan |
Framework | none |
Deep Learning: An Introduction for Applied Mathematicians
Title | Deep Learning: An Introduction for Applied Mathematicians |
Authors | Catherine F. Higham, Desmond J. Higham |
Abstract | Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature. |
Tasks | Image Classification |
Published | 2018-01-17 |
URL | http://arxiv.org/abs/1801.05894v1 |
http://arxiv.org/pdf/1801.05894v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-an-introduction-for-applied |
Repo | https://github.com/jamesrynn/Basic_DNN |
Framework | none |
Introducing the Simulated Flying Shapes and Simulated Planar Manipulator Datasets
Title | Introducing the Simulated Flying Shapes and Simulated Planar Manipulator Datasets |
Authors | Fabio Ferreira, Jonas Rothfuss, Eren Erdal Aksoy, You Zhou, Tamim Asfour |
Abstract | We release two artificial datasets, Simulated Flying Shapes and Simulated Planar Manipulator that allow to test the learning ability of video processing systems. In particular, the dataset is meant as a tool which allows to easily assess the sanity of deep neural network models that aim to encode, reconstruct or predict video frame sequences. The datasets each consist of 90000 videos. The Simulated Flying Shapes dataset comprises scenes showing two objects of equal shape (rectangle, triangle and circle) and size in which one object approaches its counterpart. The Simulated Planar Manipulator shows a 3-DOF planar manipulator that executes a pick-and-place task in which it has to place a size-varying circle on a squared platform. Different from other widely used datasets such as moving MNIST [1], [2], the two presented datasets involve goal-oriented tasks (e.g. the manipulator grasping an object and placing it on a platform), rather than showing random movements. This makes our datasets more suitable for testing prediction capabilities and the learning of sophisticated motions by a machine learning model. This technical document aims at providing an introduction into the usage of both datasets. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00703v1 |
http://arxiv.org/pdf/1807.00703v1.pdf | |
PWC | https://paperswithcode.com/paper/introducing-the-simulated-flying-shapes-and |
Repo | https://github.com/ferreirafabio/FlyingShapesDataset |
Framework | tf |
Conditional Neural Processes
Title | Conditional Neural Processes |
Authors | Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami |
Abstract | Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification and image completion. |
Tasks | Gaussian Processes |
Published | 2018-07-04 |
URL | http://arxiv.org/abs/1807.01613v1 |
http://arxiv.org/pdf/1807.01613v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-neural-processes |
Repo | https://github.com/CocoJam/Conditional_Neural_process |
Framework | none |
High-speed Tracking with Multi-kernel Correlation Filters
Title | High-speed Tracking with Multi-kernel Correlation Filters |
Authors | Ming Tang, Bin Yu, Fan Zhang, Jinqiao Wang |
Abstract | Correlation filter (CF) based trackers are currently ranked top in terms of their performances. Nevertheless, only some of them, such as KCF~\cite{henriques15} and MKCF~\cite{tangm15}, are able to exploit the powerful discriminability of non-linear kernels. Although MKCF achieves more powerful discriminability than KCF through introducing multi-kernel learning (MKL) into KCF, its improvement over KCF is quite limited and its computational burden increases significantly in comparison with KCF. In this paper, we will introduce the MKL into KCF in a different way than MKCF. We reformulate the MKL version of CF objective function with its upper bound, alleviating the negative mutual interference of different kernels significantly. Our novel MKCF tracker, MKCFup, outperforms KCF and MKCF with large margins and can still work at very high fps. Extensive experiments on public datasets show that our method is superior to state-of-the-art algorithms for target objects of small move at very high speed. |
Tasks | |
Published | 2018-06-17 |
URL | http://arxiv.org/abs/1806.06418v1 |
http://arxiv.org/pdf/1806.06418v1.pdf | |
PWC | https://paperswithcode.com/paper/high-speed-tracking-with-multi-kernel |
Repo | https://github.com/lukaswals/cf-trackers |
Framework | pytorch |
CosFace: Large Margin Cosine Loss for Deep Face Recognition
Title | CosFace: Large Margin Cosine Loss for Deep Face Recognition |
Authors | Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, Wei Liu |
Abstract | Face recognition has made extraordinary progress owing to the advancement of deep convolutional neural networks (CNNs). The central task of face recognition, including face verification and identification, involves face feature discrimination. However, the traditional softmax loss of deep CNNs usually lacks the power of discrimination. To address this problem, recently several loss functions such as center loss, large margin softmax loss, and angular softmax loss have been proposed. All these improved losses share the same idea: maximizing inter-class variance and minimizing intra-class variance. In this paper, we propose a novel loss function, namely large margin cosine loss (LMCL), to realize this idea from a different perspective. More specifically, we reformulate the softmax loss as a cosine loss by $L_2$ normalizing both features and weight vectors to remove radial variations, based on which a cosine margin term is introduced to further maximize the decision margin in the angular space. As a result, minimum intra-class variance and maximum inter-class variance are achieved by virtue of normalization and cosine decision margin maximization. We refer to our model trained with LMCL as CosFace. Extensive experimental evaluations are conducted on the most popular public-domain face recognition datasets such as MegaFace Challenge, Youtube Faces (YTF) and Labeled Face in the Wild (LFW). We achieve the state-of-the-art performance on these benchmarks, which confirms the effectiveness of our proposed approach. |
Tasks | Face Identification, Face Recognition, Face Verification |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09414v2 |
http://arxiv.org/pdf/1801.09414v2.pdf | |
PWC | https://paperswithcode.com/paper/cosface-large-margin-cosine-loss-for-deep |
Repo | https://github.com/rupaai/60DaysOfUdacity |
Framework | pytorch |
LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking
Title | LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking |
Authors | Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling |
Abstract | In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking. LaSOT consists of 1,400 sequences with more than 3.5M frames in total. Each frame in these sequences is carefully and manually annotated with a bounding box, making LaSOT the largest, to the best of our knowledge, densely annotated tracking benchmark. The average video length of LaSOT is more than 2,500 frames, and each sequence comprises various challenges deriving from the wild where target objects may disappear and re-appear again in the view. By releasing LaSOT, we expect to provide the community with a large-scale dedicated benchmark with high quality for both the training of deep trackers and the veritable evaluation of tracking algorithms. Moreover, considering the close connections of visual appearance and natural language, we enrich LaSOT by providing additional language specification, aiming at encouraging the exploration of natural linguistic feature for tracking. A thorough experimental evaluation of 35 tracking algorithms on LaSOT is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements. |
Tasks | Object Tracking |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07845v2 |
http://arxiv.org/pdf/1809.07845v2.pdf | |
PWC | https://paperswithcode.com/paper/lasot-a-high-quality-benchmark-for-large |
Repo | https://github.com/HengLan/LaSOT_Evaluation_Toolkit |
Framework | none |
BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN
Title | BiHMP-GAN: Bidirectional 3D Human Motion Prediction GAN |
Authors | Jogendra Nath Kundu, Maharshi Gor, R. Venkatesh Babu |
Abstract | Human motion prediction model has applications in various fields of computer vision. Without taking into account the inherent stochasticity in the prediction of future pose dynamics, such methods often converges to a deterministic undesired mean of multiple probable outcomes. Devoid of this, we propose a novel probabilistic generative approach called Bidirectional Human motion prediction GAN, or BiHMP-GAN. To be able to generate multiple probable human-pose sequences, conditioned on a given starting sequence, we introduce a random extrinsic factor r, drawn from a predefined prior distribution. Furthermore, to enforce a direct content loss on the predicted motion sequence and also to avoid mode-collapse, a novel bidirectional framework is incorporated by modifying the usual discriminator architecture. The discriminator is trained also to regress this extrinsic factor r, which is used alongside with the intrinsic factor (encoded starting pose sequence) to generate a particular pose sequence. To further regularize the training, we introduce a novel recursive prediction strategy. In spite of being in a probabilistic framework, the enhanced discriminator architecture allows predictions of an intermediate part of pose sequence to be used as a conditioning for prediction of the latter part of the sequence. The bidirectional setup also provides a new direction to evaluate the prediction quality against a given test sequence. For a fair assessment of BiHMP-GAN, we report performance of the generated motion sequence using (i) a critic model trained to discriminate between real and fake motion sequence, and (ii) an action classifier trained on real human motion dynamics. Outcomes of both qualitative and quantitative evaluations, on the probabilistic generations of the model, demonstrate the superiority of BiHMP-GAN over previously available methods. |
Tasks | motion prediction |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02591v1 |
http://arxiv.org/pdf/1812.02591v1.pdf | |
PWC | https://paperswithcode.com/paper/bihmp-gan-bidirectional-3d-human-motion |
Repo | https://github.com/maharshi95/Pose2vec |
Framework | tf |
Training behavior of deep neural network in frequency domain
Title | Training behavior of deep neural network in frequency domain |
Authors | Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao |
Abstract | Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle help understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insights into a general principle underlying DNN optimization and generalization. |
Tasks | |
Published | 2018-07-03 |
URL | https://arxiv.org/abs/1807.01251v6 |
https://arxiv.org/pdf/1807.01251v6.pdf | |
PWC | https://paperswithcode.com/paper/training-behavior-of-deep-neural-network-in |
Repo | https://github.com/xuzhiqin1990/F-Principle |
Framework | tf |
A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data
Title | A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data |
Authors | Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla |
Abstract | Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-of-the-art baseline methods. |
Tasks | Anomaly Detection, Time Series, Unsupervised Anomaly Detection |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08055v1 |
http://arxiv.org/pdf/1811.08055v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-neural-network-for-unsupervised |
Repo | https://github.com/albertwujj/MSCRED |
Framework | tf |