February 1, 2020

3171 words 15 mins read

Paper Group AWR 206

DurIAN: Duration Informed Attention Network For Multimodal Synthesis. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video. Visual Explanation for Deep Metric Learning. Visualizing How Embeddings Generalize. Unsupervised Discovery of Temporal Structure in Noisy Data with …

DurIAN: Duration Informed Attention Network For Multimodal Synthesis


Title	DurIAN: Duration Informed Attention Network For Multimodal Synthesis
Authors	Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu
Abstract	In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously. The key component of this system is the Duration Informed Attention Network (DurIAN), an autoregressive model in which the alignments between the input text and the output acoustic features are inferred from a duration model. This is different from the end-to-end attention mechanism used, and accounts for various unavoidable artifacts, in existing end-to-end speech synthesis systems such as Tacotron. Furthermore, DurIAN can be used to generate high quality facial expression which can be synchronized with generated speech with/without parallel speech and face data. To improve the efficiency of speech generation, we also propose a multi-band parallel generation strategy on top of the WaveRNN model. The proposed Multi-band WaveRNN effectively reduces the total computational complexity from 9.8 to 5.5 GFLOPS, and is able to generate audio that is 6 times faster than real time on a single CPU core. We show that DurIAN could generate highly natural speech that is on par with current state of the art end-to-end systems, while at the same time avoid word skipping/repeating errors in those systems. Finally, a simple yet effective approach for fine-grained control of expressiveness of speech and facial expression is introduced.
Tasks	Speech Synthesis
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01700v2
PDF	https://arxiv.org/pdf/1909.01700v2.pdf
PWC	https://paperswithcode.com/paper/durian-duration-informed-attention-network
Repo	https://github.com/yanggeng1995/subband_WaveRNN
Framework	pytorch

fairseq: A Fast, Extensible Toolkit for Sequence Modeling


Title	fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Authors	Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli
Abstract	fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at https://www.youtube.com/watch?v=OtgDdWtHvto
Tasks	Language Modelling, Text Generation
Published	2019-04-01
URL	http://arxiv.org/abs/1904.01038v1
PDF	http://arxiv.org/pdf/1904.01038v1.pdf
PWC	https://paperswithcode.com/paper/fairseq-a-fast-extensible-toolkit-for
Repo	https://github.com/facebookresearch/fairseq-py
Framework	pytorch

Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video


Title	Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video
Authors	Oier Mees, Markus Merklinger, Gabriel Kalweit, Wolfram Burgard
Abstract	Key challenges for the deployment of reinforcement learning (RL) agents in the real world are the discovery, representation and reuse of skills in the absence of a reward function. To this end, we propose a novel approach to learn a task-agnostic skill embedding space from unlabeled multi-view videos. Our method learns a general skill embedding independently from the task context by using an adversarial loss. We combine a metric learning loss, which utilizes temporal video coherence to learn a state representation, with an entropy regularized adversarial skill-transfer loss. The metric learning loss learns a disentangled representation by attracting simultaneous viewpoints of the same observations and repelling visually similar frames from temporal neighbors. The adversarial skill-transfer loss enhances re-usability of learned skill embeddings over multiple task domains. We show that the learned embedding enables training of continuous control policies to solve novel tasks that require the interpolation of previously seen skills. Our extensive evaluation with both simulation and real world data demonstrates the effectiveness of our method in learning transferable skills from unlabeled interaction videos and composing them for new tasks. Code, pretrained models and dataset are available at http://robotskills.cs.uni-freiburg.de
Tasks	Continuous Control, Metric Learning
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09430v2
PDF	https://arxiv.org/pdf/1910.09430v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-skill-networks-unsupervised-robot
Repo	https://github.com/mees/Adversarial-Skill-Networks
Framework	pytorch

Visual Explanation for Deep Metric Learning


Title	Visual Explanation for Deep Metric Learning
Authors	Sijie Zhu, Taojiannan Yang, Chen Chen
Abstract	This work explores the visual explanation for deep metric learning and its applications. As an important problem for learning representation, metric learning has attracted much attention recently, while the interpretation of such model is not as well studied as classification. To this end, we propose an intuitive idea to show where contributes the most to the overall similarity of two input images by decomposing the final activation. Instead of only providing the overall activation map of each image, we propose to generate point-to-point activation intensity between two images so that the relationship between different regions is uncovered. We show that the proposed framework can be directly deployed to a large range of metric learning applications and provides valuable information for understanding the model. Furthermore, our experiments show its effectiveness on two potential applications, i.e. cross-view pattern discovery and interactive retrieval. The source code is available at \url{https://github.com/Jeff-Zilence/Explain_Metric_Learning}.
Tasks	Metric Learning
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12977v2
PDF	https://arxiv.org/pdf/1909.12977v2.pdf
PWC	https://paperswithcode.com/paper/visual-explanation-for-deep-metric-learning
Repo	https://github.com/Jeff-Zilence/Explain_Metric_Learning
Framework	pytorch

Visualizing How Embeddings Generalize


Title	Visualizing How Embeddings Generalize
Authors	Xiaotong Liu, Hong Xuan, Zeyu Zhang, Abby Stylianou, Robert Pless
Abstract	Deep metric learning is often used to learn an embedding function that captures the semantic differences within a dataset. A key factor in many problem domains is how this embedding generalizes to new classes of data. In observing many triplet selection strategies for Metric Learning, we find that the best performance consistently arises from approaches that focus on a few, well selected triplets.We introduce visualization tools to illustrate how an embedding generalizes beyond measuring accuracy on validation data, and we illustrate the behavior of a range of triplet selection strategies.
Tasks	Metric Learning
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07464v1
PDF	https://arxiv.org/pdf/1909.07464v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-how-embeddings-generalize
Repo	https://github.com/GWUvision/Embedding_visualization
Framework	pytorch

Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis


Title	Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis
Authors	David G. Clark, Jesse A. Livezey, Kristofer E. Bouchard
Abstract	Linear dimensionality reduction methods are commonly used to extract low-dimensional structure from high-dimensional data. However, popular methods disregard temporal structure, rendering them prone to extracting noise rather than meaningful dynamics when applied to time series data. At the same time, many successful unsupervised learning methods for temporal, sequential and spatial data extract features which are predictive of their surrounding context. Combining these approaches, we introduce Dynamical Components Analysis (DCA), a linear dimensionality reduction method which discovers a subspace of high-dimensional time series data with maximal predictive information, defined as the mutual information between the past and future. We test DCA on synthetic examples and demonstrate its superior ability to extract dynamical structure compared to commonly used linear methods. We also apply DCA to several real-world datasets, showing that the dimensions extracted by DCA are more useful than those extracted by other methods for predicting future states and decoding auxiliary variables. Overall, DCA robustly extracts dynamical structure in noisy, high-dimensional data while retaining the computational efficiency and geometric interpretability of linear dimensionality reduction methods.
Tasks	Dimensionality Reduction, Time Series
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09944v2
PDF	https://arxiv.org/pdf/1905.09944v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-discovery-of-temporal-structure
Repo	https://github.com/BouchardLab/DynamicalComponentsAnalysis
Framework	none

Predicting Sparse Clients’ Actions with CPOPT-Net in the Banking Environment


Title	Predicting Sparse Clients’ Actions with CPOPT-Net in the Banking Environment
Authors	Jeremy Charlier, Radu State, Jean Hilger
Abstract	The digital revolution of the banking system with evolving European regulations have pushed the major banking actors to innovate by a newly use of their clients’ digital information. Given highly sparse client activities, we propose CPOPT-Net, an algorithm that combines the CP canonical tensor decomposition, a multidimensional matrix decomposition that factorizes a tensor as the sum of rank-one tensors, and neural networks. CPOPT-Net removes efficiently sparse information with a gradient-based resolution while relying on neural networks for time series predictions. Our experiments show that CPOPT-Net is capable to perform accurate predictions of the clients’ actions in the context of personalized recommendation. CPOPT-Net is the first algorithm to use non-linear conjugate gradient tensor resolution with neural networks to propose predictions of financial activities on a public data set.
Tasks	Time Series
Published	2019-05-23
URL	https://arxiv.org/abs/1905.12568v1
PDF	https://arxiv.org/pdf/1905.12568v1.pdf
PWC	https://paperswithcode.com/paper/190512568
Repo	https://github.com/dagrate/cpoptnet
Framework	none

Predicting Solar Flares Using a Long Short-Term Memory Network


Title	Predicting Solar Flares Using a Long Short-Term Memory Network
Authors	Hao Liu, Chang Liu, Jason T. L. Wang, Haimin Wang
Abstract	We present a long short-term memory (LSTM) network for predicting whether an active region (AR) would produce a gamma-class flare within the next 24 hours. We consider three gamma classes, namely >=M5.0 class, >=M class, and >=C class, and build three LSTM models separately, each corresponding to a gamma class. Each LSTM model is used to make predictions of its corresponding gamma-class flares. The essence of our approach is to model data samples in an AR as time series and use LSTMs to capture temporal information of the data samples. Each data sample has 40 features including 25 magnetic parameters obtained from the Space-weather HMI Active Region Patches (SHARP) and related data products as well as 15 flare history parameters. We survey the flare events that occurred from 2010 May to 2018 May, using the GOES X-ray flare catalogs provided by the National Centers for Environmental Information (NCEI), and select flares with identified ARs in the NCEI flare catalogs. These flare events are used to build the labels (positive vs. negative) of the data samples. Experimental results show that (i) using only 14-22 most important features including both flare history and magnetic parameters can achieve better performance than using all the 40 features together; (ii) our LSTM network outperforms related machine learning methods in predicting the labels of the data samples. To our knowledge, this is the first time that LSTMs have been used for solar flare prediction.
Tasks	Time Series
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07095v1
PDF	https://arxiv.org/pdf/1905.07095v1.pdf
PWC	https://paperswithcode.com/paper/predicting-solar-flares-using-a-long-short
Repo	https://github.com/JasonTLWang/LSTM-flare-prediction
Framework	none

MigrationMiner: An Automated Detection Tool of Third-Party Java Library Migration at the Method Level


Title	MigrationMiner: An Automated Detection Tool of Third-Party Java Library Migration at the Method Level
Authors	Hussein Alrubaye, Mohamed Wiem Mkaouer, Ali Ouni
Abstract	In this paper we introduce, MigrationMiner, an automated tool that detects code migrations performed between Java third-party library. Given a list of open source projects, the tool detects potential library migration code changes and collects the specific code fragments in which the developer replaces methods from the retired library with methods from the new library. To support the migration process, MigrationMiner collects the library documentation that is associated with every method involved in the migration. We evaluate our tool on a benchmark of manually validated library migrations. Results show that MigrationMiner achieves an accuracy of 100%. A demo video of MigrationMiner is available at https://youtu.be/sAlR1HNetXc.
Tasks
Published	2019-07-05
URL	https://arxiv.org/abs/1907.02997v2
PDF	https://arxiv.org/pdf/1907.02997v2.pdf
PWC	https://paperswithcode.com/paper/migrationminer-an-automated-detection-tool-of
Repo	https://github.com/hussien89aa/MigrationMiner
Framework	none

Scaling Object Detection by Transferring Classification Weights


Title	Scaling Object Detection by Transferring Classification Weights
Authors	Jason Kuen, Federico Perazzi, Zhe Lin, Jianming Zhang, Yap-Peng Tan
Abstract	Large scale object detection datasets are constantly increasing their size in terms of the number of classes and annotations count. Yet, the number of object-level categories annotated in detection datasets is an order of magnitude smaller than image-level classification labels. State-of-the art object detection models are trained in a supervised fashion and this limits the number of object classes they can detect. In this paper, we propose a novel weight transfer network (WTN) to effectively and efficiently transfer knowledge from classification network’s weights to detection network’s weights to allow detection of novel classes without box supervision. We first introduce input and feature normalization schemes to curb the under-fitting during training of a vanilla WTN. We then propose autoencoder-WTN (AE-WTN) which uses reconstruction loss to preserve classification network’s information over all classes in the target latent space to ensure generalization to novel classes. Compared to vanilla WTN, AE-WTN obtains absolute performance gains of 6% on two Open Images evaluation sets with 500 seen and 57 novel classes respectively, and 25% on a Visual Genome evaluation set with 200 novel classes. The code is available at https://github.com/xternalz/AE-WTN.
Tasks	Object Detection
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06804v1
PDF	https://arxiv.org/pdf/1909.06804v1.pdf
PWC	https://paperswithcode.com/paper/scaling-object-detection-by-transferring
Repo	https://github.com/xternalz/AE-WTN
Framework	pytorch

Translating Math Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequence-level Training


Title	Translating Math Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequence-level Training
Authors	Zelun Wang, Jyh-Charn Liu
Abstract	In this paper we propose a deep neural network model with an encoder-decoder architecture that translates images of math formulas into their LaTeX markup sequences. The encoder is a convolutional neural network (CNN) that transforms images into a group of feature maps. To better capture the spatial relationships of math symbols, the feature maps are augmented with 2D positional encoding before being unfolded into a vector. The decoder is a stacked bidirectional long short-term memory (LSTM) model integrated with the soft attention mechanism, which works as a language model to translate the encoder output into a sequence of LaTeX tokens. The neural network is trained in two steps. The first step is token-level training using the Maximum-Likelihood Estimation (MLE) as the objective function. At completion of the token-level training, the sequence-level training objective function is employed to optimize the overall model based on the policy gradient algorithm from reinforcement learning. Our design also overcomes the exposure bias problem by closing the feedback loop in the decoder during sequence-level training, i.e., feeding in the predicted token instead of the ground truth token at every time step. The model is trained and evaluated on the IM2LATEX-100K dataset and shows state-of-the-art performance on both sequence-based and image-based evaluation metrics.
Tasks	Language Modelling
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11415v2
PDF	https://arxiv.org/pdf/1908.11415v2.pdf
PWC	https://paperswithcode.com/paper/translating-mathematical-formula-images-to
Repo	https://github.com/wzlxjtu/PositionalEncoding2D
Framework	pytorch

Performance of regression models as a function of experiment noise


Title	Performance of regression models as a function of experiment noise
Authors	Gang Li, Jan Zrimec, Boyang Ji, Jun Geng, Johan Larsbrink, Aleksej Zelezniak, Jens Nielsen, Martin KM Engqvist
Abstract	A challenge in developing machine learning regression models is that it is difficult to know whether maximal performance has been reached on a particular dataset, or whether further model improvement is possible. In biology this problem is particularly pronounced as sample labels (response variables) are typically obtained through experiments and therefore have experiment noise associated with them. Such label noise puts a fundamental limit to the performance attainable by regression models. We address this challenge by deriving a theoretical upper bound for the coefficient of determination (R2) for regression models. This theoretical upper bound depends only on the noise associated with the response variable in a dataset as well as its variance. The upper bound estimate was validated via Monte Carlo simulations and then used as a tool to bootstrap performance of regression models trained on biological datasets, including protein sequence data, transcriptomic data, and genomic data. Although we study biological datasets in this work, the new upper bound estimates will hold true for regression models from any research field or application area where response variables have associated noise.
Tasks
Published	2019-12-17
URL	https://arxiv.org/abs/1912.08141v3
PDF	https://arxiv.org/pdf/1912.08141v3.pdf
PWC	https://paperswithcode.com/paper/performance-of-regression-models-as-a
Repo	https://github.com/EngqvistLab/Supplemetenary_scripts_datasets_R2LG
Framework	tf

Class-incremental Learning via Deep Model Consolidation


Title	Class-incremental Learning via Deep Model Consolidation
Authors	Junting Zhang, Jie Zhang, Shalini Ghosh, Dawei Li, Serafettin Tasci, Larry Heck, Heming Zhang, C. -C. Jay Kuo
Abstract	Deep neural networks (DNNs) often suffer from “catastrophic forgetting” during incremental learning (IL) — an abrupt degradation of performance on the original set of classes when the training objective is adapted to a newly added set of classes. Existing IL approaches tend to produce a model that is biased towards either the old classes or new classes, unless with the help of exemplars of the old data. To address this issue, we propose a class-incremental learning paradigm called Deep Model Consolidation (DMC), which works well even when the original training data is not available. The idea is to first train a separate model only for the new classes, and then combine the two individual models trained on data of two distinct set of classes (old classes and new classes) via a novel double distillation training objective. The two existing models are consolidated by exploiting publicly available unlabeled auxiliary data. This overcomes the potential difficulties due to the unavailability of original training data. Compared to the state-of-the-art techniques, DMC demonstrates significantly better performance in image classification (CIFAR-100 and CUB-200) and object detection (PASCAL VOC 2007) in the single-headed IL setting.
Tasks	Image Classification, Object Detection
Published	2019-03-19
URL	https://arxiv.org/abs/1903.07864v4
PDF	https://arxiv.org/pdf/1903.07864v4.pdf
PWC	https://paperswithcode.com/paper/class-incremental-learning-via-deep-model
Repo	https://github.com/juntingzh/incremental-learning-baselines
Framework	tf

SelectiveNet: A Deep Neural Network with an Integrated Reject Option


Title	SelectiveNet: A Deep Neural Network with an Integrated Reject Option
Authors	Yonatan Geifman, Ran El-Yaniv
Abstract	We consider the problem of selective prediction (also known as reject option) in deep neural networks, and introduce SelectiveNet, a deep neural architecture with an integrated reject option. Existing rejection mechanisms are based mostly on a threshold over the prediction confidence of a pre-trained network. In contrast, SelectiveNet is trained to optimize both classification (or regression) and rejection simultaneously, end-to-end. The result is a deep neural network that is optimized over the covered domain. In our experiments, we show a consistently improved risk-coverage trade-off over several well-known classification and regression datasets, thus reaching new state-of-the-art results for deep selective classification.
Tasks
Published	2019-01-26
URL	https://arxiv.org/abs/1901.09192v4
PDF	https://arxiv.org/pdf/1901.09192v4.pdf
PWC	https://paperswithcode.com/paper/selectivenet-a-deep-neural-network-with-an
Repo	https://github.com/geifmany/SelectiveNet
Framework	none

Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting


Title	Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting
Authors	Rajat Sen, Hsiang-Fu Yu, Inderjit Dhillon
Abstract	Forecasting high-dimensional time series plays a crucial role in many applications such as demand forecasting and financial predictions. Modern datasets can have millions of correlated time-series that evolve together, i.e they are extremely high dimensional (one dimension for each individual time-series). There is a need for exploiting global patterns and coupling them with local calibration for better prediction. However, most recent deep learning approaches in the literature are one-dimensional, i.e, even though they are trained on the whole dataset, during prediction, the future forecast for a single dimension mainly depends on past values from the same dimension. In this paper, we seek to correct this deficiency and propose DeepGLO, a deep forecasting model which thinks globally and acts locally. In particular, DeepGLO is a hybrid model that combines a global matrix factorization model regularized by a temporal convolution network, along with another temporal network that can capture local properties of each time-series and associated covariates. Our model can be trained effectively on high-dimensional but diverse time series, where different time series can have vastly different scales, without a priori normalization or rescaling. Empirical results demonstrate that DeepGLO can outperform state-of-the-art approaches; for example, we see more than 25% improvement in WAPE over other methods on a public dataset that contains more than 100K-dimensional time series.
Tasks	Calibration, Time Series, Time Series Forecasting
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03806v2
PDF	https://arxiv.org/pdf/1905.03806v2.pdf
PWC	https://paperswithcode.com/paper/think-globally-act-locally-a-deep-neural
Repo	https://github.com/rajatsen91/deepglo
Framework	pytorch