Paper Group ANR 944
Universality Theorems for Generative Models. Joint Semantic Domain Alignment and Target Classifier Learning for Unsupervised Domain Adaptation. Guided Layer-wise Learning for Deep Models using Side Information. End-to-End Spoken Language Translation. Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling. Dynamical Sy …
Universality Theorems for Generative Models
Title | Universality Theorems for Generative Models |
Authors | Valentin Khrulkov, Ivan Oseledets |
Abstract | Despite the fact that generative models are extremely successful in practice, the theory underlying this phenomenon is only starting to catch up with practice. In this work we address the question of the universality of generative models: is it true that neural networks can approximate any data manifold arbitrarily well? We provide a positive answer to this question and show that under mild assumptions on the activation function one can always find a feedforward neural network that maps the latent space onto a set located within the specified Hausdorff distance from the desired data manifold. We also prove similar theorems for the case of multiclass generative models and cycle generative models, trained to map samples from one manifold to another and vice versa. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11520v1 |
https://arxiv.org/pdf/1905.11520v1.pdf | |
PWC | https://paperswithcode.com/paper/universality-theorems-for-generative-models |
Repo | |
Framework | |
Joint Semantic Domain Alignment and Target Classifier Learning for Unsupervised Domain Adaptation
Title | Joint Semantic Domain Alignment and Target Classifier Learning for Unsupervised Domain Adaptation |
Authors | Dong-Dong Chen, Yisen Wang, Jinfeng Yi, Zaiyi Chen, Zhi-Hua Zhou |
Abstract | Unsupervised domain adaptation aims to transfer the classifier learned from the source domain to the target domain in an unsupervised manner. With the help of target pseudo-labels, aligning class-level distributions and learning the classifier in the target domain are two widely used objectives. Existing methods often separately optimize these two individual objectives, which makes them suffer from the neglect of the other. However, optimizing these two aspects together is not trivial. To alleviate the above issues, we propose a novel method that jointly optimizes semantic domain alignment and target classifier learning in a holistic way. The joint optimization mechanism can not only eliminate their weaknesses but also complement their strengths. The theoretical analysis also verifies the favor of the joint optimization mechanism. Extensive experiments on benchmark datasets show that the proposed method yields the best performance in comparison with the state-of-the-art unsupervised domain adaptation methods. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04053v1 |
https://arxiv.org/pdf/1906.04053v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-semantic-domain-alignment-and-target |
Repo | |
Framework | |
Guided Layer-wise Learning for Deep Models using Side Information
Title | Guided Layer-wise Learning for Deep Models using Side Information |
Authors | Pavel Sulimov, Elena Sukmanova, Roman Chereshnev, Attila Kertesz-Farkas |
Abstract | Training of deep models for classification tasks is hindered by local minima problems and vanishing gradients, while unsupervised layer-wise pretraining does not exploit information from class labels. Here, we propose a new regularization technique, called diversifying regularization (DR), which applies a penalty on hidden units at any layer if they obtain similar features for different types of data. For generative models, DR is defined as divergence over the variational posteriori distributions and included in the maximum likelihood estimation as a prior. Thus, DR includes class label information for greedy pretraining of deep belief networks which result in a better weight initialization for fine-tuning methods. On the other hand, for discriminative training of deep neural networks, DR is defined as a distance over the features and included in the learning objective. With our experimental tests, we show that DR can help the backpropagation to cope with vanishing gradient problems and to provide faster convergence and smaller generalization errors. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.02048v1 |
https://arxiv.org/pdf/1911.02048v1.pdf | |
PWC | https://paperswithcode.com/paper/guided-layer-wise-learning-for-deep-models |
Repo | |
Framework | |
End-to-End Spoken Language Translation
Title | End-to-End Spoken Language Translation |
Authors | Michelle Guo, Albert Haque, Prateek Verma |
Abstract | In this paper, we address the task of spoken language understanding. We present a method for translating spoken sentences from one language into spoken sentences in another language. Given spectrogram-spectrogram pairs, our model can be trained completely from scratch to translate unseen sentences. Our method consists of a pyramidal-bidirectional recurrent network combined with a convolutional network to output sentence-level spectrograms in the target language. Empirically, our model achieves competitive performance with state-of-the-art methods on multiple languages and can generalize to unseen speakers. |
Tasks | Spoken Language Understanding |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10760v1 |
http://arxiv.org/pdf/1904.10760v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-spoken-language-translation |
Repo | |
Framework | |
Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling
Title | Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling |
Authors | Marco Dinarelli, Loïc Grobol |
Abstract | During the last couple of years, Recurrent Neural Networks (RNN) have reached state-of-the-art performances on most of the sequence modelling problems. In particular, the “sequence to sequence” model and the neural CRF have proved to be very effective in this domain. In this article, we propose a new RNN architecture for sequence labelling, leveraging gated recurrent layers to take arbitrarily long contexts into account, and using two decoders operating forward and backward. We compare several variants of the proposed solution and their performances to the state-of-the-art. Most of our results are better than the state-of-the-art or very close to it and thanks to the use of recent technologies, our architecture can scale on corpora larger than those used in this work. |
Tasks | |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04733v3 |
http://arxiv.org/pdf/1904.04733v3.pdf | |
PWC | https://paperswithcode.com/paper/seq2biseq-bidirectional-output-wise-recurrent |
Repo | |
Framework | |
Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families
Title | Dynamical System Inspired Adaptive Time Stepping Controller for Residual Network Families |
Authors | Yibo Yang, Jianlong Wu, Hongyang Li, Xia Li, Tiancheng Shen, Zhouchen Lin |
Abstract | The correspondence between residual networks and dynamical systems motivates researchers to unravel the physics of ResNets with well-developed tools in numeral methods of ODE systems. The Runge-Kutta-Fehlberg method is an adaptive time stepping that renders a good trade-off between the stability and efficiency. Can we also have an adaptive time stepping for ResNets to ensure both stability and performance? In this study, we analyze the effects of time stepping on the Euler method and ResNets. We establish a stability condition for ResNets with step sizes and weight parameters, and point out the effects of step sizes on the stability and performance. Inspired by our analyses, we develop an adaptive time stepping controller that is dependent on the parameters of the current step, and aware of previous steps. The controller is jointly optimized with the network training so that variable step sizes and evolution time can be adaptively adjusted. We conduct experiments on ImageNet and CIFAR to demonstrate the effectiveness. It is shown that our proposed method is able to improve both stability and accuracy without introducing additional overhead in inference phase. |
Tasks | |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1911.10305v1 |
https://arxiv.org/pdf/1911.10305v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamical-system-inspired-adaptive-time |
Repo | |
Framework | |
Ordering-Based Causal Structure Learning in the Presence of Latent Variables
Title | Ordering-Based Causal Structure Learning in the Presence of Latent Variables |
Authors | Daniel Irving Bernstein, Basil Saeed, Chandler Squires, Caroline Uhler |
Abstract | We consider the task of learning a causal graph in the presence of latent confounders given i.i.d.~samples from the model. While current algorithms for causal structure discovery in the presence of latent confounders are constraint-based, we here propose a score-based approach. We prove that under assumptions weaker than faithfulness, any sparsest independence map (IMAP) of the distribution belongs to the Markov equivalence class of the true model. This motivates the \emph{Sparsest Poset} formulation - that posets can be mapped to minimal IMAPs of the true model such that the sparsest of these IMAPs is Markov equivalent to the true model. Motivated by this result, we propose a greedy algorithm over the space of posets for causal structure discovery in the presence of latent confounders and compare its performance to the current state-of-the-art algorithms FCI and FCI+ on synthetic data. |
Tasks | |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.09014v2 |
https://arxiv.org/pdf/1910.09014v2.pdf | |
PWC | https://paperswithcode.com/paper/ordering-based-causal-structure-learning-in |
Repo | |
Framework | |
Improving Transformer-based Speech Recognition Using Unsupervised Pre-training
Title | Improving Transformer-based Speech Recognition Using Unsupervised Pre-training |
Authors | Dongwei Jiang, Xiaoning Lei, Wubo Li, Ne Luo, Yuxuan Hu, Wei Zou, Xiangang Li |
Abstract | Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, an unsupervised pre-training method called Masked Predictive Coding is proposed, which can be applied for unsupervised pre-training with Transformer based model. Experiments on HKUST show that using the same training data, we can achieve CER 23.3%, exceeding the best end-to-end model by over 0.2% absolute CER. With more pre-training data, we can further reduce the CER to 21.0%, or a 11.8% relative CER reduction over baseline. |
Tasks | Speech Recognition |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09932v3 |
https://arxiv.org/pdf/1910.09932v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-transformer-based-speech |
Repo | |
Framework | |
Molecular activity prediction using graph convolutional deep neural network considering distance on a molecular graph
Title | Molecular activity prediction using graph convolutional deep neural network considering distance on a molecular graph |
Authors | Masahito Ohue, Ryota Ii, Keisuke Yanagisawa, Yutaka Akiyama |
Abstract | Machine learning is often used in virtual screening to find compounds that are pharmacologically active on a target protein. The weave module is a type of graph convolutional deep neural network that uses not only features focusing on atoms alone (atom features) but also features focusing on atom pairs (pair features); thus, it can consider information of nonadjacent atoms. However, the correlation between the distance on the graph and the three-dimensional coordinate distance is uncertain. In this paper, we propose three improvements for modifying the weave module. First, the distances between ring atoms on the graph were modified to bring the distances on the graph closer to the coordinate distance. Second, different weight matrices were used depending on the distance on the graph in the convolution layers of the pair features. Finally, a weighted sum, by distance, was used when converting pair features to atom features. The experimental results show that the performance of the proposed method is slightly better than that of the weave module, and the improvement in the distance representation might be useful for compound activity prediction. |
Tasks | Activity Prediction |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01103v2 |
https://arxiv.org/pdf/1907.01103v2.pdf | |
PWC | https://paperswithcode.com/paper/molecular-activity-prediction-using-graph |
Repo | |
Framework | |
Simulating Problem Difficulty in Arithmetic Cognition Through Dynamic Connectionist Models
Title | Simulating Problem Difficulty in Arithmetic Cognition Through Dynamic Connectionist Models |
Authors | Sungjae Cho, Jaeseo Lim, Chris Hickey, Jung Ae Park, Byoung-Tak Zhang |
Abstract | The present study aims to investigate similarities between how humans and connectionist models experience difficulty in arithmetic problems. Problem difficulty was operationalized by the number of carries involved in solving a given problem. Problem difficulty was measured in humans by response time, and in models by computational steps. The present study found that both humans and connectionist models experience difficulty similarly when solving binary addition and subtraction. Specifically, both agents found difficulty to be strictly increasing with respect to the number of carries. Another notable similarity is that problem difficulty increases more steeply in subtraction than in addition, for both humans and connectionist models. Further investigation on two model hyperparameters — confidence threshold and hidden dimension — shows higher confidence thresholds cause the model to take more computational steps to arrive at the correct answer. Likewise, larger hidden dimensions cause the model to take more computational steps to correctly answer arithmetic problems; however, this effect by hidden dimensions is negligible. |
Tasks | |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03617v3 |
https://arxiv.org/pdf/1905.03617v3.pdf | |
PWC | https://paperswithcode.com/paper/190503617 |
Repo | |
Framework | |
Facial Expressions Analysis Under Occlusions Based on Specificities of Facial Motion Propagation
Title | Facial Expressions Analysis Under Occlusions Based on Specificities of Facial Motion Propagation |
Authors | Delphine Poux, Benjamin Allaert, Jose Mennesson, Nacim Ihaddadene, Ioan Marius Bilasco, Chaabane Djeraba |
Abstract | Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement occurring in an occluded region propagates towards neighboring visible regions. In presence of occlusions, per expression, we compute the importance of each unoccluded facial region and we construct adapted facial frameworks that boost the performance of per expression binary classifier. The output of each expression-dependant binary classifier is then aggregated and fed into a fusion process that aims constructing, per occlusion, a unique model that recognizes all the facial expressions considered. The evaluations highlight the robustness of this approach in presence of significant facial occlusions. |
Tasks | |
Published | 2019-04-30 |
URL | http://arxiv.org/abs/1904.13154v1 |
http://arxiv.org/pdf/1904.13154v1.pdf | |
PWC | https://paperswithcode.com/paper/facial-expressions-analysis-under-occlusions |
Repo | |
Framework | |
Image Inpainting by Adaptive Fusion of Variable Spline Interpolations
Title | Image Inpainting by Adaptive Fusion of Variable Spline Interpolations |
Authors | Zahra Nabizadeh, Ghazale Ghorbanzade, Nader Karimi, Shadrokh Samavi |
Abstract | There are many methods for image enhancement. Image inpainting is one of them which could be used in reconstruction and restoration of scratch images or editing images by adding or removing objects. According to its application, different algorithmic and learning methods are proposed. In this paper, the focus is on applications, which enhance the old and historical scratched images. For this purpose, we proposed an adaptive spline interpolation. In this method, a different number of neighbors in four directions are considered for each pixel in the lost block. In the previous methods, predicting the lost pixels that are on edges is the problem. To address this problem, we consider horizontal and vertical edge information. If the pixel is located on an edge, then we use the predicted value in that direction. In other situations, irrelevant predicted values are omitted, and the average of rest values is used as the value of the missing pixel. The method evaluates by PSNR and SSIM metrics on the Kodak dataset. The results show improvement in PSNR and SSIM compared to similar procedures. Also, the run time of the proposed method outperforms others. |
Tasks | Image Enhancement, Image Inpainting |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00825v1 |
https://arxiv.org/pdf/1911.00825v1.pdf | |
PWC | https://paperswithcode.com/paper/image-inpainting-by-adaptive-fusion-of |
Repo | |
Framework | |
On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes
Title | On Sample Complexity of Projection-Free Primal-Dual Methods for Learning Mixture Policies in Markov Decision Processes |
Authors | Masoud Badiei Khuzani, Varun Vasudevan, Hongyi Ren, Lei Xing |
Abstract | We study the problem of learning policy of an infinite-horizon, discounted cost, Markov decision process (MDP) with a large number of states. We compute the actions of a policy that is nearly as good as a policy chosen by a suitable oracle from a given mixture policy class characterized by the convex hull of a set of known base policies. To learn the coefficients of the mixture model, we recast the problem as an approximate linear programming (ALP) formulation for MDPs, where the feature vectors correspond to the occupation measures of the base policies defined on the state-action space. We then propose a projection-free stochastic primal-dual method with the Bregman divergence to solve the characterized ALP. Furthermore, we analyze the probably approximately correct (PAC) sample complexity of the proposed stochastic algorithm, namely the number of queries required to achieve near optimal objective value. We also propose a modification of our proposed algorithm with the polytope constraint sampling for the smoothed ALP, where the restriction to lower bounding approximations are relaxed. In addition, we apply the proposed algorithms to a queuing problem, and compare their performance with a penalty function algorithm. The numerical results illustrates that the primal-dual achieves better efficiency and low variance across different trials compared to the penalty function method. |
Tasks | |
Published | 2019-03-15 |
URL | https://arxiv.org/abs/1903.06727v3 |
https://arxiv.org/pdf/1903.06727v3.pdf | |
PWC | https://paperswithcode.com/paper/on-sample-complexity-of-projection-free |
Repo | |
Framework | |
Variational Multi-Phase Segmentation using High-Dimensional Local Features
Title | Variational Multi-Phase Segmentation using High-Dimensional Local Features |
Authors | Niklas Mevenkamp, Benjamin Berkels |
Abstract | We propose a novel method for multi-phase segmentation of images based on high-dimensional local feature vectors. While the method was developed for the segmentation of extremely noisy crystal images based on localized Fourier transforms, the resulting framework is not tied to specific feature descriptors. For instance, using local spectral histograms as features, it allows for robust texture segmentation. The segmentation itself is based on the multi-phase Mumford-Shah model. Initializing the high-dimensional mean features directly is computationally too demanding and ill-posed in practice. This is resolved by projecting the features onto a low-dimensional space using principle component analysis. The resulting objective functional is minimized using a convexification and the Chambolle-Pock algorithm. Numerical results are presented, illustrating that the algorithm is very competitive in texture segmentation with state-of-the-art performance on the Prague benchmark and provides new possibilities in crystal segmentation, being robust to extreme noise and requiring no prior knowledge of the crystal structure. |
Tasks | |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09863v1 |
http://arxiv.org/pdf/1902.09863v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-multi-phase-segmentation-using |
Repo | |
Framework | |
Deep Reflection Prior
Title | Deep Reflection Prior |
Authors | Qingnan Fan, Yingda Yin, Dongdong Chen, Yujie Wang, Angelica Aviles-Rivero, Ruoteng Li, Carola-Bibiane Schnlieb, Dani Lischinski, Baoquan Chen |
Abstract | Reflections are very common phenomena in our daily photography, which distract people’s attention from the scene behind the glass. The problem of removing reflection artifacts is important but challenging due to its ill-posed nature. Recent learning-based approaches have demonstrated a significant improvement in removing reflections. However, these methods are limited as they require a large number of synthetic reflection/clean image pairs for supervision, at the risk of overfitting in the synthetic image domain. In this paper, we propose a learning-based approach that captures the reflection statistical prior for single image reflection removal. Our algorithm is driven by optimizing the target with joint constraints enhanced between multiple input images during the training stage, but is able to eliminate reflections only from a single input for evaluation. Our framework allows to predict both background and reflection via a one-branch deep neural network, which is implemented by the controllable latent code that indicates either the background or reflection output. We demonstrate superior performance over the state-of-the-art methods on a large range of real-world images. We further provide insightful analysis behind the learned latent code, which may inspire more future work. |
Tasks | |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.03623v1 |
https://arxiv.org/pdf/1912.03623v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reflection-prior |
Repo | |
Framework | |