Paper Group ANR 1284
Matrix and tensor decompositions for training binary neural networks. Information Robust Dirichlet Networks for Predictive Uncertainty Estimation. Video-Driven Speech Reconstruction using Generative Adversarial Networks. Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment. Recursive V …
Matrix and tensor decompositions for training binary neural networks
Title | Matrix and tensor decompositions for training binary neural networks |
Authors | Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic |
Abstract | This paper is on improving the training of binary neural networks in which both activations and weights are binary. While prior methods for neural network binarization binarize each filter independently, we propose to instead parametrize the weight tensor of each layer using matrix or tensor decomposition. The binarization process is then performed using this latent parametrization, via a quantization function (e.g. sign function) applied to the reconstructed weights. A key feature of our method is that while the reconstruction is binarized, the computation in the latent factorized space is done in the real domain. This has several advantages: (i) the latent factorization enforces a coupling of the filters before binarization, which significantly improves the accuracy of the trained models. (ii) while at training time, the binary weights of each convolutional layer are parametrized using real-valued matrix or tensor decomposition, during inference we simply use the reconstructed (binary) weights. As a result, our method does not sacrifice any advantage of binary networks in terms of model compression and speeding-up inference. As a further contribution, instead of computing the binary weight scaling factors analytically, as in prior work, we propose to learn them discriminatively via back-propagation. Finally, we show that our approach significantly outperforms existing methods when tested on the challenging tasks of (a) human pose estimation (more than 4% improvements) and (b) ImageNet classification (up to 5% performance gains). |
Tasks | Model Compression, Pose Estimation, Quantization |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07852v1 |
http://arxiv.org/pdf/1904.07852v1.pdf | |
PWC | https://paperswithcode.com/paper/matrix-and-tensor-decompositions-for-training |
Repo | |
Framework | |
Information Robust Dirichlet Networks for Predictive Uncertainty Estimation
Title | Information Robust Dirichlet Networks for Predictive Uncertainty Estimation |
Authors | Theodoros Tsiligkaridis |
Abstract | Precise estimation of uncertainty in predictions for AI systems is a critical factor in ensuring trust and safety. Deep neural networks trained with a conventional method are prone to over-confident predictions. In contrast to Bayesian neural networks that learn approximate distributions on weights to infer prediction confidence, we propose a novel method, Information Robust Dirichlet networks, that learn an explicit Dirichlet prior distribution on predictive distributions by minimizing the expected $L_p$ norm of the prediction error and penalizing information flow associated with incorrect outcomes. Properties of the new cost function are derived to indicate how improved uncertainty estimation is achieved. Experiments using real datasets show that our technique outperforms by a large margin state-of-the-art neural networks for estimating within-distribution and out-of-distribution uncertainty, and detecting adversarial examples. |
Tasks | |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04819v2 |
https://arxiv.org/pdf/1910.04819v2.pdf | |
PWC | https://paperswithcode.com/paper/information-robust-dirichlet-networks-for |
Repo | |
Framework | |
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Title | Video-Driven Speech Reconstruction using Generative Adversarial Networks |
Authors | Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic |
Abstract | Speech is a means of communication which relies on both audio and visual information. The absence of one modality can often lead to confusion or misinterpretation of information. In this paper we present an end-to-end temporal model capable of directly synthesising audio from silent video, without needing to transform to-and-from intermediate features. Our proposed approach, based on GANs is capable of producing natural sounding, intelligible speech which is synchronised with the video. The performance of our model is evaluated on the GRID dataset for both speaker dependent and speaker independent scenarios. To the best of our knowledge this is the first method that maps video directly to raw audio and the first to produce intelligible speech when tested on previously unseen speakers. We evaluate the synthesised audio not only based on the sound quality but also on the accuracy of the spoken words. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06301v1 |
https://arxiv.org/pdf/1906.06301v1.pdf | |
PWC | https://paperswithcode.com/paper/video-driven-speech-reconstruction-using |
Repo | |
Framework | |
Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment
Title | Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment |
Authors | Frank Meng, Craig A. Morioka, Danne C. Elbers |
Abstract | Sequence alignments are used to capture patterns composed of elements representing multiple conceptual levels through the alignment of sequences that contain overlapping and variable length annotations. The alignments also determine the proper context window of words and phrases that most directly impact the meaning of a given target within a sentence, eliminating the need to predefine a fixed context window of words surrounding the targets. We evaluated the system using the CoNLL-2003 named entity recognition (NER) task. |
Tasks | Named Entity Recognition |
Published | 2019-08-09 |
URL | https://arxiv.org/abs/1908.03594v2 |
https://arxiv.org/pdf/1908.03594v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-information-extraction-patterns |
Repo | |
Framework | |
Recursive Visual Sound Separation Using Minus-Plus Net
Title | Recursive Visual Sound Separation Using Minus-Plus Net |
Authors | Xudong Xu, Bo Dai, Dahua Lin |
Abstract | Sounds provide rich semantics, complementary to visual data, for many tasks. However, in practice, sounds from multiple sources are often mixed together. In this paper we propose a novel framework, referred to as MinusPlus Network (MP-Net), for the task of visual sound separation. MP-Net separates sounds recursively in the order of average energy, removing the separated sound from the mixture at the end of each prediction, until the mixture becomes empty or contains only noise. In this way, MP-Net could be applied to sound mixtures with arbitrary numbers and types of sounds. Moreover, while MP-Net keeps removing sounds with large energy from the mixture, sounds with small energy could emerge and become clearer, so that the separation is more accurate. Compared to previous methods, MP-Net obtains state-of-the-art results on two large scale datasets, across mixtures with different types and numbers of sounds. |
Tasks | |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11602v2 |
https://arxiv.org/pdf/1908.11602v2.pdf | |
PWC | https://paperswithcode.com/paper/recursive-visual-sound-separation-using-minus |
Repo | |
Framework | |
C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds
Title | C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds |
Authors | Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, Vittorio Ferrari |
Abstract | Flow-based generative models have highly desirable properties like exact log-likelihood evaluation and exact latent-variable inference, however they are still in their infancy and have not received as much attention as alternative generative models. In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for multi-modal data modeling. C-Flow is based on a parallel sequence of invertible mappings in which a source flow guides the target flow at every step, enabling fine-grained control over the generation process. We also devise a new strategy to model unordered 3D point clouds that, in combination with the conditioning scheme, makes it possible to address 3D reconstruction from a single image and its inverse problem of rendering an image given a point cloud. We demonstrate our conditioning method to be very adaptable, being also applicable to image manipulation, style transfer and multi-modal image-to-image mapping in a diversity of domains, including RGB images, segmentation maps, and edge masks. |
Tasks | 3D Reconstruction, Style Transfer |
Published | 2019-12-15 |
URL | https://arxiv.org/abs/1912.07009v1 |
https://arxiv.org/pdf/1912.07009v1.pdf | |
PWC | https://paperswithcode.com/paper/c-flow-conditional-generative-flow-models-for |
Repo | |
Framework | |
ASP-based Discovery of Semi-Markovian Causal Models under Weaker Assumptions
Title | ASP-based Discovery of Semi-Markovian Causal Models under Weaker Assumptions |
Authors | Zhalama, Jiji Zhang, Frederick Eberhardt, Wolfgang Mayer, Mark Junjie Li |
Abstract | In recent years the possibility of relaxing the so-called Faithfulness assumption in automated causal discovery has been investigated. The investigation showed (1) that the Faithfulness assumption can be weakened in various ways that in an important sense preserve its power, and (2) that weakening of Faithfulness may help to speed up methods based on Answer Set Programming. However, this line of work has so far only considered the discovery of causal models without latent variables. In this paper, we study weakenings of Faithfulness for constraint-based discovery of semi-Markovian causal models, which accommodate the possibility of latent variables, and show that both (1) and (2) remain the case in this more realistic setting. |
Tasks | Causal Discovery |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02385v1 |
https://arxiv.org/pdf/1906.02385v1.pdf | |
PWC | https://paperswithcode.com/paper/asp-based-discovery-of-semi-markovian-causal |
Repo | |
Framework | |
Mobility-aware Content Preference Learning in Decentralized Caching Networks
Title | Mobility-aware Content Preference Learning in Decentralized Caching Networks |
Authors | Yu Ye, Ming Xiao, Mikael Skoglund |
Abstract | Due to the drastic increase of mobile traffic, wireless caching is proposed to serve repeated requests for content download. To determine the caching scheme for decentralized caching networks, the content preference learning problem based on mobility prediction is studied. We first formulate preference prediction as a decentralized regularized multi-task learning (DRMTL) problem without considering the mobility of mobile terminals (MTs). The problem is solved by a hybrid Jacobian and Gauss-Seidel proximal multi-block alternating direction method (ADMM) based algorithm, which is proven to conditionally converge to the optimal solution with a rate $O(1/k)$. Then we use the tool of \textit{Markov renewal process} to predict the moving path and sojourn time for MTs, and integrate the mobility pattern with the DRMTL model by reweighting the training samples and introducing a transfer penalty in the objective. We solve the problem and prove that the developed algorithm has the same convergence property but with different conditions. Through simulation we show the convergence analysis on proposed algorithms. Our real trace driven experiments illustrate that the mobility-aware DRMTL model can provide a more accurate prediction on geography preference than DRMTL model. Besides, the hit ratio achieved by most popular proactive caching (MPC) policy with preference predicted by mobility-aware DRMTL outperforms the MPC with preference from DRMTL and random caching (RC) schemes. |
Tasks | Multi-Task Learning |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08576v1 |
https://arxiv.org/pdf/1908.08576v1.pdf | |
PWC | https://paperswithcode.com/paper/mobility-aware-content-preference-learning-in |
Repo | |
Framework | |
Machine Translation from Natural Language to Code using Long-Short Term Memory
Title | Machine Translation from Natural Language to Code using Long-Short Term Memory |
Authors | K. M. Tahsin Hassan Rahit, Rashidul Hasan Nabil, Md Hasibul Huq |
Abstract | Making computer programming language more understandable and easy for the human is a longstanding problem. From assembly language to present day’s object-oriented programming, concepts came to make programming easier so that a programmer can focus on the logic and the architecture rather than the code and language itself. To go a step further in this journey of removing human-computer language barrier, this paper proposes machine learning approach using Recurrent Neural Network (RNN) and Long-Short Term Memory (LSTM) to convert human language into programming language code. The programmer will write expressions for codes in layman’s language, and the machine learning model will translate it to the targeted programming language. The proposed approach yields result with 74.40% accuracy. This can be further improved by incorporating additional techniques, which are also discussed in this paper. |
Tasks | Machine Translation |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11471v1 |
https://arxiv.org/pdf/1910.11471v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-translation-from-natural-language-to |
Repo | |
Framework | |
Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models
Title | Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models |
Authors | Biwei Huang, Kun Zhang, Mingming Gong, Clark Glymour |
Abstract | In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the time-varying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods. |
Tasks | Bayesian Inference, Causal Discovery, Time Series |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10857v2 |
https://arxiv.org/pdf/1905.10857v2.pdf | |
PWC | https://paperswithcode.com/paper/causal-discovery-and-forecasting-in |
Repo | |
Framework | |
Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization
Title | Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization |
Authors | Huizhuo Yuan, Xiangru Lian, Ji Liu |
Abstract | Stochastic compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of \textit{Stochastic Recursive Gradient Descent} to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: $\mathcal{O}((n+m)^{1/2} \varepsilon^{-2})$ in the finite-sum case and $\mathcal{O}(\varepsilon^{-3})$ in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. Our experiments validate the theoretical performance of our algorithm. |
Tasks | Stochastic Optimization |
Published | 2019-12-31 |
URL | https://arxiv.org/abs/1912.13515v2 |
https://arxiv.org/pdf/1912.13515v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-recursive-variance-reduction-for |
Repo | |
Framework | |
DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data
Title | DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data |
Authors | Aljaž Božič, Michael Zollhöfer, Christian Theobalt, Matthias Nießner |
Abstract | Applying data-driven approaches to non-rigid 3D reconstruction has been difficult, which we believe can be attributed to the lack of a large-scale training corpus. Unfortunately, this method fails for important cases such as highly non-rigid deformations. We first address this problem of lack of data by introducing a novel semi-supervised strategy to obtain dense inter-frame correspondences from a sparse set of annotations. This way, we obtain a large dataset of 400 scenes, over 390,000 RGB-D frames, and 5,533 densely aligned frame pairs; in addition, we provide a test set along with several metrics for evaluation. Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline. Here, we propose a new neural network that operates on RGB-D frames, while maintaining robustness under large non-rigid deformations and producing accurate predictions. Our approach significantly outperforms existing non-rigid reconstruction methods that do not use learned data terms, as well as learning-based approaches that only use self-supervision. |
Tasks | 3D Reconstruction |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.04302v2 |
https://arxiv.org/pdf/1912.04302v2.pdf | |
PWC | https://paperswithcode.com/paper/deepdeform-learning-non-rigid-rgb-d |
Repo | |
Framework | |
Energy Predictive Models with Limited Data using Transfer Learning
Title | Energy Predictive Models with Limited Data using Transfer Learning |
Authors | Ali Hooshmand, Ratnesh Sharma |
Abstract | In this paper, we consider the problem of developing predictive models with limited data for energy assets such as electricity loads, PV power generations, etc. We specifically investigate the cases where the amount of historical data is not sufficient to effectively train the prediction model. We first develop an energy predictive model based on convolutional neural network (CNN) which is well suited to capture the interaday, daily, and weekly cyclostationary patterns, trends and seasonalities in energy assets time series. A transfer learning strategy is then proposed to address the challenge of limited training data. We demonstrate our approach on a usecase of daily electricity demand forecasting. we show practicing the transfer learning strategy on the CNN model results in significant improvement to existing forecasting methods. |
Tasks | Time Series, Transfer Learning |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02646v1 |
https://arxiv.org/pdf/1906.02646v1.pdf | |
PWC | https://paperswithcode.com/paper/energy-predictive-models-with-limited-data |
Repo | |
Framework | |
Randomized Shortest Paths with Net Flows and Capacity Constraints
Title | Randomized Shortest Paths with Net Flows and Capacity Constraints |
Authors | Sylvain Courtain, Pierre Leleux, Ilkka Kivimaki, Guillaume Guex, Marco Saerens |
Abstract | This work extends the randomized shortest paths model (RSP) by investigating the net flow RSP and adding capacity constraints on edge flows. The standard RSP is a model of movement, or spread, through a network interpolating between a random walk and a shortest path behavior. The framework assumes a unit flow injected into a source node and collected from a target node with flows minimizing the expected transportation cost together with a relative entropy regularization term. In this context, the present work first develops the net flow RSP model considering that edge flows in opposite directions neutralize each other (as in electrical networks) and proposes an algorithm for computing the expected routing costs between all pairs of nodes. This quantity is called the net flow RSP dissimilarity measure between nodes. Experimental comparisons on node clustering tasks show that the net flow RSP dissimilarity is competitive with other state-of-the-art dissimilarities. In the second part of the paper, it is shown how to introduce capacity constraints on edge flows and a procedure solving this constrained problem by using Lagrangian duality is developed. These two extensions should improve significantly the scope of applications of the RSP framework. |
Tasks | |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.01849v2 |
https://arxiv.org/pdf/1910.01849v2.pdf | |
PWC | https://paperswithcode.com/paper/randomized-shortest-paths-with-net-flows-and |
Repo | |
Framework | |
Multitask Learning to Improve Egocentric Action Recognition
Title | Multitask Learning to Improve Egocentric Action Recognition |
Authors | Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, Remco Veltkamp |
Abstract | In this work we employ multitask learning to capitalize on the structure that exists in related supervised tasks to train complex neural networks. It allows training a network for multiple objectives in parallel, in order to improve performance on at least one of them by capitalizing on a shared representation that is developed to accommodate more information than it otherwise would for a single task. We employ this idea to tackle action recognition in egocentric videos by introducing additional supervised tasks. We consider learning the verbs and nouns from which action labels consist of and predict coordinates that capture the hand locations and the gaze-based visual saliency for all the frames of the input video segments. This forces the network to explicitly focus on cues from secondary tasks that it might otherwise have missed resulting in improved inference. Our experiments on EPIC-Kitchens and EGTEA Gaze+ show consistent improvements when training with multiple tasks over the single-task baseline. Furthermore, in EGTEA Gaze+ we outperform the state-of-the-art in action recognition by 3.84%. Apart from actions, our method produces accurate hand and gaze estimations as side tasks, without requiring any additional input at test time other than the RGB video clips. |
Tasks | |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06761v1 |
https://arxiv.org/pdf/1909.06761v1.pdf | |
PWC | https://paperswithcode.com/paper/multitask-learning-to-improve-egocentric |
Repo | |
Framework | |