January 27, 2020

2985 words 15 mins read

Paper Group ANR 1284

Matrix and tensor decompositions for training binary neural networks. Information Robust Dirichlet Networks for Predictive Uncertainty Estimation. Video-Driven Speech Reconstruction using Generative Adversarial Networks. Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment. Recursive V …

Matrix and tensor decompositions for training binary neural networks


Title	Matrix and tensor decompositions for training binary neural networks
Authors	Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic
Abstract	This paper is on improving the training of binary neural networks in which both activations and weights are binary. While prior methods for neural network binarization binarize each filter independently, we propose to instead parametrize the weight tensor of each layer using matrix or tensor decomposition. The binarization process is then performed using this latent parametrization, via a quantization function (e.g. sign function) applied to the reconstructed weights. A key feature of our method is that while the reconstruction is binarized, the computation in the latent factorized space is done in the real domain. This has several advantages: (i) the latent factorization enforces a coupling of the filters before binarization, which significantly improves the accuracy of the trained models. (ii) while at training time, the binary weights of each convolutional layer are parametrized using real-valued matrix or tensor decomposition, during inference we simply use the reconstructed (binary) weights. As a result, our method does not sacrifice any advantage of binary networks in terms of model compression and speeding-up inference. As a further contribution, instead of computing the binary weight scaling factors analytically, as in prior work, we propose to learn them discriminatively via back-propagation. Finally, we show that our approach significantly outperforms existing methods when tested on the challenging tasks of (a) human pose estimation (more than 4% improvements) and (b) ImageNet classification (up to 5% performance gains).
Tasks	Model Compression, Pose Estimation, Quantization
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07852v1
PDF	http://arxiv.org/pdf/1904.07852v1.pdf
PWC	https://paperswithcode.com/paper/matrix-and-tensor-decompositions-for-training
Repo
Framework

Information Robust Dirichlet Networks for Predictive Uncertainty Estimation


Title	Information Robust Dirichlet Networks for Predictive Uncertainty Estimation
Authors	Theodoros Tsiligkaridis
Abstract	Precise estimation of uncertainty in predictions for AI systems is a critical factor in ensuring trust and safety. Deep neural networks trained with a conventional method are prone to over-confident predictions. In contrast to Bayesian neural networks that learn approximate distributions on weights to infer prediction confidence, we propose a novel method, Information Robust Dirichlet networks, that learn an explicit Dirichlet prior distribution on predictive distributions by minimizing the expected $L_p$ norm of the prediction error and penalizing information flow associated with incorrect outcomes. Properties of the new cost function are derived to indicate how improved uncertainty estimation is achieved. Experiments using real datasets show that our technique outperforms by a large margin state-of-the-art neural networks for estimating within-distribution and out-of-distribution uncertainty, and detecting adversarial examples.
Tasks
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04819v2
PDF	https://arxiv.org/pdf/1910.04819v2.pdf
PWC	https://paperswithcode.com/paper/information-robust-dirichlet-networks-for
Repo
Framework

Video-Driven Speech Reconstruction using Generative Adversarial Networks


Title	Video-Driven Speech Reconstruction using Generative Adversarial Networks
Authors	Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic
Abstract	Speech is a means of communication which relies on both audio and visual information. The absence of one modality can often lead to confusion or misinterpretation of information. In this paper we present an end-to-end temporal model capable of directly synthesising audio from silent video, without needing to transform to-and-from intermediate features. Our proposed approach, based on GANs is capable of producing natural sounding, intelligible speech which is synchronised with the video. The performance of our model is evaluated on the GRID dataset for both speaker dependent and speaker independent scenarios. To the best of our knowledge this is the first method that maps video directly to raw audio and the first to produce intelligible speech when tested on previously unseen speakers. We evaluate the synthesised audio not only based on the sound quality but also on the accuracy of the spoken words.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06301v1
PDF	https://arxiv.org/pdf/1906.06301v1.pdf
PWC	https://paperswithcode.com/paper/video-driven-speech-reconstruction-using
Repo
Framework

Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment


Title	Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment
Authors	Frank Meng, Craig A. Morioka, Danne C. Elbers
Abstract	Sequence alignments are used to capture patterns composed of elements representing multiple conceptual levels through the alignment of sequences that contain overlapping and variable length annotations. The alignments also determine the proper context window of words and phrases that most directly impact the meaning of a given target within a sentence, eliminating the need to predefine a fixed context window of words surrounding the targets. We evaluated the system using the CoNLL-2003 named entity recognition (NER) task.
Tasks	Named Entity Recognition
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03594v2
PDF	https://arxiv.org/pdf/1908.03594v2.pdf
PWC	https://paperswithcode.com/paper/generating-information-extraction-patterns
Repo
Framework

Recursive Visual Sound Separation Using Minus-Plus Net


Title	Recursive Visual Sound Separation Using Minus-Plus Net
Authors	Xudong Xu, Bo Dai, Dahua Lin
Abstract	Sounds provide rich semantics, complementary to visual data, for many tasks. However, in practice, sounds from multiple sources are often mixed together. In this paper we propose a novel framework, referred to as MinusPlus Network (MP-Net), for the task of visual sound separation. MP-Net separates sounds recursively in the order of average energy, removing the separated sound from the mixture at the end of each prediction, until the mixture becomes empty or contains only noise. In this way, MP-Net could be applied to sound mixtures with arbitrary numbers and types of sounds. Moreover, while MP-Net keeps removing sounds with large energy from the mixture, sounds with small energy could emerge and become clearer, so that the separation is more accurate. Compared to previous methods, MP-Net obtains state-of-the-art results on two large scale datasets, across mixtures with different types and numbers of sounds.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11602v2
PDF	https://arxiv.org/pdf/1908.11602v2.pdf
PWC	https://paperswithcode.com/paper/recursive-visual-sound-separation-using-minus
Repo
Framework

C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds


Title	C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds
Authors	Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, Vittorio Ferrari
Abstract	Flow-based generative models have highly desirable properties like exact log-likelihood evaluation and exact latent-variable inference, however they are still in their infancy and have not received as much attention as alternative generative models. In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for multi-modal data modeling. C-Flow is based on a parallel sequence of invertible mappings in which a source flow guides the target flow at every step, enabling fine-grained control over the generation process. We also devise a new strategy to model unordered 3D point clouds that, in combination with the conditioning scheme, makes it possible to address 3D reconstruction from a single image and its inverse problem of rendering an image given a point cloud. We demonstrate our conditioning method to be very adaptable, being also applicable to image manipulation, style transfer and multi-modal image-to-image mapping in a diversity of domains, including RGB images, segmentation maps, and edge masks.
Tasks	3D Reconstruction, Style Transfer
Published	2019-12-15
URL	https://arxiv.org/abs/1912.07009v1
PDF	https://arxiv.org/pdf/1912.07009v1.pdf
PWC	https://paperswithcode.com/paper/c-flow-conditional-generative-flow-models-for
Repo
Framework

ASP-based Discovery of Semi-Markovian Causal Models under Weaker Assumptions


Title	ASP-based Discovery of Semi-Markovian Causal Models under Weaker Assumptions
Authors	Zhalama, Jiji Zhang, Frederick Eberhardt, Wolfgang Mayer, Mark Junjie Li
Abstract	In recent years the possibility of relaxing the so-called Faithfulness assumption in automated causal discovery has been investigated. The investigation showed (1) that the Faithfulness assumption can be weakened in various ways that in an important sense preserve its power, and (2) that weakening of Faithfulness may help to speed up methods based on Answer Set Programming. However, this line of work has so far only considered the discovery of causal models without latent variables. In this paper, we study weakenings of Faithfulness for constraint-based discovery of semi-Markovian causal models, which accommodate the possibility of latent variables, and show that both (1) and (2) remain the case in this more realistic setting.
Tasks	Causal Discovery
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02385v1
PDF	https://arxiv.org/pdf/1906.02385v1.pdf
PWC	https://paperswithcode.com/paper/asp-based-discovery-of-semi-markovian-causal
Repo
Framework

Mobility-aware Content Preference Learning in Decentralized Caching Networks


Title	Mobility-aware Content Preference Learning in Decentralized Caching Networks
Authors	Yu Ye, Ming Xiao, Mikael Skoglund
Abstract	Due to the drastic increase of mobile traffic, wireless caching is proposed to serve repeated requests for content download. To determine the caching scheme for decentralized caching networks, the content preference learning problem based on mobility prediction is studied. We first formulate preference prediction as a decentralized regularized multi-task learning (DRMTL) problem without considering the mobility of mobile terminals (MTs). The problem is solved by a hybrid Jacobian and Gauss-Seidel proximal multi-block alternating direction method (ADMM) based algorithm, which is proven to conditionally converge to the optimal solution with a rate $O(1/k)$. Then we use the tool of \textit{Markov renewal process} to predict the moving path and sojourn time for MTs, and integrate the mobility pattern with the DRMTL model by reweighting the training samples and introducing a transfer penalty in the objective. We solve the problem and prove that the developed algorithm has the same convergence property but with different conditions. Through simulation we show the convergence analysis on proposed algorithms. Our real trace driven experiments illustrate that the mobility-aware DRMTL model can provide a more accurate prediction on geography preference than DRMTL model. Besides, the hit ratio achieved by most popular proactive caching (MPC) policy with preference predicted by mobility-aware DRMTL outperforms the MPC with preference from DRMTL and random caching (RC) schemes.
Tasks	Multi-Task Learning
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08576v1
PDF	https://arxiv.org/pdf/1908.08576v1.pdf
PWC	https://paperswithcode.com/paper/mobility-aware-content-preference-learning-in
Repo
Framework

Machine Translation from Natural Language to Code using Long-Short Term Memory


Title	Machine Translation from Natural Language to Code using Long-Short Term Memory
Authors	K. M. Tahsin Hassan Rahit, Rashidul Hasan Nabil, Md Hasibul Huq
Abstract	Making computer programming language more understandable and easy for the human is a longstanding problem. From assembly language to present day’s object-oriented programming, concepts came to make programming easier so that a programmer can focus on the logic and the architecture rather than the code and language itself. To go a step further in this journey of removing human-computer language barrier, this paper proposes machine learning approach using Recurrent Neural Network (RNN) and Long-Short Term Memory (LSTM) to convert human language into programming language code. The programmer will write expressions for codes in layman’s language, and the machine learning model will translate it to the targeted programming language. The proposed approach yields result with 74.40% accuracy. This can be further improved by incorporating additional techniques, which are also discussed in this paper.
Tasks	Machine Translation
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11471v1
PDF	https://arxiv.org/pdf/1910.11471v1.pdf
PWC	https://paperswithcode.com/paper/machine-translation-from-natural-language-to
Repo
Framework

Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models


Title	Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models
Authors	Biwei Huang, Kun Zhang, Mingming Gong, Clark Glymour
Abstract	In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the time-varying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods.
Tasks	Bayesian Inference, Causal Discovery, Time Series
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10857v2
PDF	https://arxiv.org/pdf/1905.10857v2.pdf
PWC	https://paperswithcode.com/paper/causal-discovery-and-forecasting-in
Repo
Framework

Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization


Title	Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex Compositional Optimization
Authors	Huizhuo Yuan, Xiangru Lian, Ji Liu
Abstract	Stochastic compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of \textit{Stochastic Recursive Gradient Descent} to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: $\mathcal{O}((n+m)^{1/2} \varepsilon^{-2})$ in the finite-sum case and $\mathcal{O}(\varepsilon^{-3})$ in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization, and is believed to be optimal. Our experiments validate the theoretical performance of our algorithm.
Tasks	Stochastic Optimization
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13515v2
PDF	https://arxiv.org/pdf/1912.13515v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-recursive-variance-reduction-for
Repo
Framework

DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data


Title	DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data
Authors	Aljaž Božič, Michael Zollhöfer, Christian Theobalt, Matthias Nießner
Abstract	Applying data-driven approaches to non-rigid 3D reconstruction has been difficult, which we believe can be attributed to the lack of a large-scale training corpus. Unfortunately, this method fails for important cases such as highly non-rigid deformations. We first address this problem of lack of data by introducing a novel semi-supervised strategy to obtain dense inter-frame correspondences from a sparse set of annotations. This way, we obtain a large dataset of 400 scenes, over 390,000 RGB-D frames, and 5,533 densely aligned frame pairs; in addition, we provide a test set along with several metrics for evaluation. Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline. Here, we propose a new neural network that operates on RGB-D frames, while maintaining robustness under large non-rigid deformations and producing accurate predictions. Our approach significantly outperforms existing non-rigid reconstruction methods that do not use learned data terms, as well as learning-based approaches that only use self-supervision.
Tasks	3D Reconstruction
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04302v2
PDF	https://arxiv.org/pdf/1912.04302v2.pdf
PWC	https://paperswithcode.com/paper/deepdeform-learning-non-rigid-rgb-d
Repo
Framework

Energy Predictive Models with Limited Data using Transfer Learning


Title	Energy Predictive Models with Limited Data using Transfer Learning
Authors	Ali Hooshmand, Ratnesh Sharma
Abstract	In this paper, we consider the problem of developing predictive models with limited data for energy assets such as electricity loads, PV power generations, etc. We specifically investigate the cases where the amount of historical data is not sufficient to effectively train the prediction model. We first develop an energy predictive model based on convolutional neural network (CNN) which is well suited to capture the interaday, daily, and weekly cyclostationary patterns, trends and seasonalities in energy assets time series. A transfer learning strategy is then proposed to address the challenge of limited training data. We demonstrate our approach on a usecase of daily electricity demand forecasting. we show practicing the transfer learning strategy on the CNN model results in significant improvement to existing forecasting methods.
Tasks	Time Series, Transfer Learning
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02646v1
PDF	https://arxiv.org/pdf/1906.02646v1.pdf
PWC	https://paperswithcode.com/paper/energy-predictive-models-with-limited-data
Repo
Framework

Randomized Shortest Paths with Net Flows and Capacity Constraints


Title	Randomized Shortest Paths with Net Flows and Capacity Constraints
Authors	Sylvain Courtain, Pierre Leleux, Ilkka Kivimaki, Guillaume Guex, Marco Saerens
Abstract	This work extends the randomized shortest paths model (RSP) by investigating the net flow RSP and adding capacity constraints on edge flows. The standard RSP is a model of movement, or spread, through a network interpolating between a random walk and a shortest path behavior. The framework assumes a unit flow injected into a source node and collected from a target node with flows minimizing the expected transportation cost together with a relative entropy regularization term. In this context, the present work first develops the net flow RSP model considering that edge flows in opposite directions neutralize each other (as in electrical networks) and proposes an algorithm for computing the expected routing costs between all pairs of nodes. This quantity is called the net flow RSP dissimilarity measure between nodes. Experimental comparisons on node clustering tasks show that the net flow RSP dissimilarity is competitive with other state-of-the-art dissimilarities. In the second part of the paper, it is shown how to introduce capacity constraints on edge flows and a procedure solving this constrained problem by using Lagrangian duality is developed. These two extensions should improve significantly the scope of applications of the RSP framework.
Tasks
Published	2019-10-04
URL	https://arxiv.org/abs/1910.01849v2
PDF	https://arxiv.org/pdf/1910.01849v2.pdf
PWC	https://paperswithcode.com/paper/randomized-shortest-paths-with-net-flows-and
Repo
Framework

Multitask Learning to Improve Egocentric Action Recognition


Title	Multitask Learning to Improve Egocentric Action Recognition
Authors	Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, Remco Veltkamp
Abstract	In this work we employ multitask learning to capitalize on the structure that exists in related supervised tasks to train complex neural networks. It allows training a network for multiple objectives in parallel, in order to improve performance on at least one of them by capitalizing on a shared representation that is developed to accommodate more information than it otherwise would for a single task. We employ this idea to tackle action recognition in egocentric videos by introducing additional supervised tasks. We consider learning the verbs and nouns from which action labels consist of and predict coordinates that capture the hand locations and the gaze-based visual saliency for all the frames of the input video segments. This forces the network to explicitly focus on cues from secondary tasks that it might otherwise have missed resulting in improved inference. Our experiments on EPIC-Kitchens and EGTEA Gaze+ show consistent improvements when training with multiple tasks over the single-task baseline. Furthermore, in EGTEA Gaze+ we outperform the state-of-the-art in action recognition by 3.84%. Apart from actions, our method produces accurate hand and gaze estimations as side tasks, without requiring any additional input at test time other than the RGB video clips.
Tasks
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06761v1
PDF	https://arxiv.org/pdf/1909.06761v1.pdf
PWC	https://paperswithcode.com/paper/multitask-learning-to-improve-egocentric
Repo
Framework