April 2, 2020

2919 words 14 mins read

Paper Group ANR 240

Disentangling Overlapping Beliefs by Structured Matrix Factorization. Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction. Variational Inference and Bayesian CNNs for Uncertainty Estimation in Multi-Factorial Bone Age Prediction. GradMix: Multi-source Transfer across Domains and Tasks. From Speech-to-Speech Translation to …

Disentangling Overlapping Beliefs by Structured Matrix Factorization


Title	Disentangling Overlapping Beliefs by Structured Matrix Factorization
Authors	Chaoqi Yang, Jinyang Li, Ruijie Wang, Shuochao Yao, Huajie Shao, Dongxin Liu, Shengzhong Liu, Tianshi Wang, Tarek F. Abdelzaher
Abstract	Much work on social media opinion polarization focuses on identifying separate or orthogonal beliefs from media traces, thereby missing points of agreement among different communities. This paper develops a new class of Non-negative Matrix Factorization (NMF) algorithms that allow identification of both agreement and disagreement points when beliefs of different communities partially overlap. Specifically, we propose a novel Belief Structured Matrix Factorization algorithm (BSMF) to identify partially overlapping beliefs in polarized public social media. BSMF is totally unsupervised and considers three types of information: (i) who posted which opinion, (ii) keyword-level message similarity, and (iii) empirically observed social dependency graphs (e.g., retweet graphs), to improve belief separation. In the space of unsupervised belief separation algorithms, the emphasis was mostly given to the problem of identifying disjoint (e.g., conflicting) beliefs. The case when individuals with different beliefs agree on some subset of points was less explored. We observe that social beliefs overlap even in polarized scenarios. Our proposed unsupervised algorithm captures both the latent belief intersections and dissimilarities. We discuss the properties of the algorithm and conduct extensive experiments on both synthetic data and real-world datasets. The results show that our model outperforms all compared baselines by a great margin.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05797v1
PDF	https://arxiv.org/pdf/2002.05797v1.pdf
PWC	https://paperswithcode.com/paper/disentangling-overlapping-beliefs-by
Repo
Framework

Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction


Title	Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction
Authors	Zhen Wu, Fei Zhao, Xin-Yu Dai, Shujian Huang, Jiajun Chen
Abstract	Target-oriented opinion words extraction (TOWE) is a new subtask of ABSA, which aims to extract the corresponding opinion words for a given opinion target in a sentence. Recently, neural network methods have been applied to this task and achieve promising results. However, the difficulty of annotation causes the datasets of TOWE to be insufficient, which heavily limits the performance of neural models. By contrast, abundant review sentiment classification data are easily available at online review sites. These reviews contain substantial latent opinions information and semantic patterns. In this paper, we propose a novel model to transfer these opinions knowledge from resource-rich review sentiment classification datasets to low-resource task TOWE. To address the challenges in the transfer process, we design an effective transformation method to obtain latent opinions, then integrate them into TOWE. Extensive experimental results show that our model achieves better performance compared to other state-of-the-art methods and significantly outperforms the base model without transferring opinions knowledge. Further analysis validates the effectiveness of our model.
Tasks	Sentiment Analysis
Published	2020-01-07
URL	https://arxiv.org/abs/2001.01989v1
PDF	https://arxiv.org/pdf/2001.01989v1.pdf
PWC	https://paperswithcode.com/paper/latent-opinions-transfer-network-for-target
Repo
Framework

Variational Inference and Bayesian CNNs for Uncertainty Estimation in Multi-Factorial Bone Age Prediction


Title	Variational Inference and Bayesian CNNs for Uncertainty Estimation in Multi-Factorial Bone Age Prediction
Authors	Stefan Eggenreich, Christian Payer, Martin Urschler, Darko Štern
Abstract	Additionally to the extensive use in clinical medicine, biological age (BA) in legal medicine is used to assess unknown chronological age (CA) in applications where identification documents are not available. Automatic methods for age estimation proposed in the literature are predicting point estimates, which can be misleading without the quantification of predictive uncertainty. In our multi-factorial age estimation method from MRI data, we used the Variational Inference approach to estimate the uncertainty of a Bayesian CNN model. Distinguishing model uncertainty from data uncertainty, we interpreted data uncertainty as biological variation, i.e. the range of possible CA of subjects having the same BA.
Tasks	Age Estimation
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10819v1
PDF	https://arxiv.org/pdf/2002.10819v1.pdf
PWC	https://paperswithcode.com/paper/variational-inference-and-bayesian-cnns-for
Repo
Framework

GradMix: Multi-source Transfer across Domains and Tasks


Title	GradMix: Multi-source Transfer across Domains and Tasks
Authors	Junnan Li, Ziwei Xu, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
Abstract	The computer vision community is witnessing an unprecedented rate of new tasks being proposed and addressed, thanks to the deep convolutional networks’ capability to find complex mappings from X to Y. The advent of each task often accompanies the release of a large-scale annotated dataset, for supervised training of deep network. However, it is expensive and time-consuming to manually label sufficient amount of training data. Therefore, it is important to develop algorithms that can leverage off-the-shelf labeled dataset to learn useful knowledge for the target task. While previous works mostly focus on transfer learning from a single source, we study multi-source transfer across domains and tasks (MS-DTT), in a semi-supervised setting. We propose GradMix, a model-agnostic method applicable to any model trained with gradient-based learning rule, to transfer knowledge via gradient descent by weighting and mixing the gradients from all sources during training. GradMix follows a meta-learning objective, which assigns layer-wise weights to the source gradients, such that the combined gradient follows the direction that minimize the loss for a small set of samples from the target dataset. In addition, we propose to adaptively adjust the learning rate for each mini-batch based on its importance to the target task, and a pseudo-labeling method to leverage the unlabeled samples in the target domain. We conduct MS-DTT experiments on two tasks: digit recognition and action recognition, and demonstrate the advantageous performance of the proposed method against multiple baselines.
Tasks	Meta-Learning, Transfer Learning
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03264v1
PDF	https://arxiv.org/pdf/2002.03264v1.pdf
PWC	https://paperswithcode.com/paper/gradmix-multi-source-transfer-across-domains
Repo
Framework

From Speech-to-Speech Translation to Automatic Dubbing


Title	From Speech-to-Speech Translation to Automatic Dubbing
Authors	Marcello Federico, Robert Enyedi, Roberto Barra-Chicote, Ritwik Giri, Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf
Abstract	We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing. Our architecture features neural machine translation generating output of preferred length, prosodic alignment of the translation with the original speech segments, neural text-to-speech with fine tuning of the duration of each utterance, and, finally, audio rendering to enriches text-to-speech output with background noise and reverberation extracted from the original audio. We report on a subjective evaluation of automatic dubbing of excerpts of TED Talks from English into Italian, which measures the perceived naturalness of automatic dubbing and the relative importance of each proposed enhancement.
Tasks	Machine Translation
Published	2020-01-19
URL	https://arxiv.org/abs/2001.06785v3
PDF	https://arxiv.org/pdf/2001.06785v3.pdf
PWC	https://paperswithcode.com/paper/from-speech-to-speech-translation-to
Repo
Framework

Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion


Title	Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion
Authors	José Pedro Iglesias, Carl Olsson, Marcus Valtonen Örnhag
Abstract	Fitting a matrix of a given rank to data in a least squares sense can be done very effectively using 2nd order methods such as Levenberg-Marquardt by explicitly optimizing over a bilinear parameterization of the matrix. In contrast, when applying more general singular value penalties, such as weighted nuclear norm priors, direct optimization over the elements of the matrix is typically used. Due to non-differentiability of the resulting objective function, first order sub-gradient or splitting methods are predominantly used. While these offer rapid iterations it is well known that they become inefficent near the minimum due to zig-zagging and in practice one is therefore often forced to settle for an approximate solution. In this paper we show that more accurate results can in many cases be achieved with 2nd order methods. Our main result shows how to construct bilinear formulations, for a general class of regularizers including weighted nuclear norm penalties, that are provably equivalent to the original problems. With these formulations the regularizing function becomes twice differentiable and 2nd order methods can be applied. We show experimentally, on a number of structure from motion problems, that our approach outperforms state-of-the-art methods.
Tasks
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10281v1
PDF	https://arxiv.org/pdf/2003.10281v1.pdf
PWC	https://paperswithcode.com/paper/accurate-optimization-of-weighted-nuclear
Repo
Framework

Data-Driven Discovery of Coarse-Grained Equations


Title	Data-Driven Discovery of Coarse-Grained Equations
Authors	Joseph Bakarji, Daniel M. Tartakovsky
Abstract	We introduce a general method for learning probability density function (PDF) equations from Monte Carlo simulations of partial differential equations with uncertain (random) parameters and forcings. The method relies on sparse linear regression to discover the relevant terms in the PDF equation. Unlike other methods for equation discovery, our approach accounts for salient properties of PDF equations, such as positivity, smoothness and conservation. Our results reveal a promising direction for data-driven discovery of coarse-grained PDEs in general.
Tasks
Published	2020-01-30
URL	https://arxiv.org/abs/2002.00790v3
PDF	https://arxiv.org/pdf/2002.00790v3.pdf
PWC	https://paperswithcode.com/paper/data-driven-discovery-of-coarse-grained
Repo
Framework

Overly Optimistic Prediction Results on Imbalanced Data: Flaws and Benefits of Applying Over-sampling


Title	Overly Optimistic Prediction Results on Imbalanced Data: Flaws and Benefits of Applying Over-sampling
Authors	Gilles Vandewiele, Isabelle Dehaene, György Kovács, Lucas Sterckx, Olivier Janssens, Femke Ongenae, Femke De Backere, Filip De Turck, Kristien Roelens, Johan Decruyenaere, Sofie Van Hoecke, Thomas Demeester
Abstract	Information extracted from electrohysterography recordings could potentially prove to be an interesting additional source of information to estimate the risk on preterm birth. Recently, a large number of studies have reported near-perfect results to distinguish between recordings of patients that will deliver term or preterm using a public resource, called the Term/Preterm Electrohysterogram database. However, we argue that these results are overly optimistic due to a methodological flaw being made. In this work, we focus on one specific type of methodological flaw: applying over-sampling before partitioning the data into mutually exclusive training and testing sets. We show how this causes the results to be biased using two artificial datasets and reproduce results of studies in which this flaw was identified. Moreover, we evaluate the actual impact of over-sampling on predictive performance, when applied prior to data partitioning, using the same methodologies of related studies, to provide a realistic view of these methodologies’ generalization capabilities. We make our research reproducible by providing all the code under an open license.
Tasks
Published	2020-01-15
URL	https://arxiv.org/abs/2001.06296v1
PDF	https://arxiv.org/pdf/2001.06296v1.pdf
PWC	https://paperswithcode.com/paper/overly-optimistic-prediction-results-on
Repo
Framework

ELM-based Frame Synchronization in Burst-Mode Communication Systems with Nonlinear Distortion


Title	ELM-based Frame Synchronization in Burst-Mode Communication Systems with Nonlinear Distortion
Authors	Chaojin Qing, Wang Yu, Bin Cai, Jiafan Wang, Chuan Huang
Abstract	In burst-mode communication systems, the quality of frame synchronization (FS) at receivers significantly impacts the overall system performance. To guarantee FS, an extreme learning machine (ELM)-based synchronization method is proposed to overcome the nonlinear distortion caused by nonlinear devices or blocks. In the proposed method, a preprocessing is first performed to capture the coarse features of synchronization metric (SM) by using empirical knowledge. Then, an ELM-based FS network is employed to reduce system’s nonlinear distortion and improve SMs. Experimental results indicate that, compared with existing methods, our approach could significantly reduce the error probability of FS while improve the performance in terms of robustness and generalization.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.07599v1
PDF	https://arxiv.org/pdf/2002.07599v1.pdf
PWC	https://paperswithcode.com/paper/elm-based-frame-synchronization-in-burst-mode
Repo
Framework

Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames


Title	Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames
Authors	Osamu Shouno
Abstract	Recent advances in deep learning have significantly improved performance of video prediction. However, state-of-the-art methods still suffer from blurriness and distortions in their future predictions, especially when there are large motions between frames. To address these issues, we propose a deep residual network with the hierarchical architecture where each layer makes a prediction of future state at different spatial resolution, and these predictions of different layers are merged via top-down connections to generate future frames. We trained our model with adversarial and perceptual loss functions, and evaluated it on a natural video dataset captured by car-mounted cameras. Our model quantitatively outperforms state-of-the-art baselines in future frame prediction on video sequences of both largely and slightly changing frames. Furthermore, our model generates future frames with finer details and textures that are perceptually more realistic than the baselines, especially under fast camera motions.
Tasks	Video Prediction
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08635v1
PDF	https://arxiv.org/pdf/2003.08635v1.pdf
PWC	https://paperswithcode.com/paper/photo-realistic-video-prediction-on-natural
Repo
Framework

Learning Attentive Pairwise Interaction for Fine-Grained Classification


Title	Learning Attentive Pairwise Interaction for Fine-Grained Classification
Authors	Peiqin Zhuang, Yali Wang, Yu Qiao
Abstract	Fine-grained classification is a challenging problem, due to subtle differences among highly-confused categories. Most approaches address this difficulty by learning discriminative representation of individual input image. On the other hand, humans can effectively identify contrastive clues by comparing image pairs. Inspired by this fact, this paper proposes a simple but effective Attentive Pairwise Interaction Network (API-Net), which can progressively recognize a pair of fine-grained images by interaction. Specifically, API-Net first learns a mutual feature vector to capture semantic differences in the input pair. It then compares this mutual vector with individual vectors to generate gates for each input image. These distinct gate vectors inherit mutual context on semantic differences, which allow API-Net to attentively capture contrastive clues by pairwise interaction between two images. Additionally, we train API-Net in an end-to-end manner with a score ranking regularization, which can further generalize API-Net by taking feature priorities into account. We conduct extensive experiments on five popular benchmarks in fine-grained classification. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft(93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).
Tasks	Fine-Grained Image Classification
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10191v1
PDF	https://arxiv.org/pdf/2002.10191v1.pdf
PWC	https://paperswithcode.com/paper/learning-attentive-pairwise-interaction-for
Repo
Framework

Multiplication fusion of sparse and collaborative-competitive representation for image classification


Title	Multiplication fusion of sparse and collaborative-competitive representation for image classification
Authors	Zi-Qi Li, Jun Sun, Xiao-Jun Wu, He-Feng Yin
Abstract	Representation based classification methods have become a hot research topic during the past few years, and the two most prominent approaches are sparse representation based classification (SRC) and collaborative representation based classification (CRC). CRC reveals that it is the collaborative representation rather than the sparsity that makes SRC successful. Nevertheless, the dense representation of CRC may not be discriminative which will degrade its performance for classification tasks. To alleviate this problem to some extent, we propose a new method called sparse and collaborative-competitive representation based classification (SCCRC) for image classification. Firstly, the coefficients of the test sample are obtained by SRC and CCRC, respectively. Then the fused coefficient is derived by multiplying the coefficients of SRC and CCRC. Finally, the test sample is designated to the class that has the minimum residual. Experimental results on several benchmark databases demonstrate the efficacy of our proposed SCCRC. The source code of SCCRC is accessible at https://github.com/li-zi-qi/SCCRC.
Tasks	Image Classification, Sparse Representation-based Classification
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07090v1
PDF	https://arxiv.org/pdf/2001.07090v1.pdf
PWC	https://paperswithcode.com/paper/multiplication-fusion-of-sparse-and
Repo
Framework

Applying Recent Innovations from NLP to MOOC Student Course Trajectory Modeling


Title	Applying Recent Innovations from NLP to MOOC Student Course Trajectory Modeling
Authors	Clarence Chen, Zachary Pardos
Abstract	This paper presents several strategies that can improve neural network-based predictive methods for MOOC student course trajectory modeling, applying multiple ideas previously applied to tackle NLP (Natural Language Processing) tasks. In particular, this paper investigates LSTM networks enhanced with two forms of regularization, along with the more recently introduced Transformer architecture.
Tasks
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08333v1
PDF	https://arxiv.org/pdf/2001.08333v1.pdf
PWC	https://paperswithcode.com/paper/applying-recent-innovations-from-nlp-to-mooc
Repo
Framework

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction


Title	Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction
Authors	Beibei Jin, Yu Hu, Qiankun Tang, Jingyu Niu, Zhiping Shi, Yinhe Han, Xiaowei Li
Abstract	Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multi-level temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multi-frequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over state-of-the-art works.
Tasks	Video Prediction
Published	2020-02-23
URL	https://arxiv.org/abs/2002.09905v1
PDF	https://arxiv.org/pdf/2002.09905v1.pdf
PWC	https://paperswithcode.com/paper/exploring-spatial-temporal-multi-frequency
Repo
Framework

VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation


Title	VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation
Authors	Ryan Hoque, Daniel Seita, Ashwin Balakrishna, Aditya Ganapathi, Ajay Kumar Tanwani, Nawid Jamali, Katsu Yamane, Soshi Iba, Ken Goldberg
Abstract	Robotic fabric manipulation has applications in cloth and cable management, senior care, surgery and more. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We address this problem by extending the recently proposed Visual Foresight framework to learn fabric dynamics, which can be efficiently reused to accomplish a variety of different fabric manipulation tasks with a single goal-conditioned policy. We introduce VisuoSpatial Foresight (VSF), which extends prior work by learning visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. We experimentally evaluate VSF on multi-step fabric smoothing and folding tasks both in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Furthermore, we find that leveraging depth significantly improves performance for cloth manipulation tasks, and results suggest that leveraging RGBD data for video prediction and planning yields an 80% improvement in fabric folding success rate over pure RGB data. Supplementary material is available at https://sites.google.com/view/fabric-vsf/.
Tasks	Video Prediction
Published	2020-03-19
URL	https://arxiv.org/abs/2003.09044v1
PDF	https://arxiv.org/pdf/2003.09044v1.pdf
PWC	https://paperswithcode.com/paper/visuospatial-foresight-for-multi-step-multi
Repo
Framework