January 29, 2020

3066 words 15 mins read

Paper Group ANR 659

Understand customer reviews with less data and in short time: pretrained language representation and active learning. Crowd Counting Using Scale-Aware Attention Networks. MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets. STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection. Cycle-SUM: Cycle-consisten …

Understand customer reviews with less data and in short time: pretrained language representation and active learning


Title	Understand customer reviews with less data and in short time: pretrained language representation and active learning
Authors	Yanwei Cui, Xavier Illy
Abstract	In this paper, we address customer review understanding problems by using supervised machine learning approaches, in order to achieve a fully automatic review aspects categorisation and sentiment analysis. In general, such supervised learning algorithms require domain-specific expert knowledge for generating high quality labeled training data, and the cost of labeling can be very high. To achieve an in-production customer review machine learning enabled analysis tool with only a limited amount of data and within a reasonable training data collection time, we propose to use pre-trained language representation to boost model performance and active learning framework for accelerating the iterative training process. The results show that with integration of both components, the fully automatic review analysis can be achieved at a much faster pace.
Tasks	Active Learning, Sentiment Analysis
Published	2019-10-29
URL	https://arxiv.org/abs/1911.01198v1
PDF	https://arxiv.org/pdf/1911.01198v1.pdf
PWC	https://paperswithcode.com/paper/understand-customer-reviews-with-less-data
Repo
Framework

Crowd Counting Using Scale-Aware Attention Networks


Title	Crowd Counting Using Scale-Aware Attention Networks
Authors	Mohammad Asiful Hossain, Mehrdad Hosseinzadeh, Omit Chanda, Yang Wang
Abstract	In this paper, we consider the problem of crowd counting in images. Given an image of a crowded scene, our goal is to estimate the density map of this image, where each pixel value in the density map corresponds to the crowd density at the corresponding location in the image. Given the estimated density map, the final crowd count can be obtained by summing over all values in the density map. One challenge of crowd counting is the scale variation in images. In this work, we propose a novel scale-aware attention network to address this challenge. Using the attention mechanism popular in recent deep learning architectures, our model can automatically focus on certain global and local scales appropriate for the image. By combining these global and local scale attention, our model outperforms other state-of-the-art methods for crowd counting on several benchmark datasets.
Tasks	Crowd Counting
Published	2019-03-05
URL	http://arxiv.org/abs/1903.02025v1
PDF	http://arxiv.org/pdf/1903.02025v1.pdf
PWC	https://paperswithcode.com/paper/crowd-counting-using-scale-aware-attention
Repo
Framework

MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets


Title	MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets
Authors	Sungjoo Ha, Martin Kersner, Beomsu Kim, Seokjun Seo, Dongyoung Kim
Abstract	When there is a mismatch between the target identity and the driver identity, face reenactment suffers severe degradation in the quality of the result, especially in a few-shot setting. The identity preservation problem, where the model loses the detailed information of the target leading to a defective output, is the most common failure mode. The problem has several potential sources such as the identity of the driver leaking due to the identity mismatch, or dealing with unseen large poses. To overcome such problems, we introduce components that address the mentioned problem: image attention block, target feature alignment, and landmark transformer. Through attending and warping the relevant features, the proposed architecture, called MarioNETte, produces high-quality reenactments of unseen identities in a few-shot setting. In addition, the landmark transformer dramatically alleviates the identity preservation problem by isolating the expression geometry through landmark disentanglement. Comprehensive experiments are performed to verify that the proposed framework can generate highly realistic faces, outperforming all other baselines, even under a significant mismatch of facial characteristics between the target and the driver.
Tasks	Face Reenactment
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08139v1
PDF	https://arxiv.org/pdf/1911.08139v1.pdf
PWC	https://paperswithcode.com/paper/marionette-few-shot-face-reenactment
Repo
Framework

STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection


Title	STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection
Authors	William McNally, Alexander Wong, John McPhee
Abstract	While depth cameras and inertial sensors have been frequently leveraged for human action recognition, these sensing modalities are impractical in many scenarios where cost or environmental constraints prohibit their use. As such, there has been recent interest on human action recognition using low-cost, readily-available RGB cameras via deep convolutional neural networks. However, many of the deep convolutional neural networks proposed for action recognition thus far have relied heavily on learning global appearance cues directly from imaging data, resulting in highly complex network architectures that are computationally expensive and difficult to train. Motivated to reduce network complexity and achieve higher performance, we introduce the concept of spatio-temporal activation reprojection (STAR). More specifically, we reproject the spatio-temporal activations generated by human pose estimation layers in space and time using a stack of 3D convolutions. Experimental results on UTD-MHAD and J-HMDB demonstrate that an end-to-end architecture based on the proposed STAR framework (which we nickname STAR-Net) is proficient in single-environment and small-scale applications. On UTD-MHAD, STAR-Net outperforms several methods using richer data modalities such as depth and inertial sensors.
Tasks	Multimodal Activity Recognition, Pose Estimation, Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-02-26
URL	http://arxiv.org/abs/1902.10024v1
PDF	http://arxiv.org/pdf/1902.10024v1.pdf
PWC	https://paperswithcode.com/paper/star-net-action-recognition-using-spatio
Repo
Framework

Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization


Title	Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization
Authors	Li Yuan, Francis EH Tay, Ping Li, Li Zhou, Jiashi Feng
Abstract	In this paper, we present a novel unsupervised video summarization model that requires no manual annotation. The proposed model termed Cycle-SUM adopts a new cycle-consistent adversarial LSTM architecture that can effectively maximize the information preserving and compactness of the summary video. It consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-direction LSTM network that learns video representations that embed the long-range relationships among video frames. The evaluator defines a learnable information preserving metric between original video and summary video and “supervises” the selector to identify the most informative frames to form the summary video. In particular, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN is learned to reconstruct original video from summary video while the backward GAN learns to invert the processing. The consistency between the output of such cycle learning is adopted as the information preserving metric for video summarization. We demonstrate the close relation between mutual information maximization and such cycle learning procedure. Experiments on two video summarization benchmark datasets validate the state-of-the-art performance and superiority of the Cycle-SUM model over previous baselines.
Tasks	Unsupervised Video Summarization, Video Summarization
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08265v1
PDF	http://arxiv.org/pdf/1904.08265v1.pdf
PWC	https://paperswithcode.com/paper/cycle-sum-cycle-consistent-adversarial-lstm
Repo
Framework

Data-driven Modelling of Dynamical Systems Using Tree Adjoining Grammar and Genetic Programming


Title	Data-driven Modelling of Dynamical Systems Using Tree Adjoining Grammar and Genetic Programming
Authors	Dhruv Khandelwal, Maarten Schoukens, Roland Tóth
Abstract	State-of-the-art methods for data-driven modelling of non-linear dynamical systems typically involve interactions with an expert user. In order to partially automate the process of modelling physical systems from data, many EA-based approaches have been proposed for model-structure selection, with special focus on non-linear systems. Recently, an approach for data-driven modelling of non-linear dynamical systems using Genetic Programming (GP) was proposed. The novelty of the method was the modelling of noise and the use of Tree Adjoining Grammar to shape the search-space explored by GP. In this paper, we report results achieved by the proposed method on three case studies. Each of the case studies considered here is based on real physical systems. The case studies pose a variety of challenges. In particular, these challenges range over varying amounts of prior knowledge of the true system, amount of data available, the complexity of the dynamics of the system, and the nature of non-linearities in the system. Based on the results achieved for the case studies, we critically analyse the performance of the proposed method.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03152v1
PDF	http://arxiv.org/pdf/1904.03152v1.pdf
PWC	https://paperswithcode.com/paper/data-driven-modelling-of-dynamical-systems
Repo
Framework

FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment


Title	FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment
Authors	Jiangning Zhang, Xianfang Zeng, Yusu Pan, Yong Liu, Yu Ding, Changjie Fan
Abstract	Recent face reenactment studies have achieved remarkable success either between two identities or in the many-to-one task. However, existing methods have limited scalability when the target person is not a predefined specific identity. To address this limitation, we present a novel many-to-many face reenactment framework, named FaceSwapNet, which allows transferring facial expressions and movements from one source face to arbitrary targets. Our proposed approach is composed of two main modules: the landmark swapper and the landmark-guided generator. Instead of maintaining independent models for each pair of person, the former module uses two encoders and one decoder to adapt anyone’s face landmark to target persons. Using the neutral expression of the target person as a reference image, the latter module leverages geometry information from the swapped landmark to generate photo-realistic and emotion-alike images. In addition, a novel triplet perceptual loss is proposed to force the generator to learn geometry and appearance information simultaneously. We evaluate our model on RaFD dataset and the results demonstrate the superior quality of reenacted images as well as the flexibility of transferring facial movements between identities.
Tasks	Face Reenactment
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11805v1
PDF	https://arxiv.org/pdf/1905.11805v1.pdf
PWC	https://paperswithcode.com/paper/faceswapnet-landmark-guided-many-to-many-face
Repo
Framework

Curious iLQR: Resolving Uncertainty in Model-based RL


Title	Curious iLQR: Resolving Uncertainty in Model-based RL
Authors	Sarah Bechtle, Yixin Lin, Akshara Rai, Ludovic Righetti, Franziska Meier
Abstract	Curiosity as a means to explore during reinforcement learning problems has recently become very popular. However, very little progress has been made in utilizing curiosity for learning control. In this work, we propose a model-based reinforcement learning (MBRL) framework that combines Bayesian modeling of the system dynamics with curious iLQR, an iterative LQR approach that considers model uncertainty. During trajectory optimization the curious iLQR attempts to minimize both the task-dependent cost and the uncertainty in the dynamics model. We demonstrate the approach on reaching tasks with 7-DoF manipulators in simulation and on a real robot. Our experiments show that MBRL with curious iLQR reaches desired end-effector targets more reliably and with less system rollouts when learning a new task from scratch, and that the learned model generalizes better to new reaching tasks.
Tasks
Published	2019-04-15
URL	https://arxiv.org/abs/1904.06786v2
PDF	https://arxiv.org/pdf/1904.06786v2.pdf
PWC	https://paperswithcode.com/paper/curious-ilqr-resolving-uncertainty-in-model
Repo
Framework

Precomputing Datalog evaluation plans in large-scale scenarios


Title	Precomputing Datalog evaluation plans in large-scale scenarios
Authors	Alessio Fiorentino, Nicola Leone, Marco Manna, Simona Perri, Jessica Zangari
Abstract	With the more and more growing demand for semantic Web services over large databases, an efficient evaluation of Datalog queries is arousing a renewed interest among researchers and industry experts. In this scenario, to reduce memory consumption and possibly optimize execution times, the paper proposes novel techniques to determine an optimal indexing schema for the underlying database together with suitable body-orderings for the Datalog rules. The new approach is compared with the standard execution plans implemented in DLV over widely used ontological benchmarks. The results confirm that the memory usage can be significantly reduced without paying any cost in efficiency. This paper is under consideration in Theory and Practice of Logic Programming (TPLP).
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12495v1
PDF	https://arxiv.org/pdf/1907.12495v1.pdf
PWC	https://paperswithcode.com/paper/precomputing-datalog-evaluation-plans-in
Repo
Framework

A General FOFE-net Framework for Simple and Effective Question Answering over Knowledge Bases


Title	A General FOFE-net Framework for Simple and Effective Question Answering over Knowledge Bases
Authors	Dekun Wu, Nana Nosirova, Hui Jiang, Mingbin Xu
Abstract	Question answering over knowledge base (KB-QA) has recently become a popular research topic in NLP. One popular way to solve the KB-QA problem is to make use of a pipeline of several NLP modules, including entity discovery and linking (EDL) and relation detection. Recent success on KB-QA task usually involves complex network structures with sophisticated heuristics. Inspired by a previous work that builds a strong KB-QA baseline, we propose a simple but general neural model composed of fixed-size ordinally forgetting encoding (FOFE) and deep neural networks, called FOFE-net to solve KB-QA problem at different stages. For evaluation, we use two popular KB-QA datasets, SimpleQuestions and WebQSP, and a newly created dataset, FreebaseQA. The experimental results show that FOFE-net performs well on KB-QA subtasks, entity discovery and linking (EDL) and relation detection, and in turn pushing overall KB-QA system to achieve strong results on all datasets.
Tasks	Question Answering
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12356v1
PDF	http://arxiv.org/pdf/1903.12356v1.pdf
PWC	https://paperswithcode.com/paper/a-general-fofe-net-framework-for-simple-and
Repo
Framework

Sparse Least Squares Low Rank Kernel Machines


Title	Sparse Least Squares Low Rank Kernel Machines
Authors	Di Xu, Manjing Fang, Xia Hong, Junbin Gao
Abstract	A general framework of least squares support vector machine with low rank kernels, referred to as LR-LSSVM, is introduced in this paper. The special structure of low rank kernels with a controlled model size brings sparsity as well as computational efficiency to the proposed model. Meanwhile, a two-step optimization algorithm with three different criteria is proposed and various experiments are carried out using the example of the so-call robust RBF kernel to validate the model. The experiment results show that the performance of the proposed algorithm is comparable or superior to several existing kernel machines.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10098v2
PDF	https://arxiv.org/pdf/1901.10098v2.pdf
PWC	https://paperswithcode.com/paper/sparse-least-squares-low-rank-kernel-machines
Repo
Framework

Risk Management via Anomaly Circumvent: Mnemonic Deep Learning for Midterm Stock Prediction


Title	Risk Management via Anomaly Circumvent: Mnemonic Deep Learning for Midterm Stock Prediction
Authors	Xinyi Li, Yinchuan Li, Xiao-Yang Liu, Christina Dan Wang
Abstract	Midterm stock price prediction is crucial for value investments in the stock market. However, most deep learning models are essentially short-term and applying them to midterm predictions encounters large cumulative errors because they cannot avoid anomalies. In this paper, we propose a novel deep neural network Mid-LSTM for midterm stock prediction, which incorporates the market trend as hidden states. First, based on the autoregressive moving average model (ARMA), a midterm ARMA is formulated by taking into consideration both hidden states and the capital asset pricing model. Then, a midterm LSTM-based deep neural network is designed, which consists of three components: LSTM, hidden Markov model and linear regression networks. The proposed Mid-LSTM can avoid anomalies to reduce large prediction errors, and has good explanatory effects on the factors affecting stock prices. Extensive experiments on S&P 500 stocks show that (i) the proposed Mid-LSTM achieves 2-4% improvement in prediction accuracy, and (ii) in portfolio allocation investment, we achieve up to 120.16% annual return and 2.99 average Sharpe ratio.
Tasks	Stock Prediction, Stock Price Prediction
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01112v1
PDF	https://arxiv.org/pdf/1908.01112v1.pdf
PWC	https://paperswithcode.com/paper/risk-management-via-anomaly-circumvent
Repo
Framework

Calibration tests in multi-class classification: A unifying framework


Title	Calibration tests in multi-class classification: A unifying framework
Authors	David Widmann, Fredrik Lindsten, Dave Zachariah
Abstract	In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. We propose and evaluate empirically different consistent and unbiased estimators for a specific class of measures based on matrix-valued kernels. Importantly, these estimators can be interpreted as test statistics associated with well-defined bounds and approximations of the p-value under the null hypothesis that the model is calibrated, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale.
Tasks	Calibration
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11385v2
PDF	https://arxiv.org/pdf/1910.11385v2.pdf
PWC	https://paperswithcode.com/paper/calibration-tests-in-multi-class
Repo
Framework

Semantic Image Networks for Human Action Recognition


Title	Semantic Image Networks for Human Action Recognition
Authors	Sunder Ali Khowaja, Seok-Lyong Lee
Abstract	In this paper, we propose the use of a semantic image, an improved representation for video analysis, principally in combination with Inception networks. The semantic image is obtained by applying localized sparse segmentation using global clustering (LSSGC) prior to the approximate rank pooling which summarizes the motion characteristics in single or multiple images. It incorporates the background information by overlaying a static background from the window onto the subsequent segmented frames. The idea is to improve the action-motion dynamics by focusing on the region which is important for action recognition and encoding the temporal variances using the frame ranking method. We also propose the sequential combination of Inception-ResNetv2 and long-short-term memory network (LSTM) to leverage the temporal variances for improved recognition performance. Extensive analysis has been carried out on UCF101 and HMDB51 datasets which are widely used in action recognition studies. We show that (i) the semantic image generates better activations and converges faster than its original variant, (ii) using segmentation prior to approximate rank pooling yields better recognition performance, (iii) The use of LSTM leverages the temporal variance information from approximate rank pooling to model the action behavior better than the base network, (iv) the proposed representations can be adaptive as they can be used with existing methods such as temporal segment networks to improve the recognition performance, and (v) our proposed four-stream network architecture comprising of semantic images and semantic optical flows achieves state-of-the-art performance, 95.9% and 73.5% recognition accuracy on UCF101 and HMDB51, respectively.
Tasks	Temporal Action Localization
Published	2019-01-21
URL	http://arxiv.org/abs/1901.06792v1
PDF	http://arxiv.org/pdf/1901.06792v1.pdf
PWC	https://paperswithcode.com/paper/semantic-image-networks-for-human-action
Repo
Framework

Adaptive Estimators Show Information Compression in Deep Neural Networks


Title	Adaptive Estimators Show Information Compression in Deep Neural Networks
Authors	Ivan Chelombiev, Conor Houghton, Cian O’Donnell
Abstract	To improve how neural networks function it is crucial to understand their learning process. The information bottleneck theory of deep learning proposes that neural networks achieve good generalization by compressing their representations to disregard information that is not relevant to the task. However, empirical evidence for this theory is conflicting, as compression was only observed when networks used saturating activation functions. In contrast, networks with non-saturating activation functions achieved comparable levels of task performance but did not show compression. In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions. Using these adaptive estimation techniques, we explored compression in networks with a range of different activation functions. With two improved methods of estimation, firstly, we show that saturation of the activation function is not required for compression, and the amount of compression varies between different activation functions. We also find that there is a large amount of variation in compression between different network initializations. Secondary, we see that L2 regularization leads to significantly increased compression, while preventing overfitting. Finally, we show that only compression of the last layer is positively correlated with generalization.
Tasks	L2 Regularization
Published	2019-02-24
URL	http://arxiv.org/abs/1902.09037v1
PDF	http://arxiv.org/pdf/1902.09037v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-estimators-show-information
Repo
Framework