Paper Group ANR 659
Understand customer reviews with less data and in short time: pretrained language representation and active learning. Crowd Counting Using Scale-Aware Attention Networks. MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets. STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection. Cycle-SUM: Cycle-consisten …
Understand customer reviews with less data and in short time: pretrained language representation and active learning
Title | Understand customer reviews with less data and in short time: pretrained language representation and active learning |
Authors | Yanwei Cui, Xavier Illy |
Abstract | In this paper, we address customer review understanding problems by using supervised machine learning approaches, in order to achieve a fully automatic review aspects categorisation and sentiment analysis. In general, such supervised learning algorithms require domain-specific expert knowledge for generating high quality labeled training data, and the cost of labeling can be very high. To achieve an in-production customer review machine learning enabled analysis tool with only a limited amount of data and within a reasonable training data collection time, we propose to use pre-trained language representation to boost model performance and active learning framework for accelerating the iterative training process. The results show that with integration of both components, the fully automatic review analysis can be achieved at a much faster pace. |
Tasks | Active Learning, Sentiment Analysis |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1911.01198v1 |
https://arxiv.org/pdf/1911.01198v1.pdf | |
PWC | https://paperswithcode.com/paper/understand-customer-reviews-with-less-data |
Repo | |
Framework | |
Crowd Counting Using Scale-Aware Attention Networks
Title | Crowd Counting Using Scale-Aware Attention Networks |
Authors | Mohammad Asiful Hossain, Mehrdad Hosseinzadeh, Omit Chanda, Yang Wang |
Abstract | In this paper, we consider the problem of crowd counting in images. Given an image of a crowded scene, our goal is to estimate the density map of this image, where each pixel value in the density map corresponds to the crowd density at the corresponding location in the image. Given the estimated density map, the final crowd count can be obtained by summing over all values in the density map. One challenge of crowd counting is the scale variation in images. In this work, we propose a novel scale-aware attention network to address this challenge. Using the attention mechanism popular in recent deep learning architectures, our model can automatically focus on certain global and local scales appropriate for the image. By combining these global and local scale attention, our model outperforms other state-of-the-art methods for crowd counting on several benchmark datasets. |
Tasks | Crowd Counting |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.02025v1 |
http://arxiv.org/pdf/1903.02025v1.pdf | |
PWC | https://paperswithcode.com/paper/crowd-counting-using-scale-aware-attention |
Repo | |
Framework | |
MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets
Title | MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets |
Authors | Sungjoo Ha, Martin Kersner, Beomsu Kim, Seokjun Seo, Dongyoung Kim |
Abstract | When there is a mismatch between the target identity and the driver identity, face reenactment suffers severe degradation in the quality of the result, especially in a few-shot setting. The identity preservation problem, where the model loses the detailed information of the target leading to a defective output, is the most common failure mode. The problem has several potential sources such as the identity of the driver leaking due to the identity mismatch, or dealing with unseen large poses. To overcome such problems, we introduce components that address the mentioned problem: image attention block, target feature alignment, and landmark transformer. Through attending and warping the relevant features, the proposed architecture, called MarioNETte, produces high-quality reenactments of unseen identities in a few-shot setting. In addition, the landmark transformer dramatically alleviates the identity preservation problem by isolating the expression geometry through landmark disentanglement. Comprehensive experiments are performed to verify that the proposed framework can generate highly realistic faces, outperforming all other baselines, even under a significant mismatch of facial characteristics between the target and the driver. |
Tasks | Face Reenactment |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08139v1 |
https://arxiv.org/pdf/1911.08139v1.pdf | |
PWC | https://paperswithcode.com/paper/marionette-few-shot-face-reenactment |
Repo | |
Framework | |
STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection
Title | STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection |
Authors | William McNally, Alexander Wong, John McPhee |
Abstract | While depth cameras and inertial sensors have been frequently leveraged for human action recognition, these sensing modalities are impractical in many scenarios where cost or environmental constraints prohibit their use. As such, there has been recent interest on human action recognition using low-cost, readily-available RGB cameras via deep convolutional neural networks. However, many of the deep convolutional neural networks proposed for action recognition thus far have relied heavily on learning global appearance cues directly from imaging data, resulting in highly complex network architectures that are computationally expensive and difficult to train. Motivated to reduce network complexity and achieve higher performance, we introduce the concept of spatio-temporal activation reprojection (STAR). More specifically, we reproject the spatio-temporal activations generated by human pose estimation layers in space and time using a stack of 3D convolutions. Experimental results on UTD-MHAD and J-HMDB demonstrate that an end-to-end architecture based on the proposed STAR framework (which we nickname STAR-Net) is proficient in single-environment and small-scale applications. On UTD-MHAD, STAR-Net outperforms several methods using richer data modalities such as depth and inertial sensors. |
Tasks | Multimodal Activity Recognition, Pose Estimation, Skeleton Based Action Recognition, Temporal Action Localization |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.10024v1 |
http://arxiv.org/pdf/1902.10024v1.pdf | |
PWC | https://paperswithcode.com/paper/star-net-action-recognition-using-spatio |
Repo | |
Framework | |
Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization
Title | Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization |
Authors | Li Yuan, Francis EH Tay, Ping Li, Li Zhou, Jiashi Feng |
Abstract | In this paper, we present a novel unsupervised video summarization model that requires no manual annotation. The proposed model termed Cycle-SUM adopts a new cycle-consistent adversarial LSTM architecture that can effectively maximize the information preserving and compactness of the summary video. It consists of a frame selector and a cycle-consistent learning based evaluator. The selector is a bi-direction LSTM network that learns video representations that embed the long-range relationships among video frames. The evaluator defines a learnable information preserving metric between original video and summary video and “supervises” the selector to identify the most informative frames to form the summary video. In particular, the evaluator is composed of two generative adversarial networks (GANs), in which the forward GAN is learned to reconstruct original video from summary video while the backward GAN learns to invert the processing. The consistency between the output of such cycle learning is adopted as the information preserving metric for video summarization. We demonstrate the close relation between mutual information maximization and such cycle learning procedure. Experiments on two video summarization benchmark datasets validate the state-of-the-art performance and superiority of the Cycle-SUM model over previous baselines. |
Tasks | Unsupervised Video Summarization, Video Summarization |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08265v1 |
http://arxiv.org/pdf/1904.08265v1.pdf | |
PWC | https://paperswithcode.com/paper/cycle-sum-cycle-consistent-adversarial-lstm |
Repo | |
Framework | |
Data-driven Modelling of Dynamical Systems Using Tree Adjoining Grammar and Genetic Programming
Title | Data-driven Modelling of Dynamical Systems Using Tree Adjoining Grammar and Genetic Programming |
Authors | Dhruv Khandelwal, Maarten Schoukens, Roland Tóth |
Abstract | State-of-the-art methods for data-driven modelling of non-linear dynamical systems typically involve interactions with an expert user. In order to partially automate the process of modelling physical systems from data, many EA-based approaches have been proposed for model-structure selection, with special focus on non-linear systems. Recently, an approach for data-driven modelling of non-linear dynamical systems using Genetic Programming (GP) was proposed. The novelty of the method was the modelling of noise and the use of Tree Adjoining Grammar to shape the search-space explored by GP. In this paper, we report results achieved by the proposed method on three case studies. Each of the case studies considered here is based on real physical systems. The case studies pose a variety of challenges. In particular, these challenges range over varying amounts of prior knowledge of the true system, amount of data available, the complexity of the dynamics of the system, and the nature of non-linearities in the system. Based on the results achieved for the case studies, we critically analyse the performance of the proposed method. |
Tasks | |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03152v1 |
http://arxiv.org/pdf/1904.03152v1.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-modelling-of-dynamical-systems |
Repo | |
Framework | |
FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment
Title | FaceSwapNet: Landmark Guided Many-to-Many Face Reenactment |
Authors | Jiangning Zhang, Xianfang Zeng, Yusu Pan, Yong Liu, Yu Ding, Changjie Fan |
Abstract | Recent face reenactment studies have achieved remarkable success either between two identities or in the many-to-one task. However, existing methods have limited scalability when the target person is not a predefined specific identity. To address this limitation, we present a novel many-to-many face reenactment framework, named FaceSwapNet, which allows transferring facial expressions and movements from one source face to arbitrary targets. Our proposed approach is composed of two main modules: the landmark swapper and the landmark-guided generator. Instead of maintaining independent models for each pair of person, the former module uses two encoders and one decoder to adapt anyone’s face landmark to target persons. Using the neutral expression of the target person as a reference image, the latter module leverages geometry information from the swapped landmark to generate photo-realistic and emotion-alike images. In addition, a novel triplet perceptual loss is proposed to force the generator to learn geometry and appearance information simultaneously. We evaluate our model on RaFD dataset and the results demonstrate the superior quality of reenacted images as well as the flexibility of transferring facial movements between identities. |
Tasks | Face Reenactment |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11805v1 |
https://arxiv.org/pdf/1905.11805v1.pdf | |
PWC | https://paperswithcode.com/paper/faceswapnet-landmark-guided-many-to-many-face |
Repo | |
Framework | |
Curious iLQR: Resolving Uncertainty in Model-based RL
Title | Curious iLQR: Resolving Uncertainty in Model-based RL |
Authors | Sarah Bechtle, Yixin Lin, Akshara Rai, Ludovic Righetti, Franziska Meier |
Abstract | Curiosity as a means to explore during reinforcement learning problems has recently become very popular. However, very little progress has been made in utilizing curiosity for learning control. In this work, we propose a model-based reinforcement learning (MBRL) framework that combines Bayesian modeling of the system dynamics with curious iLQR, an iterative LQR approach that considers model uncertainty. During trajectory optimization the curious iLQR attempts to minimize both the task-dependent cost and the uncertainty in the dynamics model. We demonstrate the approach on reaching tasks with 7-DoF manipulators in simulation and on a real robot. Our experiments show that MBRL with curious iLQR reaches desired end-effector targets more reliably and with less system rollouts when learning a new task from scratch, and that the learned model generalizes better to new reaching tasks. |
Tasks | |
Published | 2019-04-15 |
URL | https://arxiv.org/abs/1904.06786v2 |
https://arxiv.org/pdf/1904.06786v2.pdf | |
PWC | https://paperswithcode.com/paper/curious-ilqr-resolving-uncertainty-in-model |
Repo | |
Framework | |
Precomputing Datalog evaluation plans in large-scale scenarios
Title | Precomputing Datalog evaluation plans in large-scale scenarios |
Authors | Alessio Fiorentino, Nicola Leone, Marco Manna, Simona Perri, Jessica Zangari |
Abstract | With the more and more growing demand for semantic Web services over large databases, an efficient evaluation of Datalog queries is arousing a renewed interest among researchers and industry experts. In this scenario, to reduce memory consumption and possibly optimize execution times, the paper proposes novel techniques to determine an optimal indexing schema for the underlying database together with suitable body-orderings for the Datalog rules. The new approach is compared with the standard execution plans implemented in DLV over widely used ontological benchmarks. The results confirm that the memory usage can be significantly reduced without paying any cost in efficiency. This paper is under consideration in Theory and Practice of Logic Programming (TPLP). |
Tasks | |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12495v1 |
https://arxiv.org/pdf/1907.12495v1.pdf | |
PWC | https://paperswithcode.com/paper/precomputing-datalog-evaluation-plans-in |
Repo | |
Framework | |
A General FOFE-net Framework for Simple and Effective Question Answering over Knowledge Bases
Title | A General FOFE-net Framework for Simple and Effective Question Answering over Knowledge Bases |
Authors | Dekun Wu, Nana Nosirova, Hui Jiang, Mingbin Xu |
Abstract | Question answering over knowledge base (KB-QA) has recently become a popular research topic in NLP. One popular way to solve the KB-QA problem is to make use of a pipeline of several NLP modules, including entity discovery and linking (EDL) and relation detection. Recent success on KB-QA task usually involves complex network structures with sophisticated heuristics. Inspired by a previous work that builds a strong KB-QA baseline, we propose a simple but general neural model composed of fixed-size ordinally forgetting encoding (FOFE) and deep neural networks, called FOFE-net to solve KB-QA problem at different stages. For evaluation, we use two popular KB-QA datasets, SimpleQuestions and WebQSP, and a newly created dataset, FreebaseQA. The experimental results show that FOFE-net performs well on KB-QA subtasks, entity discovery and linking (EDL) and relation detection, and in turn pushing overall KB-QA system to achieve strong results on all datasets. |
Tasks | Question Answering |
Published | 2019-03-29 |
URL | http://arxiv.org/abs/1903.12356v1 |
http://arxiv.org/pdf/1903.12356v1.pdf | |
PWC | https://paperswithcode.com/paper/a-general-fofe-net-framework-for-simple-and |
Repo | |
Framework | |
Sparse Least Squares Low Rank Kernel Machines
Title | Sparse Least Squares Low Rank Kernel Machines |
Authors | Di Xu, Manjing Fang, Xia Hong, Junbin Gao |
Abstract | A general framework of least squares support vector machine with low rank kernels, referred to as LR-LSSVM, is introduced in this paper. The special structure of low rank kernels with a controlled model size brings sparsity as well as computational efficiency to the proposed model. Meanwhile, a two-step optimization algorithm with three different criteria is proposed and various experiments are carried out using the example of the so-call robust RBF kernel to validate the model. The experiment results show that the performance of the proposed algorithm is comparable or superior to several existing kernel machines. |
Tasks | |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10098v2 |
https://arxiv.org/pdf/1901.10098v2.pdf | |
PWC | https://paperswithcode.com/paper/sparse-least-squares-low-rank-kernel-machines |
Repo | |
Framework | |
Risk Management via Anomaly Circumvent: Mnemonic Deep Learning for Midterm Stock Prediction
Title | Risk Management via Anomaly Circumvent: Mnemonic Deep Learning for Midterm Stock Prediction |
Authors | Xinyi Li, Yinchuan Li, Xiao-Yang Liu, Christina Dan Wang |
Abstract | Midterm stock price prediction is crucial for value investments in the stock market. However, most deep learning models are essentially short-term and applying them to midterm predictions encounters large cumulative errors because they cannot avoid anomalies. In this paper, we propose a novel deep neural network Mid-LSTM for midterm stock prediction, which incorporates the market trend as hidden states. First, based on the autoregressive moving average model (ARMA), a midterm ARMA is formulated by taking into consideration both hidden states and the capital asset pricing model. Then, a midterm LSTM-based deep neural network is designed, which consists of three components: LSTM, hidden Markov model and linear regression networks. The proposed Mid-LSTM can avoid anomalies to reduce large prediction errors, and has good explanatory effects on the factors affecting stock prices. Extensive experiments on S&P 500 stocks show that (i) the proposed Mid-LSTM achieves 2-4% improvement in prediction accuracy, and (ii) in portfolio allocation investment, we achieve up to 120.16% annual return and 2.99 average Sharpe ratio. |
Tasks | Stock Prediction, Stock Price Prediction |
Published | 2019-08-03 |
URL | https://arxiv.org/abs/1908.01112v1 |
https://arxiv.org/pdf/1908.01112v1.pdf | |
PWC | https://paperswithcode.com/paper/risk-management-via-anomaly-circumvent |
Repo | |
Framework | |
Calibration tests in multi-class classification: A unifying framework
Title | Calibration tests in multi-class classification: A unifying framework |
Authors | David Widmann, Fredrik Lindsten, Dave Zachariah |
Abstract | In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is often not sufficient. We propose and study calibration measures for multi-class classification that generalize existing measures such as the expected calibration error, the maximum calibration error, and the maximum mean calibration error. We propose and evaluate empirically different consistent and unbiased estimators for a specific class of measures based on matrix-valued kernels. Importantly, these estimators can be interpreted as test statistics associated with well-defined bounds and approximations of the p-value under the null hypothesis that the model is calibrated, significantly improving the interpretability of calibration measures, which otherwise lack any meaningful unit or scale. |
Tasks | Calibration |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11385v2 |
https://arxiv.org/pdf/1910.11385v2.pdf | |
PWC | https://paperswithcode.com/paper/calibration-tests-in-multi-class |
Repo | |
Framework | |
Semantic Image Networks for Human Action Recognition
Title | Semantic Image Networks for Human Action Recognition |
Authors | Sunder Ali Khowaja, Seok-Lyong Lee |
Abstract | In this paper, we propose the use of a semantic image, an improved representation for video analysis, principally in combination with Inception networks. The semantic image is obtained by applying localized sparse segmentation using global clustering (LSSGC) prior to the approximate rank pooling which summarizes the motion characteristics in single or multiple images. It incorporates the background information by overlaying a static background from the window onto the subsequent segmented frames. The idea is to improve the action-motion dynamics by focusing on the region which is important for action recognition and encoding the temporal variances using the frame ranking method. We also propose the sequential combination of Inception-ResNetv2 and long-short-term memory network (LSTM) to leverage the temporal variances for improved recognition performance. Extensive analysis has been carried out on UCF101 and HMDB51 datasets which are widely used in action recognition studies. We show that (i) the semantic image generates better activations and converges faster than its original variant, (ii) using segmentation prior to approximate rank pooling yields better recognition performance, (iii) The use of LSTM leverages the temporal variance information from approximate rank pooling to model the action behavior better than the base network, (iv) the proposed representations can be adaptive as they can be used with existing methods such as temporal segment networks to improve the recognition performance, and (v) our proposed four-stream network architecture comprising of semantic images and semantic optical flows achieves state-of-the-art performance, 95.9% and 73.5% recognition accuracy on UCF101 and HMDB51, respectively. |
Tasks | Temporal Action Localization |
Published | 2019-01-21 |
URL | http://arxiv.org/abs/1901.06792v1 |
http://arxiv.org/pdf/1901.06792v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-image-networks-for-human-action |
Repo | |
Framework | |
Adaptive Estimators Show Information Compression in Deep Neural Networks
Title | Adaptive Estimators Show Information Compression in Deep Neural Networks |
Authors | Ivan Chelombiev, Conor Houghton, Cian O’Donnell |
Abstract | To improve how neural networks function it is crucial to understand their learning process. The information bottleneck theory of deep learning proposes that neural networks achieve good generalization by compressing their representations to disregard information that is not relevant to the task. However, empirical evidence for this theory is conflicting, as compression was only observed when networks used saturating activation functions. In contrast, networks with non-saturating activation functions achieved comparable levels of task performance but did not show compression. In this paper we developed more robust mutual information estimation techniques, that adapt to hidden activity of neural networks and produce more sensitive measurements of activations from all functions, especially unbounded functions. Using these adaptive estimation techniques, we explored compression in networks with a range of different activation functions. With two improved methods of estimation, firstly, we show that saturation of the activation function is not required for compression, and the amount of compression varies between different activation functions. We also find that there is a large amount of variation in compression between different network initializations. Secondary, we see that L2 regularization leads to significantly increased compression, while preventing overfitting. Finally, we show that only compression of the last layer is positively correlated with generalization. |
Tasks | L2 Regularization |
Published | 2019-02-24 |
URL | http://arxiv.org/abs/1902.09037v1 |
http://arxiv.org/pdf/1902.09037v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-estimators-show-information |
Repo | |
Framework | |