April 2, 2020

3536 words 17 mins read

Paper Group ANR 329

Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction. Ensembles of Deep Neural Networks for Action Recognition in Still Images. Learning similarity measures from data. Posterior Ratio Estimation for Latent Variables. Learning Spatiotemporal Features via Video and Text Pair Discrimination. Future Video …

Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction


Title	Hyperparameters optimization for Deep Learning based emotion prediction for Human Robot Interaction
Authors	Shruti Jaiswal, Gora Chand Nandi
Abstract	To enable humanoid robots to share our social space we need to develop technology for easy interaction with the robots using multiple modes such as speech, gestures and share our emotions with them. We have targeted this research towards addressing the core issue of emotion recognition problem which would require less computation resources and much lesser number of network hyperparameters which will be more adaptive to be computed on low resourced social robots for real time communication. More specifically, here we have proposed an Inception module based Convolutional Neural Network Architecture which has achieved improved accuracy of upto 6% improvement over the existing network architecture for emotion classification when combinedly tested over multiple datasets when tried over humanoid robots in real - time. Our proposed model is reducing the trainable Hyperparameters to an extent of 94% as compared to vanilla CNN model which clearly indicates that it can be used in real time based application such as human robot interaction. Rigorous experiments have been performed to validate our methodology which is sufficiently robust and could achieve high level of accuracy. Finally, the model is implemented in a humanoid robot, NAO in real time and robustness of the model is evaluated.
Tasks	Emotion Classification, Emotion Recognition
Published	2020-01-12
URL	https://arxiv.org/abs/2001.03855v1
PDF	https://arxiv.org/pdf/2001.03855v1.pdf
PWC	https://paperswithcode.com/paper/hyperparameters-optimization-for-deep
Repo
Framework

Ensembles of Deep Neural Networks for Action Recognition in Still Images


Title	Ensembles of Deep Neural Networks for Action Recognition in Still Images
Authors	Sina Mohammadi, Sina Ghofrani Majelan, Shahriar B. Shokouhi
Abstract	Despite the fact that notable improvements have been made recently in the field of feature extraction and classification, human action recognition is still challenging, especially in images, in which, unlike videos, there is no motion. Thus, the methods proposed for recognizing human actions in videos cannot be applied to still images. A big challenge in action recognition in still images is the lack of large enough datasets, which is problematic for training deep Convolutional Neural Networks (CNNs) due to the overfitting issue. In this paper, by taking advantage of pre-trained CNNs, we employ the transfer learning technique to tackle the lack of massive labeled action recognition datasets. Furthermore, since the last layer of the CNN has class-specific information, we apply an attention mechanism on the output feature maps of the CNN to extract more discriminative and powerful features for classification of human actions. Moreover, we use eight different pre-trained CNNs in our framework and investigate their performance on Stanford 40 dataset. Finally, we propose using the Ensemble Learning technique to enhance the overall accuracy of action classification by combining the predictions of multiple models. The best setting of our method is able to achieve 93.17$%$ accuracy on the Stanford 40 dataset.
Tasks	Action Classification, Action Recognition In Still Images, Temporal Action Localization, Transfer Learning
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09893v1
PDF	https://arxiv.org/pdf/2003.09893v1.pdf
PWC	https://paperswithcode.com/paper/ensembles-of-deep-neural-networks-for-action
Repo
Framework

Learning similarity measures from data


Title	Learning similarity measures from data
Authors	Bjørn Magnus Mathisen, Agnar Aamodt, Kerstin Bach, Helge Langseth
Abstract	Defining similarity measures is a requirement for some machine learning methods. One such method is case-based reasoning (CBR) where the similarity measure is used to retrieve the stored case or set of cases most similar to the query case. Describing a similarity measure analytically is challenging, even for domain experts working with CBR experts. However, data sets are typically gathered as part of constructing a CBR or machine learning system. These datasets are assumed to contain the features that correctly identify the solution from the problem features, thus they may also contain the knowledge to construct or learn such a similarity measure. The main motivation for this work is to automate the construction of similarity measures using machine learning, while keeping training time as low as possible. Our objective is to investigate how to apply machine learning to effectively learn a similarity measure. Such a learned similarity measure could be used for CBR systems, but also for clustering data in semi-supervised learning, or one-shot learning tasks. Recent work has advanced towards this goal, relies on either very long training times or manually modeling parts of the similarity measure. We created a framework to help us analyze current methods for learning similarity measures. This analysis resulted in two novel similarity measure designs. One design using a pre-trained classifier as basis for a similarity measure. The second design uses as little modeling as possible while learning the similarity measure from data and keeping training time low. Both similarity measures were evaluated on 14 different datasets. The evaluation shows that using a classifier as basis for a similarity measure gives state of the art performance. Finally the evaluation shows that our fully data-driven similarity measure design outperforms state of the art methods while keeping training time low.
Tasks	One-Shot Learning
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05312v1
PDF	https://arxiv.org/pdf/2001.05312v1.pdf
PWC	https://paperswithcode.com/paper/learning-similarity-measures-from-data
Repo
Framework

Posterior Ratio Estimation for Latent Variables


Title	Posterior Ratio Estimation for Latent Variables
Authors	Yulong Zhang, Mingxuan Yi, Song Liu, Mladen Kolar
Abstract	Density Ratio Estimation has attracted attention from machine learning community due to its ability of comparing the underlying distributions of two datasets. However, in some applications, we want to compare distributions of \emph{latent} random variables that can be only inferred from observations. In this paper, we study the problem of estimating the ratio between two posterior probability density functions of a latent variable. Particularly, we assume the posterior ratio function can be well-approximated by a parametric model, which is then estimated using observed datasets and synthetic prior samples. We prove consistency of our estimator and the asymptotic normality of the estimated parameters as the number of prior samples tending to infinity. Finally, we validate our theories using numerical experiments and demonstrate the usefulness of the proposed method through some real-world applications.
Tasks
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06410v1
PDF	https://arxiv.org/pdf/2002.06410v1.pdf
PWC	https://paperswithcode.com/paper/posterior-ratio-estimation-for-latent
Repo
Framework

Learning Spatiotemporal Features via Video and Text Pair Discrimination


Title	Learning Spatiotemporal Features via Video and Text Pair Discrimination
Authors	Tianhao Li, Limin Wang
Abstract	Current video representations heavily rely on learning from manually annotated video datasets. However, it is expensive and time-consuming to acquire a large-scale well-labeled video dataset. We observe that videos are naturally accompanied with abundant text information such as YouTube titles and movie scripts. In this paper, we leverage this visual-textual connection to learn effective spatiotemporal features in an efficient weakly-supervised manner. We present a general cross-modal pair discrimination (CPD) framework to capture this correlation between a clip and its associated text, and adopt noise-contrastive estimation technique to tackle the computational issues imposed by the huge numbers of pair instance classes. Specifically, we investigate the CPD framework from two sources of video-text pairs, and design a practical curriculum learning strategy to train the CPD. Without further fine tuning, the learned models obtain competitive results for action classification on the Kinetics dataset under the common linear classification protocol. Moreover, our visual model provides a very effective initialization to fine-tune on the downstream task datasets. Experimental results demonstrate that our weakly-supervised pre-training yields a remarkable performance gain for action recognition on the datasets of UCF101 and HMDB51, compared with the state-of-the-art self-supervised training methods. In addition, our CPD model yields a new state of the art for zero-shot action recognition on UCF101 by directly utilizing the learnt visual-textual embedding.
Tasks	Action Classification
Published	2020-01-16
URL	https://arxiv.org/abs/2001.05691v1
PDF	https://arxiv.org/pdf/2001.05691v1.pdf
PWC	https://paperswithcode.com/paper/learning-spatiotemporal-features-via-video
Repo
Framework

Future Video Synthesis with Object Motion Prediction


Title	Future Video Synthesis with Object Motion Prediction
Authors	Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen
Abstract	We present an approach to predict future video frames given a sequence of continuous video frames in the past. Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics by decoupling the background scene and moving objects. The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects. The anticipated appearances are combined to create a reasonable video in the future. With this procedure, our method exhibits much less tearing or distortion artifact compared to other approaches. Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.
Tasks	motion prediction, Predict Future Video Frames
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00542v1
PDF	https://arxiv.org/pdf/2004.00542v1.pdf
PWC	https://paperswithcode.com/paper/future-video-synthesis-with-object-motion
Repo
Framework

Robust Policies For Proactive ICU Transfers


Title	Robust Policies For Proactive ICU Transfers
Authors	Julien Grand-Clement, Carri W. Chan, Vineet Goyal, Gabriel Escobar
Abstract	Patients whose transfer to the Intensive Care Unit (ICU) is unplanned are prone to higher mortality rates than those who were admitted directly to the ICU. Recent advances in machine learning to predict patient deterioration have introduced the possibility of \emph{proactive transfer} from the ward to the ICU. In this work, we study the problem of finding \emph{robust} patient transfer policies which account for uncertainty in statistical estimates due to data limitations when optimizing to improve overall patient care. We propose a Markov Decision Process model to capture the evolution of patient health, where the states represent a measure of patient severity. Under fairly general assumptions, we show that an optimal transfer policy has a threshold structure, i.e., that it transfers all patients above a certain severity level to the ICU (subject to available capacity). As model parameters are typically determined based on statistical estimations from real-world data, they are inherently subject to misspecification and estimation errors. We account for this parameter uncertainty by deriving a robust policy that optimizes the worst-case reward across all plausible values of the model parameters. We show that the robust policy also has a threshold structure under fairly general assumptions. Moreover, it is more aggressive in transferring patients than the optimal nominal policy, which does not take into account parameter uncertainty. We present computational experiments using a dataset of hospitalizations at 21 KNPC hospitals, and present empirical evidence of the sensitivity of various hospital metrics (mortality, length-of-stay, average ICU occupancy) to small changes in the parameters. Our work provides useful insights into the impact of parameter uncertainty on deriving simple policies for proactive ICU transfer that have strong empirical performance and theoretical guarantees.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06247v1
PDF	https://arxiv.org/pdf/2002.06247v1.pdf
PWC	https://paperswithcode.com/paper/robust-policies-for-proactive-icu-transfers
Repo
Framework

Invariance vs. Robustness of Neural Networks


Title	Invariance vs. Robustness of Neural Networks
Authors	Sandesh Kamath, Amit Deshpande, K V Subrahmanyam
Abstract	We study the performance of neural network models on random geometric transformations and adversarial perturbations. Invariance means that the model’s prediction remains unchanged when a geometric transformation is applied to an input. Adversarial robustness means that the model’s prediction remains unchanged after small adversarial perturbations of an input. In this paper, we show a quantitative trade-off between rotation invariance and robustness. We empirically study the following two cases: (a) change in adversarial robustness as we improve only the invariance of equivariant models via training augmentation, (b) change in invariance as we improve only the adversarial robustness using adversarial training. We observe that the rotation invariance of equivariant models (StdCNNs and GCNNs) improves by training augmentation with progressively larger random rotations but while doing so, their adversarial robustness drops progressively, and very significantly on MNIST. We take adversarially trained LeNet and ResNet models which have good $L_\infty$ adversarial robustness on MNIST and CIFAR-10, respectively, and observe that adversarial training with progressively larger perturbations results in a progressive drop in their rotation invariance profiles. Similar to the trade-off between accuracy and robustness known in previous work, we give a theoretical justification for the invariance vs. robustness trade-off observed in our experiments.
Tasks
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11318v1
PDF	https://arxiv.org/pdf/2002.11318v1.pdf
PWC	https://paperswithcode.com/paper/invariance-vs-robustness-of-neural-networks-1
Repo
Framework

Large-scale Gender/Age Prediction of Tumblr Users


Title	Large-scale Gender/Age Prediction of Tumblr Users
Authors	Yao Zhan, Changwei Hu, Yifan Hu, Tejaswi Kasturi, Shanmugam Ramasamy, Matt Gillingham, Keith Yamamoto
Abstract	Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gender and ages during their registration. Hence, to promote ad targeting, it is essential to predict user’s demography using rich content such as posts, images and social connections. In this paper, we propose graph based and deep learning models for age and gender predictions, which take into account user activities and content features. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features as well as directly infer user’s demography. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users’ age and gender. Experimental results on real Tumblr daily dataset, with hundreds of millions of active users and billions of following relations, demonstrate that our approaches significantly outperform the baseline model, by improving the accuracy relatively by 81% for age, and the AUC and accuracy by 5% for gender.
Tasks	Network Embedding
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00594v1
PDF	https://arxiv.org/pdf/2001.00594v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-genderage-prediction-of-tumblr
Repo
Framework

Joint Wasserstein Distribution Matching


Title	Joint Wasserstein Distribution Matching
Authors	JieZhang Cao, Langyuan Mo, Qing Du, Yong Guo, Peilin Zhao, Junzhou Huang, Mingkui Tan
Abstract	Joint distribution matching (JDM) problem, which aims to learn bidirectional mappings to match joint distributions of two domains, occurs in many machine learning and computer vision applications. This problem, however, is very difficult due to two critical challenges: (i) it is often difficult to exploit sufficient information from the joint distribution to conduct the matching; (ii) this problem is hard to formulate and optimize. In this paper, relying on optimal transport theory, we propose to address JDM problem by minimizing the Wasserstein distance of the joint distributions in two domains. However, the resultant optimization problem is still intractable. We then propose an important theorem to reduce the intractable problem into a simple optimization problem, and develop a novel method (called Joint Wasserstein Distribution Matching (JWDM)) to solve it. In the experiments, we apply our method to unsupervised image translation and cross-domain video synthesis. Both qualitative and quantitative comparisons demonstrate the superior performance of our method over several state-of-the-arts.
Tasks
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00389v1
PDF	https://arxiv.org/pdf/2003.00389v1.pdf
PWC	https://paperswithcode.com/paper/joint-wasserstein-distribution-matching
Repo
Framework

ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning


Title	ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning
Authors	Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei
Abstract	Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. In addition, we propose two improved versions of the proposed network. The first one transforms the penalty parameter into a set of iteration-dependent ones, and the second one adopts a specially designed penalty function, which is based on a piecewise linear function with adjustable slopes. Numerical results show that the resulting DL-aided decoders outperform the original ADMM-penalized decoder for various low density parity check (LDPC) codes with similar computational complexity.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.07601v1
PDF	https://arxiv.org/pdf/2002.07601v1.pdf
PWC	https://paperswithcode.com/paper/admm-based-decoder-for-binary-linear-codes
Repo
Framework

Constraining Temporal Relationship for Action Localization


Title	Constraining Temporal Relationship for Action Localization
Authors	Peisen Zhao, Lingxi Xie, Chen Ju, Ya Zhang, Qi Tian
Abstract	Recently, temporal action localization (TAL), i.e., finding specific action segments in untrimmed videos, has attracted increasing attentions of the computer vision community. State-of-the-art solutions for TAL involves predicting three values at each time point, corresponding to the probabilities that the action starts, continues and ends, and post-processing these curves for the final localization. This paper delves deep into this mechanism, and argues that existing approaches mostly ignored the potential relationship of these curves, and results in low quality of action proposals. To alleviate this problem, we add extra constraints to these curves, e.g., the probability of ‘‘action continues’’ should be relatively high between probability peaks of ‘‘action starts’’ and ‘‘action ends’', so that the entire framework is aware of these latent constraints during an end-to-end optimization process. Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3. Our approach clearly outperforms the baseline both quantitatively (in terms of the AR@AN and mAP) and qualitatively (the curves in the testing stage become much smoother). In particular, when we build our constraints beyond TSA-Net and PGCN, we achieve the state-of-the-art performance especially at strict high IoU settings. The code will be available.
Tasks	Action Localization, Temporal Action Localization
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07358v1
PDF	https://arxiv.org/pdf/2002.07358v1.pdf
PWC	https://paperswithcode.com/paper/constraining-temporal-relationship-for-action
Repo
Framework

Algorithms for Hiring and Outsourcing in the Online Labor Market


Title	Algorithms for Hiring and Outsourcing in the Online Labor Market
Authors	Aris Anagnostopoulos, Carlos Castillo, Adriano Fazzone, Stefano Leonardi, Evimaria Terzi
Abstract	Although freelancing work has grown substantially in recent years, in part facilitated by a number of online labor marketplaces, (e.g., Guru, Freelancer, Amazon Mechanical Turk), traditional forms of “in-sourcing” work continue being the dominant form of employment. This means that, at least for the time being, freelancing and salaried employment will continue to co-exist. In this paper, we provide algorithms for outsourcing and hiring workers in a general setting, where workers form a team and contribute different skills to perform a task. We call this model team formation with outsourcing. In our model, tasks arrive in an online fashion: neither the number nor the composition of the tasks is known a-priori. At any point in time, there is a team of hired workers who receive a fixed salary independently of the work they perform. This team is dynamic: new members can be hired and existing members can be fired, at some cost. Additionally, some parts of the arriving tasks can be outsourced and thus completed by non-team members, at a premium. Our contribution is an efficient online cost-minimizing algorithm for hiring and firing team members and outsourcing tasks. We present theoretical bounds obtained using a primal-dual scheme proving that our algorithms have a logarithmic competitive approximation ratio. We complement these results with experiments using semi-synthetic datasets based on actual task requirements and worker skills from three large online labor marketplaces.
Tasks
Published	2020-02-16
URL	https://arxiv.org/abs/2002.07618v1
PDF	https://arxiv.org/pdf/2002.07618v1.pdf
PWC	https://paperswithcode.com/paper/algorithms-for-hiring-and-outsourcing-in-the
Repo
Framework

On the Matrix-Free Generation of Adversarial Perturbations for Black-Box Attacks


Title	On the Matrix-Free Generation of Adversarial Perturbations for Black-Box Attacks
Authors	Hisaichi Shibata, Shouhei Hanaoka, Yukihiro Nomura, Naoto Hayashi, Osamu Abe
Abstract	In general, adversarial perturbations superimposed on inputs are realistic threats for a deep neural network (DNN). In this paper, we propose a practical generation method of such adversarial perturbation to be applied to black-box attacks that demand access to an input-output relationship only. Thus, the attackers generate such perturbation without invoking inner functions and/or accessing the inner states of a DNN. Unlike the earlier studies, the algorithm to generate the perturbation presented in this study requires much fewer query trials. Moreover, to show the effectiveness of the adversarial perturbation extracted, we experiment with a DNN for semantic segmentation. The result shows that the network is easily deceived with the perturbation generated than using uniformly distributed random noise with the same magnitude.
Tasks	Semantic Segmentation
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07317v1
PDF	https://arxiv.org/pdf/2002.07317v1.pdf
PWC	https://paperswithcode.com/paper/on-the-matrix-free-generation-of-adversarial
Repo
Framework

Lake Ice Monitoring with Webcams and Crowd-Sourced Images


Title	Lake Ice Monitoring with Webcams and Crowd-Sourced Images
Authors	Rajanie Prabha, Manu Tom, Mathias Rothermel, Emmanuel Baltsavias, Laura Leal-Taixe, Konrad Schindler
Abstract	Lake ice is a strong climate indicator and has been recognised as part of the Essential Climate Variables (ECV) by the Global Climate Observing System (GCOS). The dynamics of freezing and thawing, and possible shifts of freezing patterns over time, can help in understanding the local and global climate systems. One way to acquire the spatio-temporal information about lake ice formation, independent of clouds, is to analyse webcam images. This paper intends to move towards a universal model for monitoring lake ice with freely available webcam data. We demonstrate good performance, including the ability to generalise across different winters and different lakes, with a state-of-the-art Convolutional Neural Network (CNN) model for semantic image segmentation, Deeplab v3+. Moreover, we design a variant of that model, termed Deep-U-Lab, which predicts sharper, more correct segmentation boundaries. We have tested the model’s ability to generalise with data from multiple camera views and two different winters. On average, it achieves intersection-over-union (IoU) values of ~71% across different cameras and ~69% across different winters, greatly outperforming prior work. Going even further, we show that the model even achieves 60% IoU on arbitrary images scraped from photo-sharing web sites. As part of the work, we introduce a new benchmark dataset of webcam images, Photi-LakeIce, from multiple cameras and two different winters, along with pixel-wise ground truth annotations.
Tasks	Semantic Segmentation
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07875v1
PDF	https://arxiv.org/pdf/2002.07875v1.pdf
PWC	https://paperswithcode.com/paper/lake-ice-monitoring-with-webcams-and-crowd
Repo
Framework