January 25, 2020

3408 words 16 mins read

Paper Group ANR 1675

Estimating 3D Camera Pose from 2D Pedestrian Trajectories. AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition. A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis. Eigenvalue distribution of nonlinear models of random matrices. MOANA: An …

Estimating 3D Camera Pose from 2D Pedestrian Trajectories


Title	Estimating 3D Camera Pose from 2D Pedestrian Trajectories
Authors	Yan Xu, Vivek Roy, Kris Kitani
Abstract	We consider the task of re-calibrating the 3D pose of a static surveillance camera, whose pose may change due to external forces, such as birds, wind, falling objects or earthquakes. Conventionally, camera pose estimation can be solved with a PnP (Perspective-n-Point) method using 2D-to-3D feature correspondences, when 3D points are known. However, 3D point annotations are not always available or practical to obtain in real-world applications. We propose an alternative strategy for extracting 3D information to solve for camera pose by using pedestrian trajectories. We observe that 2D pedestrian trajectories indirectly contain useful 3D information that can be used for inferring camera pose. To leverage this information, we propose a data-driven approach by training a neural network (NN) regressor to model a direct mapping from 2D pedestrian trajectories projected on the image plane to 3D camera pose. We demonstrate that our regressor trained only on synthetic data can be directly applied to real data, thus eliminating the need to label any real data. We evaluate our method across six different scenes from the Town Centre Street and DUKEMTMC datasets. Our method achieves an improvement of $\sim50%$ on both position and orientation prediction accuracy when compared to other SOTA methods.
Tasks	Pose Estimation
Published	2019-12-12
URL	https://arxiv.org/abs/1912.05758v2
PDF	https://arxiv.org/pdf/1912.05758v2.pdf
PWC	https://paperswithcode.com/paper/estimating-3d-camera-pose-from-2d-pedestrian
Repo
Framework

AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition


Title	AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition
Authors	Fabien Ringeval, Björn Schuller, Michel Valstar, NIcholas Cummins, Roddy Cowie, Leili Tavabi, Maximilian Schmitt, Sina Alisamir, Shahin Amiriparian, Eva-Maria Messner, Siyang Song, Shuo Liu, Ziping Zhao, Adria Mallol-Ragolta, Zhao Ren, Mohammad Soleymani, Maja Pantic
Abstract	The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) “State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition” is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the health and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of various approaches to health and emotion recognition from real-life data. This paper presents the major novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline systems on the three proposed tasks: state-of-mind recognition, depression assessment with AI, and cross-cultural affect sensing, respectively.
Tasks	Emotion Recognition
Published	2019-07-10
URL	https://arxiv.org/abs/1907.11510v1
PDF	https://arxiv.org/pdf/1907.11510v1.pdf
PWC	https://paperswithcode.com/paper/avec-2019-workshop-and-challenge-state-of
Repo
Framework

A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis


Title	A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis
Authors	Guojun Yin, Bin Liu, Huihui Zhu, Tao Gong, Nenghai Yu
Abstract	Multiple-object tracking and behavior analysis have been the essential parts of surveillance video analysis for public security and urban management. With billions of surveillance video captured all over the world, multiple-object tracking and behavior analysis by manual labor are cumbersome and cost expensive. Due to the rapid development of deep learning algorithms in recent years, automatic object tracking and behavior analysis put forward an urgent demand on a large scale well-annotated surveillance video dataset that can reflect the diverse, congested, and complicated scenarios in real applications. This paper introduces an urban surveillance video dataset (USVD) which is by far the largest and most comprehensive. The dataset consists of 16 scenes captured in 7 typical outdoor scenarios: street, crossroads, hospital entrance, school gate, park, pedestrian mall, and public square. Over 200k video frames are annotated carefully, resulting in more than 3:7 million object bounding boxes and about 7:1 thousand trajectories. We further use this dataset to evaluate the performance of typical algorithms for multiple-object tracking and anomaly behavior analysis and explore the robustness of these methods in urban congested scenarios.
Tasks	Multiple Object Tracking, Object Tracking
Published	2019-04-26
URL	http://arxiv.org/abs/1904.11784v1
PDF	http://arxiv.org/pdf/1904.11784v1.pdf
PWC	https://paperswithcode.com/paper/a-large-scale-urban-surveillance-video
Repo
Framework

Eigenvalue distribution of nonlinear models of random matrices


Title	Eigenvalue distribution of nonlinear models of random matrices
Authors	Lucas Benigni, Sandrine Péché
Abstract	This paper is concerned with the asymptotic empirical eigenvalue distribution of a non linear random matrix ensemble. More precisely we consider $M= \frac{1}{m} YY^*$ with $Y=f(WX)$ where $W$ and $X$ are random rectangular matrices with i.i.d. centered entries. The function $f$ is applied pointwise and can be seen as an activation function in (random) neural networks. We compute the asymptotic empirical distribution of this ensemble in the case where $W$ and $X$ have sub-Gaussian tails and $f$ is real analytic. This extends a previous result where the case of Gaussian matrices $W$ and $X$ is considered. We also investigate the same questions in the multi-layer case, regarding neural network applications.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03090v2
PDF	http://arxiv.org/pdf/1904.03090v2.pdf
PWC	https://paperswithcode.com/paper/eigenvalue-distribution-of-nonlinear-models
Repo
Framework

MOANA: An Online Learned Adaptive Appearance Model for Robust Multiple Object Tracking in 3D


Title	MOANA: An Online Learned Adaptive Appearance Model for Robust Multiple Object Tracking in 3D
Authors	Zheng Tang, Jenq-Neng Hwang
Abstract	Multiple object tracking has been a challenging field, mainly due to noisy detection sets and identity switch caused by occlusion and similar appearance among nearby targets. Previous works rely on appearance models built on individual or several selected frames for the comparison of features, but they cannot encode long-term appearance changes caused by pose, viewing angle and lighting conditions. In this work, we propose an adaptive model that learns online a relatively long-term appearance change of each target. The proposed model is compatible with any feature of fixed dimension or their combination, whose learning rates are dynamically controlled by adaptive update and spatial weighting schemes. To handle occlusion and nearby objects sharing similar appearance, we also design cross-matching and re-identification schemes based on the application of the proposed adaptive appearance models. Additionally, the 3D geometry information is effectively incorporated in our formulation for data association. The proposed method outperforms all the state-of-the-art on the MOTChallenge 3D benchmark and achieves real-time computation with only a standard desktop CPU. It has also shown superior performance over the state-of-the-art on the 2D benchmark of MOTChallenge.
Tasks	Multiple Object Tracking, Object Tracking
Published	2019-01-09
URL	http://arxiv.org/abs/1901.02626v2
PDF	http://arxiv.org/pdf/1901.02626v2.pdf
PWC	https://paperswithcode.com/paper/moana-an-online-learned-adaptive-appearance
Repo
Framework

Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee


Title	Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee
Authors	Ming Lin, Shuang Qiu, Jieping Ye, Xiaomin Song, Qi Qian, Liang Sun, Shenghuo Zhu, Rong Jin
Abstract	Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank $k$ generalized FM of $d$ dimensional input, the previous best known sampling complexity is $\mathcal{O}[k^{3}d\cdot\mathrm{polylog}(kd)]$ under Gaussian distribution. This bound is sub-optimal comparing to the information theoretical lower bound $\mathcal{O}(kd)$. In this work, we aim to tighten this bound towards optimal and generalize the analysis to sub-gaussian distribution. We prove that when the input data satisfies the so-called $\tau$-Moment Invertible Property, the sampling complexity of generalized FM can be improved to $\mathcal{O}[k^{2}d\cdot\mathrm{polylog}(kd)/\tau^{2}]$. When the second order self-interaction terms are excluded in the generalized FM, the bound can be improved to the optimal $\mathcal{O}[kd\cdot\mathrm{polylog}(kd)]$ up to the logarithmic factors. Our analysis also suggests that the positive semi-definite constraint in the conventional FM is redundant as it does not improve the sampling complexity while making the model difficult to optimize. We evaluate our improved FM model in real-time high precision GPS signal calibration task to validate its superiority.
Tasks	Calibration
Published	2019-01-30
URL	http://arxiv.org/abs/1901.11149v1
PDF	http://arxiv.org/pdf/1901.11149v1.pdf
PWC	https://paperswithcode.com/paper/which-factorization-machine-modeling-is
Repo
Framework

Approximating Spectral Clustering via Sampling: a Review


Title	Approximating Spectral Clustering via Sampling: a Review
Authors	Nicolas Tremblay, Andreas Loukas
Abstract	Spectral clustering refers to a family of unsupervised learning algorithms that compute a spectral embedding of the original data based on the eigenvectors of a similarity graph. This non-linear transformation of the data is both the key of these algorithms’ success and their Achilles heel: forming a graph and computing its dominant eigenvectors can indeed be computationally prohibitive when dealing with more that a few tens of thousands of points. In this paper, we review the principal research efforts aiming to reduce this computational cost. We focus on methods that come with a theoretical control on the clustering performance and incorporate some form of sampling in their operation. Such methods abound in the machine learning, numerical linear algebra, and graph signal processing literature and, amongst others, include Nystr"om-approximation, landmarks, coarsening, coresets, and compressive spectral clustering. We present the approximation guarantees available for each and discuss practical merits and limitations. Surprisingly, despite the breadth of the literature explored, we conclude that there is still a gap between theory and practice: the most scalable methods are only intuitively motivated or loosely controlled, whereas those that come with end-to-end guarantees rely on strong assumptions or enable a limited gain of computation time.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10204v1
PDF	http://arxiv.org/pdf/1901.10204v1.pdf
PWC	https://paperswithcode.com/paper/approximating-spectral-clustering-via
Repo
Framework

Guided Meta-Policy Search


Title	Guided Meta-Policy Search
Authors	Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn
Abstract	Reinforcement learning (RL) algorithms have demonstrated promising results on complex tasks, yet often require impractical numbers of samples because they learn from scratch. Meta-RL aims to address this challenge by leveraging experience from previous tasks in order to more quickly solve new tasks. However, in practice, these algorithms generally also require large amounts of on-policy experience during the meta-training process, making them impractical for use in many problems. To this end, we propose to learn a reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks. This involves a nested optimization, with RL in the inner loop and supervised imitation learning in the outer loop. Because the outer loop imitation learning can be done with off-policy data, we can achieve significant gains in meta-learning sample efficiency. In this paper, we show how this general idea can be used both for meta-reinforcement learning and for learning fast RL procedures from multi-task demonstration data. The former results in an approach that can leverage policies learned for previous tasks without significant amounts of on-policy data during meta-training, whereas the latter is particularly useful in cases where demonstrations are easy for a person to provide. Across a number of continuous control meta-RL problems, we demonstrate significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations.
Tasks	Continuous Control, Imitation Learning, Meta-Learning
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00956v1
PDF	http://arxiv.org/pdf/1904.00956v1.pdf
PWC	https://paperswithcode.com/paper/guided-meta-policy-search
Repo
Framework

Angular separability of data clusters or network communities in geometrical space and its relevance to hyperbolic embedding


Title	Angular separability of data clusters or network communities in geometrical space and its relevance to hyperbolic embedding
Authors	Alessandro Muscoloni, Carlo Vittorio Cannistraci
Abstract	Analysis of ‘big data’ characterized by high-dimensionality such as word vectors and complex networks requires often their representation in a geometrical space by embedding. Recent developments in machine learning and network geometry have pointed out the hyperbolic space as a useful framework for the representation of this data derived by real complex physical systems. In the hyperbolic space, the radial coordinate of the nodes characterizes their hierarchy, whereas the angular distance between them represents their similarity. Several studies have highlighted the relationship between the angular coordinates of the nodes embedded in the hyperbolic space and the community metadata available. However, such analyses have been often limited to a visual or qualitative assessment. Here, we introduce the angular separation index (ASI), to quantitatively evaluate the separation of node network communities or data clusters over the angular coordinates of a geometrical space. ASI is particularly useful in the hyperbolic space - where it is extensively tested along this study - but can be used in general for any assessment of angular separation regardless of the adopted geometry. ASI is proposed together with an exact test statistic based on a uniformly random null model to assess the statistical significance of the separation. We show that ASI allows to discover two significant phenomena in network geometry. The first is that the increase of temperature in 2D hyperbolic network generative models, not only reduces the network clustering but also induces a ‘dimensionality jump’ of the network to dimensions higher than two. The second is that ASI can be successfully applied to detect the intrinsic dimensionality of network structures that grow in a hidden geometrical space.
Tasks
Published	2019-06-28
URL	https://arxiv.org/abs/1907.00025v1
PDF	https://arxiv.org/pdf/1907.00025v1.pdf
PWC	https://paperswithcode.com/paper/angular-separability-of-data-clusters-or
Repo
Framework

Globally-Aware Multiple Instance Classifier for Breast Cancer Screening


Title	Globally-Aware Multiple Instance Classifier for Breast Cancer Screening
Authors	Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
Abstract	Deep learning models designed for visual classification tasks on natural images have become prevalent in medical image analysis. However, medical images differ from typical natural images in many ways, such as significantly higher resolutions and smaller regions of interest. Moreover, both the global structure and local details play important roles in medical image analysis tasks. To address these unique properties of medical images, we propose a neural network that is able to classify breast cancer lesions utilizing information from both a global saliency map and multiple local patches. The proposed model outperforms the ResNet-based baseline and achieves radiologist-level performance in the interpretation of screening mammography. Although our model is trained only with image-level labels, it is able to generate pixel-level saliency maps that provide localization of possible malignant findings.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02846v2
PDF	https://arxiv.org/pdf/1906.02846v2.pdf
PWC	https://paperswithcode.com/paper/globally-aware-multiple-instance-classifier
Repo
Framework

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning


Title	Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning
Authors	Alexander H. Liu, Tao Tu, Hung-yi Lee, Lin-shan Lee
Abstract	In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances. This is achieved by proper temporal segmentation to make the representations phoneme-synchronized, and proper phonetic clustering to have total number of distinct representations close to the number of phonemes. Mapping between the distinct representations and phonemes is learned from a small amount of annotated paired data. Preliminary experiments on LJSpeech demonstrated the learned representations for vowels have relative locations in latent space in good parallel to that shown in the IPA vowel chart defined by linguistics experts. With less than 20 minutes of annotated speech, our method outperformed existing methods on phoneme recognition and is able to synthesize intelligible speech that beats our baseline model.
Tasks	Quantization, Representation Learning, Speech Recognition
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12729v2
PDF	https://arxiv.org/pdf/1910.12729v2.pdf
PWC	https://paperswithcode.com/paper/towards-unsupervised-speech-recognition-and
Repo
Framework

Cascaded Deep Neural Networks for Retinal Layer Segmentation of Optical Coherence Tomography with Fluid Presence


Title	Cascaded Deep Neural Networks for Retinal Layer Segmentation of Optical Coherence Tomography with Fluid Presence
Authors	Donghuan Lu, Morgan Heisler, Da Ma, Setareh Dabiri, Sieun Lee, Gavin Weiguang Ding, Marinko V. Sarunic, Mirza Faisal Beg
Abstract	Optical coherence tomography (OCT) is a non-invasive imaging technology which can provide micrometer-resolution cross-sectional images of the inner structures of the eye. It is widely used for the diagnosis of ophthalmic diseases with retinal alteration, such as layer deformation and fluid accumulation. In this paper, a novel framework was proposed to segment retinal layers with fluid presence. The main contribution of this study is two folds: 1) we developed a cascaded network framework to incorporate the prior structural knowledge; 2) we proposed a novel deep neural network based on U-Net and fully convolutional network, termed LF-UNet. Cross validation experiments proved that the proposed LF-UNet has superior performance comparing with the state-of-the-art methods, and incorporating the relative distance map structural prior information could further improve the performance regardless the network.
Tasks
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03418v1
PDF	https://arxiv.org/pdf/1912.03418v1.pdf
PWC	https://paperswithcode.com/paper/cascaded-deep-neural-networks-for-retinal
Repo
Framework

CNNs, LSTMs, and Attention Networks for Pathology Detection in Medical Data


Title	CNNs, LSTMs, and Attention Networks for Pathology Detection in Medical Data
Authors	Nora Vogt
Abstract	For the weakly supervised task of electrocardiogram (ECG) rhythm classification, convolutional neural networks (CNNs) and long short-term memory (LSTM) networks are two increasingly popular classification models. This work investigates whether a combination of both architectures to so-called convolutional long short-term memory (ConvLSTM) networks can improve classification performances by explicitly capturing morphological as well as temporal features of raw ECG records. In addition, various attention mechanisms are studied to localize and visualize record sections of abnormal morphology and irregular rhythm. The resulting saliency maps are supposed to not only allow for a better network understanding but to also improve clinicians’ acceptance of automatic diagnosis in order to avoid the technique being labeled as a black box. In further experiments, attention mechanisms are actively incorporated into the training process by learning a few additional attention gating parameters in a CNN model. An 8-fold cross validation is finally carried out on the PhysioNet Computing in Cardiology (CinC) challenge 2017 to compare the performances of standard CNN models, ConvLSTMs, and attention gated CNNs.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00852v1
PDF	https://arxiv.org/pdf/1912.00852v1.pdf
PWC	https://paperswithcode.com/paper/cnns-lstms-and-attention-networks-for
Repo
Framework

Cognitive Assessment Estimation from Behavioral Responses in Emotional Faces Evaluation Task – AI Regression Approach for Dementia Onset Prediction in Aging Societies


Title	Cognitive Assessment Estimation from Behavioral Responses in Emotional Faces Evaluation Task – AI Regression Approach for Dementia Onset Prediction in Aging Societies
Authors	Tomasz M. Rutkowski, Masato S. Abe, Marcin Koculak, Mihoko Otake-Matsuura
Abstract	We present a practical health-theme machine learning (ML) application concerning `AI for social good' domain for` Producing Good Outcomes’ track. In particular, the solution is concerning the problem of a potential elderly adult dementia onset prediction in aging societies. The paper discusses our attempt and encouraging preliminary study results of behavioral responses analysis in a working memory-based emotional evaluation experiment. We focus on the development of digital biomarkers for dementia progress detection and monitoring. We present a behavioral data collection concept for a subsequent AI-based application together with a range of regression encouraging results of Montreal Cognitive Assessment (MoCA) scores in the leave-one-subject-out cross-validation setup. The regressor input variables include experimental subject’s emotional valence and arousal recognition responses, as well as reaction times, together with self-reported education levels and ages, obtained from a group of twenty older adults taking part in the reported data collection project. The presented results showcase the potential social benefits of artificial intelligence application for elderly and establish a step forward to develop ML approaches, for the subsequent application of simple behavioral objective testing for dementia onset diagnostics replacing subjective MoCA.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.12135v1
PDF	https://arxiv.org/pdf/1911.12135v1.pdf
PWC	https://paperswithcode.com/paper/cognitive-assessment-estimation-from
Repo
Framework

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives


Title	The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Authors	Elena Voita, Rico Sennrich, Ivan Titov
Abstract	We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. We focus on the Transformers for our analysis as they have been shown effective on various tasks, including machine translation (MT), standard left-to-right language models (LM) and masked language modeling (MLM). Previous work used black-box probing tasks to show that the representations learned by the Transformer differ significantly depending on the objective. In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and how this process depends on the choice of learning objective. For example, as you go from bottom to top layers, information about the past in left-to-right language models gets vanished and predictions about the future get formed. In contrast, for MLM, representations initially acquire information about the context around the token, partially forgetting the token identity and producing a more generalized token representation. The token identity then gets recreated at the top MLM layers.
Tasks	Language Modelling, Machine Translation
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01380v1
PDF	https://arxiv.org/pdf/1909.01380v1.pdf
PWC	https://paperswithcode.com/paper/the-bottom-up-evolution-of-representations-in
Repo
Framework