October 18, 2019

3219 words 16 mins read

Paper Group ANR 545

Anomaly Detection and Localization in Crowded Scenes by Motion-field Shape Description and Similarity-based Statistical Learning. A Survey of Deep Facial Attribute Analysis. Guided Proceduralization: Optimizing Geometry Processing and Grammar Extraction for Architectural Models. Zero-shot Transfer Learning for Semantic Parsing. Fully Scalable Gauss …

Anomaly Detection and Localization in Crowded Scenes by Motion-field Shape Description and Similarity-based Statistical Learning


Title	Anomaly Detection and Localization in Crowded Scenes by Motion-field Shape Description and Similarity-based Statistical Learning
Authors	Xinfeng Zhang, Su Yang, Xinjian Zhang, Weishan Zhang, Jiulong Zhang
Abstract	In crowded scenes, detection and localization of abnormal behaviors is challenging in that high-density people make object segmentation and tracking extremely difficult. We associate the optical flows of multiple frames to capture short-term trajectories and introduce the histogram-based shape descriptor referred to as shape contexts to describe such short-term trajectories. Furthermore, we propose a K-NN similarity-based statistical model to detect anomalies over time and space, which is an unsupervised one-class learning algorithm requiring no clustering nor any prior assumption. Firstly, we retrieve the K-NN samples from the training set in regard to the testing sample, and then use the similarities between every pair of the K-NN samples to construct a Gaussian model. Finally, the probabilities of the similarities from the testing sample to the K-NN samples under the Gaussian model are calculated in the form of a joint probability. Abnormal events can be detected by judging whether the joint probability is below predefined thresholds in terms of time and space, separately. Such a scheme can adapt to the whole scene, since the probability computed as such is not affected by motion distortions arising from perspective distortion. We conduct experiments on real-world surveillance videos, and the results demonstrate that the proposed method can reliably detect and locate the abnormal events in the video sequences, outperforming the state-of-the-art approaches.
Tasks	Anomaly Detection, Semantic Segmentation
Published	2018-05-27
URL	http://arxiv.org/abs/1805.10620v1
PDF	http://arxiv.org/pdf/1805.10620v1.pdf
PWC	https://paperswithcode.com/paper/anomaly-detection-and-localization-in-crowded
Repo
Framework

A Survey of Deep Facial Attribute Analysis


Title	A Survey of Deep Facial Attribute Analysis
Authors	Xin Zheng, Yanqing Guo, Huaibo Huang, Yi Li, Ran He
Abstract	Facial attribute analysis has received considerable attention when deep learning techniques made remarkable breakthroughs in this field over the past few years. Deep learning based facial attribute analysis consists of two basic sub-issues: facial attribute estimation (FAE), which recognizes whether facial attributes are present in given images, and facial attribute manipulation (FAM), which synthesizes or removes desired facial attributes. In this paper, we provide a comprehensive survey of deep facial attribute analysis from the perspectives of both estimation and manipulation. First, we summarize a general pipeline that deep facial attribute analysis follows, which comprises two stages: data preprocessing and model construction. Additionally, we introduce the underlying theories of this two-stage pipeline for both FAE and FAM. Second, the datasets and performance metrics commonly used in facial attribute analysis are presented. Third, we create a taxonomy of state-of-the-art methods and review deep FAE and FAM algorithms in detail. Furthermore, several additional facial attribute related issues are introduced, as well as relevant real-world applications. Finally, we discuss possible challenges and promising future research directions.
Tasks
Published	2018-12-26
URL	https://arxiv.org/abs/1812.10265v3
PDF	https://arxiv.org/pdf/1812.10265v3.pdf
PWC	https://paperswithcode.com/paper/a-survey-to-deep-facial-attribute-analysis
Repo
Framework

Guided Proceduralization: Optimizing Geometry Processing and Grammar Extraction for Architectural Models


Title	Guided Proceduralization: Optimizing Geometry Processing and Grammar Extraction for Architectural Models
Authors	Ilke Demir, Daniel G. Aliaga
Abstract	We describe a guided proceduralization framework that optimizes geometry processing on architectural input models to extract target grammars. We aim to provide efficient artistic workflows by creating procedural representations from existing 3D models, where the procedural expressiveness is controlled by the user. Architectural reconstruction and modeling tasks have been handled as either time consuming manual processes or procedural generation with difficult control and artistic influence. We bridge the gap between creation and generation by converting existing manually modeled architecture to procedurally editable parametrized models, and carrying the guidance to procedural domain by letting the user define the target procedural representation. Additionally, we propose various applications of such procedural representations, including guided completion of point cloud models, controllable 3D city modeling, and other benefits of procedural modeling.
Tasks
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02578v1
PDF	http://arxiv.org/pdf/1807.02578v1.pdf
PWC	https://paperswithcode.com/paper/guided-proceduralization-optimizing-geometry
Repo
Framework

Zero-shot Transfer Learning for Semantic Parsing


Title	Zero-shot Transfer Learning for Semantic Parsing
Authors	Javid Dadashkarimi, Alexander Fabbri, Sekhar Tatikonda, Dragomir R. Radev
Abstract	While neural networks have shown impressive performance on large datasets, applying these models to tasks where little data is available remains a challenging problem. In this paper we propose to use feature transfer in a zero-shot experimental setting on the task of semantic parsing. We first introduce a new method for learning the shared space between multiple domains based on the prediction of the domain label for each example. Our experiments support the superiority of this method in a zero-shot experimental setting in terms of accuracy metrics compared to state-of-the-art techniques. In the second part of this paper we study the impact of individual domains and examples on semantic parsing performance. We use influence functions to this aim and investigate the sensitivity of domain-label classification loss on each example. Our findings reveal that cross-domain adversarial attacks identify useful examples for training even from the domains the least similar to the target domain. Augmenting our training data with these influential examples further boosts our accuracy at both the token and the sequence level.
Tasks	Accuracy Metrics, Semantic Parsing, Transfer Learning
Published	2018-08-27
URL	http://arxiv.org/abs/1808.09889v1
PDF	http://arxiv.org/pdf/1808.09889v1.pdf
PWC	https://paperswithcode.com/paper/zero-shot-transfer-learning-for-semantic
Repo
Framework

Fully Scalable Gaussian Processes using Subspace Inducing Inputs


Title	Fully Scalable Gaussian Processes using Subspace Inducing Inputs
Authors	Aristeidis Panos, Petros Dellaportas, Michalis K. Titsias
Abstract	We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data. Our key idea is a representation trick over the inducing variables called subspace inducing inputs. This is combined with certain matrix-preconditioning based parametrizations of the variational distributions that lead to simplified and numerically stable variational lower bounds. Our illustrative applications are based on challenging extreme multi-label classification problems with the extra burden of the very large number of class labels. We demonstrate the usefulness of our approach by presenting predictive performances together with low computational times in datasets with extremely large number of instances and input dimensions.
Tasks	Extreme Multi-Label Classification, Gaussian Processes, Multi-Label Classification
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02537v2
PDF	http://arxiv.org/pdf/1807.02537v2.pdf
PWC	https://paperswithcode.com/paper/fully-scalable-gaussian-processes-using
Repo
Framework

Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks


Title	Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks
Authors	Jonathan Rubin, Deepan Sanghavi, Claire Zhao, Kathy Lee, Ashequl Qadir, Minnan Xu-Wilson
Abstract	The MIMIC-CXR dataset is (to date) the largest released chest x-ray dataset consisting of 473,064 chest x-rays and 206,574 radiology reports collected from 63,478 patients. We present the results of training and evaluating a collection of deep convolutional neural networks on this dataset to recognize multiple common thorax diseases. To the best of our knowledge, this is the first work that trains CNNs for this task on such a large collection of chest x-ray images, which is over four times the size of the largest previously released chest x-ray corpus (ChestX-Ray14). We describe and evaluate individual CNN models trained on frontal and lateral CXR view types. In addition, we present a novel DualNet architecture that emulates routine clinical practice by simultaneously processing both frontal and lateral CXR images obtained from a radiological exam. Our DualNet architecture shows improved performance in recognizing findings in CXR images when compared to applying separate baseline frontal and lateral classifiers.
Tasks
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07839v2
PDF	http://arxiv.org/pdf/1804.07839v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-automated-reading-of-frontal-and
Repo
Framework

Real Vector Spaces and the Cauchy-Schwarz Inequality in ACL2(r)


Title	Real Vector Spaces and the Cauchy-Schwarz Inequality in ACL2(r)
Authors	Carl Kwan, Mark R. Greenstreet
Abstract	We present a mechanical proof of the Cauchy-Schwarz inequality in ACL2(r) and a formalisation of the necessary mathematics to undertake such a proof. This includes the formalisation of $\mathbb{R}^n$ as an inner product space. We also provide an application of Cauchy-Schwarz by formalising $\mathbb R^n$ as a metric space and exhibiting continuity for some simple functions $\mathbb R^n\to\mathbb R$. The Cauchy-Schwarz inequality relates the magnitude of a vector to its projection (or inner product) with another: [\langle u,v\rangle \leq \u\ \v\] with equality iff the vectors are linearly dependent. It finds frequent use in many branches of mathematics including linear algebra, real analysis, functional analysis, probability, etc. Indeed, the inequality is considered to be among “The Hundred Greatest Theorems” and is listed in the “Formalizing 100 Theorems” project. To the best of our knowledge, our formalisation is the first published proof using ACL2(r) or any other first-order theorem prover.
Tasks
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04315v1
PDF	http://arxiv.org/pdf/1810.04315v1.pdf
PWC	https://paperswithcode.com/paper/real-vector-spaces-and-the-cauchy-schwarz
Repo
Framework

Physics-based Scene-level Reasoning for Object Pose Estimation in Clutter


Title	Physics-based Scene-level Reasoning for Object Pose Estimation in Clutter
Authors	Chaitanya Mitash, Abdeslam Boularias, Kostas Bekris
Abstract	This paper focuses on vision-based pose estimation for multiple rigid objects placed in clutter, especially in cases involving occlusions and objects resting on each other. Progress has been achieved recently in object recognition given advancements in deep learning. Nevertheless, such tools typically require a large amount of training data and significant manual effort to label objects. This limits their applicability in robotics, where solutions must scale to a large number of objects and variety of conditions. Moreover, the combinatorial nature of the scenes that could arise from the placement of multiple objects is hard to capture in the training dataset. Thus, the learned models might not produce the desired level of precision required for tasks, such as robotic manipulation. This work proposes an autonomous process for pose estimation that spans from data generation to scene-level reasoning and self-learning. In particular, the proposed framework first generates a labeled dataset for training a Convolutional Neural Network (CNN) for object detection in clutter. These detections are used to guide a scene-level optimization process, which considers the interactions between the different objects present in the clutter to output pose estimates of high precision. Furthermore, confident estimates are used to label online real images from multiple views and re-train the process in a self-learning pipeline. Experimental results indicate that this process is quickly able to identify in cluttered scenes physically-consistent object poses that are more precise than the ones found by reasoning over individual instances of objects. Furthermore, the quality of pose estimates increases over time given the self-learning process.
Tasks	Object Detection, Object Recognition, Pose Estimation
Published	2018-06-25
URL	http://arxiv.org/abs/1806.10457v2
PDF	http://arxiv.org/pdf/1806.10457v2.pdf
PWC	https://paperswithcode.com/paper/physics-based-scene-level-reasoning-for
Repo
Framework

Live Video Comment Generation Based on Surrounding Frames and Live Comments


Title	Live Video Comment Generation Based on Surrounding Frames and Live Comments
Authors	Damai Dai
Abstract	In this paper, we propose the task of live comment generation. Live comments are a new form of comments on videos, which can be regarded as a mixture of comments and chats. A high-quality live comment should be not only relevant to the video, but also interactive with other users. In this work, we first construct a new dataset for live comment generation. Then, we propose a novel end-to-end model to generate the human-like live comments by referring to the video and the other users’ comments. Finally, we evaluate our model on the constructed dataset. Experimental results show that our method can significantly outperform the baselines.
Tasks
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04091v1
PDF	http://arxiv.org/pdf/1808.04091v1.pdf
PWC	https://paperswithcode.com/paper/live-video-comment-generation-based-on
Repo
Framework

Fast Variance Reduction Method with Stochastic Batch Size


Title	Fast Variance Reduction Method with Stochastic Batch Size
Authors	Xuanqing Liu, Cho-Jui Hsieh
Abstract	In this paper we study a family of variance reduction methods with randomized batch size—at each step, the algorithm first randomly chooses the batch size and then selects a batch of samples to conduct a variance-reduced stochastic update. We give the linear convergence rate for this framework for composite functions, and show that the optimal strategy to achieve the optimal convergence rate per data access is to always choose batch size of 1, which is equivalent to the SAGA algorithm. However, due to the presence of cache/disk IO effect in computer architecture, the number of data access cannot reflect the running time because of 1) random memory access is much slower than sequential access, 2) when data is too big to fit into memory, disk seeking takes even longer time. After taking these into account, choosing batch size of $1$ is no longer optimal, so we propose a new algorithm called SAGA++ and show how to calculate the optimal average batch size theoretically. Our algorithm outperforms SAGA and other existing batched and stochastic solvers on real datasets. In addition, we also conduct a precise analysis to compare different update rules for variance reduction methods, showing that SAGA++ converges faster than SVRG in theory.
Tasks
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02169v1
PDF	http://arxiv.org/pdf/1808.02169v1.pdf
PWC	https://paperswithcode.com/paper/fast-variance-reduction-method-with
Repo
Framework


Title	Predicting the Next Best View for 3D Mesh Refinement
Authors	Luca Morreale, Andrea Romanoni, Matteo Matteucci
Abstract	3D reconstruction is a core task in many applications such as robot navigation or sites inspections. Finding the best poses to capture part of the scene is one of the most challenging topic that goes under the name of Next Best View. Recently, many volumetric methods have been proposed; they choose the Next Best View by reasoning over a 3D voxelized space and by finding which pose minimizes the uncertainty decoded into the voxels. Such methods are effective, but they do not scale well since the underlaying representation requires a huge amount of memory. In this paper we propose a novel mesh-based approach which focuses on the worst reconstructed region of the environment mesh. We define a photo-consistent index to evaluate the 3D mesh accuracy, and an energy function over the worst regions of the mesh which takes into account the mutual parallax with respect to the previous cameras, the angle of incidence of the viewing ray to the surface and the visibility of the region. We test our approach over a well known dataset and achieve state-of-the-art results.
Tasks	3D Reconstruction, Robot Navigation
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06207v1
PDF	http://arxiv.org/pdf/1805.06207v1.pdf
PWC	https://paperswithcode.com/paper/predicting-the-next-best-view-for-3d-mesh
Repo
Framework

Deep cross-domain building extraction for selective depth estimation from oblique aerial imagery


Title	Deep cross-domain building extraction for selective depth estimation from oblique aerial imagery
Authors	Boitumelo Ruf, Laurenz Thiel, Martin Weinmann
Abstract	With the technological advancements of aerial imagery and accurate 3d reconstruction of urban environments, more and more attention has been paid to the automated analyses of urban areas. In our work, we examine two important aspects that allow live analysis of building structures in city models given oblique aerial imagery, namely automatic building extraction with convolutional neural networks (CNNs) and selective real-time depth estimation from aerial imagery. We use transfer learning to train the Faster R-CNN method for real-time deep object detection, by combining a large ground-based dataset for urban scene understanding with a smaller number of images from an aerial dataset. We achieve an average precision (AP) of about 80% for the task of building extraction on a selected evaluation dataset. Our evaluation focuses on both dataset-specific learning and transfer learning. Furthermore, we present an algorithm that allows for multi-view depth estimation from aerial imagery in real-time. We adopt the semi-global matching (SGM) optimization strategy to preserve sharp edges at object boundaries. In combination with the Faster R-CNN, it allows a selective reconstruction of buildings, identified with regions of interest (RoIs), from oblique aerial imagery.
Tasks	3D Reconstruction, Depth Estimation, Object Detection, Scene Understanding, Transfer Learning
Published	2018-04-23
URL	https://arxiv.org/abs/1804.08302v3
PDF	https://arxiv.org/pdf/1804.08302v3.pdf
PWC	https://paperswithcode.com/paper/deep-cross-domain-building-extraction-for
Repo
Framework

Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder


Title	Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder
Authors	Takuma Nakamura, Ryosuke Goto
Abstract	When creating an outfit, style is a criterion in selecting each fashion item. This means that style can be regarded as a feature of the overall outfit. However, in various previous studies on outfit generation, there have been few methods focusing on global information obtained from an outfit. To address this deficiency, we have incorporated an unsupervised style extraction module into a model to learn outfits. Using the style information of an outfit as a whole, the proposed model succeeded in generating outfits more flexibly without requiring additional information. Moreover, the style information extracted by the proposed model is easy to interpret. The proposed model was evaluated on two human-generated outfit datasets. In a fashion item prediction task (missing prediction task), the proposed model outperformed a baseline method. In a style extraction task, the proposed model extracted some easily distinguishable styles. In an outfit generation task, the proposed model generated an outfit while controlling its styles. This capability allows us to generate fashionable outfits according to various preferences.
Tasks
Published	2018-06-29
URL	http://arxiv.org/abs/1807.03133v3
PDF	http://arxiv.org/pdf/1807.03133v3.pdf
PWC	https://paperswithcode.com/paper/outfit-generation-and-style-extraction-via
Repo
Framework

Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously


Title	Editable Generative Adversarial Networks: Generating and Editing Faces Simultaneously
Authors	Kyungjune Baek, Duhyeon Bang, Hyunjung Shim
Abstract	We propose a novel framework for simultaneously generating and manipulating the face images with desired attributes. While the state-of-the-art attribute editing technique has achieved the impressive performance for creating realistic attribute effects, they only address the image editing problem, using the input image as the condition of model. Recently, several studies attempt to tackle both novel face generation and attribute editing problem using a single solution. However, their image quality is still unsatisfactory. Our goal is to develop a single unified model that can simultaneously create and edit high quality face images with desired attributes. A key idea of our work is that we decompose the image into the latent and attribute vector in low dimensional representation, and then utilize the GAN framework for mapping the low dimensional representation to the image. In this way, we can address both the generation and editing problem by learning the generator. For qualitative and quantitative evaluations, the proposed algorithm outperforms recent algorithms addressing the same problem. Also, we show that our model can achieve the competitive performance with the state-of-the-art attribute editing technique in terms of attribute editing quality.
Tasks	Face Generation
Published	2018-07-20
URL	http://arxiv.org/abs/1807.07700v1
PDF	http://arxiv.org/pdf/1807.07700v1.pdf
PWC	https://paperswithcode.com/paper/editable-generative-adversarial-networks
Repo
Framework

Average Margin Regularization for Classifiers


Title	Average Margin Regularization for Classifiers
Authors	Matt Olfat, Anil Aswani
Abstract	Adversarial robustness has become an important research topic given empirical demonstrations on the lack of robustness of deep neural networks. Unfortunately, recent theoretical results suggest that adversarial training induces a strict tradeoff between classification accuracy and adversarial robustness. In this paper, we propose and then study a new regularization for any margin classifier or deep neural network. We motivate this regularization by a novel generalization bound that shows a tradeoff in classifier accuracy between maximizing its margin and average margin. We thus call our approach an average margin (AM) regularization, and it consists of a linear term added to the objective. We theoretically show that for certain distributions AM regularization can both improve classifier accuracy and robustness to adversarial attacks. We conclude by using both synthetic and real data to empirically show that AM regularization can strictly improve both accuracy and robustness for support vector machine’s (SVM’s), relative to unregularized classifiers and adversarially trained classifiers.
Tasks
Published	2018-10-09
URL	https://arxiv.org/abs/1810.03773v3
PDF	https://arxiv.org/pdf/1810.03773v3.pdf
PWC	https://paperswithcode.com/paper/average-margin-regularization-for-classifiers
Repo
Framework