July 28, 2019

3459 words 17 mins read

Paper Group ANR 276

Entity Linking with people entity on Wikipedia. Coplanar Repeats by Energy Minimization. Representation Learning by Learning to Count. Energy-based Models for Video Anomaly Detection. Discriminatively Learned Hierarchical Rank Pooling Networks. Successive Embedding and Classification Loss for Aerial Image Classification. A discriminative view of MR …

Entity Linking with people entity on Wikipedia


Title	Entity Linking with people entity on Wikipedia
Authors	Weiqian Yan, Kanchan Khurad
Abstract	This paper introduces a new model that uses named entity recognition, coreference resolution, and entity linking techniques, to approach the task of linking people entities on Wikipedia people pages to their corresponding Wikipedia pages if applicable. Our task is different from general and traditional entity linking because we are working in a limited domain, namely, people entities, and we are including pronouns as entities, whereas in the past, pronouns were never considered as entities in entity linking. We have built 2 models, both outperforms our baseline model significantly. The purpose of our project is to build a model that could be use to generate cleaner data for future entity linking tasks. Our contribution include a clean data set consisting of 50Wikipedia people pages, and 2 entity linking models, specifically tuned for this domain.
Tasks	Coreference Resolution, Entity Linking, Named Entity Recognition
Published	2017-05-02
URL	http://arxiv.org/abs/1705.01042v1
PDF	http://arxiv.org/pdf/1705.01042v1.pdf
PWC	https://paperswithcode.com/paper/entity-linking-with-people-entity-on
Repo
Framework

Coplanar Repeats by Energy Minimization


Title	Coplanar Repeats by Energy Minimization
Authors	James Pritts, Denys Rozumnyi, M. Pawan Kumar, Ondrej Chum
Abstract	This paper proposes an automated method to detect, group and rectify arbitrarily-arranged coplanar repeated elements via energy minimization. The proposed energy functional combines several features that model how planes with coplanar repeats are projected into images and captures global interactions between different coplanar repeat groups and scene planes. An inference framework based on a recent variant of $\alpha$-expansion is described and fast convergence is demonstrated. We compare the proposed method to two widely-used geometric multi-model fitting methods using a new dataset of annotated images containing multiple scene planes with coplanar repeats in varied arrangements. The evaluation shows a significant improvement in the accuracy of rectifications computed from coplanar repeats detected with the proposed method versus those detected with the baseline methods.
Tasks
Published	2017-11-26
URL	http://arxiv.org/abs/1711.09432v1
PDF	http://arxiv.org/pdf/1711.09432v1.pdf
PWC	https://paperswithcode.com/paper/coplanar-repeats-by-energy-minimization
Repo
Framework

Representation Learning by Learning to Count


Title	Representation Learning by Learning to Count
Authors	Mehdi Noroozi, Hamed Pirsiavash, Paolo Favaro
Abstract	We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such relation rather than the transformations that match a given representation. In this paper, we use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. These two transformations are combined in one constraint and used to train a neural network with a contrastive loss. The proposed task produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.
Tasks	Representation Learning, Transfer Learning
Published	2017-08-22
URL	http://arxiv.org/abs/1708.06734v1
PDF	http://arxiv.org/pdf/1708.06734v1.pdf
PWC	https://paperswithcode.com/paper/representation-learning-by-learning-to-count
Repo
Framework

Energy-based Models for Video Anomaly Detection


Title	Energy-based Models for Video Anomaly Detection
Authors	Hung Vu, Dinh Phung, Tu Dinh Nguyen, Anthony Trevors, Svetha Venkatesh
Abstract	Automated detection of abnormalities in data has been studied in research area in recent years because of its diverse applications in practice including video surveillance, industrial damage detection and network intrusion detection. However, building an effective anomaly detection system is a non-trivial task since it requires to tackle challenging issues of the shortage of annotated data, inability of defining anomaly objects explicitly and the expensive cost of feature engineering procedure. Unlike existing appoaches which only partially solve these problems, we develop a unique framework to cope the problems above simultaneously. Instead of hanlding with ambiguous definition of anomaly objects, we propose to work with regular patterns whose unlabeled data is abundant and usually easy to collect in practice. This allows our system to be trained completely in an unsupervised procedure and liberate us from the need for costly data annotation. By learning generative model that capture the normality distribution in data, we can isolate abnormal data points that result in low normality scores (high abnormality scores). Moreover, by leverage on the power of generative networks, i.e. energy-based models, we are also able to learn the feature representation automatically rather than replying on hand-crafted features that have been dominating anomaly detection research over many decades. We demonstrate our proposal on the specific application of video anomaly detection and the experimental results indicate that our method performs better than baselines and are comparable with state-of-the-art methods in many benchmark video anomaly detection datasets.
Tasks	Anomaly Detection, Feature Engineering, Intrusion Detection, Network Intrusion Detection
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05211v1
PDF	http://arxiv.org/pdf/1708.05211v1.pdf
PWC	https://paperswithcode.com/paper/energy-based-models-for-video-anomaly
Repo
Framework

Discriminatively Learned Hierarchical Rank Pooling Networks


Title	Discriminatively Learned Hierarchical Rank Pooling Networks
Authors	Basura Fernando, Stephen Gould
Abstract	In this work, we present novel temporal encoding methods for action and activity classification by extending the unsupervised rank pooling temporal encoding method in two ways. First, we present “discriminative rank pooling” in which the shared weights of our video representation and the parameters of the action classifiers are estimated jointly for a given training dataset of labelled vector sequences using a bilevel optimization formulation of the learning problem. When the frame level features vectors are obtained from a convolutional neural network (CNN), we rank pool the network activations and jointly estimate all parameters of the model, including CNN filters and fully-connected weights, in an end-to-end manner which we coined as “end-to-end trainable rank pooled CNN”. Importantly, this model can make use of any existing convolutional neural network architecture (e.g., AlexNet or VGG) without modification or introduction of additional parameters. Then, we extend rank pooling to a high capacity video representation, called “hierarchical rank pooling”. Hierarchical rank pooling consists of a network of rank pooling functions, which encode temporal semantics over arbitrary long video clips based on rich frame level features. By stacking non-linear feature functions and temporal sub-sequence encoders one on top of the other, we build a high capacity encoding network of the dynamic behaviour of the video. The resulting video representation is a fixed-length feature vector describing the entire video clip that can be used as input to standard machine learning classifiers. We demonstrate our approach on the task of action and activity recognition. Obtained results are comparable to state-of-the-art methods on three important activity recognition benchmarks with classification performance of 76.7% mAP on Hollywood2, 69.4% on HMDB51, and 93.6% on UCF101.
Tasks	Activity Recognition, bilevel optimization
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10420v1
PDF	http://arxiv.org/pdf/1705.10420v1.pdf
PWC	https://paperswithcode.com/paper/discriminatively-learned-hierarchical-rank
Repo
Framework

Successive Embedding and Classification Loss for Aerial Image Classification


Title	Successive Embedding and Classification Loss for Aerial Image Classification
Authors	Jiayun Wang, Patrick Virtue, Stella X. Yu
Abstract	Deep neural networks can be effective means to automatically classify aerial images but is easy to overfit to the training data. It is critical for trained neural networks to be robust to variations that exist between training and test environments. To address the overfitting problem in aerial image classification, we consider the neural network as successive transformations of an input image into embedded feature representations and ultimately into a semantic class label, and train neural networks to optimize image representations in the embedded space in addition to optimizing the final classification score. We demonstrate that networks trained with this dual embedding and classification loss outperform networks with classification loss only. %We also study placing the embedding loss on different network layers. We also find that moving the embedding loss from commonly-used feature space to the classifier space, which is the space just before softmax nonlinearization, leads to the best classification performance for aerial images. Visualizations of the network’s embedded representations reveal that the embedding loss encourages greater separation between target class clusters for both training and testing partitions of two aerial image classification benchmark datasets, MSTAR and AID. Our code is publicly available on GitHub.
Tasks	Image Classification
Published	2017-12-05
URL	https://arxiv.org/abs/1712.01511v3
PDF	https://arxiv.org/pdf/1712.01511v3.pdf
PWC	https://paperswithcode.com/paper/joint-embedding-and-classification-for-sar
Repo
Framework

A discriminative view of MRF pre-processing algorithms


Title	A discriminative view of MRF pre-processing algorithms
Authors	Chen Wang, Charles Herrmann, Ramin Zabih
Abstract	While Markov Random Fields (MRFs) are widely used in computer vision, they present a quite challenging inference problem. MRF inference can be accelerated by pre-processing techniques like Dead End Elimination (DEE) or QPBO-based approaches which compute the optimal labeling of a subset of variables. These techniques are guaranteed to never wrongly label a variable but they often leave a large number of variables unlabeled. We address this shortcoming by interpreting pre-processing as a classification problem, which allows us to trade off false positives (i.e., giving a variable an incorrect label) versus false negatives (i.e., failing to label a variable). We describe an efficient discriminative rule that finds optimal solutions for a subset of variables. Our technique provides both per-instance and worst-case guarantees concerning the quality of the solution. Empirical studies were conducted over several benchmark datasets. We obtain a speedup factor of 2 to 12 over expansion moves without preprocessing, and on difficult non-submodular energy functions produce slightly lower energy.
Tasks
Published	2017-08-08
URL	http://arxiv.org/abs/1708.02668v1
PDF	http://arxiv.org/pdf/1708.02668v1.pdf
PWC	https://paperswithcode.com/paper/a-discriminative-view-of-mrf-pre-processing
Repo
Framework

Dynamic Analysis of Executables to Detect and Characterize Malware


Title	Dynamic Analysis of Executables to Detect and Characterize Malware
Authors	Michael R. Smith, Joe B. Ingram, Christopher C. Lamb, Timothy J. Draelos, Justin E. Doak, James B. Aimone, Conrad D. James
Abstract	It is needed to ensure the integrity of systems that process sensitive information and control many aspects of everyday life. We examine the use of machine learning algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored rather than the bytes of an executable. We examine several machine learning techniques for detecting malware including random forests, deep learning techniques, and liquid state machines. The experiments examine the effects of concept drift on each algorithm to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to gain a better understanding about what differentiates the malware samples from the goodware, which can further be used as a forensics tool to understand what the malware (or goodware) was doing to provide directions for investigation and remediation.
Tasks
Published	2017-11-10
URL	http://arxiv.org/abs/1711.03947v2
PDF	http://arxiv.org/pdf/1711.03947v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-analysis-of-executables-to-detect-and
Repo
Framework

An Ontology to support automated negotiation


Title	An Ontology to support automated negotiation
Authors	Susel Fernandez, Takayuki Ito
Abstract	In this work we propose an ontology to support automated negotiation in multiagent systems. The ontology can be connected with some domain-specific ontologies to facilitate the negotiation in different domains, such as Intelligent Transportation Systems (ITS), e-commerce, etc. The specific negotiation rules for each type of negotiation strategy can also be defined as part of the ontology, reducing the amount of knowledge hardcoded in the agents and ensuring the interoperability. The expressiveness of the ontology was proved in a multiagent architecture for the automatic traffic light setting application on ITS.
Tasks
Published	2017-10-28
URL	http://arxiv.org/abs/1710.10433v1
PDF	http://arxiv.org/pdf/1710.10433v1.pdf
PWC	https://paperswithcode.com/paper/an-ontology-to-support-automated-negotiation
Repo
Framework


Title	Facets, Tiers and Gems: Ontology Patterns for Hypernormalisation
Authors	Phillip Lord, Robert Stevens
Abstract	There are many methodologies and techniques for easing the task of ontology building. Here we describe the intersection of two of these: ontology normalisation and fully programmatic ontology development. The first of these describes a standardized organisation for an ontology, with singly inherited self-standing entities, and a number of small taxonomies of refining entities. The former are described and defined in terms of the latter and used to manage the polyhierarchy of the self-standing entities. Fully programmatic development is a technique where an ontology is developed using a domain-specific language within a programming language, meaning that as well defining ontological entities, it is possible to add arbitrary patterns or new syntax within the same environment. We describe how new patterns can be used to enable a new style of ontology development that we call hypernormalisation.
Tasks
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07273v1
PDF	http://arxiv.org/pdf/1711.07273v1.pdf
PWC	https://paperswithcode.com/paper/facets-tiers-and-gems-ontology-patterns-for
Repo
Framework

Predicting Scene Parsing and Motion Dynamics in the Future


Title	Predicting Scene Parsing and Motion Dynamics in the Future
Authors	Xiaojie Jin, Huaxin Xiao, Xiaohui Shen, Jimei Yang, Zhe Lin, Yunpeng Chen, Zequn Jie, Jiashi Feng, Shuicheng Yan
Abstract	The ability of predicting the future is important for intelligent systems, e.g. autonomous vehicles and robots to plan early and make decisions accordingly. Future scene parsing and optical flow estimation are two key tasks that help agents better understand their environments as the former provides dense semantic information, i.e. what objects will be present and where they will appear, while the latter provides dense motion information, i.e. how the objects will move. In this paper, we propose a novel model to simultaneously predict scene parsing and optical flow in unobserved future video frames. To our best knowledge, this is the first attempt in jointly predicting scene parsing and motion dynamics. In particular, scene parsing enables structured motion prediction by decomposing optical flow into different groups while optical flow estimation brings reliable pixel-wise correspondence to scene parsing. By exploiting this mutually beneficial relationship, our model shows significantly better parsing and motion prediction results when compared to well-established baselines and individual prediction models on the large-scale Cityscapes dataset. In addition, we also demonstrate that our model can be used to predict the steering angle of the vehicles, which further verifies the ability of our model to learn latent representations of scene dynamics.
Tasks	Autonomous Vehicles, motion prediction, Optical Flow Estimation, Scene Parsing
Published	2017-11-09
URL	http://arxiv.org/abs/1711.03270v1
PDF	http://arxiv.org/pdf/1711.03270v1.pdf
PWC	https://paperswithcode.com/paper/predicting-scene-parsing-and-motion-dynamics
Repo
Framework


Title	Multi-View Stereo with Single-View Semantic Mesh Refinement
Authors	Andrea Romanoni, Marco Ciccone, Francesco Visin, Matteo Matteucci
Abstract	While 3D reconstruction is a well-established and widely explored research topic, semantic 3D reconstruction has only recently witnessed an increasing share of attention from the Computer Vision community. Semantic annotations allow in fact to enforce strong class-dependent priors, as planarity for ground and walls, which can be exploited to refine the reconstruction often resulting in non-trivial performance improvements. State-of-the art methods propose volumetric approaches to fuse RGB image data with semantic labels; even if successful, they do not scale well and fail to output high resolution meshes. In this paper we propose a novel method to refine both the geometry and the semantic labeling of a given mesh. We refine the mesh geometry by applying a variational method that optimizes a composite energy made of a state-of-the-art pairwise photo-metric term and a single-view term that models the semantic consistency between the labels of the 3D mesh and those of the segmented images. We also update the semantic labeling through a novel Markov Random Field (MRF) formulation that, together with the classical data and smoothness terms, takes into account class-specific priors estimated directly from the annotated mesh. This is in contrast to state-of-the-art methods that are typically based on handcrafted or learned priors. We are the first, jointly with the very recent and seminal work of [M. Blaha et al arXiv:1706.08336, 2017], to propose the use of semantics inside a mesh refinement framework. Differently from [M. Blaha et al arXiv:1706.08336, 2017], which adopts a more classical pairwise comparison to estimate the flow of the mesh, we apply a single-view comparison between the semantically annotated image and the current 3D mesh labels; this improves the robustness in case of noisy segmentations.
Tasks	3D Reconstruction
Published	2017-08-16
URL	http://arxiv.org/abs/1708.04907v2
PDF	http://arxiv.org/pdf/1708.04907v2.pdf
PWC	https://paperswithcode.com/paper/multi-view-stereo-with-single-view-semantic
Repo
Framework

Object Detection Using Deep CNNs Trained on Synthetic Images


Title	Object Detection Using Deep CNNs Trained on Synthetic Images
Authors	Param S. Rajpura, Hristo Bojinov, Ravi S. Hegde
Abstract	The need for large annotated image datasets for training Convolutional Neural Networks (CNNs) has been a significant impediment for their adoption in computer vision applications. We show that with transfer learning an effective object detector can be trained almost entirely on synthetically rendered datasets. We apply this strategy for detecting pack- aged food products clustered in refrigerator scenes. Our CNN trained only with 4000 synthetic images achieves mean average precision (mAP) of 24 on a test set with 55 distinct products as objects of interest and 17 distractor objects. A further increase of 12% in the mAP is obtained by adding only 400 real images to these 4000 synthetic images in the training set. A high degree of photorealism in the synthetic images was not essential in achieving this performance. We analyze factors like training data set size and 3D model dictionary size for their influence on detection performance. Additionally, training strategies like fine-tuning with selected layers and early stopping which affect transfer learning from synthetic scenes to real scenes are explored. Training CNNs with synthetic datasets is a novel application of high-performance computing and a promising approach for object detection applications in domains where there is a dearth of large annotated image data.
Tasks	Object Detection, Transfer Learning
Published	2017-06-21
URL	http://arxiv.org/abs/1706.06782v2
PDF	http://arxiv.org/pdf/1706.06782v2.pdf
PWC	https://paperswithcode.com/paper/object-detection-using-deep-cnns-trained-on
Repo
Framework

Sample-Efficient Learning of Mixtures


Title	Sample-Efficient Learning of Mixtures
Authors	Hassan Ashtiani, Shai Ben-David, Abbas Mehrabian
Abstract	We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let $\mathcal F$ be an arbitrary class of probability distributions, and let $\mathcal{F}^k$ denote the class of $k$-mixtures of elements of $\mathcal F$. Assuming the existence of a method for learning $\mathcal F$ with sample complexity $m_{\mathcal{F}}(\epsilon)$, we provide a method for learning $\mathcal F^k$ with sample complexity $O({k\log k \cdot m_{\mathcal F}(\epsilon) }/{\epsilon^{2}})$. Our mixture learning algorithm has the property that, if the $\mathcal F$-learner is proper/agnostic, then the $\mathcal F^k$-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ is PAC-learnable in the agnostic setting with $\widetilde{O}({kd}/{\epsilon ^ 4})$ samples, which is tight in $k$ and $d$ up to logarithmic factors. Second, we show that the class of mixtures of $k$ Gaussians in $\mathbb{R}^d$ is PAC-learnable in the agnostic setting with sample complexity $\widetilde{O}({kd^2}/{\epsilon ^ 4})$, which improves the previous known bounds of $\widetilde{O}({k^3d^2}/{\epsilon ^ 4})$ and $\widetilde{O}(k^4d^4/\epsilon ^ 2)$ in its dependence on $k$ and $d$. Finally, we show that the class of mixtures of $k$ log-concave distributions over $\mathbb{R}^d$ is PAC-learnable using $\widetilde{O}(d^{(d+5)/2}\epsilon^{-(d+9)/2}k)$ samples.
Tasks	Density Estimation
Published	2017-06-06
URL	http://arxiv.org/abs/1706.01596v3
PDF	http://arxiv.org/pdf/1706.01596v3.pdf
PWC	https://paperswithcode.com/paper/sample-efficient-learning-of-mixtures
Repo
Framework

Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments


Title	Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments
Authors	Shichao Yang, Yu Song, Michael Kaess, Sebastian Scherer
Abstract	Existing simultaneous localization and mapping (SLAM) algorithms are not robust in challenging low-texture environments because there are only few salient features. The resulting sparse or semi-dense map also conveys little information for motion planning. Though some work utilize plane or scene layout for dense map regularization, they require decent state estimation from other sources. In this paper, we propose real-time monocular plane SLAM to demonstrate that scene understanding could improve both state estimation and dense mapping especially in low-texture environments. The plane measurements come from a pop-up 3D plane model applied to each single image. We also combine planes with point based SLAM to improve robustness. On a public TUM dataset, our algorithm generates a dense semantic 3D model with pixel depth error of 6.2 cm while existing SLAM algorithms fail. On a 60 m long dataset with loops, our method creates a much better 3D model with state estimation error of 0.67%.
Tasks	Motion Planning, Scene Understanding, Simultaneous Localization and Mapping
Published	2017-03-21
URL	http://arxiv.org/abs/1703.07334v1
PDF	http://arxiv.org/pdf/1703.07334v1.pdf
PWC	https://paperswithcode.com/paper/pop-up-slam-semantic-monocular-plane-slam-for
Repo
Framework