January 27, 2020

3465 words 17 mins read

Paper Group ANR 1198

Google street view and deep learning: a new ground truthing approach for crop mapping. MFA is a Waste of Time! Understanding Negative Connotation Towards MFA Applications via User Generated Content. Matrix Completion With Selective Sampling. Structuring Autoencoders. Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjust …

Google street view and deep learning: a new ground truthing approach for crop mapping


Title	Google street view and deep learning: a new ground truthing approach for crop mapping
Authors	Yulin Yan, Youngryel Ryu
Abstract	Ground referencing is essential for supervised crop mapping. However, conventional ground truthing involves extensive field surveys and post processing, which is costly in terms of time and labor. In this study, we applied a convolutional neural network (CNN) model to explore the efficacy of automatic ground truthing via Google street view (GSV) images in two distinct farming regions: central Illinois and southern California. We demonstrated the feasibility and reliability of the new ground referencing technique further by performing pixel-based crop mapping with vegetation indices as the model input. The results were evaluated using the United States Department of Agriculture (USDA) crop data layer (CDL) products. From 8,514 GSV images, the CNN model screened out 2,645 target crop images. These images were well classified into crop types, including alfalfa, almond, corn, cotton, grape, soybean, and pistachio. The overall GSV image classification accuracy reached 93% in California and 97% in Illinois. We then shifted the image geographic coordinates using fixed empirical coefficients to produce 8,173 crop reference points including 1,764 in Illinois and 6,409 in California. Evaluation of these new reference points with CDL products showed satisfactory coherence, with 94 to 97% agreement. CNN-based mapping also captured the general pattern of crop type distributions. The overall differences between CDL products and our mapping results were 4% in California and 5% in Illinois. Thus, using these deep learning and GSV image techniques, we have provided an efficient and cost-effective alternative method for ground referencing and crop mapping.
Tasks	Image Classification
Published	2019-12-03
URL	https://arxiv.org/abs/1912.05024v1
PDF	https://arxiv.org/pdf/1912.05024v1.pdf
PWC	https://paperswithcode.com/paper/google-street-view-and-deep-learning-a-new
Repo
Framework

MFA is a Waste of Time! Understanding Negative Connotation Towards MFA Applications via User Generated Content


Title	MFA is a Waste of Time! Understanding Negative Connotation Towards MFA Applications via User Generated Content
Authors	Sanchari Das, Bingxing Wang, L. Jean Camp
Abstract	Traditional single-factor authentication possesses several critical security vulnerabilities due to single-point failure feature. Multi-factor authentication (MFA), intends to enhance security by providing additional verification steps. However, in practical deployment, users often experience dissatisfaction while using MFA, which leads to non-adoption. In order to understand the current design and usability issues with MFA, we analyze aggregated user generated comments (N = 12,500) about application-based MFA tools from major distributors, such as, Amazon, Google Play, Apple App Store, and others. While some users acknowledge the security benefits of MFA, majority of them still faced problems with initial configuration, system design understanding, limited device compatibility, and risk trade-offs leading to non-adoption of MFA. Based on these results, we provide actionable recommendations in technological design, initial training, and risk communication to improve the adoption and user experience of MFA.
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05902v1
PDF	https://arxiv.org/pdf/1908.05902v1.pdf
PWC	https://paperswithcode.com/paper/mfa-is-a-waste-of-time-understanding-negative
Repo
Framework

Matrix Completion With Selective Sampling


Title	Matrix Completion With Selective Sampling
Authors	Christian Parkinson, Kevin Huynh, Deanna Needell
Abstract	Matrix completion is a classical problem in data science wherein one attempts to reconstruct a low-rank matrix while only observing some subset of the entries. Previous authors have phrased this problem as a nuclear norm minimization problem. Almost all previous work assumes no explicit structure of the matrix and uses uniform sampling to decide the observed entries. We suggest methods for selective sampling in the case where we have some knowledge about the structure of the matrix and are allowed to design the observation set.
Tasks	Matrix Completion
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08540v1
PDF	http://arxiv.org/pdf/1904.08540v1.pdf
PWC	https://paperswithcode.com/paper/matrix-completion-with-selective-sampling
Repo
Framework

Structuring Autoencoders


Title	Structuring Autoencoders
Authors	Marco Rudolph, Bastian Wandt, Bodo Rosenhahn
Abstract	In this paper we propose Structuring AutoEncoders (SAE). SAEs are neural networks which learn a low dimensional representation of data which are additionally enriched with a desired structure in this low dimensional space. While traditional Autoencoders have proven to structure data naturally they fail to discover semantic structure that is hard to recognize in the raw data. The SAE solves the problem by enhancing a traditional Autoencoder using weak supervision to form a structured latent space. In the experiments we demonstrate, that the structured latent space allows for a much more efficient data representation for further tasks such as classification for sparsely labeled data, an efficient choice of data to label, and morphing between classes. To demonstrate the general applicability of our method, we show experiments on the benchmark image datasets MNIST, Fashion-MNIST, DeepFashion2 and on a dataset of 3D human shapes.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02626v1
PDF	https://arxiv.org/pdf/1908.02626v1.pdf
PWC	https://paperswithcode.com/paper/structuring-autoencoders
Repo
Framework

Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment


Title	Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment
Authors	Yunxiao Shi, Jing Zhu, Yi Fang, Kuochin Lien, Junli Gu
Abstract	Learning to predict scene depth and camera motion from RGB inputs only is a challenging task. Most existing learning based methods deal with this task in a supervised manner which require ground-truth data that is expensive to acquire. More recent approaches explore the possibility of estimating scene depth and camera pose in a self-supervised learning framework. Despite encouraging results are shown, current methods either learn from monocular videos for depth and pose and typically do so without enforcing multi-view geometry constraints between scene structure and camera motion, or require stereo sequences as input where the ground-truth between-frame motion parameters need to be known. In this paper we propose to jointly optimize the scene depth and camera motion via incorporating differentiable Bundle Adjustment (BA) layer by minimizing the feature-metric error, and then form the photometric consistency loss with view synthesis as the final supervisory signal. The proposed approach only needs unlabeled monocular videos as input, and extensive experiments on the KITTI and Cityscapes dataset show that our method achieves state-of-the-art results in self-supervised approaches using monocular videos as input, and even gains advantage to the line of methods that learns from calibrated stereo sequences (i.e. with pose supervision).
Tasks	Depth And Camera Motion
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13163v1
PDF	https://arxiv.org/pdf/1909.13163v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-of-depth-and-ego
Repo
Framework

Improving Self-Supervised Single View Depth Estimation by Masking Occlusion


Title	Improving Self-Supervised Single View Depth Estimation by Masking Occlusion
Authors	Maarten Schellevis
Abstract	Single view depth estimation models can be trained from video footage using a self-supervised end-to-end approach with view synthesis as the supervisory signal. This is achieved with a framework that predicts depth and camera motion, with a loss based on reconstructing a target video frame from temporally adjacent frames. In this context, occlusion relates to parts of a scene that can be observed in the target frame but not in a frame used for image reconstruction. Since the image reconstruction is based on sampling from the adjacent frame, and occluded areas by definition cannot be sampled, reconstructed occluded areas corrupt to the supervisory signal. In previous work arXiv:1806.01260 occlusion is handled based on reconstruction error; at each pixel location, only the reconstruction with the lowest error is included in the loss. The current study aims to determine whether performance improvements of depth estimation models can be gained by during training only ignoring those regions that are affected by occlusion. In this work we introduce occlusion mask, a mask that during training can be used to specifically ignore regions that cannot be reconstructed due to occlusions. Occlusion mask is based entirely on predicted depth information. We introduce two novel loss formulations which incorporate the occlusion mask. The method and implementation of arXiv:1806.01260 serves as the foundation for our modifications as well as the baseline in our experiments. We demonstrate that (i) incorporating occlusion mask in the loss function improves the performance of single image depth prediction models on the KITTI benchmark. (ii) loss functions that select from reconstructions based on error are able to ignore some of the reprojection error caused by object motion.
Tasks	Depth And Camera Motion, Depth Estimation, Image Reconstruction
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11112v1
PDF	https://arxiv.org/pdf/1908.11112v1.pdf
PWC	https://paperswithcode.com/paper/improving-self-supervised-single-view-depth
Repo
Framework

Sparse Representations for Object and Ego-motion Estimation in Dynamic Scenes


Title	Sparse Representations for Object and Ego-motion Estimation in Dynamic Scenes
Authors	Hirak J Kashyap, Charless Fowlkes, Jeffrey L Krichmar
Abstract	Dynamic scenes that contain both object motion and egomotion are a challenge for monocular visual odometry (VO). Another issue with monocular VO is the scale ambiguity, i.e. these methods cannot estimate scene depth and camera motion in real scale. Here, we propose a learning based approach to predict camera motion parameters directly from optic flow, by marginalizing depthmap variations and outliers. This is achieved by learning a sparse overcomplete basis set of egomotion in an autoencoder network, which is able to eliminate irrelevant components of optic flow for the task of camera parameter or motionfield estimation. The model is trained using a sparsity regularizer and a supervised egomotion loss, and achieves the state-of-the-art performances on trajectory prediction and camera rotation prediction tasks on KITTI and Virtual KITTI datasets, respectively. The sparse latent space egomotion representation learned by the model is robust and requires only 5% of the hidden layer neurons to maintain the best trajectory prediction accuracy on KITTI dataset. Additionally, in presence of depth information, the proposed method demonstrates faithful object velocity prediction for wide range of object sizes and speeds by global compensation of predicted egomotion and a divisive normalization procedure.
Tasks	Depth And Camera Motion, Monocular Visual Odometry, Motion Estimation, Trajectory Prediction, Visual Odometry
Published	2019-03-09
URL	http://arxiv.org/abs/1903.03731v1
PDF	http://arxiv.org/pdf/1903.03731v1.pdf
PWC	https://paperswithcode.com/paper/sparse-representations-for-object-and-ego
Repo
Framework

HEAX: An Architecture for Computing on Encrypted Data


Title	HEAX: An Architecture for Computing on Encrypted Data
Authors	M. Sadegh Riazi, Kim Laine, Blake Pelton, Wei Dai
Abstract	With the rapid increase in cloud computing, concerns surrounding data privacy, security, and confidentiality also have been increased significantly. Not only cloud providers are susceptible to internal and external hacks, but also in some scenarios, data owners cannot outsource the computation due to privacy laws such as GDPR, HIPAA, or CCPA. Fully Homomorphic Encryption (FHE) is a groundbreaking invention in cryptography that, unlike traditional cryptosystems, enables computation on encrypted data without ever decrypting it. However, the most critical obstacle in deploying FHE at large-scale is the enormous computation overhead. In this paper, we present HEAX, a novel hardware architecture for FHE that achieves unprecedented performance improvement. HEAX leverages multiple levels of parallelism, ranging from ciphertext-level to fine-grained modular arithmetic level. Our first contribution is a new highly-parallelizable architecture for number-theoretic transform (NTT) which can be of independent interest as NTT is frequently used in many lattice-based cryptography systems. Building on top of NTT engine, we design a novel architecture for computation on homomorphically encrypted data. We also introduce several techniques to enable an end-to-end, fully pipelined design as well as reducing on-chip memory consumption. Our implementation on reconfigurable hardware demonstrates 164-268x performance improvement for a wide range of FHE parameters.
Tasks
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09731v2
PDF	https://arxiv.org/pdf/1909.09731v2.pdf
PWC	https://paperswithcode.com/paper/190909731
Repo
Framework

Scene Graph Generation with External Knowledge and Image Reconstruction


Title	Scene Graph Generation with External Knowledge and Image Reconstruction
Authors	Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling
Abstract	Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc. However, existing datasets are biased in terms of object and relationship labels, or often come with noisy and missing annotations, which makes the development of a reliable scene graph prediction model very challenging. In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. In particular, we extract commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability in scene graph generation. To address the bias of noisy object annotations, we introduce an auxiliary image reconstruction path to regularize the scene graph generation network. Extensive experiments show that our framework can generate better scene graphs, achieving the state-of-the-art performance on two benchmark datasets: Visual Relationship Detection and Visual Genome datasets.
Tasks	Graph Generation, Image Reconstruction, Object Detection, Scene Graph Generation
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00560v1
PDF	http://arxiv.org/pdf/1904.00560v1.pdf
PWC	https://paperswithcode.com/paper/scene-graph-generation-with-external
Repo
Framework

Multi-View Matrix Completion for Multi-Label Image Classification


Title	Multi-View Matrix Completion for Multi-Label Image Classification
Authors	Yong Luo, Tongliang Liu, Dacheng Tao, Chao Xu
Abstract	There is growing interest in multi-label image classification due to its critical role in web-based image analytics-based applications, such as large-scale image retrieval and browsing. Matrix completion has recently been introduced as a method for transductive (semi-supervised) multi-label classification, and has several distinct advantages, including robustness to missing data and background noise in both feature and label space. However, it is limited by only considering data represented by a single-view feature, which cannot precisely characterize images containing several semantic concepts. To utilize multiple features taken from different views, we have to concatenate the different features as a long vector. But this concatenation is prone to over-fitting and often leads to very high time complexity in MC based image classification. Therefore, we propose to weightedly combine the MC outputs of different views, and present the multi-view matrix completion (MVMC) framework for transductive multi-label image classification. To learn the view combination weights effectively, we apply a cross validation strategy on the labeled set. In the learning process, we adopt the average precision (AP) loss, which is particular suitable for multi-label image classification. A least squares loss formulation is also presented for the sake of efficiency, and the robustness of the algorithm based on the AP loss compared with the other losses is investigated. Experimental evaluation on two real world datasets (PASCAL VOC’ 07 and MIR Flickr) demonstrate the effectiveness of MVMC for transductive (semi-supervised) multi-label image classification, and show that MVMC can exploit complementary properties of different features and output-consistent labels for improved multi-label image classification.
Tasks	Image Classification, Image Retrieval, Matrix Completion, Multi-Label Classification
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03901v1
PDF	http://arxiv.org/pdf/1904.03901v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-matrix-completion-for-multi-label
Repo
Framework

Stacking with Neural network for Cryptocurrency investment


Title	Stacking with Neural network for Cryptocurrency investment
Authors	Avinash Barnwal, Hari Pad Bharti, Aasim Ali, Vishal Singh
Abstract	Predicting the direction of assets have been an active area of study and a difficult task. Machine learning models have been used to build robust models to model the above task. Ensemble methods is one of them showing results better than a single supervised method. In this paper, we have used generative and discriminative classifiers to create the stack, particularly 3 generative and 6 discriminative classifiers and optimized over one-layer Neural Network to model the direction of price cryptocurrencies. Features used are technical indicators used are not limited to trend, momentum, volume, volatility indicators, and sentiment analysis has also been used to gain useful insight combined with the above features. For Cross-validation, Purged Walk forward cross-validation has been used. In terms of accuracy, we have done a comparative analysis of the performance of Ensemble method with Stacking and Ensemble method with blending. We have also developed a methodology for combined features importance for the stacked model. Important indicators are also identified based on feature importance.
Tasks	Feature Importance, Sentiment Analysis
Published	2019-02-21
URL	http://arxiv.org/abs/1902.07855v2
PDF	http://arxiv.org/pdf/1902.07855v2.pdf
PWC	https://paperswithcode.com/paper/stacking-with-neural-network-for
Repo
Framework

Training large-scale ANNs on simulated resistive crossbar arrays


Title	Training large-scale ANNs on simulated resistive crossbar arrays
Authors	Malte J. Rasch, Tayfun Gokmen, Wilfried Haensch
Abstract	Accelerating training of artificial neural networks (ANN) with analog resistive crossbar arrays is a promising idea. While the concept has been verified on very small ANNs and toy data sets (such as MNIST), more realistically sized ANNs and datasets have not yet been tackled. However, it is to be expected that device materials and hardware design constraints, such as noisy computations, finite number of resistive states of the device materials, saturating weight and activation ranges, and limited precision of analog-to-digital converters, will cause significant challenges to the successful training of state-of-the-art ANNs. By using analog hardware aware ANN training simulations, we here explore a number of simple algorithmic compensatory measures to cope with analog noise and limited weight and output ranges and resolutions, that dramatically improve the simulated training performances on RPU arrays on intermediately to large-scale ANNs.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02698v1
PDF	https://arxiv.org/pdf/1906.02698v1.pdf
PWC	https://paperswithcode.com/paper/training-large-scale-anns-on-simulated
Repo
Framework

Predicting and interpreting embeddings for out of vocabulary words in downstream tasks


Title	Predicting and interpreting embeddings for out of vocabulary words in downstream tasks
Authors	Nicolas Garneau, Jean-Samuel Leboeuf, Luc Lamontagne
Abstract	We propose a novel way to handle out of vocabulary (OOV) words in downstream natural language processing (NLP) tasks. We implement a network that predicts useful embeddings for OOV words based on their morphology and on the context in which they appear. Our model also incorporates an attention mechanism indicating the focus allocated to the left context words, the right context words or the word’s characters, hence making the prediction more interpretable. The model is a ``drop-in’’ module that is jointly trained with the downstream task’s neural network, thus producing embeddings specialized for the task at hand. When the task is mostly syntactical, we observe that our model aims most of its attention on surface form characters. On the other hand, for tasks more semantical, the network allocates more attention to the surrounding words. In all our tests, the module helps the network to achieve better performances in comparison to the use of simple random embeddings. \|
Tasks
Published	2019-03-02
URL	http://arxiv.org/abs/1903.00724v1
PDF	http://arxiv.org/pdf/1903.00724v1.pdf
PWC	https://paperswithcode.com/paper/predicting-and-interpreting-embeddings-for
Repo
Framework

Graphical Contrastive Losses for Scene Graph Parsing


Title	Graphical Contrastive Losses for Scene Graph Parsing
Authors	Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro
Abstract	Most scene graph parsers use a two-stage pipeline to detect visual relationships: the first stage detects entities, and the second predicts the predicate for each entity pair using a softmax distribution. We find that such pipelines, trained with only a cross entropy loss over predicate classes, suffer from two common errors. The first, Entity Instance Confusion, occurs when the model confuses multiple instances of the same type of entity (e.g. multiple cups). The second, Proximal Relationship Ambiguity, arises when multiple subject-predicate-object triplets appear in close proximity with the same predicate, and the model struggles to infer the correct subject-object pairings (e.g. mis-pairing musicians and their instruments). We propose a set of contrastive loss formulations that specifically target these types of errors within the scene graph parsing problem, collectively termed the Graphical Contrastive Losses. These losses explicitly force the model to disambiguate related and unrelated instances through margin constraints specific to each type of confusion. We further construct a relationship detector, called RelDN, using the aforementioned pipeline to demonstrate the efficacy of our proposed losses. Our model outperforms the winning method of the OpenImages Relationship Detection Challenge by 4.7% (16.5% relative) on the test set. We also show improved results over the best previous methods on the Visual Genome and Visual Relationship Detection datasets.
Tasks	Scene Graph Generation
Published	2019-03-07
URL	https://arxiv.org/abs/1903.02728v5
PDF	https://arxiv.org/pdf/1903.02728v5.pdf
PWC	https://paperswithcode.com/paper/graphical-contrastive-losses-for-scene-graph
Repo
Framework

3DPalsyNet: A Facial Palsy Grading and Motion Recognition Framework using Fully 3D Convolutional Neural Networks


Title	3DPalsyNet: A Facial Palsy Grading and Motion Recognition Framework using Fully 3D Convolutional Neural Networks
Authors	Gary Storey, Richard Jiang, Shelagh Keogh, Ahmed Bouridane, Chang-Tsun Li
Abstract	The capability to perform facial analysis from video sequences has significant potential to positively impact in many areas of life. One such area relates to the medical domain to specifically aid in the diagnosis and rehabilitation of patients with facial palsy. With this application in mind, this paper presents an end-to-end framework, named 3DPalsyNet, for the tasks of mouth motion recognition and facial palsy grading. 3DPalsyNet utilizes a 3D CNN architecture with a ResNet backbone for the prediction of these dynamic tasks. Leveraging transfer learning from a 3D CNNs pre-trained on the Kinetics data set for general action recognition, the model is modified to apply joint supervised learning using center and softmax loss concepts. 3DPalsyNet is evaluated on a test set consisting of individuals with varying ranges of facial palsy and mouth motions and the results have shown an attractive level of classification accuracy in these task of 82% and 86% respectively. The frame duration and the loss function affect was studied in terms of the predictive qualities of the proposed 3DPalsyNet, where it was found shorter frame duration’s of 8 performed best for this specific task. Centre loss and softmax have shown improvements in spatio-temporal feature learning than softmax loss alone, this is in agreement with earlier work involving the spatial domain.
Tasks	Transfer Learning
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13607v1
PDF	https://arxiv.org/pdf/1905.13607v1.pdf
PWC	https://paperswithcode.com/paper/3dpalsynet-a-facial-palsy-grading-and-motion
Repo
Framework