January 28, 2020

3121 words 15 mins read

Paper Group ANR 835

Robust Triple-Matrix-Recovery-Based Auto-Weighted Label Propagation for Classification. Secure Evaluation of Quantized Neural Networks. Spiking Neural Network based Region Proposal Networks for Neuromorphic Vision Sensors. Same-Cluster Querying for Overlapping Clusters. Compositional Temporal Visual Grounding of Natural Language Event Descriptions. …

Robust Triple-Matrix-Recovery-Based Auto-Weighted Label Propagation for Classification


Title	Robust Triple-Matrix-Recovery-Based Auto-Weighted Label Propagation for Classification
Authors	Huan Zhang, Zhao Zhang, Mingbo Zhao, Qiaolin Ye, Min Zhang, Meng Wang
Abstract	The graph-based semi-supervised label propagation algorithm has delivered impressive classification results. However, the estimated soft labels typically contain mixed signs and noise, which cause inaccurate predictions due to the lack of suitable constraints. Moreover, available methods typically calculate the weights and estimate the labels in the original input space, which typically contains noise and corruption. Thus, the en-coded similarities and manifold smoothness may be inaccurate for label estimation. In this paper, we present effective schemes for resolving these issues and propose a novel and robust semi-supervised classification algorithm, namely, the tri-ple-matrix-recovery-based robust auto-weighted label propa-gation framework (ALP-TMR). Our ALP-TMR introduces a triple matrix recovery mechanism to remove noise or mixed signs from the estimated soft labels and improve the robustness to noise and outliers in the steps of assigning weights and pre-dicting the labels simultaneously. Our method can jointly re-cover the underlying clean data, clean labels and clean weighting spaces by decomposing the original data, predicted soft labels or weights into a clean part plus an error part by fitting noise. In addition, ALP-TMR integrates the au-to-weighting process by minimizing reconstruction errors over the recovered clean data and clean soft labels, which can en-code the weights more accurately to improve both data rep-resentation and classification. By classifying samples in the recovered clean label and weight spaces, one can potentially improve the label prediction results. The results of extensive experiments demonstrated the satisfactory performance of our ALP-TMR.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08678v1
PDF	https://arxiv.org/pdf/1911.08678v1.pdf
PWC	https://paperswithcode.com/paper/robust-triple-matrix-recovery-based-auto
Repo
Framework

Secure Evaluation of Quantized Neural Networks


Title	Secure Evaluation of Quantized Neural Networks
Authors	Anders Dalskov, Daniel Escudero, Marcel Keller
Abstract	Image classification using Deep Neural Networks that preserve the privacy of both the input image and the model being used, has received considerable attention in the last couple of years. Recent work in this area have shown that it is possible to perform image classification with realistically sized networks using e.g., Garbled Circuits as in XONN (USENIX ‘19) or MPC (CrypTFlow, Eprint ‘19). These, and other prior work, require models to be either trained in a specific way or postprocessed in order to be evaluated securely. We contribute to this line of research by showing that this postprocessing can be handled by standard Machine Learning frameworks. More precisely, we show that quantization as present in Tensorflow suffices to obtain models that can be evaluated directly and as-is in standard off-the-shelve MPC. We implement secure inference of these quantized models in MP-SPDZ, and the generality of our technique means we can demonstrate benchmarks for a wide variety of threat models, something that has not been done before. In particular, we provide a comprehensive comparison between running secure inference of large ImageNet models with active and passive security, as well as honest and dishonest majority. The most efficient inference can be performed using a passive honest majority protocol which takes between 0.9 and 25.8 seconds, depending on the size of the model; for active security and an honest majority, inference is possible between 9.5 and 147.8 seconds.
Tasks	Image Classification, Quantization
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12435v1
PDF	https://arxiv.org/pdf/1910.12435v1.pdf
PWC	https://paperswithcode.com/paper/secure-evaluation-of-quantized-neural
Repo
Framework

Spiking Neural Network based Region Proposal Networks for Neuromorphic Vision Sensors


Title	Spiking Neural Network based Region Proposal Networks for Neuromorphic Vision Sensors
Authors	Jyotibdha Acharya, Vandana Padala, Arindam Basu
Abstract	This paper presents a three layer spiking neural network based region proposal network operating on data generated by neuromorphic vision sensors. The proposed architecture consists of refractory, convolution and clustering layers designed with bio-realistic leaky integrate and fire (LIF) neurons and synapses. The proposed algorithm is tested on traffic scene recordings from a DAVIS sensor setup. The performance of the region proposal network has been compared with event based mean shift algorithm and is found to be far superior (~50% better) in recall for similar precision (~85%). Computational and memory complexity of the proposed method are also shown to be similar to that of event based mean shift
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09864v1
PDF	http://arxiv.org/pdf/1902.09864v1.pdf
PWC	https://paperswithcode.com/paper/spiking-neural-network-based-region-proposal
Repo
Framework

Same-Cluster Querying for Overlapping Clusters


Title	Same-Cluster Querying for Overlapping Clusters
Authors	Wasim Huleihel, Arya Mazumdar, Muriel Médard, Soumyabrata Pal
Abstract	Overlapping clusters are common in models of many practical data-segmentation applications. Suppose we are given $n$ elements to be clustered into $k$ possibly overlapping clusters, and an oracle that can interactively answer queries of the form “do elements $u$ and $v$ belong to the same cluster?” The goal is to recover the clusters with minimum number of such queries. This problem has been of recent interest for the case of disjoint clusters. In this paper, we look at the more practical scenario of overlapping clusters, and provide upper bounds (with algorithms) on the sufficient number of queries. We provide algorithmic results under both arbitrary (worst-case) and statistical modeling assumptions. Our algorithms are parameter free, efficient, and work in the presence of random noise. We also derive information-theoretic lower bounds on the number of queries needed, proving that our algorithms are order optimal. Finally, we test our algorithms over both synthetic and real-world data, showing their practicality and effectiveness.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12490v1
PDF	https://arxiv.org/pdf/1910.12490v1.pdf
PWC	https://paperswithcode.com/paper/same-cluster-querying-for-overlapping
Repo
Framework

Compositional Temporal Visual Grounding of Natural Language Event Descriptions


Title	Compositional Temporal Visual Grounding of Natural Language Event Descriptions
Authors	Jonathan C. Stroud, Ryan McCaffrey, Rada Mihalcea, Jia Deng, Olga Russakovsky
Abstract	Temporal grounding entails establishing a correspondence between natural language event descriptions and their visual depictions. Compositional modeling becomes central: we first ground atomic descriptions “girl eating an apple,” “batter hitting the ball” to short video segments, and then establish the temporal relationships between the segments. This compositional structure enables models to recognize a wider variety of events not seen during training through recognizing their atomic sub-events. Explicit temporal modeling accounts for a wide variety of temporal relationships that can be expressed in language: e.g., in the description “girl stands up from the table after eating an apple” the visual ordering of the events is reversed, with first “eating an apple” followed by “standing up from the table.” We leverage these observations to develop a unified deep architecture, CTG-Net, to perform temporal grounding of natural language event descriptions to videos. We demonstrate that our system outperforms prior state-of-the-art methods on the DiDeMo, Tempo-TL, and Tempo-HL temporal grounding datasets.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02256v1
PDF	https://arxiv.org/pdf/1912.02256v1.pdf
PWC	https://paperswithcode.com/paper/compositional-temporal-visual-grounding-of
Repo
Framework

Waterfall Bandits: Learning to Sell Ads Online


Title	Waterfall Bandits: Learning to Sell Ads Online
Authors	Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian
Abstract	A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incur a linear regret. We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret. We evaluate the algorithm on both synthetic and real-world data, and show that it quickly learns high quality pricing strategies. This is the first principled study of learning a waterfall design online by sequential experimentation.
Tasks
Published	2019-04-20
URL	http://arxiv.org/abs/1904.09404v1
PDF	http://arxiv.org/pdf/1904.09404v1.pdf
PWC	https://paperswithcode.com/paper/190409404
Repo
Framework

KernelNet: A Data-Dependent Kernel Parameterization for Deep Generative Modeling


Title	KernelNet: A Data-Dependent Kernel Parameterization for Deep Generative Modeling
Authors	Yufan Zhou, Changyou Chen, Jinhui Xu
Abstract	Learning with kernels is an often resorted tool in modern machine learning. Standard approaches for this type of learning use a predefined kernel that requires careful selection of hyperparameters. To mitigate this burden, we propose in this paper a framework to construct and learn a data-dependent kernel based on random features and implicit spectral distributions (Fourier transform of the kernel) parameterized by deep neural networks. We call the constructed network {\em KernelNet}, and apply it for deep generative modeling in various scenarios, including variants of the MMD-GAN and an implicit Variational Autoencoder (VAE), the two popular learning paradigms in deep generative models. Extensive experiments show the advantages of the proposed KernelNet, consistently achieving better performance compared to related methods.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00979v1
PDF	https://arxiv.org/pdf/1912.00979v1.pdf
PWC	https://paperswithcode.com/paper/kernelnet-a-data-dependent-kernel
Repo
Framework

Machine learning in acoustics: theory and applications


Title	Machine learning in acoustics: theory and applications
Authors	Michael J. Bianco, Peter Gerstoft, James Traer, Emma Ozanich, Marie A. Roch, Sharon Gannot, Charles-Alban Deledalle
Abstract	Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science. We survey the recent advances and transformative potential of machine learning (ML), including deep learning, in the field of acoustics. ML is a broad family of techniques, which are often based in statistics, for automatically detecting and utilizing patterns in data. Relative to conventional acoustics and signal processing, ML is data-driven. Given sufficient training data, ML can discover complex relationships between features and desired labels or actions, or between features themselves. With large volumes of training data, ML can discover models describing complex acoustic phenomena such as human speech and reverberation. ML in acoustics is rapidly developing with compelling results and significant future promise. We first introduce ML, then highlight ML developments in four acoustics research areas: source localization in speech processing, source localization in ocean acoustics, bioacoustics, and environmental sounds in everyday scenes.
Tasks
Published	2019-05-11
URL	https://arxiv.org/abs/1905.04418v4
PDF	https://arxiv.org/pdf/1905.04418v4.pdf
PWC	https://paperswithcode.com/paper/machine-learning-in-acoustics-a-review
Repo
Framework

Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds from Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction


Title	Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds from Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction
Authors	Zhizhong Han, Xiyang Wang, Yu-Shen Liu, Matthias Zwicker
Abstract	Unsupervised feature learning for point clouds has been vital for large-scale point cloud understanding. Recent deep learning based methods depend on learning global geometry from self-reconstruction. However, these methods are still suffering from ineffective learning of local geometry, which significantly limits the discriminability of learned features. To resolve this issue, we propose MAP-VAE to enable the learning of global and local geometry by jointly leveraging global and local self-supervision. To enable effective local self-supervision, we introduce multi-angle analysis for point clouds. In a multi-angle scenario, we first split a point cloud into a front half and a back half from each angle, and then, train MAP-VAE to learn to predict a back half sequence from the corresponding front half sequence. MAP-VAE performs this half-to-half prediction using RNN to simultaneously learn each local geometry and the spatial relationship among them. In addition, MAP-VAE also learns global geometry via self-reconstruction, where we employ a variational constraint to facilitate novel shape generation. The outperforming results in four shape analysis tasks show that MAP-VAE can learn more discriminative global or local features than the state-of-the-art methods.
Tasks
Published	2019-07-30
URL	https://arxiv.org/abs/1907.12704v1
PDF	https://arxiv.org/pdf/1907.12704v1.pdf
PWC	https://paperswithcode.com/paper/multi-angle-point-cloud-vae-unsupervised
Repo
Framework

Hybrid Text Feature Modeling for Disease Group Prediction using Unstructured Physician Notes


Title	Hybrid Text Feature Modeling for Disease Group Prediction using Unstructured Physician Notes
Authors	Gokul S Krishnan, Sowmya Kamath S
Abstract	Existing Clinical Decision Support Systems (CDSSs) largely depend on the availability of structured patient data and Electronic Health Records (EHRs) to aid caregivers. However, in case of hospitals in developing countries, structured patient data formats are not widely adopted, where medical professionals still rely on clinical notes in the form of unstructured text. Such unstructured clinical notes recorded by medical personnel can also be a potential source of rich patient-specific information which can be leveraged to build CDSSs, even for hospitals in developing countries. If such unstructured clinical text can be used, the manual and time-consuming process of EHR generation will no longer be required, with huge person-hours and cost savings. In this paper, we propose a generic ICD9 disease group prediction CDSS built on unstructured physician notes modeled using hybrid word embeddings. These word embeddings are used to train a deep neural network for effectively predicting ICD9 disease groups. Experimental evaluation showed that the proposed approach outperformed the state-of-the-art disease group prediction model built on structured EHRs by 15% in terms of AUROC and 40% in terms of AUPRC, thus proving our hypothesis and eliminating dependency on availability of structured patient data.
Tasks	Word Embeddings
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11657v1
PDF	https://arxiv.org/pdf/1911.11657v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-text-feature-modeling-for-disease
Repo
Framework

City2City: Translating Place Representations across Cities


Title	City2City: Translating Place Representations across Cities
Authors	Takahiro Yabe, Kota Tsubouchi, Toru Shimizu, Yoshihide Sekimoto, Satish V. Ukkusuri
Abstract	Large mobility datasets collected from various sources have allowed us to observe, analyze, predict and solve a wide range of important urban challenges. In particular, studies have generated place representations (or embeddings) from mobility patterns in a similar manner to word embeddings to better understand the functionality of different places within a city. However, studies have been limited to generating such representations of cities in an individual manner and has lacked an inter-city perspective, which has made it difficult to transfer the insights gained from the place representations across different cities. In this study, we attempt to bridge this research gap by treating \textit{cities} and \textit{languages} analogously. We apply methods developed for unsupervised machine language translation tasks to translate place representations across different cities. Real world mobility data collected from mobile phone users in 2 cities in Japan are used to test our place representation translation methods. Translated place representations are validated using landuse data, and results show that our methods were able to accurately translate place representations from one city to another.
Tasks	Word Embeddings
Published	2019-11-26
URL	https://arxiv.org/abs/1911.12143v1
PDF	https://arxiv.org/pdf/1911.12143v1.pdf
PWC	https://paperswithcode.com/paper/city2city-translating-place-representations
Repo
Framework

A Fully-Integrated Sensing and Control System for High-Accuracy Mobile Robotic Building Construction


Title	A Fully-Integrated Sensing and Control System for High-Accuracy Mobile Robotic Building Construction
Authors	Abel Gawel, Hermann Blum, Johannes Pankert, Koen Krämer, Luca Bartolomei, Selen Ercan, Farbod Farshidian, Margarita Chli, Fabio Gramazio, Roland Siegwart, Marco Hutter, Timothy Sandy
Abstract	We present a fully-integrated sensing and control system which enables mobile manipulator robots to execute building tasks with millimeter-scale accuracy on building construction sites. The approach leverages multi-modal sensing capabilities for state estimation, tight integration with digital building models, and integrated trajectory planning and whole-body motion control. A novel method for high-accuracy localization updates relative to the known building structure is proposed. The approach is implemented on a real platform and tested under realistic construction conditions. We show that the system can achieve sub-cm end-effector positioning accuracy during fully autonomous operation using solely on-board sensing.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01870v1
PDF	https://arxiv.org/pdf/1912.01870v1.pdf
PWC	https://paperswithcode.com/paper/a-fully-integrated-sensing-and-control-system
Repo
Framework

Schedule Earth Observation satellites with Deep Reinforcement Learning


Title	Schedule Earth Observation satellites with Deep Reinforcement Learning
Authors	Adrien Hadj-Salah, Rémi Verdier, Clément Caron, Mathieu Picard, Mikaël Capelle
Abstract	Optical Earth observation satellites acquire images worldwide , covering up to several million square kilometers every day. The complexity of scheduling acquisitions for such systems increases exponentially when considering the interoperabil-ity of several satellite constellations together with the uncertainties from weather forecasts. In order to deliver valid images to customers as fast as possible, it is crucial to acquire cloud-free images. Depending on weather forecasts, up to 50% of images acquired by operational satellites can be trashed due to excessive cloud covers, showing there is room for improvement. We propose an acquisition scheduling approach based on Deep Reinforcement Learning and experiment on a simplified environment. We find that it challenges classical methods relying on human-expert heuristic.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05696v1
PDF	https://arxiv.org/pdf/1911.05696v1.pdf
PWC	https://paperswithcode.com/paper/schedule-earth-observation-satellites-with
Repo
Framework

Visual Summarization of Scholarly Videos using Word Embeddings and Keyphrase Extraction


Title	Visual Summarization of Scholarly Videos using Word Embeddings and Keyphrase Extraction
Authors	Hang Zhou, Christian Otto, Ralph Ewerth
Abstract	Effective learning with audiovisual content depends on many factors. Besides the quality of the learning resource’s content, it is essential to discover the most relevant and suitable video in order to support the learning process most effectively. Video summarization techniques facilitate this goal by providing a quick overview over the content. It is especially useful for longer recordings such as conference presentations or lectures. In this paper, we present an approach that generates a visual summary of video content based on semantic word embeddings and keyphrase extraction. For this purpose, we exploit video annotations that are automatically generated by speech recognition and video OCR (optical character recognition).
Tasks	Optical Character Recognition, Speech Recognition, Video Summarization, Word Embeddings
Published	2019-11-25
URL	https://arxiv.org/abs/1912.10809v1
PDF	https://arxiv.org/pdf/1912.10809v1.pdf
PWC	https://paperswithcode.com/paper/visual-summarization-of-scholarly-videos
Repo
Framework

A Cost Efficient Approach to Correct OCR Errors in Large Document Collections


Title	A Cost Efficient Approach to Correct OCR Errors in Large Document Collections
Authors	Deepayan Das, Jerin Philip, Minesh Mathew, C. V. Jawahar
Abstract	Word error rate of an ocr is often higher than its character error rate. This is especially true when ocrs are designed by recognizing characters. High word accuracies are critical to tasks like the creation of content in digital libraries and text-to-speech applications. In order to detect and correct the misrecognised words, it is common for an ocr module to employ a post-processor to further improve the word accuracy. However, conventional approaches to post-processing like looking up a dictionary or using a statistical language model (slm), are still limited. In many such scenarios, it is often required to remove the outstanding errors manually. We observe that the traditional post-processing schemes look at error words sequentially since ocrs process documents one at a time. We propose a cost-efficient model to address the error words in batches rather than correcting them individually. We exploit the fact that a collection of documents, unlike a single document, has a structure leading to repetition of words. Such words, if efficiently grouped together and corrected as a whole can lead to a significant reduction in the cost. Correction can be fully automatic or with a human in the loop. Towards this, we employ a novel clustering scheme to obtain fairly homogeneous clusters. We compare the performance of our model with various baseline approaches including the case where all the errors are removed by a human. We demonstrate the efficacy of our solution empirically by reporting more than 70% reduction in the human effort with near perfect error correction. We validate our method on Books from multiple languages.
Tasks	Language Modelling, Optical Character Recognition
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11739v1
PDF	https://arxiv.org/pdf/1905.11739v1.pdf
PWC	https://paperswithcode.com/paper/a-cost-efficient-approach-to-correct-ocr
Repo
Framework