January 29, 2020

2885 words 14 mins read

Paper Group ANR 548

Machine learning and glioma imaging biomarkers. Splitting Methods for Convex Bi-Clustering and Co-Clustering. Learning Multilingual Word Embeddings Using Image-Text Data. PVSS: A Progressive Vehicle Search System for Video Surveillance Networks. Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces. Towards vision-based robotic skins: a …

Machine learning and glioma imaging biomarkers


Title	Machine learning and glioma imaging biomarkers
Authors	Thomas Booth, Matthew Williams, Aysha Luis, Jorge Cardoso, Ashkan Keyoumars, Haris Shuaib
Abstract	Aim: To review how machine learning (ML) is applied to imaging biomarkers in neuro-oncology, in particular for diagnosis, prognosis, and treatment response monitoring. Materials and Methods: The PubMed and MEDLINE databases were searched for articles published before September 2018 using relevant search terms. The search strategy focused on articles applying ML to high-grade glioma biomarkers for treatment response monitoring, prognosis, and prediction. Results: Magnetic resonance imaging (MRI) is typically used throughout the patient pathway because routine structural imaging provides detailed anatomical and pathological information and advanced techniques provide additional physiological detail. Using carefully chosen image features, ML is frequently used to allow accurate classification in a variety of scenarios. Rather than being chosen by human selection, ML also enables image features to be identified by an algorithm. Much research is applied to determining molecular profiles, histological tumour grade, and prognosis using MRI images acquired at the time that patients first present with a brain tumour. Differentiating a treatment response from a post-treatment-related effect using imaging is clinically important and also an area of active study (described here in one of two Special Issue publications dedicated to the application of ML in glioma imaging). Conclusion: Although pioneering, most of the evidence is of a low level, having been obtained retrospectively and in single centres. Studies applying ML to build neuro-oncology monitoring biomarker models have yet to show an overall advantage over those using traditional statistical methods. Development and validation of ML models applied to neuro-oncology require large, well-annotated datasets, and therefore multidisciplinary and multi-centre collaborations are necessary.
Tasks
Published	2019-08-28
URL	https://arxiv.org/abs/1910.07440v1
PDF	https://arxiv.org/pdf/1910.07440v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-and-glioma-imaging
Repo
Framework

Splitting Methods for Convex Bi-Clustering and Co-Clustering


Title	Splitting Methods for Convex Bi-Clustering and Co-Clustering
Authors	Michael Weylandt
Abstract	Co-Clustering, the problem of simultaneously identifying clusters across multiple aspects of a data set, is a natural generalization of clustering to higher-order structured data. Recent convex formulations of bi-clustering and tensor co-clustering, which shrink estimated centroids together using a convex fusion penalty, allow for global optimality guarantees and precise theoretical analysis, but their computational properties have been less well studied. In this note, we present three efficient operator-splitting methods for the convex co-clustering problem: a standard two-block ADMM, a Generalized ADMM which avoids an expensive tensor Sylvester equation in the primal update, and a three-block ADMM based on the operator splitting scheme of Davis and Yin. Theoretical complexity analysis suggests, and experimental evidence confirms, that the Generalized ADMM is far more efficient for large problems.
Tasks
Published	2019-01-18
URL	https://arxiv.org/abs/1901.06075v4
PDF	https://arxiv.org/pdf/1901.06075v4.pdf
PWC	https://paperswithcode.com/paper/splitting-methods-for-convex-bi-clustering
Repo
Framework

Learning Multilingual Word Embeddings Using Image-Text Data


Title	Learning Multilingual Word Embeddings Using Image-Text Data
Authors	Karan Singhal, Karthik Raman, Balder ten Cate
Abstract	There has been significant interest recently in learning multilingual word embeddings – in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.
Tasks	Multilingual Word Embeddings, Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12260v1
PDF	https://arxiv.org/pdf/1905.12260v1.pdf
PWC	https://paperswithcode.com/paper/learning-multilingual-word-embeddings-using
Repo
Framework

PVSS: A Progressive Vehicle Search System for Video Surveillance Networks


Title	PVSS: A Progressive Vehicle Search System for Video Surveillance Networks
Authors	Xinchen Liu, Wu Liu, Huadong Ma, Shuangqun Li
Abstract	This paper is focused on the task of searching for a specific vehicle that appeared in the surveillance networks. Existing methods usually assume the vehicle images are well cropped from the surveillance videos, then use visual attributes, like colors and types, or license plate numbers to match the target vehicle in the image set. However, a complete vehicle search system should consider the problems of vehicle detection, representation, indexing, storage, matching, and so on. Besides, attribute-based search cannot accurately find the same vehicle due to intra-instance changes in different cameras and the extremely uncertain environment. Moreover, the license plates may be misrecognized in surveillance scenes due to the low resolution and noise. In this paper, a Progressive Vehicle Search System, named as PVSS, is designed to solve the above problems. PVSS is constituted of three modules: the crawler, the indexer, and the searcher. The vehicle crawler aims to detect and track vehicles in surveillance videos and transfer the captured vehicle images, metadata and contextual information to the server or cloud. Then multi-grained attributes, such as the visual features and license plate fingerprints, are extracted and indexed by the vehicle indexer. At last, a query triplet with an input vehicle image, the time range, and the spatial scope is taken as the input by the vehicle searcher. The target vehicle will be searched in the database by a progressive process. Extensive experiments on the public dataset from a real surveillance network validate the effectiveness of the PVSS.
Tasks
Published	2019-01-10
URL	http://arxiv.org/abs/1901.03062v1
PDF	http://arxiv.org/pdf/1901.03062v1.pdf
PWC	https://paperswithcode.com/paper/pvss-a-progressive-vehicle-search-system-for
Repo
Framework

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces


Title	Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces
Authors	Senthil Purushwalkam, Abhinav Gupta, Danny M. Kaufman, Bryan Russell
Abstract	We introduce an approach to model surface properties governing bounces in everyday scenes. Our model learns end-to-end, starting from sensor inputs, to predict post-bounce trajectories and infer two underlying physical properties that govern bouncing - restitution and effective collision normals. Our model, Bounce and Learn, comprises two modules – a Physics Inference Module (PIM) and a Visual Inference Module (VIM). VIM learns to infer physical parameters for locations in a scene given a single still image, while PIM learns to model physical interactions for the prediction task given physical parameters and observed pre-collision 3D trajectories. To achieve our results, we introduce the Bounce Dataset comprising 5K RGB-D videos of bouncing trajectories of a foam ball to probe surfaces of varying shapes and materials in everyday scenes including homes and offices. Our proposed model learns from our collected dataset of real-world bounces and is bootstrapped with additional information from simple physics simulations. We show on our newly collected dataset that our model out-performs baselines, including trajectory fitting with Newtonian physics, in predicting post-bounce trajectories and inferring physical properties of a scene.
Tasks
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06827v1
PDF	http://arxiv.org/pdf/1904.06827v1.pdf
PWC	https://paperswithcode.com/paper/bounce-and-learn-modeling-scene-dynamics-with-1
Repo
Framework

Towards vision-based robotic skins: a data-driven, multi-camera tactile sensor


Title	Towards vision-based robotic skins: a data-driven, multi-camera tactile sensor
Authors	Camill Trueeb, Carmelo Sferrazza, Raffaello D’Andrea
Abstract	This paper describes the design of a multi-camera optical tactile sensor that provides information about the contact force distribution applied to its soft surface. This information is contained in the motion of spherical particles spread within the surface, which deforms when subject to force. The small embedded cameras capture images of the different particle patterns that are then mapped to the three-dimensional contact force distribution through a machine learning architecture. The design proposed in this paper exhibits a larger contact surface and a thinner structure than most of the existing camera-based tactile sensors, without the use of additional reflecting components such as mirrors. A modular implementation of the learning architecture is discussed that facilitates the scalability to larger surfaces such as robotic skins.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14526v2
PDF	https://arxiv.org/pdf/1910.14526v2.pdf
PWC	https://paperswithcode.com/paper/towards-vision-based-robotic-skins-a-data
Repo
Framework

Digital Twin: Acquiring High-Fidelity 3D Avatar from a Single Image


Title	Digital Twin: Acquiring High-Fidelity 3D Avatar from a Single Image
Authors	Ruizhe Wang, Chih-Fan Chen, Hao Peng, Xudong Liu, Oliver Liu, Xin Li
Abstract	We present an approach to generate high fidelity 3D face avatar with a high-resolution UV texture map from a single image. To estimate the face geometry, we use a deep neural network to directly predict vertex coordinates of the 3D face model from the given image. The 3D face geometry is further refined by a non-rigid deformation process to more accurately capture facial landmarks before texture projection. A key novelty of our approach is to train the shape regression network on facial images synthetically generated using a high-quality rendering engine. Moreover, our shape estimator fully leverages the discriminative power of deep facial identity features learned from millions of facial images. We have conducted extensive experiments to demonstrate the superiority of our optimized 2D-to-3D rendering approach, especially its excellent generalization property on real-world selfie images. Our proposed system of rendering 3D avatars from 2D images has a wide range of applications from virtual/augmented reality (VR/AR) and telepsychiatry to human-computer interaction and social networks.
Tasks
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03455v1
PDF	https://arxiv.org/pdf/1912.03455v1.pdf
PWC	https://paperswithcode.com/paper/digital-twin-acquiring-high-fidelity-3d
Repo
Framework

Rethinking Atmospheric Turbulence Mitigation


Title	Rethinking Atmospheric Turbulence Mitigation
Authors	Nicholas Chimitt, Zhiyuan Mao, Guanzhe Hong, Stanley H. Chan
Abstract	State-of-the-art atmospheric turbulence image restoration methods utilize standard image processing tools such as optical flow, lucky region and blind deconvolution to restore the images. While promising results have been reported over the past decade, many of the methods are agnostic to the physical model that generates the distortion. In this paper, we revisit the turbulence restoration problem by analyzing the reference frame generation and the blind deconvolution steps in a typical restoration pipeline. By leveraging tools in large deviation theory, we rigorously prove the minimum number of frames required to generate a reliable reference for both static and dynamic scenes. We discuss how a turbulence agnostic model can lead to potential flaws, and how to configure a simple spatial-temporal non-local weighted averaging method to generate references. For blind deconvolution, we present a new data-driven prior by analyzing the distributions of the point spread functions. We demonstrate how a simple prior can outperform state-of-the-art blind deconvolution methods.
Tasks	Image Restoration, Optical Flow Estimation
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07498v1
PDF	https://arxiv.org/pdf/1905.07498v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-atmospheric-turbulence-mitigation
Repo
Framework

COPHY: Counterfactual Learning of Physical Dynamics


Title	COPHY: Counterfactual Learning of Physical Dynamics
Authors	Fabien Baradel, Natalia Neverova, Julien Mille, Greg Mori, Christian Wolf
Abstract	Understanding causes and effects in mechanical systems is an essential component of reasoning in the physical world. This work poses a new problem of counterfactual learning of object mechanics from visual input. We develop the COPHY benchmark to assess the capacity of the state-of-the-art models for causal physical reasoning in a synthetic 3D environment and propose a model for learning the physical dynamics in a counterfactual setting. Having observed a mechanical experiment that involves, for example, a falling tower of blocks, a set of bouncing balls or colliding objects, we learn to predict how its outcome is affected by an arbitrary intervention on its initial conditions, such as displacing one of the objects in the scene. The alternative future is predicted given the altered past and a latent representation of the confounders learned by the model in an end-to-end fashion with no supervision. We compare against feedforward video prediction baselines and show how observing alternative experiences allows the network to capture latent physical properties of the environment, which results in significantly more accurate predictions at the level of super human performance.
Tasks	Video Prediction
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12000v1
PDF	https://arxiv.org/pdf/1909.12000v1.pdf
PWC	https://paperswithcode.com/paper/cophy-counterfactual-learning-of-physical
Repo
Framework

A Billion Ways to Grasp: An Evaluation of Grasp Sampling Schemes on a Dense, Physics-based Grasp Data Set


Title	A Billion Ways to Grasp: An Evaluation of Grasp Sampling Schemes on a Dense, Physics-based Grasp Data Set
Authors	Clemens Eppner, Arsalan Mousavian, Dieter Fox
Abstract	Robot grasping is often formulated as a learning problem. With the increasing speed and quality of physics simulations, generating large-scale grasping data sets that feed learning algorithms is becoming more and more popular. An often overlooked question is how to generate the grasps that make up these data sets. In this paper, we review, classify, and compare different grasp sampling strategies. Our evaluation is based on a fine-grained discretization of SE(3) and uses physics-based simulation to evaluate the quality and robustness of the corresponding parallel-jaw grasps. Specifically, we consider more than 1 billion grasps for each of the 21 objects from the YCB data set. This dense data set lets us evaluate existing sampling schemes w.r.t. their bias and efficiency. Our experiments show that some popular sampling schemes contain significant bias and do not cover all possible ways an object can be grasped.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05604v1
PDF	https://arxiv.org/pdf/1912.05604v1.pdf
PWC	https://paperswithcode.com/paper/a-billion-ways-to-grasp-an-evaluation-of
Repo
Framework

Detecting Heterogeneous Treatment Effect with Instrumental Variables


Title	Detecting Heterogeneous Treatment Effect with Instrumental Variables
Authors	Michael Johnson, Jiongyi Cao, Hyunseung Kang
Abstract	There is an increasing interest in estimating heterogeneity in causal effects in randomized and observational studies. However, little research has been conducted to understand heterogeneity in an instrumental variables study. In this work, we present a method to estimate heterogeneous causal effects using an instrumental variable approach. The method has two parts. The first part uses subject-matter knowledge and interpretable machine learning techniques, such as classification and regression trees, to discover potential effect modifiers. The second part uses closed testing to test for the statistical significance of the effect modifiers while strongly controlling familywise error rate. We conducted this method on the Oregon Health Insurance Experiment, estimating the effect of Medicaid on the number of days an individual’s health does not impede their usual activities, and found evidence of heterogeneity in older men who prefer English and don’t self-identify as Asian and younger individuals who have at most a high school diploma or GED and prefer English.
Tasks	Interpretable Machine Learning
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03652v1
PDF	https://arxiv.org/pdf/1908.03652v1.pdf
PWC	https://paperswithcode.com/paper/detecting-heterogeneous-treatment-effect-with
Repo
Framework

ADS-ME: Anomaly Detection System for Micro-expression Spotting


Title	ADS-ME: Anomaly Detection System for Micro-expression Spotting
Authors	Dawood Al Chanti, Alice Caplier
Abstract	Micro-expressions (MEs) are infrequent and uncontrollable facial events that can highlight emotional deception and appear in a high-stakes environment. This paper propose an algorithm for spatiotemporal MEs spotting. Since MEs are unusual events, we treat them as abnormal patterns that diverge from expected Normal Facial Behaviour (NFBs) patterns. NFBs correspond to facial muscle activations, eye blink/gaze events and mouth opening/closing movements that are all facial deformation but not MEs. We propose a probabilistic model to estimate the probability density function that models the spatiotemporal distributions of NFBs patterns. To rank the outputs, we compute the negative log-likelihood and we developed an adaptive thresholding technique to identify MEs from NFBs. While working only with NFBs data, the main challenge is to capture intrinsic spatiotemoral features, hence we design a recurrent convolutional autoencoder for feature representation. Finally, we show that our system is superior to previous works for MEs spotting.
Tasks	Anomaly Detection
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04354v1
PDF	http://arxiv.org/pdf/1903.04354v1.pdf
PWC	https://paperswithcode.com/paper/ads-me-anomaly-detection-system-for-micro
Repo
Framework

Improving EEG based Continuous Speech Recognition


Title	Improving EEG based Continuous Speech Recognition
Authors	Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik
Abstract	In this paper we introduce various techniques to improve the performance of electroencephalography (EEG) features based continuous speech recognition (CSR) systems. A connectionist temporal classification (CTC) based automatic speech recognition (ASR) system was implemented for performing recognition. We introduce techniques to initialize the weights of the recurrent layers in the encoder of the CTC model with more meaningful weights rather than with random weights and we make use of an external language model to improve the beam search during decoding time. We finally study the problem of predicting articulatory features from EEG features in this paper.
Tasks	EEG, Language Modelling, Speech Recognition
Published	2019-11-24
URL	https://arxiv.org/abs/1911.11610v6
PDF	https://arxiv.org/pdf/1911.11610v6.pdf
PWC	https://paperswithcode.com/paper/improving-eeg-based-continuous-speech
Repo
Framework

Distributed Black-Box Optimization via Error Correcting Codes


Title	Distributed Black-Box Optimization via Error Correcting Codes
Authors	Burak Bartan, Mert Pilanci
Abstract	We introduce a novel distributed derivative-free optimization framework that is resilient to stragglers. The proposed method employs coded search directions at which the objective function is evaluated, and a decoding step to find the next iterate. Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized. As an application, we consider black-box adversarial attacks on deep convolutional neural networks. Our numerical experiments demonstrate a significant improvement in the computation times.
Tasks
Published	2019-07-13
URL	https://arxiv.org/abs/1907.05984v1
PDF	https://arxiv.org/pdf/1907.05984v1.pdf
PWC	https://paperswithcode.com/paper/distributed-black-box-optimization-via-error
Repo
Framework

DeepSignals: Predicting Intent of Drivers Through Visual Signals


Title	DeepSignals: Predicting Intent of Drivers Through Visual Signals
Authors	Davi Frossard, Eric Kee, Raquel Urtasun
Abstract	Detecting the intention of drivers is an essential task in self-driving, necessary to anticipate sudden events like lane changes and stops. Turn signals and emergency flashers communicate such intentions, providing seconds of potentially critical reaction time. In this paper, we propose to detect these signals in video sequences by using a deep neural network that reasons about both spatial and temporal information. Our experiments on more than a million frames show high per-frame accuracy in very challenging scenarios.
Tasks
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01333v1
PDF	https://arxiv.org/pdf/1905.01333v1.pdf
PWC	https://paperswithcode.com/paper/deepsignals-predicting-intent-of-drivers
Repo
Framework