October 16, 2019

3695 words 18 mins read

Paper Group ANR 1108

Paper Group ANR 1108

Unsupervised learning with sparse space-and-time autoencoders. Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time. ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers. SFV: Reinforcement Learning of Physical Skills from Videos. Efficient Interpretation of Deep L …

Unsupervised learning with sparse space-and-time autoencoders

Title Unsupervised learning with sparse space-and-time autoencoders
Authors Benjamin Graham
Abstract We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time. We evaluate the resulting latent spaces by testing their usefulness for downstream tasks. Applications are to handwriting recognition in 2D, segmentation for parts in 3D objects, segmentation for objects in 3D scenes, and body-part segmentation for 4D wire-frame models generated from motion capture data.
Tasks Motion Capture
Published 2018-11-26
URL http://arxiv.org/abs/1811.10355v1
PDF http://arxiv.org/pdf/1811.10355v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-with-sparse-space-and
Repo
Framework

Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time

Title Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time
Authors Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J. Black, Otmar Hilliges, Gerard Pons-Moll
Abstract We demonstrate a novel deep neural network capable of reconstructing human full body pose in real-time from 6 Inertial Measurement Units (IMUs) worn on the user’s body. In doing so, we address several difficult challenges. First, the problem is severely under-constrained as multiple pose parameters produce the same IMU orientations. Second, capturing IMU data in conjunction with ground-truth poses is expensive and difficult to do in many target application scenarios (e.g., outdoors). Third, modeling temporal dependencies through non-linear optimization has proven effective in prior work but makes real-time prediction infeasible. To address this important limitation, we learn the temporal pose priors using deep learning. To learn from sufficient data, we synthesize IMU data from motion capture datasets. A bi-directional RNN architecture leverages past and future information that is available at training time. At test time, we deploy the network in a sliding window fashion, retaining real time capabilities. To evaluate our method, we recorded DIP-IMU, a dataset consisting of $10$ subjects wearing 17 IMUs for validation in $64$ sequences with $330,000$ time instants; this constitutes the largest IMU dataset publicly available. We quantitatively evaluate our approach on multiple datasets and show results from a real-time implementation. DIP-IMU and the code are available for research purposes.
Tasks Motion Capture
Published 2018-10-10
URL http://arxiv.org/abs/1810.04703v1
PDF http://arxiv.org/pdf/1810.04703v1.pdf
PWC https://paperswithcode.com/paper/deep-inertial-poser-learning-to-reconstruct
Repo
Framework

ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers

Title ParsRec: A Novel Meta-Learning Approach to Recommending Bibliographic Reference Parsers
Authors Dominika Tkaczyk, Rohit Gupta, Riccardo Cinti, Joeran Beel
Abstract Bibliographic reference parsers extract machine-readable metadata such as author names, title, journal, and year from bibliographic reference strings. To extract the metadata, the parsers apply heuristics or machine learning. However, no reference parser, and no algorithm, consistently gives the best results in every scenario. For instance, one tool may be best in extracting titles in ACM citation style, but only third best when APA is used. Another tool may be best in extracting English author names, while another one is best for noisy data (i.e. inconsistent citation styles). In this paper, which is an extended version of our recent RecSys poster, we address the problem of reference parsing from a recommender-systems and meta-learning perspective. We propose ParsRec, a meta-learning based recommender-system that recommends the potentially most effective parser for a given reference string. ParsRec recommends one out of 10 open-source parsers: Anystyle-Parser, Biblio, CERMINE, Citation, Citation-Parser, GROBID, ParsCit, PDFSSA4MET, Reference Tagger, and Science Parse. We evaluate ParsRec on 105k references from chemistry. We propose two approaches to meta-learning recommendations. The first approach learns the best parser for an entire reference string. The second approach learns the best parser for each metadata type in a reference string. The second approach achieved a 2.6% increase in F1 (0.909 vs. 0.886) over the best single parser (GROBID), reducing the false positive rate by 20.2% (0.075 vs. 0.094), and the false negative rate by 18.9% (0.107 vs. 0.132).
Tasks Meta-Learning, Recommendation Systems
Published 2018-11-26
URL http://arxiv.org/abs/1811.10369v1
PDF http://arxiv.org/pdf/1811.10369v1.pdf
PWC https://paperswithcode.com/paper/parsrec-a-novel-meta-learning-approach-to
Repo
Framework

SFV: Reinforcement Learning of Physical Skills from Videos

Title SFV: Reinforcement Learning of Physical Skills from Videos
Authors Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine
Abstract Data-driven character animation based on motion capture can produce highly naturalistic behaviors and, when combined with physics simulation, can provide for natural procedural responses to physical perturbations, environmental changes, and morphological discrepancies. Motion capture remains the most popular source of motion data, but collecting mocap data typically requires heavily instrumented environments and actors. In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV). Our approach, based on deep pose estimation and deep reinforcement learning, allows data-driven animation to leverage the abundance of publicly available video clips from the web, such as those from YouTube. This has the potential to enable fast and easy design of character controllers simply by querying for video recordings of the desired behavior. The resulting controllers are robust to perturbations, can be adapted to new settings, can perform basic object interactions, and can be retargeted to new morphologies via reinforcement learning. We further demonstrate that our method can predict potential human motions from still images, by forward simulation of learned controllers initialized from the observed pose. Our framework is able to learn a broad range of dynamic skills, including locomotion, acrobatics, and martial arts.
Tasks Motion Capture, Pose Estimation
Published 2018-10-08
URL http://arxiv.org/abs/1810.03599v2
PDF http://arxiv.org/pdf/1810.03599v2.pdf
PWC https://paperswithcode.com/paper/sfv-reinforcement-learning-of-physical-skills
Repo
Framework

Efficient Interpretation of Deep Learning Models Using Graph Structure and Cooperative Game Theory: Application to ASD Biomarker Discovery

Title Efficient Interpretation of Deep Learning Models Using Graph Structure and Cooperative Game Theory: Application to ASD Biomarker Discovery
Authors Xiaoxiao Li, Nicha C. Dvornek, Yuan Zhou, Juntang Zhuang, Pamela Ventola, James S. Duncan
Abstract Discovering imaging biomarkers for autism spectrum disorder (ASD) is critical to help explain ASD and predict or monitor treatment outcomes. Toward this end, deep learning classifiers have recently been used for identifying ASD from functional magnetic resonance imaging (fMRI) with higher accuracy than traditional learning strategies. However, a key challenge with deep learning models is understanding just what image features the network is using, which can in turn be used to define the biomarkers. Current methods extract biomarkers, i.e., important features, by looking at how the prediction changes if “ignoring” one feature at a time. In this work, we go beyond looking at only individual features by using Shapley value explanation (SVE) from cooperative game theory. Cooperative game theory is advantageous here because it directly considers the interaction between features and can be applied to any machine learning method, making it a novel, more accurate way of determining instance-wise biomarker importance from deep learning models. A barrier to using SVE is its computational complexity: $2^N$ given $N$ features. We explicitly reduce the complexity of SVE computation by two approaches based on the underlying graph structure of the input data: 1) only consider the centralized coalition of each feature; 2) a hierarchical pipeline which first clusters features into small communities, then applies SVE in each community. Monte Carlo approximation can be used for large permutation sets. We first validate our methods on the MNIST dataset and compare to human perception. Next, to insure plausibility of our biomarker results, we train a Random Forest (RF) to classify ASD/control subjects from fMRI and compare SVE results to standard RF-based feature importance. Finally, we show initial results on ranked fMRI biomarkers using SVE on a deep learning classifier for the ASD/control dataset.
Tasks Feature Importance
Published 2018-12-14
URL http://arxiv.org/abs/1812.06181v2
PDF http://arxiv.org/pdf/1812.06181v2.pdf
PWC https://paperswithcode.com/paper/efficient-interpretation-of-deep-learning
Repo
Framework

Challenges in Discriminating Profanity from Hate Speech

Title Challenges in Discriminating Profanity from Hate Speech
Authors Shervin Malmasi, Marcos Zampieri
Abstract In this study we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered. Using a new dataset annotated specifically for this task, we employ supervised classification along with a set of features that includes n-grams, skip-grams and clustering-based word representations. We apply approaches based on single classifiers as well as more advanced ensemble classifiers and stacked generalization, achieving the best result of 80% accuracy for this 3-class classification task. Analysis of the results reveals that discriminating hate speech and profanity is not a simple task, which may require features that capture a deeper understanding of the text not always possible with surface n-grams. The variability of gold labels in the annotated data, due to differences in the subjective adjudications of the annotators, is also an issue. Other directions for future work are discussed.
Tasks
Published 2018-03-14
URL http://arxiv.org/abs/1803.05495v1
PDF http://arxiv.org/pdf/1803.05495v1.pdf
PWC https://paperswithcode.com/paper/challenges-in-discriminating-profanity-from
Repo
Framework

Indoor GeoNet: Weakly Supervised Hybrid Learning for Depth and Pose Estimation

Title Indoor GeoNet: Weakly Supervised Hybrid Learning for Depth and Pose Estimation
Authors Amirreza Farnoosh, Sarah Ostadabbas
Abstract Humans naturally perceive a 3D scene in front of them through accumulation of information obtained from multiple interconnected projections of the scene and by interpreting their correspondence. This phenomenon has inspired artificial intelligence models to extract the depth and view angle of the observed scene by modeling the correspondence between different views of that scene. Our paper is built upon previous works in the field of unsupervised depth and relative camera pose estimation from temporal consecutive video frames using deep learning (DL) models. Our approach uses a hybrid learning framework introduced in a recent work called GeoNet, which leverages geometric constraints in the 3D scenes to synthesize a novel view from intermediate DL-based predicted depth and relative pose. However, the state-of-the-art unsupervised depth and pose estimation DL models are exclusively trained/tested on a few available outdoor scene datasets and we have shown they are hardly transferable to new scenes, especially from indoor environments, in which estimation requires higher precision and dealing with probable occlusions. This paper introduces “Indoor GeoNet”, a weakly supervised depth and camera pose estimation model targeted for indoor scenes. In Indoor GeoNet, we take advantage of the availability of indoor RGBD datasets collected by human or robot navigators, and added partial (i.e. weak) supervision in depth training into the model. Experimental results showed that our model effectively generalizes to new scenes from different buildings. Indoor GeoNet demonstrated significant depth and pose estimation error reduction when compared to the original GeoNet, while showing 3 times more reconstruction accuracy in synthesizing novel views in indoor environments.
Tasks Pose Estimation
Published 2018-11-19
URL http://arxiv.org/abs/1811.07461v1
PDF http://arxiv.org/pdf/1811.07461v1.pdf
PWC https://paperswithcode.com/paper/indoor-geonet-weakly-supervised-hybrid
Repo
Framework

Completely Distributed Power Allocation using Deep Neural Network for Device to Device communication Underlaying LTE

Title Completely Distributed Power Allocation using Deep Neural Network for Device to Device communication Underlaying LTE
Authors Jeehyeong Kim, Joohan Park, Jaewon Noh, Sunghyun Cho
Abstract Device to device (D2D) communication underlaying LTE can be used to distribute traffic loads of eNBs. However, a conventional D2D link is controlled by an eNB, and it still remains burdens to the eNB. We propose a completely distributed power allocation method for D2D communication underlaying LTE using deep learning. In the proposed scheme, a D2D transmitter can decide the transmit power without any help from other nodes, such as an eNB or another D2D device. Also, the power set, which is delivered from each D2D node independently, can optimize the overall cell throughput. We suggest a distirbuted deep learning architecture in which the devices are trained as a group, but operate independently. The deep learning can optimize total cell throughput while keeping constraints such as interference to eNB. The proposed scheme, which is implemented model using Tensorflow, can provide same throughput with the conventional method even it operates completely on distributed manner.
Tasks
Published 2018-02-08
URL http://arxiv.org/abs/1802.02736v2
PDF http://arxiv.org/pdf/1802.02736v2.pdf
PWC https://paperswithcode.com/paper/completely-distributed-power-allocation-using
Repo
Framework

Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations

Title Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations
Authors Di Wang, Adam Smith, Jinhui Xu
Abstract Minimizing a convex risk function is the main step in many basic learning algorithms. We study protocols for convex optimization which provably leak very little about the individual data points that constitute the loss function. Specifically, we consider differentially private algorithms that operate in the local model, where each data record is stored on a separate user device and randomization is performed locally by those devices. We give new protocols for \emph{noninteractive} LDP convex optimization—i.e., protocols that require only a single randomized report from each user to an untrusted aggregator. We study our algorithms’ performance with respect to expected loss—either over the data set at hand (empirical risk) or a larger population from which our data set is assumed to be drawn. Our error bounds depend on the form of individuals’ contribution to the expected loss. For the case of \emph{generalized linear losses} (such as hinge and logistic losses), we give an LDP algorithm whose sample complexity is only linear in the dimensionality $p$ and quasipolynomial in other terms (the privacy parameters $\epsilon$ and $\delta$, and the desired excess risk $\alpha$). This is the first algorithm for nonsmooth losses with sub-exponential dependence on $p$. For the Euclidean median problem, where the loss is given by the Euclidean distance to a given data point, we give a protocol whose sample complexity grows quasipolynomially in $p$. This is the first protocol with sub-exponential dependence on $p$ for a loss that is not a generalized linear loss . Our result for the hinge loss is based on a technique, dubbed polynomial of inner product approximation, which may be applicable to other problems. Our results for generalized linear losses and the Euclidean median are based on new reductions to the case of hinge loss.
Tasks
Published 2018-12-17
URL http://arxiv.org/abs/1812.06825v2
PDF http://arxiv.org/pdf/1812.06825v2.pdf
PWC https://paperswithcode.com/paper/noninteractive-locally-private-learning-of
Repo
Framework

Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach

Title Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach
Authors Hao-Hsuan Chang, Hao Song, Yang Yi, Jianzhong Zhang, Haibo He, Lingjia Liu
Abstract Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn “appropriate” spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.
Tasks Q-Learning
Published 2018-10-28
URL http://arxiv.org/abs/1810.11758v1
PDF http://arxiv.org/pdf/1810.11758v1.pdf
PWC https://paperswithcode.com/paper/distributive-dynamic-spectrum-access-through
Repo
Framework

Approximation Algorithms for Cascading Prediction Models

Title Approximation Algorithms for Cascading Prediction Models
Authors Matthew Streeter
Abstract We present an approximation algorithm that takes a pool of pre-trained models as input and produces from it a cascaded model with similar accuracy but lower average-case cost. Applied to state-of-the-art ImageNet classification models, this yields up to a 2x reduction in floating point multiplications, and up to a 6x reduction in average-case memory I/O. The auto-generated cascades exhibit intuitive properties, such as using lower-resolution input for easier images and requiring higher prediction confidence when using a computationally cheaper model.
Tasks
Published 2018-02-21
URL http://arxiv.org/abs/1802.07697v1
PDF http://arxiv.org/pdf/1802.07697v1.pdf
PWC https://paperswithcode.com/paper/approximation-algorithms-for-cascading
Repo
Framework

Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data

Title Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data
Authors Filipe Rodrigues, Francisco C. Pereira
Abstract Accurately modeling traffic speeds is a fundamental part of efficient intelligent transportation systems. Nowadays, with the widespread deployment of GPS-enabled devices, it has become possible to crowdsource the collection of speed information to road users (e.g. through mobile applications or dedicated in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced speed data also brings very important challenges, such as the highly variable measurement noise in the data due to a variety of driving behaviors and sample sizes. When not properly accounted for, this noise can severely compromise any application that relies on accurate traffic data. In this article, we propose the use of heteroscedastic Gaussian processes (HGP) to model the time-varying uncertainty in large-scale crowdsourced traffic data. Furthermore, we develop a HGP conditioned on sample size and traffic regime (SRC-HGP), which makes use of sample size information (probe vehicles per minute) as well as previous observed speeds, in order to more accurately model the uncertainty in observed speeds. Using 6 months of crowdsourced traffic data from Copenhagen, we empirically show that the proposed heteroscedastic models produce significantly better predictive distributions when compared to current state-of-the-art methods for both speed imputation and short-term forecasting tasks.
Tasks Gaussian Processes, Imputation
Published 2018-12-20
URL http://arxiv.org/abs/1812.08733v1
PDF http://arxiv.org/pdf/1812.08733v1.pdf
PWC https://paperswithcode.com/paper/heteroscedastic-gaussian-processes-for
Repo
Framework

Rain Streak Removal for Single Image via Kernel Guided CNN

Title Rain Streak Removal for Single Image via Kernel Guided CNN
Authors Ye-Tao Wang, Xi-Le Zhao, Tai-Xiang Jiang, Liang-Jian Deng, Yi Chang, Ting-Zhu Huang
Abstract Rain streak removal is an important issue and has recently been investigated extensively. Existing methods, especially the newly emerged deep learning methods, could remove the rain streaks well in many cases. However the essential factor in the generative procedure of the rain streaks, i.e., the motion blur, which leads to the line pattern appearances, were neglected by the deep learning rain streaks approaches and this resulted in over-derain or under-derain results. In this paper, we propose a novel rain streak removal approach using a kernel guided convolutional neural network (KGCNN), achieving the state-of-the-art performance with simple network architectures. We first model the rain streak interference with its motion blur mechanism. Then, our framework starts with learning the motion blur kernel, which is determined by two factors including angle and length, by a plain neural network, denoted as parameter net, from a patch of the texture component. Then, after a dimensionality stretching operation, the learned motion blur kernel is stretched into a degradation map with the same spatial size as the rainy patch. The stretched degradation map together with the texture patch is subsequently input into a derain convolutional network, which is a typical ResNet architecture and trained to output the rain streaks with the guidance of the learned motion blur kernel. Experiments conducted on extensive synthetic and real data demonstrate the effectiveness of the proposed method, which preserves the texture and the contrast while removing the rain streaks.
Tasks
Published 2018-08-26
URL http://arxiv.org/abs/1808.08545v2
PDF http://arxiv.org/pdf/1808.08545v2.pdf
PWC https://paperswithcode.com/paper/rain-streak-removal-for-single-image-via
Repo
Framework

Iris recognition in cases of eye pathology

Title Iris recognition in cases of eye pathology
Authors Mateusz Trokielewicz, Adam Czajka, Piotr Maciejewicz
Abstract This chapter provides insight on how iris recognition, one of the leading biometric identification technologies in the world, can be impacted by pathologies and illnesses present in the eye, what are the possible repercussions of this influence, and what are the possible means for taking such effects into account when matching iris samples. To make this study possible, a special database of iris images has been used, representing more than 20 different medical conditions of the ocular region (including cataract, glaucoma, rubeosis iridis, synechiae, iris defects, corneal pathologies and other) and containing almost 3000 samples collected from 230 distinct irises. Then, with the use of four different iris recognition methods, a series of experiments has been conducted, concluding in several important observations. One of the most popular ocular disorders worldwide - the cataract - is shown to worsen genuine comparison scores when results obtained from cataract-affected eyes are compared to those coming from healthy irises. An analysis devoted to different types of impact on eye structures caused by diseases is also carried out with significant results. The enrollment process is highly sensitive to those eye conditions that make the iris obstructed or introduce geometrical distortions. Disorders affecting iris geometry, or producing obstructions are exceptionally capable of degrading the genuine comparison scores, so that the performance of the entire biometric system can be influenced. Experiments also reveal that imperfect execution of the image segmentation stage is the most prominent contributor to recognition errors.
Tasks Iris Recognition, Semantic Segmentation
Published 2018-09-04
URL http://arxiv.org/abs/1809.01040v1
PDF http://arxiv.org/pdf/1809.01040v1.pdf
PWC https://paperswithcode.com/paper/iris-recognition-in-cases-of-eye-pathology
Repo
Framework

Machine Learning for Molecular Dynamics on Long Timescales

Title Machine Learning for Molecular Dynamics on Long Timescales
Authors Frank Noé
Abstract Molecular Dynamics (MD) simulation is widely used to analyze the properties of molecules and materials. Most practical applications, such as comparison with experimental measurements, designing drug molecules, or optimizing materials, rely on statistical quantities, which may be prohibitively expensive to compute from direct long-time MD simulations. Classical Machine Learning (ML) techniques have already had a profound impact on the field, especially for learning low-dimensional models of the long-time dynamics and for devising more efficient sampling schemes for computing long-time statistics. Novel ML methods have the potential to revolutionize long-timescale MD and to obtain interpretable models. ML concepts such as statistical estimator theory, end-to-end learning, representation learning and active learning are highly interesting for the MD researcher and will help to develop new solutions to hard MD problems. With the aim of better connecting the MD and ML research areas and spawning new research on this interface, we define the learning problems in long-timescale MD, present successful approaches and outline some of the unsolved ML problems in this application field.
Tasks Active Learning, Representation Learning
Published 2018-12-18
URL http://arxiv.org/abs/1812.07669v1
PDF http://arxiv.org/pdf/1812.07669v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-for-molecular-dynamics-on
Repo
Framework
comments powered by Disqus