October 18, 2019

3307 words 16 mins read

Paper Group ANR 562

Paper Group ANR 562

Low-Complexity Data-Parallel Earth Mover’s Distance Approximations. Protection Against Reconstruction and Its Applications in Private Federated Learning. Artifacts Detection and Error Block Analysis from Broadcasted Videos. Gradient Agreement as an Optimization Objective for Meta-Learning. Relational Long Short-Term Memory for Video Action Recognit …

Low-Complexity Data-Parallel Earth Mover’s Distance Approximations

Title Low-Complexity Data-Parallel Earth Mover’s Distance Approximations
Authors Kubilay Atasu, Thomas Mittelholzer
Abstract The Earth Mover’s Distance (EMD) is a state-of-the art metric for comparing discrete probability distributions, but its high distinguishability comes at a high cost in computational complexity. Even though linear-complexity approximation algorithms have been proposed to improve its scalability, these algorithms are either limited to vector spaces with only a few dimensions or they become ineffective when the degree of overlap between the probability distributions is high. We propose novel approximation algorithms that overcome both of these limitations, yet still achieve linear time complexity. All our algorithms are data parallel, and thus, we take advantage of massively parallel computing engines, such as Graphics Processing Units (GPUs). On the popular text-based 20 Newsgroups dataset, the new algorithms are four orders of magnitude faster than a multi-threaded CPU implementation of Word Mover’s Distance and match its nearest-neighbors-search accuracy. On MNIST images, the new algorithms are four orders of magnitude faster than a GPU implementation of the Sinkhorn’s algorithm while offering a slightly higher nearest-neighbors-search accuracy.
Tasks
Published 2018-12-05
URL https://arxiv.org/abs/1812.02091v2
PDF https://arxiv.org/pdf/1812.02091v2.pdf
PWC https://paperswithcode.com/paper/low-complexity-data-parallel-earth-movers
Repo
Framework

Protection Against Reconstruction and Its Applications in Private Federated Learning

Title Protection Against Reconstruction and Its Applications in Private Federated Learning
Authors Abhishek Bhowmick, John Duchi, Julien Freudiger, Gaurav Kapoor, Ryan Rogers
Abstract In large-scale statistical learning, data collection and model fitting are moving increasingly toward peripheral devices—phones, watches, fitness trackers—away from centralized data collection. Concomitant with this rise in decentralized data are increasing challenges of maintaining privacy while allowing enough information to fit accurate, useful statistical models. This motivates local notions of privacy—most significantly, local differential privacy, which provides strong protections against sensitive data disclosures—where data is obfuscated before a statistician or learner can even observe it, providing strong protections to individuals’ data. Yet local privacy as traditionally employed may prove too stringent for practical use, especially in modern high-dimensional statistical and machine learning problems. Consequently, we revisit the types of disclosures and adversaries against which we provide protections, considering adversaries with limited prior information and ensuring that with high probability, ensuring they cannot reconstruct an individual’s data within useful tolerances. By reconceptualizing these protections, we allow more useful data release—large privacy parameters in local differential privacy—and we design new (minimax) optimal locally differentially private mechanisms for statistical learning problems for \emph{all} privacy levels. We thus present practicable approaches to large-scale locally private model training that were previously impossible, showing theoretically and empirically that we can fit large-scale image classification and language models with little degradation in utility.
Tasks Image Classification
Published 2018-12-03
URL https://arxiv.org/abs/1812.00984v2
PDF https://arxiv.org/pdf/1812.00984v2.pdf
PWC https://paperswithcode.com/paper/protection-against-reconstruction-and-its
Repo
Framework

Artifacts Detection and Error Block Analysis from Broadcasted Videos

Title Artifacts Detection and Error Block Analysis from Broadcasted Videos
Authors Md Mehedi Hasan, Tasneem Rahman, Kiok Ahn, Oksam Chae
Abstract With the advancement of IPTV and HDTV technology, previous subtle errors in videos are now becoming more prominent because of the structure oriented and compression based artifacts. In this paper, we focus towards the development of a real-time video quality check system. Light weighted edge gradient magnitude information is incorporated to acquire the statistical information and the distorted frames are then estimated based on the characteristics of their surrounding frames. Then we apply the prominent texture patterns to classify them in different block errors and analyze them not only in video error detection application but also in error concealment, restoration and retrieval. Finally, evaluating the performance through experiments on prominent datasets and broadcasted videos show that the proposed algorithm is very much efficient to detect errors for video broadcast and surveillance applications in terms of computation time and analysis of distorted frames.
Tasks
Published 2018-08-30
URL http://arxiv.org/abs/1808.10086v1
PDF http://arxiv.org/pdf/1808.10086v1.pdf
PWC https://paperswithcode.com/paper/artifacts-detection-and-error-block-analysis
Repo
Framework

Gradient Agreement as an Optimization Objective for Meta-Learning

Title Gradient Agreement as an Optimization Objective for Meta-Learning
Authors Amir Erfan Eshratifar, David Eigen, Massoud Pedram
Abstract This paper presents a novel optimization method for maximizing generalization over tasks in meta-learning. The goal of meta-learning is to learn a model for an agent adapting rapidly when presented with previously unseen tasks. Tasks are sampled from a specific distribution which is assumed to be similar for both seen and unseen tasks. We focus on a family of meta-learning methods learning initial parameters of a base model which can be fine-tuned quickly on a new task, by few gradient steps (MAML). Our approach is based on pushing the parameters of the model to a direction in which tasks have more agreement upon. If the gradients of a task agree with the parameters update vector, then their inner product will be a large positive value. As a result, given a batch of tasks to be optimized for, we associate a positive (negative) weight to the loss function of a task, if the inner product between its gradients and the average of the gradients of all tasks in the batch is a positive (negative) value. Therefore, the degree of the contribution of a task to the parameter updates is controlled by introducing a set of weights on the loss function of the tasks. Our method can be easily integrated with the current meta-learning algorithms for neural networks. Our experiments demonstrate that it yields models with better generalization compared to MAML and Reptile.
Tasks Meta-Learning
Published 2018-10-18
URL http://arxiv.org/abs/1810.08178v1
PDF http://arxiv.org/pdf/1810.08178v1.pdf
PWC https://paperswithcode.com/paper/gradient-agreement-as-an-optimization
Repo
Framework

Relational Long Short-Term Memory for Video Action Recognition

Title Relational Long Short-Term Memory for Video Action Recognition
Authors Zexi Chen, Bharathkumar Ramachandra, Tianfu Wu, Ranga Raju Vatsavai
Abstract Spatial and temporal relationships, both short-range and long-range, between objects in videos are key cues for recognizing actions. It is a challenging problem to model them jointly. In this paper, we first present a new variant of Long Short-Term Memory, namely Relational LSTM to address the challenge for relation reasoning across space and time between objects. In our Relational LSTM module, we utilize a non-local operation similar in spirit to the recently proposed non-local network to substitute the fully connected operation in the vanilla LSTM. By doing this, our Relational LSTM is capable of capturing long and short-range spatio-temporal relations between objects in videos in a principled way. Then, we propose a two-branch neural architecture consisting of the Relational LSTM module as the non-local branch and a spatio-temporal pooling based local branch. The local branch is introduced for capturing local spatial appearance and/or short-term motion features. The two-branch modules are concatenated to learn video-level features from snippet-level ones end-to-end. Experimental results on UCF-101 and HMDB-51 datasets show that our model achieves state-of-the-art results among LSTM-based methods, while obtaining comparable performance with other state-of-the-art methods (which use not directly comparable schema). Our code will be released.
Tasks Temporal Action Localization
Published 2018-11-16
URL http://arxiv.org/abs/1811.07059v1
PDF http://arxiv.org/pdf/1811.07059v1.pdf
PWC https://paperswithcode.com/paper/relational-long-short-term-memory-for-video
Repo
Framework

A jamming transition from under- to over-parametrization affects loss landscape and generalization

Title A jamming transition from under- to over-parametrization affects loss landscape and generalization
Authors Stefano Spigler, Mario Geiger, Stéphane d’Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart
Abstract We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point — where it displays a cusp — and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.
Tasks
Published 2018-10-22
URL https://arxiv.org/abs/1810.09665v5
PDF https://arxiv.org/pdf/1810.09665v5.pdf
PWC https://paperswithcode.com/paper/a-jamming-transition-from-under-to-over
Repo
Framework

Making Sense of Random Forest Probabilities: a Kernel Perspective

Title Making Sense of Random Forest Probabilities: a Kernel Perspective
Authors Matthew A. Olson, Abraham J. Wyner
Abstract A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound statistical footing. As part of our investigation, we develop a model for the proximity kernel and relate it to the geometry and sparsity of the estimation problem. We also provide intuition and recommendations for tuning a random forest to improve its probability estimates.
Tasks
Published 2018-12-14
URL http://arxiv.org/abs/1812.05792v1
PDF http://arxiv.org/pdf/1812.05792v1.pdf
PWC https://paperswithcode.com/paper/making-sense-of-random-forest-probabilities-a
Repo
Framework

Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts

Title Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts
Authors Rohan Bavishi, Michael Pradel, Koushik Sen
Abstract Most of the JavaScript code deployed in the wild has been minified, a process in which identifier names are replaced with short, arbitrary and meaningless names. Minified code occupies less space, but also makes the code extremely difficult to manually inspect and understand. This paper presents Context2Name, a deep learningbased technique that partially reverses the effect of minification by predicting natural identifier names for minified names. The core idea is to predict from the usage context of a variable a name that captures the meaning of the variable. The approach combines a lightweight, token-based static analysis with an auto-encoder neural network that summarizes usage contexts and a recurrent neural network that predict natural names for a given usage context. We evaluate Context2Name with a large corpus of real-world JavaScript code and show that it successfully predicts 47.5% of all minified identifiers while taking only 2.9 milliseconds on average to predict a name. A comparison with the state-of-the-art tools JSNice and JSNaughty shows that our approach performs comparably in terms of accuracy while improving in terms of efficiency. Moreover, Context2Name complements the state-of-the-art by predicting 5.3% additional identifiers that are missed by both existing tools.
Tasks
Published 2018-08-31
URL https://arxiv.org/abs/1809.05193v1
PDF https://arxiv.org/pdf/1809.05193v1.pdf
PWC https://paperswithcode.com/paper/context2name-a-deep-learning-based-approach
Repo
Framework

Instance-based entropy fuzzy support vector machine for imbalanced data

Title Instance-based entropy fuzzy support vector machine for imbalanced data
Authors Poongjin Cho, Minhyuk Lee, Woojin Chang
Abstract Imbalanced classification has been a major challenge for machine learning because many standard classifiers mainly focus on balanced datasets and tend to have biased results towards the majority class. We modify entropy fuzzy support vector machine (EFSVM) and introduce instance-based entropy fuzzy support vector machine (IEFSVM). Both EFSVM and IEFSVM use the entropy information of k-nearest neighbors to determine the fuzzy membership value for each sample which prioritizes the importance of each sample. IEFSVM considers the diversity of entropy patterns for each sample when increasing the size of neighbors, k, while EFSVM uses single entropy information of the fixed size of neighbors for all samples. By varying k, we can reflect the component change of sample’s neighbors from near to far distance in the determination of fuzzy value membership. Numerical experiments on 35 public and 12 real-world imbalanced datasets are performed to validate IEFSVM and area under the receiver operating characteristic curve (AUC) is used to compare its performance with other SVMs and machine learning methods. IEFSVM shows a much higher AUC value for datasets with high imbalance ratio, implying that IEFSVM is effective in dealing with the class imbalance problem.
Tasks
Published 2018-07-11
URL http://arxiv.org/abs/1807.03933v1
PDF http://arxiv.org/pdf/1807.03933v1.pdf
PWC https://paperswithcode.com/paper/instance-based-entropy-fuzzy-support-vector
Repo
Framework

Correlation Net: Spatiotemporal multimodal deep learning for action recognition

Title Correlation Net: Spatiotemporal multimodal deep learning for action recognition
Authors Novanto Yudistira, Takio Kurita
Abstract This paper describes a network that captures multimodal correlations over arbitrary timestamps. The proposed scheme operates as a complementary, extended network over a multimodal convolutional neural network (CNN). Spatial and temporal streams are required for action recognition by a deep CNN, but overfitting reduction and fusing these two streams remain open problems. The existing fusion approach averages the two streams. Here we propose a correlation network with a Shannon fusion for learning a pre-trained CNN. A Long-range video may consist of spatiotemporal correlations over arbitrary times, which can be captured by forming the correlation network from simple fully connected layers. This approach was found to complement the existing network fusion methods. The importance of multimodal correlation is validated in comparison experiments on the UCF-101 and HMDB-51 datasets. The multimodal correlation enhanced the accuracy of the video recognition results.
Tasks Temporal Action Localization, Video Recognition
Published 2018-07-22
URL https://arxiv.org/abs/1807.08291v6
PDF https://arxiv.org/pdf/1807.08291v6.pdf
PWC https://paperswithcode.com/paper/correlation-net-spatio-temporal-multimodal
Repo
Framework

Deep Learning with Cinematic Rendering: Fine-Tuning Deep Neural Networks Using Photorealistic Medical Images

Title Deep Learning with Cinematic Rendering: Fine-Tuning Deep Neural Networks Using Photorealistic Medical Images
Authors Faisal Mahmood, Richard Chen, Sandra Sudarsky, Daphne Yu, Nicholas J. Durr
Abstract Deep learning has emerged as a powerful artificial intelligence tool to interpret medical images for a growing variety of applications. However, the paucity of medical imaging data with high-quality annotations that is necessary for training such methods ultimately limits their performance. Medical data is challenging to acquire due to privacy issues, shortage of experts available for annotation, limited representation of rare conditions and cost. This problem has previously been addressed by using synthetically generated data. However, networks trained on synthetic data often fail to generalize to real data. Cinematic rendering simulates the propagation and interaction of light passing through tissue models reconstructed from CT data, enabling the generation of photorealistic images. In this paper, we present one of the first applications of cinematic rendering in deep learning, in which we propose to fine-tune synthetic data-driven networks using cinematically rendered CT data for the task of monocular depth estimation in endoscopy. Our experiments demonstrate that: (a) Convolutional Neural Networks (CNNs) trained on synthetic data and fine-tuned on photorealistic cinematically rendered data adapt better to real medical images and demonstrate more robust performance when compared to networks with no fine-tuning, (b) these fine-tuned networks require less training data to converge to an optimal solution, and (c) fine-tuning with data from a variety of photorealistic rendering conditions of the same scene prevents the network from learning patient-specific information and aids in generalizability of the model. Our empirical evaluation demonstrates that networks fine-tuned with cinematically rendered data predict depth with 56.87% less error for rendered endoscopy images and 27.49% less error for real porcine colon endoscopy images.
Tasks Depth Estimation, Monocular Depth Estimation
Published 2018-05-22
URL http://arxiv.org/abs/1805.08400v3
PDF http://arxiv.org/pdf/1805.08400v3.pdf
PWC https://paperswithcode.com/paper/deep-learning-with-cinematic-rendering-fine
Repo
Framework

Combining Stereo Disparity and Optical Flow for Basic Scene Flow

Title Combining Stereo Disparity and Optical Flow for Basic Scene Flow
Authors René Schuster, Christian Bailer, Oliver Wasenmüller, Didier Stricker
Abstract Scene flow is a description of real world motion in 3D that contains more information than optical flow. Because of its complexity there exists no applicable variant for real-time scene flow estimation in an automotive or commercial vehicle context that is sufficiently robust and accurate. Therefore, many applications estimate the 2D optical flow instead. In this paper, we examine the combination of top-performing state-of-the-art optical flow and stereo disparity algorithms in order to achieve a basic scene flow. On the public KITTI Scene Flow Benchmark we demonstrate the reasonable accuracy of the combination approach and show its speed in computation.
Tasks Optical Flow Estimation, Scene Flow Estimation
Published 2018-01-15
URL http://arxiv.org/abs/1801.04720v1
PDF http://arxiv.org/pdf/1801.04720v1.pdf
PWC https://paperswithcode.com/paper/combining-stereo-disparity-and-optical-flow
Repo
Framework

Adversarial Structure Matching for Structured Prediction Tasks

Title Adversarial Structure Matching for Structured Prediction Tasks
Authors Jyh-Jing Hwang, Tsung-Wei Ke, Jianbo Shi, Stella X. Yu
Abstract Pixel-wise losses, e.g., cross-entropy or L2, have been widely used in structured prediction tasks as a spatial extension of generic image classification or regression. However, its i.i.d. assumption neglects the structural regularity present in natural images. Various attempts have been made to incorporate structural reasoning mostly through structure priors in a cooperative way where co-occurring patterns are encouraged. We, on the other hand, approach this problem from an opposing angle and propose a new framework, Adversarial Structure Matching (ASM), for training such structured prediction networks via an adversarial process, in which we train a structure analyzer that provides the supervisory signals, the ASM loss. The structure analyzer is trained to maximize the ASM loss, or to emphasize recurring multi-scale hard negative structural mistakes among co-occurring patterns. On the contrary, the structured prediction network is trained to reduce those mistakes and is thus enabled to distinguish fine-grained structures. As a result, training structured prediction networks using ASM reduces contextual confusion among objects and improves boundary localization. We demonstrate that our ASM outperforms pixel-wise IID loss or structural prior GAN loss on three different structured prediction tasks: semantic segmentation, monocular depth estimation, and surface normal prediction.
Tasks Depth Estimation, Image Classification, Monocular Depth Estimation, Semantic Segmentation, Structured Prediction
Published 2018-05-18
URL https://arxiv.org/abs/1805.07457v2
PDF https://arxiv.org/pdf/1805.07457v2.pdf
PWC https://paperswithcode.com/paper/adversarial-structure-matching-loss-for-image
Repo
Framework

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

Title AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
Authors Jogendra Nath Kundu, Phani Krishna Uppala, Anuj Pahuja, R. Venkatesh Babu
Abstract Supervised deep learning methods have shown promising results for the task of monocular depth estimation; but acquiring ground truth is costly, and prone to noise as well as inaccuracies. While synthetic datasets have been used to circumvent above problems, the resultant models do not generalize well to natural scenes due to the inherent domain shift. Recent adversarial approaches for domain adaption have performed well in mitigating the differences between the source and target domains. But these methods are mostly limited to a classification setup and do not scale well for fully-convolutional architectures. In this work, we propose AdaDepth - an unsupervised domain adaptation strategy for the pixel-wise regression task of monocular depth estimation. The proposed approach is devoid of above limitations through a) adversarial learning and b) explicit imposition of content consistency on the adapted target representation. Our unsupervised approach performs competitively with other established approaches on depth estimation tasks and achieves state-of-the-art results in a semi-supervised setting.
Tasks Depth Estimation, Domain Adaptation, Monocular Depth Estimation, Unsupervised Domain Adaptation
Published 2018-03-05
URL http://arxiv.org/abs/1803.01599v2
PDF http://arxiv.org/pdf/1803.01599v2.pdf
PWC https://paperswithcode.com/paper/adadepth-unsupervised-content-congruent
Repo
Framework

Structure Learning of Sparse GGMs over Multiple Access Networks

Title Structure Learning of Sparse GGMs over Multiple Access Networks
Authors Mostafa Tavassolipour, Armin Karamzade, Reza Mirzaeifard, Seyed Abolfazl Motahari, Mohammad-Taghi Manzuri Shalmani
Abstract A central machine is interested in estimating the underlying structure of a sparse Gaussian Graphical Model (GGM) from datasets distributed across multiple local machines. The local machines can communicate with the central machine through a wireless multiple access channel. In this paper, we are interested in designing effective strategies where reliable learning is feasible under power and bandwidth limitations. Two approaches are proposed: Signs and Uncoded methods. In Signs method, the local machines quantize their data into binary vectors and an optimal channel coding scheme is used to reliably send the vectors to the central machine where the structure is learned from the received data. In Uncoded method, data symbols are scaled and transmitted through the channel. The central machine uses the received noisy symbols to recover the structure. Theoretical results show that both methods can recover the structure with high probability for large enough sample size. Experimental results indicate the superiority of Signs method over Uncoded method under several circumstances.
Tasks
Published 2018-12-26
URL http://arxiv.org/abs/1812.10437v1
PDF http://arxiv.org/pdf/1812.10437v1.pdf
PWC https://paperswithcode.com/paper/structure-learning-of-sparse-ggms-over
Repo
Framework
comments powered by Disqus