October 18, 2019

3210 words 16 mins read

Paper Group ANR 490

Paper Group ANR 490

Realtime Time Synchronized Event-based Stereo. A Joint Sequence Fusion Model for Video Question Answering and Retrieval. Attention to Head Locations for Crowd Counting. Inception-Residual Block based Neural Network for Thermal Image Denoising. On the Complexity of the Weighted Fused Lasso. How Developers Iterate on Machine Learning Workflows – A S …

Realtime Time Synchronized Event-based Stereo

Title Realtime Time Synchronized Event-based Stereo
Authors Alex Zihao Zhu, Yibo Chen, Kostas Daniilidis
Abstract In this work, we propose a novel event based stereo method which addresses the problem of motion blur for a moving event camera. Our method uses the velocity of the camera and a range of disparities to synchronize the positions of the events, as if they were captured at a single point in time. We represent these events using a pair of novel time synchronized event disparity volumes, which we show remove motion blur for pixels at the correct disparity in the volume, while further blurring pixels at the wrong disparity. We then apply a novel matching cost over these time synchronized event disparity volumes, which both rewards similarity between the volumes while penalizing blurriness. We show that our method outperforms more expensive, smoothing based event stereo methods, by evaluating on the Multi Vehicle Stereo Event Camera dataset.
Tasks
Published 2018-03-24
URL http://arxiv.org/abs/1803.09025v2
PDF http://arxiv.org/pdf/1803.09025v2.pdf
PWC https://paperswithcode.com/paper/realtime-time-synchronized-event-based-stereo
Repo
Framework

A Joint Sequence Fusion Model for Video Question Answering and Retrieval

Title A Joint Sequence Fusion Model for Video Question Answering and Retrieval
Authors Youngjae Yu, Jongseok Kim, Gunhee Kim
Abstract We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e.g. a video clip and a language sentence). Our multimodal matching network consists of two key components. First, the Joint Semantic Tensor composes a dense pairwise representation of two sequence data into a 3D tensor. Then, the Convolutional Hierarchical Decoder computes their similarity score by discovering hidden hierarchical matches between the two sequence modalities. Both modules leverage hierarchical attention mechanisms that learn to promote well-matched representation patterns while prune out misaligned ones in a bottom-up manner. Although the JSFusion is a universal model to be applicable to any multimodal sequence data, this work focuses on video-language tasks including multimodal retrieval and video QA. We evaluate the JSFusion model in three retrieval and VQA tasks in LSMDC, for which our model achieves the best performance reported so far. We also perform multiple-choice and movie retrieval tasks for the MSR-VTT dataset, on which our approach outperforms many state-of-the-art methods.
Tasks Question Answering, Semantic Similarity, Semantic Textual Similarity, Video Question Answering, Video Retrieval, Visual Question Answering
Published 2018-08-07
URL http://arxiv.org/abs/1808.02559v1
PDF http://arxiv.org/pdf/1808.02559v1.pdf
PWC https://paperswithcode.com/paper/a-joint-sequence-fusion-model-for-video
Repo
Framework

Attention to Head Locations for Crowd Counting

Title Attention to Head Locations for Crowd Counting
Authors Youmei Zhang, Chunluan Zhou, Faliang Chang, Alex C. Kot
Abstract Occlusions, complex backgrounds, scale variations and non-uniform distributions present great challenges for crowd counting in practical applications. In this paper, we propose a novel method using an attention model to exploit head locations which are the most important cue for crowd counting. The attention model estimates a probability map in which high probabilities indicate locations where heads are likely to be present. The estimated probability map is used to suppress non-head regions in feature maps from several multi-scale feature extraction branches of a convolution neural network for crowd density estimation, which makes our method robust to complex backgrounds, scale variations and non-uniform distributions. In addition, we introduce a relative deviation loss to compensate a commonly used training loss, Euclidean distance, to improve the accuracy of sparse crowd density estimation. Experiments on Shanghai-Tech, UCF_CC_50 and World-Expo’10 data sets demonstrate the effectiveness of our method.
Tasks Crowd Counting, Density Estimation
Published 2018-06-27
URL http://arxiv.org/abs/1806.10287v1
PDF http://arxiv.org/pdf/1806.10287v1.pdf
PWC https://paperswithcode.com/paper/attention-to-head-locations-for-crowd
Repo
Framework

Inception-Residual Block based Neural Network for Thermal Image Denoising

Title Inception-Residual Block based Neural Network for Thermal Image Denoising
Authors Seongmin Hwang, Gwanghyun Yu, Huy Toan Nguyen, Nazeer Shahid, Doseong Sin, Jinyoung Kim, Seungyou Na
Abstract Thermal cameras show noisy images due to their limited thermal resolution, especially for the scenes of a low temperature difference. In order to deal with a noise problem, this paper proposes a novel neural network architecture with repeatable denoising inception-residual blocks(DnIRB) for noise learning. Each DnIRB has two sub-blocks with difference receptive fields and one shortcut connection to prevent a vanishing gradient problem. The proposed approach is tested for thermal images. The experimental results indicate that the proposed approach shows the best SQNR performance and reasonable processing time compared with state-of-the-art denoising methods.
Tasks Denoising, Image Denoising, Thermal Image Denoising
Published 2018-10-31
URL http://arxiv.org/abs/1810.13169v2
PDF http://arxiv.org/pdf/1810.13169v2.pdf
PWC https://paperswithcode.com/paper/inception-residual-block-based-neural-network
Repo
Framework

On the Complexity of the Weighted Fused Lasso

Title On the Complexity of the Weighted Fused Lasso
Authors Jose Bento, Ralph Furmaniak, Surjyendu Ray
Abstract The solution path of the 1D fused lasso for an $n$-dimensional input is piecewise linear with $\mathcal{O}(n)$ segments (Hoefling et al. 2010 and Tibshirani et al 2011). However, existing proofs of this bound do not hold for the weighted fused lasso. At the same time, results for the generalized lasso, of which the weighted fused lasso is a special case, allow $\Omega(3^n)$ segments (Mairal et al. 2012). In this paper, we prove that the number of segments in the solution path of the weighted fused lasso is $\mathcal{O}(n^2)$, and that, for some instances, it is $\Omega(n^2)$. We also give a new, very simple, proof of the $\mathcal{O}(n)$ bound for the fused lasso.
Tasks
Published 2018-01-15
URL http://arxiv.org/abs/1801.04987v3
PDF http://arxiv.org/pdf/1801.04987v3.pdf
PWC https://paperswithcode.com/paper/on-the-complexity-of-the-weighted-fused-lasso
Repo
Framework

How Developers Iterate on Machine Learning Workflows – A Survey of the Applied Machine Learning Literature

Title How Developers Iterate on Machine Learning Workflows – A Survey of the Applied Machine Learning Literature
Authors Doris Xin, Litian Ma, Shuchen Song, Aditya Parameswaran
Abstract Machine learning workflow development is anecdotally regarded to be an iterative process of trial-and-error with humans-in-the-loop. However, we are not aware of quantitative evidence corroborating this popular belief. A quantitative characterization of iteration can serve as a benchmark for machine learning workflow development in practice, and can aid the development of human-in-the-loop machine learning systems. To this end, we conduct a small-scale survey of the applied machine learning literature from five distinct application domains. We collect and distill statistics on the role of iteration within machine learning workflow development, and report preliminary trends and insights from our investigation, as a starting point towards this benchmark. Based on our findings, we finally describe desiderata for effective and versatile human-in-the-loop machine learning systems that can cater to users in diverse domains.
Tasks
Published 2018-03-27
URL http://arxiv.org/abs/1803.10311v2
PDF http://arxiv.org/pdf/1803.10311v2.pdf
PWC https://paperswithcode.com/paper/how-developers-iterate-on-machine-learning
Repo
Framework

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

Title Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures
Authors Sergey Bartunov, Adam Santoro, Blake A. Richards, Luke Marris, Geoffrey E. Hinton, Timothy Lillicrap
Abstract The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has proved critical, or in architectures more structured than simple fully-connected networks. Here we present results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance. We present results on the MNIST, CIFAR-10, and ImageNet datasets and explore variants of target-propagation (TP) and feedback alignment (FA) algorithms, and explore performance in both fully- and locally-connected architectures. We also introduce weight-transport-free variants of difference target propagation (DTP) modified to remove backpropagation from the penultimate layer. Many of these algorithms perform well for MNIST, but for CIFAR and ImageNet we find that TP and FA variants perform significantly worse than BP, especially for networks composed of locally connected units, opening questions about whether new architectures and algorithms are required to scale these approaches. Our results and implementation details help establish baselines for biologically motivated deep learning schemes going forward.
Tasks
Published 2018-07-12
URL http://arxiv.org/abs/1807.04587v2
PDF http://arxiv.org/pdf/1807.04587v2.pdf
PWC https://paperswithcode.com/paper/assessing-the-scalability-of-biologically
Repo
Framework

Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification

Title Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification
Authors Guodong Ding, Shanshan Zhang, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli
Abstract Person re-identification aims to match a person’s identity across multiple camera streams. Deep neural networks have been successfully applied to the challenging person re-identification task. One remarkable bottleneck is that the existing deep models are data hungry and require large amounts of labeled training data. Acquiring manual annotations for pedestrian identity matchings in large-scale surveillance camera installations is a highly cumbersome task. Here, we propose the first semi-supervised approach that performs pseudo-labeling by considering complex relationships between unlabeled and labeled training samples in the feature space. Our approach first approximates the actual data manifold by learning a generative model via adversarial training. Given the trained model, data augmentation can be performed by generating new synthetic data samples which are unlabeled. An open research problem is how to effectively use this additional data for improved feature learning. To this end, this work proposes a novel Feature Affinity based Pseudo-Labeling (FAPL) approach with two possible label encodings under a unified setting. Our approach measures the affinity of unlabeled samples with the underlying clusters of labeled data samples using the intermediate feature representations from deep networks. FAPL trains with the joint supervision of cross-entropy loss together with a center regularization term, which not only ensures discriminative feature representation learning but also simultaneously predicts pseudo-labels for unlabeled data. Our extensive experiments on two standard large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate significant performance boosts over closely related competitors and outperforms state-of-the-art person re-identification techniques in most cases.
Tasks Data Augmentation, Person Re-Identification, Representation Learning, Semi-Supervised Person Re-Identification
Published 2018-05-16
URL http://arxiv.org/abs/1805.06118v1
PDF http://arxiv.org/pdf/1805.06118v1.pdf
PWC https://paperswithcode.com/paper/feature-affinity-based-pseudo-labeling-for
Repo
Framework

Learning Deep Similarity Metric for 3D MR-TRUS Registration

Title Learning Deep Similarity Metric for 3D MR-TRUS Registration
Authors Grant Haskins, Jochen Kruecker, Uwe Kruger, Sheng Xu, Peter A. Pinto, Brad J. Wood, Pingkun Yan
Abstract Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images for guiding targeted prostate biopsy has significantly improved the biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image registration. However, it is very challenging to obtain a robust automatic MR-TRUS registration due to the large appearance difference between the two imaging modalities. The work presented in this paper aims to tackle this problem by addressing two challenges: (i) the definition of a suitable similarity metric and (ii) the determination of a suitable optimization strategy. Methods: This work proposes the use of a deep convolutional neural network to learn a similarity metric for MR-TRUS registration. We also use a composite optimization strategy that explores the solution space in order to search for a suitable initialization for the second-order optimization of the learned metric. Further, a multi-pass approach is used in order to smooth the metric for optimization. Results: The learned similarity metric outperforms the classical mutual information and also the state-of-the-art MIND feature based methods. The results indicate that the overall registration framework has a large capture range. The proposed deep similarity metric based approach obtained a mean TRE of 3.86mm (with an initial TRE of 16mm) for this challenging problem. Conclusion: A similarity metric that is learned using a deep neural network can be used to assess the quality of any given image registration and can be used in conjunction with the aforementioned optimization framework to perform automatic registration that is robust to poor initialization.
Tasks Image Registration
Published 2018-06-12
URL http://arxiv.org/abs/1806.04548v2
PDF http://arxiv.org/pdf/1806.04548v2.pdf
PWC https://paperswithcode.com/paper/learning-deep-similarity-metric-for-3d-mr
Repo
Framework

Liveness Detection Using Implicit 3D Features

Title Liveness Detection Using Implicit 3D Features
Authors J. Matias Di Martino, Qiang Qiu, Trishul Nagenalli, Guillermo Sapiro
Abstract Spoofing attacks are a threat to modern face recognition systems. In this work we present a simple yet effective liveness detection approach to enhance 2D face recognition methods and make them robust against spoofing attacks. We show that the risk to spoofing attacks can be re- duced through the use of an additional source of light, for example a flash. From a pair of input images taken under different illumination, we define discriminative features that implicitly contain facial three-dimensional in- formation. Furthermore, we show that when multiple sources of light are considered, we are able to validate which one has been activated. This makes possible the design of a highly secure active-light authentication framework. Finally, further investigating the use of 3D features without 3D reconstruction, we introduce an approximated disparity-based implicit 3D feature obtained from an uncalibrated stereo-pair of cameras. Valida- tion experiments show that the proposed methods produce state-of-the-art results in challenging scenarios with nearly no feature extraction latency.
Tasks 3D Reconstruction, Face Recognition
Published 2018-04-18
URL http://arxiv.org/abs/1804.06702v2
PDF http://arxiv.org/pdf/1804.06702v2.pdf
PWC https://paperswithcode.com/paper/liveness-detection-using-implicit-3d-features
Repo
Framework

Using Machine Learning to Improve Cylindrical Algebraic Decomposition

Title Using Machine Learning to Improve Cylindrical Algebraic Decomposition
Authors Zongyan Huang, Matthew England, David Wilson, James H. Davenport, Lawrence C. Paulson
Abstract Cylindrical Algebraic Decomposition (CAD) is a key tool in computational algebraic geometry, best known as a procedure to enable Quantifier Elimination over real-closed fields. However, it has a worst case complexity doubly exponential in the size of the input, which is often encountered in practice. It has been observed that for many problems a change in algorithm settings or problem formulation can cause huge differences in runtime costs, changing problem instances from intractable to easy. A number of heuristics have been developed to help with such choices, but the complicated nature of the geometric relationships involved means these are imperfect and can sometimes make poor choices. We investigate the use of machine learning (specifically support vector machines) to make such choices instead. Machine learning is the process of fitting a computer model to a complex function based on properties learned from measured data. In this paper we apply it in two case studies: the first to select between heuristics for choosing a CAD variable ordering; the second to identify when a CAD problem instance would benefit from Groebner Basis preconditioning. These appear to be the first such applications of machine learning to Symbolic Computation. We demonstrate in both cases that the machine learned choice outperforms human developed heuristics.
Tasks
Published 2018-04-26
URL http://arxiv.org/abs/1804.10520v1
PDF http://arxiv.org/pdf/1804.10520v1.pdf
PWC https://paperswithcode.com/paper/using-machine-learning-to-improve-cylindrical
Repo
Framework

Policy Certificates: Towards Accountable Reinforcement Learning

Title Policy Certificates: Towards Accountable Reinforcement Learning
Authors Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill
Abstract The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms is the first to achieve minimax-optimal PAC bounds up to lower-order terms, and this algorithm also matches (and in some settings slightly improves upon) existing minimax regret bounds.
Tasks
Published 2018-11-07
URL https://arxiv.org/abs/1811.03056v3
PDF https://arxiv.org/pdf/1811.03056v3.pdf
PWC https://paperswithcode.com/paper/policy-certificates-towards-accountable
Repo
Framework

Learning Pixel-wise Labeling from the Internet without Human Interaction

Title Learning Pixel-wise Labeling from the Internet without Human Interaction
Authors Yun Liu, Yujun Shi, JiaWang Bian, Le Zhang, Ming-Ming Cheng, Jiashi Feng
Abstract Deep learning stands at the forefront in many computer vision tasks. However, deep neural networks are usually data-hungry and require a huge amount of well-annotated training samples. Collecting sufficient annotated data is very expensive in many applications, especially for pixel-level prediction tasks such as semantic segmentation. To solve this fundamental issue, we consider a new challenging vision task, Internetly supervised semantic segmentation, which only uses Internet data with noisy image-level supervision of corresponding query keywords for segmentation model training. We address this task by proposing the following solution. A class-specific attention model unifying multiscale forward and backward convolutional features is proposed to provide initial segmentation “ground truth”. The model trained with such noisy annotations is then improved by an online fine-tuning procedure. It achieves state-of-the-art performance under the weakly-supervised setting on PASCAL VOC2012 dataset. The proposed framework also paves a new way towards learning from the Internet without human interaction and could serve as a strong baseline therein. Code and data will be released upon the paper acceptance.
Tasks Semantic Segmentation
Published 2018-05-19
URL http://arxiv.org/abs/1805.07548v1
PDF http://arxiv.org/pdf/1805.07548v1.pdf
PWC https://paperswithcode.com/paper/learning-pixel-wise-labeling-from-the
Repo
Framework

Explore-Exploit: A Framework for Interactive and Online Learning

Title Explore-Exploit: A Framework for Interactive and Online Learning
Authors Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin
Abstract Interactive user interfaces need to continuously evolve based on the interactions that a user has (or does not have) with the system. This may require constant exploration of various options that the system may have for the user and obtaining signals of user preferences on those. However, such an exploration, especially when the set of available options itself can change frequently, can lead to sub-optimal user experiences. We present Explore-Exploit: a framework designed to collect and utilize user feedback in an interactive and online setting that minimizes regressions in end-user experience. This framework provides a suite of online learning operators for various tasks such as personalization ranking, candidate selection and active learning. We demonstrate how to integrate this framework with run-time services to leverage online and interactive machine learning out-of-the-box. We also present results demonstrating the efficiencies that can be achieved using the Explore-Exploit framework.
Tasks Active Learning
Published 2018-12-01
URL http://arxiv.org/abs/1812.00116v1
PDF http://arxiv.org/pdf/1812.00116v1.pdf
PWC https://paperswithcode.com/paper/explore-exploit-a-framework-for-interactive
Repo
Framework

OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages

Title OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages
Authors Jyoti Jha, Sreekavitha Parupalli, Navjyoti Singh
Abstract Following approaches for understanding lexical meaning developed by Yaska, Patanjali and Bhartrihari from Indian linguistic traditions and extending approaches developed by Leibniz and Brentano in the modern times, a framework of formal ontology of language was developed. This framework proposes that meaning of words are in-formed by intrinsic and extrinsic ontological structures. The paper aims to capture such intrinsic and extrinsic meanings of words for two major Indian languages, namely, Hindi and Telugu. Parts-of-speech have been rendered into sense-types and sense-classes. Using them we have developed a gold- standard annotated lexical resource to support semantic understanding of a language. The resource has collection of Hindi and Telugu lexicons, which has been manually annotated by native speakers of the languages following our annotation guidelines. Further, the resource was utilised to derive adverbial sense-class distribution of verbs and karaka-verb sense- type distribution. Different corpora (news, novels) were compared using verb sense-types distribution. Word Embedding was used as an aid for the enrichment of the resource. This is a work in progress that aims at lexical coverage of language extensively.
Tasks
Published 2018-08-02
URL http://arxiv.org/abs/1808.00694v1
PDF http://arxiv.org/pdf/1808.00694v1.pdf
PWC https://paperswithcode.com/paper/ontosensenet-a-verb-centric-ontological
Repo
Framework
comments powered by Disqus