October 18, 2019

3210 words 16 mins read

Paper Group ANR 490

Realtime Time Synchronized Event-based Stereo. A Joint Sequence Fusion Model for Video Question Answering and Retrieval. Attention to Head Locations for Crowd Counting. Inception-Residual Block based Neural Network for Thermal Image Denoising. On the Complexity of the Weighted Fused Lasso. How Developers Iterate on Machine Learning Workflows – A S …

Realtime Time Synchronized Event-based Stereo


Title	Realtime Time Synchronized Event-based Stereo
Authors	Alex Zihao Zhu, Yibo Chen, Kostas Daniilidis
Abstract	In this work, we propose a novel event based stereo method which addresses the problem of motion blur for a moving event camera. Our method uses the velocity of the camera and a range of disparities to synchronize the positions of the events, as if they were captured at a single point in time. We represent these events using a pair of novel time synchronized event disparity volumes, which we show remove motion blur for pixels at the correct disparity in the volume, while further blurring pixels at the wrong disparity. We then apply a novel matching cost over these time synchronized event disparity volumes, which both rewards similarity between the volumes while penalizing blurriness. We show that our method outperforms more expensive, smoothing based event stereo methods, by evaluating on the Multi Vehicle Stereo Event Camera dataset.
Tasks
Published	2018-03-24
URL	http://arxiv.org/abs/1803.09025v2
PDF	http://arxiv.org/pdf/1803.09025v2.pdf
PWC	https://paperswithcode.com/paper/realtime-time-synchronized-event-based-stereo
Repo
Framework

A Joint Sequence Fusion Model for Video Question Answering and Retrieval


Title	A Joint Sequence Fusion Model for Video Question Answering and Retrieval
Authors	Youngjae Yu, Jongseok Kim, Gunhee Kim
Abstract	We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e.g. a video clip and a language sentence). Our multimodal matching network consists of two key components. First, the Joint Semantic Tensor composes a dense pairwise representation of two sequence data into a 3D tensor. Then, the Convolutional Hierarchical Decoder computes their similarity score by discovering hidden hierarchical matches between the two sequence modalities. Both modules leverage hierarchical attention mechanisms that learn to promote well-matched representation patterns while prune out misaligned ones in a bottom-up manner. Although the JSFusion is a universal model to be applicable to any multimodal sequence data, this work focuses on video-language tasks including multimodal retrieval and video QA. We evaluate the JSFusion model in three retrieval and VQA tasks in LSMDC, for which our model achieves the best performance reported so far. We also perform multiple-choice and movie retrieval tasks for the MSR-VTT dataset, on which our approach outperforms many state-of-the-art methods.
Tasks	Question Answering, Semantic Similarity, Semantic Textual Similarity, Video Question Answering, Video Retrieval, Visual Question Answering
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02559v1
PDF	http://arxiv.org/pdf/1808.02559v1.pdf
PWC	https://paperswithcode.com/paper/a-joint-sequence-fusion-model-for-video
Repo
Framework

Attention to Head Locations for Crowd Counting


Title	Attention to Head Locations for Crowd Counting
Authors	Youmei Zhang, Chunluan Zhou, Faliang Chang, Alex C. Kot
Abstract	Occlusions, complex backgrounds, scale variations and non-uniform distributions present great challenges for crowd counting in practical applications. In this paper, we propose a novel method using an attention model to exploit head locations which are the most important cue for crowd counting. The attention model estimates a probability map in which high probabilities indicate locations where heads are likely to be present. The estimated probability map is used to suppress non-head regions in feature maps from several multi-scale feature extraction branches of a convolution neural network for crowd density estimation, which makes our method robust to complex backgrounds, scale variations and non-uniform distributions. In addition, we introduce a relative deviation loss to compensate a commonly used training loss, Euclidean distance, to improve the accuracy of sparse crowd density estimation. Experiments on Shanghai-Tech, UCF_CC_50 and World-Expo’10 data sets demonstrate the effectiveness of our method.
Tasks	Crowd Counting, Density Estimation
Published	2018-06-27
URL	http://arxiv.org/abs/1806.10287v1
PDF	http://arxiv.org/pdf/1806.10287v1.pdf
PWC	https://paperswithcode.com/paper/attention-to-head-locations-for-crowd
Repo
Framework

Inception-Residual Block based Neural Network for Thermal Image Denoising


Title	Inception-Residual Block based Neural Network for Thermal Image Denoising
Authors	Seongmin Hwang, Gwanghyun Yu, Huy Toan Nguyen, Nazeer Shahid, Doseong Sin, Jinyoung Kim, Seungyou Na
Abstract	Thermal cameras show noisy images due to their limited thermal resolution, especially for the scenes of a low temperature difference. In order to deal with a noise problem, this paper proposes a novel neural network architecture with repeatable denoising inception-residual blocks(DnIRB) for noise learning. Each DnIRB has two sub-blocks with difference receptive fields and one shortcut connection to prevent a vanishing gradient problem. The proposed approach is tested for thermal images. The experimental results indicate that the proposed approach shows the best SQNR performance and reasonable processing time compared with state-of-the-art denoising methods.
Tasks	Denoising, Image Denoising, Thermal Image Denoising
Published	2018-10-31
URL	http://arxiv.org/abs/1810.13169v2
PDF	http://arxiv.org/pdf/1810.13169v2.pdf
PWC	https://paperswithcode.com/paper/inception-residual-block-based-neural-network
Repo
Framework

On the Complexity of the Weighted Fused Lasso


Title	On the Complexity of the Weighted Fused Lasso
Authors	Jose Bento, Ralph Furmaniak, Surjyendu Ray
Abstract	The solution path of the 1D fused lasso for an $n$-dimensional input is piecewise linear with $\mathcal{O}(n)$ segments (Hoefling et al. 2010 and Tibshirani et al 2011). However, existing proofs of this bound do not hold for the weighted fused lasso. At the same time, results for the generalized lasso, of which the weighted fused lasso is a special case, allow $\Omega(3^n)$ segments (Mairal et al. 2012). In this paper, we prove that the number of segments in the solution path of the weighted fused lasso is $\mathcal{O}(n^2)$, and that, for some instances, it is $\Omega(n^2)$. We also give a new, very simple, proof of the $\mathcal{O}(n)$ bound for the fused lasso.
Tasks
Published	2018-01-15
URL	http://arxiv.org/abs/1801.04987v3
PDF	http://arxiv.org/pdf/1801.04987v3.pdf
PWC	https://paperswithcode.com/paper/on-the-complexity-of-the-weighted-fused-lasso
Repo
Framework

How Developers Iterate on Machine Learning Workflows – A Survey of the Applied Machine Learning Literature


Title	How Developers Iterate on Machine Learning Workflows – A Survey of the Applied Machine Learning Literature
Authors	Doris Xin, Litian Ma, Shuchen Song, Aditya Parameswaran
Abstract	Machine learning workflow development is anecdotally regarded to be an iterative process of trial-and-error with humans-in-the-loop. However, we are not aware of quantitative evidence corroborating this popular belief. A quantitative characterization of iteration can serve as a benchmark for machine learning workflow development in practice, and can aid the development of human-in-the-loop machine learning systems. To this end, we conduct a small-scale survey of the applied machine learning literature from five distinct application domains. We collect and distill statistics on the role of iteration within machine learning workflow development, and report preliminary trends and insights from our investigation, as a starting point towards this benchmark. Based on our findings, we finally describe desiderata for effective and versatile human-in-the-loop machine learning systems that can cater to users in diverse domains.
Tasks
Published	2018-03-27
URL	http://arxiv.org/abs/1803.10311v2
PDF	http://arxiv.org/pdf/1803.10311v2.pdf
PWC	https://paperswithcode.com/paper/how-developers-iterate-on-machine-learning
Repo
Framework

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures


Title	Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures
Authors	Sergey Bartunov, Adam Santoro, Blake A. Richards, Luke Marris, Geoffrey E. Hinton, Timothy Lillicrap
Abstract	The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has proved critical, or in architectures more structured than simple fully-connected networks. Here we present results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance. We present results on the MNIST, CIFAR-10, and ImageNet datasets and explore variants of target-propagation (TP) and feedback alignment (FA) algorithms, and explore performance in both fully- and locally-connected architectures. We also introduce weight-transport-free variants of difference target propagation (DTP) modified to remove backpropagation from the penultimate layer. Many of these algorithms perform well for MNIST, but for CIFAR and ImageNet we find that TP and FA variants perform significantly worse than BP, especially for networks composed of locally connected units, opening questions about whether new architectures and algorithms are required to scale these approaches. Our results and implementation details help establish baselines for biologically motivated deep learning schemes going forward.
Tasks
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04587v2
PDF	http://arxiv.org/pdf/1807.04587v2.pdf
PWC	https://paperswithcode.com/paper/assessing-the-scalability-of-biologically
Repo
Framework

Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification


Title	Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification
Authors	Guodong Ding, Shanshan Zhang, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli
Abstract	Person re-identification aims to match a person’s identity across multiple camera streams. Deep neural networks have been successfully applied to the challenging person re-identification task. One remarkable bottleneck is that the existing deep models are data hungry and require large amounts of labeled training data. Acquiring manual annotations for pedestrian identity matchings in large-scale surveillance camera installations is a highly cumbersome task. Here, we propose the first semi-supervised approach that performs pseudo-labeling by considering complex relationships between unlabeled and labeled training samples in the feature space. Our approach first approximates the actual data manifold by learning a generative model via adversarial training. Given the trained model, data augmentation can be performed by generating new synthetic data samples which are unlabeled. An open research problem is how to effectively use this additional data for improved feature learning. To this end, this work proposes a novel Feature Affinity based Pseudo-Labeling (FAPL) approach with two possible label encodings under a unified setting. Our approach measures the affinity of unlabeled samples with the underlying clusters of labeled data samples using the intermediate feature representations from deep networks. FAPL trains with the joint supervision of cross-entropy loss together with a center regularization term, which not only ensures discriminative feature representation learning but also simultaneously predicts pseudo-labels for unlabeled data. Our extensive experiments on two standard large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate significant performance boosts over closely related competitors and outperforms state-of-the-art person re-identification techniques in most cases.
Tasks	Data Augmentation, Person Re-Identification, Representation Learning, Semi-Supervised Person Re-Identification
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06118v1
PDF	http://arxiv.org/pdf/1805.06118v1.pdf
PWC	https://paperswithcode.com/paper/feature-affinity-based-pseudo-labeling-for
Repo
Framework

Learning Deep Similarity Metric for 3D MR-TRUS Registration


Title	Learning Deep Similarity Metric for 3D MR-TRUS Registration
Authors	Grant Haskins, Jochen Kruecker, Uwe Kruger, Sheng Xu, Peter A. Pinto, Brad J. Wood, Pingkun Yan
Abstract	Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images for guiding targeted prostate biopsy has significantly improved the biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image registration. However, it is very challenging to obtain a robust automatic MR-TRUS registration due to the large appearance difference between the two imaging modalities. The work presented in this paper aims to tackle this problem by addressing two challenges: (i) the definition of a suitable similarity metric and (ii) the determination of a suitable optimization strategy. Methods: This work proposes the use of a deep convolutional neural network to learn a similarity metric for MR-TRUS registration. We also use a composite optimization strategy that explores the solution space in order to search for a suitable initialization for the second-order optimization of the learned metric. Further, a multi-pass approach is used in order to smooth the metric for optimization. Results: The learned similarity metric outperforms the classical mutual information and also the state-of-the-art MIND feature based methods. The results indicate that the overall registration framework has a large capture range. The proposed deep similarity metric based approach obtained a mean TRE of 3.86mm (with an initial TRE of 16mm) for this challenging problem. Conclusion: A similarity metric that is learned using a deep neural network can be used to assess the quality of any given image registration and can be used in conjunction with the aforementioned optimization framework to perform automatic registration that is robust to poor initialization.
Tasks	Image Registration
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04548v2
PDF	http://arxiv.org/pdf/1806.04548v2.pdf
PWC	https://paperswithcode.com/paper/learning-deep-similarity-metric-for-3d-mr
Repo
Framework

Liveness Detection Using Implicit 3D Features


Title	Liveness Detection Using Implicit 3D Features
Authors	J. Matias Di Martino, Qiang Qiu, Trishul Nagenalli, Guillermo Sapiro
Abstract	Spoofing attacks are a threat to modern face recognition systems. In this work we present a simple yet effective liveness detection approach to enhance 2D face recognition methods and make them robust against spoofing attacks. We show that the risk to spoofing attacks can be re- duced through the use of an additional source of light, for example a flash. From a pair of input images taken under different illumination, we define discriminative features that implicitly contain facial three-dimensional in- formation. Furthermore, we show that when multiple sources of light are considered, we are able to validate which one has been activated. This makes possible the design of a highly secure active-light authentication framework. Finally, further investigating the use of 3D features without 3D reconstruction, we introduce an approximated disparity-based implicit 3D feature obtained from an uncalibrated stereo-pair of cameras. Valida- tion experiments show that the proposed methods produce state-of-the-art results in challenging scenarios with nearly no feature extraction latency.
Tasks	3D Reconstruction, Face Recognition
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06702v2
PDF	http://arxiv.org/pdf/1804.06702v2.pdf
PWC	https://paperswithcode.com/paper/liveness-detection-using-implicit-3d-features
Repo
Framework

Using Machine Learning to Improve Cylindrical Algebraic Decomposition


Title	Using Machine Learning to Improve Cylindrical Algebraic Decomposition
Authors	Zongyan Huang, Matthew England, David Wilson, James H. Davenport, Lawrence C. Paulson
Abstract	Cylindrical Algebraic Decomposition (CAD) is a key tool in computational algebraic geometry, best known as a procedure to enable Quantifier Elimination over real-closed fields. However, it has a worst case complexity doubly exponential in the size of the input, which is often encountered in practice. It has been observed that for many problems a change in algorithm settings or problem formulation can cause huge differences in runtime costs, changing problem instances from intractable to easy. A number of heuristics have been developed to help with such choices, but the complicated nature of the geometric relationships involved means these are imperfect and can sometimes make poor choices. We investigate the use of machine learning (specifically support vector machines) to make such choices instead. Machine learning is the process of fitting a computer model to a complex function based on properties learned from measured data. In this paper we apply it in two case studies: the first to select between heuristics for choosing a CAD variable ordering; the second to identify when a CAD problem instance would benefit from Groebner Basis preconditioning. These appear to be the first such applications of machine learning to Symbolic Computation. We demonstrate in both cases that the machine learned choice outperforms human developed heuristics.
Tasks
Published	2018-04-26
URL	http://arxiv.org/abs/1804.10520v1
PDF	http://arxiv.org/pdf/1804.10520v1.pdf
PWC	https://paperswithcode.com/paper/using-machine-learning-to-improve-cylindrical
Repo
Framework

Policy Certificates: Towards Accountable Reinforcement Learning


Title	Policy Certificates: Towards Accountable Reinforcement Learning
Authors	Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill
Abstract	The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms is the first to achieve minimax-optimal PAC bounds up to lower-order terms, and this algorithm also matches (and in some settings slightly improves upon) existing minimax regret bounds.
Tasks
Published	2018-11-07
URL	https://arxiv.org/abs/1811.03056v3
PDF	https://arxiv.org/pdf/1811.03056v3.pdf
PWC	https://paperswithcode.com/paper/policy-certificates-towards-accountable
Repo
Framework

Learning Pixel-wise Labeling from the Internet without Human Interaction


Title	Learning Pixel-wise Labeling from the Internet without Human Interaction
Authors	Yun Liu, Yujun Shi, JiaWang Bian, Le Zhang, Ming-Ming Cheng, Jiashi Feng
Abstract	Deep learning stands at the forefront in many computer vision tasks. However, deep neural networks are usually data-hungry and require a huge amount of well-annotated training samples. Collecting sufficient annotated data is very expensive in many applications, especially for pixel-level prediction tasks such as semantic segmentation. To solve this fundamental issue, we consider a new challenging vision task, Internetly supervised semantic segmentation, which only uses Internet data with noisy image-level supervision of corresponding query keywords for segmentation model training. We address this task by proposing the following solution. A class-specific attention model unifying multiscale forward and backward convolutional features is proposed to provide initial segmentation “ground truth”. The model trained with such noisy annotations is then improved by an online fine-tuning procedure. It achieves state-of-the-art performance under the weakly-supervised setting on PASCAL VOC2012 dataset. The proposed framework also paves a new way towards learning from the Internet without human interaction and could serve as a strong baseline therein. Code and data will be released upon the paper acceptance.
Tasks	Semantic Segmentation
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07548v1
PDF	http://arxiv.org/pdf/1805.07548v1.pdf
PWC	https://paperswithcode.com/paper/learning-pixel-wise-labeling-from-the
Repo
Framework

Explore-Exploit: A Framework for Interactive and Online Learning


Title	Explore-Exploit: A Framework for Interactive and Online Learning
Authors	Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin
Abstract	Interactive user interfaces need to continuously evolve based on the interactions that a user has (or does not have) with the system. This may require constant exploration of various options that the system may have for the user and obtaining signals of user preferences on those. However, such an exploration, especially when the set of available options itself can change frequently, can lead to sub-optimal user experiences. We present Explore-Exploit: a framework designed to collect and utilize user feedback in an interactive and online setting that minimizes regressions in end-user experience. This framework provides a suite of online learning operators for various tasks such as personalization ranking, candidate selection and active learning. We demonstrate how to integrate this framework with run-time services to leverage online and interactive machine learning out-of-the-box. We also present results demonstrating the efficiencies that can be achieved using the Explore-Exploit framework.
Tasks	Active Learning
Published	2018-12-01
URL	http://arxiv.org/abs/1812.00116v1
PDF	http://arxiv.org/pdf/1812.00116v1.pdf
PWC	https://paperswithcode.com/paper/explore-exploit-a-framework-for-interactive
Repo
Framework

OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages


Title	OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages
Authors	Jyoti Jha, Sreekavitha Parupalli, Navjyoti Singh
Abstract	Following approaches for understanding lexical meaning developed by Yaska, Patanjali and Bhartrihari from Indian linguistic traditions and extending approaches developed by Leibniz and Brentano in the modern times, a framework of formal ontology of language was developed. This framework proposes that meaning of words are in-formed by intrinsic and extrinsic ontological structures. The paper aims to capture such intrinsic and extrinsic meanings of words for two major Indian languages, namely, Hindi and Telugu. Parts-of-speech have been rendered into sense-types and sense-classes. Using them we have developed a gold- standard annotated lexical resource to support semantic understanding of a language. The resource has collection of Hindi and Telugu lexicons, which has been manually annotated by native speakers of the languages following our annotation guidelines. Further, the resource was utilised to derive adverbial sense-class distribution of verbs and karaka-verb sense- type distribution. Different corpora (news, novels) were compared using verb sense-types distribution. Word Embedding was used as an aid for the enrichment of the resource. This is a work in progress that aims at lexical coverage of language extensively.
Tasks
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00694v1
PDF	http://arxiv.org/pdf/1808.00694v1.pdf
PWC	https://paperswithcode.com/paper/ontosensenet-a-verb-centric-ontological
Repo
Framework