Paper Group ANR 490
Realtime Time Synchronized Event-based Stereo. A Joint Sequence Fusion Model for Video Question Answering and Retrieval. Attention to Head Locations for Crowd Counting. Inception-Residual Block based Neural Network for Thermal Image Denoising. On the Complexity of the Weighted Fused Lasso. How Developers Iterate on Machine Learning Workflows – A S …
Realtime Time Synchronized Event-based Stereo
Title | Realtime Time Synchronized Event-based Stereo |
Authors | Alex Zihao Zhu, Yibo Chen, Kostas Daniilidis |
Abstract | In this work, we propose a novel event based stereo method which addresses the problem of motion blur for a moving event camera. Our method uses the velocity of the camera and a range of disparities to synchronize the positions of the events, as if they were captured at a single point in time. We represent these events using a pair of novel time synchronized event disparity volumes, which we show remove motion blur for pixels at the correct disparity in the volume, while further blurring pixels at the wrong disparity. We then apply a novel matching cost over these time synchronized event disparity volumes, which both rewards similarity between the volumes while penalizing blurriness. We show that our method outperforms more expensive, smoothing based event stereo methods, by evaluating on the Multi Vehicle Stereo Event Camera dataset. |
Tasks | |
Published | 2018-03-24 |
URL | http://arxiv.org/abs/1803.09025v2 |
http://arxiv.org/pdf/1803.09025v2.pdf | |
PWC | https://paperswithcode.com/paper/realtime-time-synchronized-event-based-stereo |
Repo | |
Framework | |
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
Title | A Joint Sequence Fusion Model for Video Question Answering and Retrieval |
Authors | Youngjae Yu, Jongseok Kim, Gunhee Kim |
Abstract | We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e.g. a video clip and a language sentence). Our multimodal matching network consists of two key components. First, the Joint Semantic Tensor composes a dense pairwise representation of two sequence data into a 3D tensor. Then, the Convolutional Hierarchical Decoder computes their similarity score by discovering hidden hierarchical matches between the two sequence modalities. Both modules leverage hierarchical attention mechanisms that learn to promote well-matched representation patterns while prune out misaligned ones in a bottom-up manner. Although the JSFusion is a universal model to be applicable to any multimodal sequence data, this work focuses on video-language tasks including multimodal retrieval and video QA. We evaluate the JSFusion model in three retrieval and VQA tasks in LSMDC, for which our model achieves the best performance reported so far. We also perform multiple-choice and movie retrieval tasks for the MSR-VTT dataset, on which our approach outperforms many state-of-the-art methods. |
Tasks | Question Answering, Semantic Similarity, Semantic Textual Similarity, Video Question Answering, Video Retrieval, Visual Question Answering |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02559v1 |
http://arxiv.org/pdf/1808.02559v1.pdf | |
PWC | https://paperswithcode.com/paper/a-joint-sequence-fusion-model-for-video |
Repo | |
Framework | |
Attention to Head Locations for Crowd Counting
Title | Attention to Head Locations for Crowd Counting |
Authors | Youmei Zhang, Chunluan Zhou, Faliang Chang, Alex C. Kot |
Abstract | Occlusions, complex backgrounds, scale variations and non-uniform distributions present great challenges for crowd counting in practical applications. In this paper, we propose a novel method using an attention model to exploit head locations which are the most important cue for crowd counting. The attention model estimates a probability map in which high probabilities indicate locations where heads are likely to be present. The estimated probability map is used to suppress non-head regions in feature maps from several multi-scale feature extraction branches of a convolution neural network for crowd density estimation, which makes our method robust to complex backgrounds, scale variations and non-uniform distributions. In addition, we introduce a relative deviation loss to compensate a commonly used training loss, Euclidean distance, to improve the accuracy of sparse crowd density estimation. Experiments on Shanghai-Tech, UCF_CC_50 and World-Expo’10 data sets demonstrate the effectiveness of our method. |
Tasks | Crowd Counting, Density Estimation |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10287v1 |
http://arxiv.org/pdf/1806.10287v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-to-head-locations-for-crowd |
Repo | |
Framework | |
Inception-Residual Block based Neural Network for Thermal Image Denoising
Title | Inception-Residual Block based Neural Network for Thermal Image Denoising |
Authors | Seongmin Hwang, Gwanghyun Yu, Huy Toan Nguyen, Nazeer Shahid, Doseong Sin, Jinyoung Kim, Seungyou Na |
Abstract | Thermal cameras show noisy images due to their limited thermal resolution, especially for the scenes of a low temperature difference. In order to deal with a noise problem, this paper proposes a novel neural network architecture with repeatable denoising inception-residual blocks(DnIRB) for noise learning. Each DnIRB has two sub-blocks with difference receptive fields and one shortcut connection to prevent a vanishing gradient problem. The proposed approach is tested for thermal images. The experimental results indicate that the proposed approach shows the best SQNR performance and reasonable processing time compared with state-of-the-art denoising methods. |
Tasks | Denoising, Image Denoising, Thermal Image Denoising |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13169v2 |
http://arxiv.org/pdf/1810.13169v2.pdf | |
PWC | https://paperswithcode.com/paper/inception-residual-block-based-neural-network |
Repo | |
Framework | |
On the Complexity of the Weighted Fused Lasso
Title | On the Complexity of the Weighted Fused Lasso |
Authors | Jose Bento, Ralph Furmaniak, Surjyendu Ray |
Abstract | The solution path of the 1D fused lasso for an $n$-dimensional input is piecewise linear with $\mathcal{O}(n)$ segments (Hoefling et al. 2010 and Tibshirani et al 2011). However, existing proofs of this bound do not hold for the weighted fused lasso. At the same time, results for the generalized lasso, of which the weighted fused lasso is a special case, allow $\Omega(3^n)$ segments (Mairal et al. 2012). In this paper, we prove that the number of segments in the solution path of the weighted fused lasso is $\mathcal{O}(n^2)$, and that, for some instances, it is $\Omega(n^2)$. We also give a new, very simple, proof of the $\mathcal{O}(n)$ bound for the fused lasso. |
Tasks | |
Published | 2018-01-15 |
URL | http://arxiv.org/abs/1801.04987v3 |
http://arxiv.org/pdf/1801.04987v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-complexity-of-the-weighted-fused-lasso |
Repo | |
Framework | |
How Developers Iterate on Machine Learning Workflows – A Survey of the Applied Machine Learning Literature
Title | How Developers Iterate on Machine Learning Workflows – A Survey of the Applied Machine Learning Literature |
Authors | Doris Xin, Litian Ma, Shuchen Song, Aditya Parameswaran |
Abstract | Machine learning workflow development is anecdotally regarded to be an iterative process of trial-and-error with humans-in-the-loop. However, we are not aware of quantitative evidence corroborating this popular belief. A quantitative characterization of iteration can serve as a benchmark for machine learning workflow development in practice, and can aid the development of human-in-the-loop machine learning systems. To this end, we conduct a small-scale survey of the applied machine learning literature from five distinct application domains. We collect and distill statistics on the role of iteration within machine learning workflow development, and report preliminary trends and insights from our investigation, as a starting point towards this benchmark. Based on our findings, we finally describe desiderata for effective and versatile human-in-the-loop machine learning systems that can cater to users in diverse domains. |
Tasks | |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.10311v2 |
http://arxiv.org/pdf/1803.10311v2.pdf | |
PWC | https://paperswithcode.com/paper/how-developers-iterate-on-machine-learning |
Repo | |
Framework | |
Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures
Title | Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures |
Authors | Sergey Bartunov, Adam Santoro, Blake A. Richards, Luke Marris, Geoffrey E. Hinton, Timothy Lillicrap |
Abstract | The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has proved critical, or in architectures more structured than simple fully-connected networks. Here we present results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance. We present results on the MNIST, CIFAR-10, and ImageNet datasets and explore variants of target-propagation (TP) and feedback alignment (FA) algorithms, and explore performance in both fully- and locally-connected architectures. We also introduce weight-transport-free variants of difference target propagation (DTP) modified to remove backpropagation from the penultimate layer. Many of these algorithms perform well for MNIST, but for CIFAR and ImageNet we find that TP and FA variants perform significantly worse than BP, especially for networks composed of locally connected units, opening questions about whether new architectures and algorithms are required to scale these approaches. Our results and implementation details help establish baselines for biologically motivated deep learning schemes going forward. |
Tasks | |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04587v2 |
http://arxiv.org/pdf/1807.04587v2.pdf | |
PWC | https://paperswithcode.com/paper/assessing-the-scalability-of-biologically |
Repo | |
Framework | |
Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification
Title | Feature Affinity based Pseudo Labeling for Semi-supervised Person Re-identification |
Authors | Guodong Ding, Shanshan Zhang, Salman Khan, Zhenmin Tang, Jian Zhang, Fatih Porikli |
Abstract | Person re-identification aims to match a person’s identity across multiple camera streams. Deep neural networks have been successfully applied to the challenging person re-identification task. One remarkable bottleneck is that the existing deep models are data hungry and require large amounts of labeled training data. Acquiring manual annotations for pedestrian identity matchings in large-scale surveillance camera installations is a highly cumbersome task. Here, we propose the first semi-supervised approach that performs pseudo-labeling by considering complex relationships between unlabeled and labeled training samples in the feature space. Our approach first approximates the actual data manifold by learning a generative model via adversarial training. Given the trained model, data augmentation can be performed by generating new synthetic data samples which are unlabeled. An open research problem is how to effectively use this additional data for improved feature learning. To this end, this work proposes a novel Feature Affinity based Pseudo-Labeling (FAPL) approach with two possible label encodings under a unified setting. Our approach measures the affinity of unlabeled samples with the underlying clusters of labeled data samples using the intermediate feature representations from deep networks. FAPL trains with the joint supervision of cross-entropy loss together with a center regularization term, which not only ensures discriminative feature representation learning but also simultaneously predicts pseudo-labels for unlabeled data. Our extensive experiments on two standard large-scale datasets, Market-1501 and DukeMTMC-reID, demonstrate significant performance boosts over closely related competitors and outperforms state-of-the-art person re-identification techniques in most cases. |
Tasks | Data Augmentation, Person Re-Identification, Representation Learning, Semi-Supervised Person Re-Identification |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06118v1 |
http://arxiv.org/pdf/1805.06118v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-affinity-based-pseudo-labeling-for |
Repo | |
Framework | |
Learning Deep Similarity Metric for 3D MR-TRUS Registration
Title | Learning Deep Similarity Metric for 3D MR-TRUS Registration |
Authors | Grant Haskins, Jochen Kruecker, Uwe Kruger, Sheng Xu, Peter A. Pinto, Brad J. Wood, Pingkun Yan |
Abstract | Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images for guiding targeted prostate biopsy has significantly improved the biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image registration. However, it is very challenging to obtain a robust automatic MR-TRUS registration due to the large appearance difference between the two imaging modalities. The work presented in this paper aims to tackle this problem by addressing two challenges: (i) the definition of a suitable similarity metric and (ii) the determination of a suitable optimization strategy. Methods: This work proposes the use of a deep convolutional neural network to learn a similarity metric for MR-TRUS registration. We also use a composite optimization strategy that explores the solution space in order to search for a suitable initialization for the second-order optimization of the learned metric. Further, a multi-pass approach is used in order to smooth the metric for optimization. Results: The learned similarity metric outperforms the classical mutual information and also the state-of-the-art MIND feature based methods. The results indicate that the overall registration framework has a large capture range. The proposed deep similarity metric based approach obtained a mean TRE of 3.86mm (with an initial TRE of 16mm) for this challenging problem. Conclusion: A similarity metric that is learned using a deep neural network can be used to assess the quality of any given image registration and can be used in conjunction with the aforementioned optimization framework to perform automatic registration that is robust to poor initialization. |
Tasks | Image Registration |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04548v2 |
http://arxiv.org/pdf/1806.04548v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-similarity-metric-for-3d-mr |
Repo | |
Framework | |
Liveness Detection Using Implicit 3D Features
Title | Liveness Detection Using Implicit 3D Features |
Authors | J. Matias Di Martino, Qiang Qiu, Trishul Nagenalli, Guillermo Sapiro |
Abstract | Spoofing attacks are a threat to modern face recognition systems. In this work we present a simple yet effective liveness detection approach to enhance 2D face recognition methods and make them robust against spoofing attacks. We show that the risk to spoofing attacks can be re- duced through the use of an additional source of light, for example a flash. From a pair of input images taken under different illumination, we define discriminative features that implicitly contain facial three-dimensional in- formation. Furthermore, we show that when multiple sources of light are considered, we are able to validate which one has been activated. This makes possible the design of a highly secure active-light authentication framework. Finally, further investigating the use of 3D features without 3D reconstruction, we introduce an approximated disparity-based implicit 3D feature obtained from an uncalibrated stereo-pair of cameras. Valida- tion experiments show that the proposed methods produce state-of-the-art results in challenging scenarios with nearly no feature extraction latency. |
Tasks | 3D Reconstruction, Face Recognition |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06702v2 |
http://arxiv.org/pdf/1804.06702v2.pdf | |
PWC | https://paperswithcode.com/paper/liveness-detection-using-implicit-3d-features |
Repo | |
Framework | |
Using Machine Learning to Improve Cylindrical Algebraic Decomposition
Title | Using Machine Learning to Improve Cylindrical Algebraic Decomposition |
Authors | Zongyan Huang, Matthew England, David Wilson, James H. Davenport, Lawrence C. Paulson |
Abstract | Cylindrical Algebraic Decomposition (CAD) is a key tool in computational algebraic geometry, best known as a procedure to enable Quantifier Elimination over real-closed fields. However, it has a worst case complexity doubly exponential in the size of the input, which is often encountered in practice. It has been observed that for many problems a change in algorithm settings or problem formulation can cause huge differences in runtime costs, changing problem instances from intractable to easy. A number of heuristics have been developed to help with such choices, but the complicated nature of the geometric relationships involved means these are imperfect and can sometimes make poor choices. We investigate the use of machine learning (specifically support vector machines) to make such choices instead. Machine learning is the process of fitting a computer model to a complex function based on properties learned from measured data. In this paper we apply it in two case studies: the first to select between heuristics for choosing a CAD variable ordering; the second to identify when a CAD problem instance would benefit from Groebner Basis preconditioning. These appear to be the first such applications of machine learning to Symbolic Computation. We demonstrate in both cases that the machine learned choice outperforms human developed heuristics. |
Tasks | |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1804.10520v1 |
http://arxiv.org/pdf/1804.10520v1.pdf | |
PWC | https://paperswithcode.com/paper/using-machine-learning-to-improve-cylindrical |
Repo | |
Framework | |
Policy Certificates: Towards Accountable Reinforcement Learning
Title | Policy Certificates: Towards Accountable Reinforcement Learning |
Authors | Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill |
Abstract | The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms is the first to achieve minimax-optimal PAC bounds up to lower-order terms, and this algorithm also matches (and in some settings slightly improves upon) existing minimax regret bounds. |
Tasks | |
Published | 2018-11-07 |
URL | https://arxiv.org/abs/1811.03056v3 |
https://arxiv.org/pdf/1811.03056v3.pdf | |
PWC | https://paperswithcode.com/paper/policy-certificates-towards-accountable |
Repo | |
Framework | |
Learning Pixel-wise Labeling from the Internet without Human Interaction
Title | Learning Pixel-wise Labeling from the Internet without Human Interaction |
Authors | Yun Liu, Yujun Shi, JiaWang Bian, Le Zhang, Ming-Ming Cheng, Jiashi Feng |
Abstract | Deep learning stands at the forefront in many computer vision tasks. However, deep neural networks are usually data-hungry and require a huge amount of well-annotated training samples. Collecting sufficient annotated data is very expensive in many applications, especially for pixel-level prediction tasks such as semantic segmentation. To solve this fundamental issue, we consider a new challenging vision task, Internetly supervised semantic segmentation, which only uses Internet data with noisy image-level supervision of corresponding query keywords for segmentation model training. We address this task by proposing the following solution. A class-specific attention model unifying multiscale forward and backward convolutional features is proposed to provide initial segmentation “ground truth”. The model trained with such noisy annotations is then improved by an online fine-tuning procedure. It achieves state-of-the-art performance under the weakly-supervised setting on PASCAL VOC2012 dataset. The proposed framework also paves a new way towards learning from the Internet without human interaction and could serve as a strong baseline therein. Code and data will be released upon the paper acceptance. |
Tasks | Semantic Segmentation |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07548v1 |
http://arxiv.org/pdf/1805.07548v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-pixel-wise-labeling-from-the |
Repo | |
Framework | |
Explore-Exploit: A Framework for Interactive and Online Learning
Title | Explore-Exploit: A Framework for Interactive and Online Learning |
Authors | Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin |
Abstract | Interactive user interfaces need to continuously evolve based on the interactions that a user has (or does not have) with the system. This may require constant exploration of various options that the system may have for the user and obtaining signals of user preferences on those. However, such an exploration, especially when the set of available options itself can change frequently, can lead to sub-optimal user experiences. We present Explore-Exploit: a framework designed to collect and utilize user feedback in an interactive and online setting that minimizes regressions in end-user experience. This framework provides a suite of online learning operators for various tasks such as personalization ranking, candidate selection and active learning. We demonstrate how to integrate this framework with run-time services to leverage online and interactive machine learning out-of-the-box. We also present results demonstrating the efficiencies that can be achieved using the Explore-Exploit framework. |
Tasks | Active Learning |
Published | 2018-12-01 |
URL | http://arxiv.org/abs/1812.00116v1 |
http://arxiv.org/pdf/1812.00116v1.pdf | |
PWC | https://paperswithcode.com/paper/explore-exploit-a-framework-for-interactive |
Repo | |
Framework | |
OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages
Title | OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages |
Authors | Jyoti Jha, Sreekavitha Parupalli, Navjyoti Singh |
Abstract | Following approaches for understanding lexical meaning developed by Yaska, Patanjali and Bhartrihari from Indian linguistic traditions and extending approaches developed by Leibniz and Brentano in the modern times, a framework of formal ontology of language was developed. This framework proposes that meaning of words are in-formed by intrinsic and extrinsic ontological structures. The paper aims to capture such intrinsic and extrinsic meanings of words for two major Indian languages, namely, Hindi and Telugu. Parts-of-speech have been rendered into sense-types and sense-classes. Using them we have developed a gold- standard annotated lexical resource to support semantic understanding of a language. The resource has collection of Hindi and Telugu lexicons, which has been manually annotated by native speakers of the languages following our annotation guidelines. Further, the resource was utilised to derive adverbial sense-class distribution of verbs and karaka-verb sense- type distribution. Different corpora (news, novels) were compared using verb sense-types distribution. Word Embedding was used as an aid for the enrichment of the resource. This is a work in progress that aims at lexical coverage of language extensively. |
Tasks | |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.00694v1 |
http://arxiv.org/pdf/1808.00694v1.pdf | |
PWC | https://paperswithcode.com/paper/ontosensenet-a-verb-centric-ontological |
Repo | |
Framework | |