January 26, 2020

3198 words 16 mins read

Paper Group ANR 1510

Analysis of Deep Networks for Monocular Depth Estimation Through Adversarial Attacks with Proposal of a Defense Method. Improved Regret Bounds for Projection-free Bandit Convex Optimization. Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded. Minimal Learning Machine: Theoretical Results and Clustering-Based Ref …

Analysis of Deep Networks for Monocular Depth Estimation Through Adversarial Attacks with Proposal of a Defense Method


Title	Analysis of Deep Networks for Monocular Depth Estimation Through Adversarial Attacks with Proposal of a Defense Method
Authors	Junjie Hu, Takayuki Okatani
Abstract	In this paper, we consider adversarial attacks against a system of monocular depth estimation (MDE) based on convolutional neural networks (CNNs). The motivation is two-fold. One is to study the security of MDE systems, which has not been actively considered in the community. The other is to improve our understanding of the computational mechanism of CNNs performing MDE. Toward this end, we apply the method recently proposed for visualization of MDE to defending attacks. It trains another CNN to predict a saliency map from an input image, such that the CNN for MDE continues to accurately estimate the depth map from the image with its non-salient part masked out. We report the following findings. First, unsurprisingly, attacks by IFGSM (or equivalently PGD) succeed in making the CNNs yield inaccurate depth estimates. Second, the attacks can be defended by masking out non-salient pixels, indicating that the attacks function by perturbing mostly non-salient pixels. However, the prediction of saliency maps is itself vulnerable to the attacks, even though it is not the direct target of the attacks. We show that the attacks can be defended by using a saliency map predicted by a CNN trained to be robust to the attacks. These results provide an effective defense method as well as a clue to understanding the computational mechanism of CNNs for MDE.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08790v1
PDF	https://arxiv.org/pdf/1911.08790v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-deep-networks-for-monocular-depth
Repo
Framework

Improved Regret Bounds for Projection-free Bandit Convex Optimization


Title	Improved Regret Bounds for Projection-free Bandit Convex Optimization
Authors	Dan Garber, Ben Kretzu
Abstract	We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on the conditional gradient method whose only access to the feasible decision set, is through a linear optimization oracle (as opposed to other methods which require potentially much more computationally-expensive subprocedures, such as computing Euclidean projections). We present the first such algorithm that attains $O(T^{3/4})$ expected regret using only $O(T)$ overall calls to the linear optimization oracle, in expectation, where $T$ is the number of prediction rounds. This improves over the $O(T^{4/5})$ expected regret bound recently obtained by \cite{Karbasi19}, and actually matches the current best regret bound for projection-free online learning in the \textit{full information} setting.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03374v1
PDF	https://arxiv.org/pdf/1910.03374v1.pdf
PWC	https://paperswithcode.com/paper/improved-regret-bounds-for-projection-free
Repo
Framework

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded


Title	Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
Authors	Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh
Abstract	Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image. In this work, we propose a generic approach called Human Importance-aware Network Tuning (HINT) that effectively leverages human demonstrations to improve visual grounding. HINT encourages deep networks to be sensitive to the same input regions as humans. Our approach optimizes the alignment between human attention maps and gradient-based network importances - ensuring that models learn not just to look at but rather rely on visual concepts that humans found relevant for a task when making predictions. We apply HINT to Visual Question Answering and Image Captioning tasks, outperforming top approaches on splits that penalize over-reliance on language priors (VQA-CP and robust captioning) using human attention demonstrations for just 6% of the training data.
Tasks	Image Captioning, Question Answering, Visual Question Answering
Published	2019-02-11
URL	https://arxiv.org/abs/1902.03751v2
PDF	https://arxiv.org/pdf/1902.03751v2.pdf
PWC	https://paperswithcode.com/paper/taking-a-hint-leveraging-explanations-to-make
Repo
Framework

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection


Title	Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection
Authors	Joonas Hämäläinen, Alisson S. C. Alencar, Tommi Kärkkäinen, César L. C. Mattos, Amauri H. Souza Júnior, João P. P. Gomes
Abstract	The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated concerning a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM’s generalization capability; furthermore, we assess several clustering-based methods in regression scenarios. Based on an extensive empirical evaluation, we conclude that the evaluated methods are both scalable and useful. Specifically, for a small number of reference points, the clustering-based methods outperformed the standard random selection of the original MLM formulation.
Tasks
Published	2019-09-22
URL	https://arxiv.org/abs/1909.09978v1
PDF	https://arxiv.org/pdf/1909.09978v1.pdf
PWC	https://paperswithcode.com/paper/190909978
Repo
Framework

A Tree Pattern Matching Algorithm for XML Queries with Structural Preferences


Title	A Tree Pattern Matching Algorithm for XML Queries with Structural Preferences
Authors	Maurice Tchoupé Tchendji, Lionel Tadonfouet, Thomas Tébougang Tchendji
Abstract	In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD-Document Type Definition, Schema, etc.) increases the risk of ob-taining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as “*"). The reason is that in both cases, users write queries according to the document model they have in mind; this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evalua-tions. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results; moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the pro-posed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03053v1
PDF	https://arxiv.org/pdf/1906.03053v1.pdf
PWC	https://paperswithcode.com/paper/a-tree-pattern-matching-algorithm-for-xml
Repo
Framework

Research on the Concept of Liquid State Machine


Title	Research on the Concept of Liquid State Machine
Authors	Gideon Gbenga Oladipupo
Abstract	Liquid State Machine (LSM) is a neural model with real time computations which transforms the time varying inputs stream to a higher dimensional space. The concept of LSM is a novel field of research in biological inspired computation with most research effort on training the model as well as finding the optimum learning method. In this review, the performance of LSM model was investigated using two learning method, online learning and offline (batch) learning methods. The review revealed that optimal performance of LSM was recorded through online method as computational space and other complexities associated with batch learning is eliminated.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03354v1
PDF	https://arxiv.org/pdf/1910.03354v1.pdf
PWC	https://paperswithcode.com/paper/research-on-the-concept-of-liquid-state
Repo
Framework

VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting


Title	VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting
Authors	Divyam Madaan, Radhika Dua, Prerana Mukherjee, Brejesh Lall
Abstract	Air pollution is the leading environmental health hazard globally due to various sources which include factory emissions, car exhaust and cooking stoves. As a precautionary measure, air pollution forecast serves as the basis for taking effective pollution control measures, and accurate air pollution forecasting has become an important task. In this paper, we forecast fine-grained ambient air quality information for 5 prominent locations in Delhi based on the historical and real-time ambient air quality and meteorological data reported by Central Pollution Control board. We present VayuAnukulani system, a novel end-to-end solution to predict air quality for next 24 hours by estimating the concentration and level of different air pollutants including nitrogen dioxide ($NO_2$), particulate matter ($PM_{2.5}$ and $PM_{10}$) for Delhi. Extensive experiments on data sources obtained in Delhi demonstrate that the proposed adaptive attention based Bidirectional LSTM Network outperforms several baselines for classification and regression models. The accuracy of the proposed adaptive system is $\sim 15 - 20%$ better than the same offline trained model. We compare the proposed methodology on several competing baselines, and show that the network outperforms conventional methods by $\sim 3 - 5 %$.
Tasks
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03977v1
PDF	http://arxiv.org/pdf/1904.03977v1.pdf
PWC	https://paperswithcode.com/paper/vayuanukulani-adaptive-memory-networks-for
Repo
Framework

Bipartite Conditional Random Fields for Panoptic Segmentation


Title	Bipartite Conditional Random Fields for Panoptic Segmentation
Authors	Sadeep Jayasumana, Kanchana Ranasinghe, Mayuka Jayawardhana, Sahan Liyanaarachchi, Harsha Ranasinghe
Abstract	We tackle the panoptic segmentation problem with a conditional random field (CRF) model. Panoptic segmentation involves assigning a semantic label and an instance label to each pixel of a given image. At each pixel, the semantic label and the instance label should be compatible. Furthermore, a good panoptic segmentation should have a number of other desirable properties such as the spatial and color consistency of the labeling (similar looking neighboring pixels should have the same semantic label and the instance label). To tackle this problem, we propose a CRF model, named Bipartite CRF or BCRF, with two types of random variables for semantic and instance labels. In this formulation, various energies are defined within and across the two types of random variables to encourage a consistent panoptic segmentation. We propose a mean-field-based efficient inference algorithm for solving the CRF and empirically show its convergence properties. This algorithm is fully differentiable, and therefore, BCRF inference can be included as a trainable module in a deep network. In the experimental evaluation, we quantitatively and qualitatively show that the BCRF yields superior panoptic segmentation results in practice.
Tasks	Panoptic Segmentation
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05307v1
PDF	https://arxiv.org/pdf/1912.05307v1.pdf
PWC	https://paperswithcode.com/paper/bipartite-conditional-random-fields-for
Repo
Framework

Multimodal Machine Translation through Visuals and Speech


Title	Multimodal Machine Translation through Visuals and Speech
Authors	Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
Abstract	Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area are spoken language translation, image-guided translation, and video-guided translation, which exploit audio and visual modalities, respectively. These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language. This survey reviews the major data resources for these tasks, the evaluation campaigns concentrated around them, the state of the art in end-to-end and pipeline approaches, and also the challenges in performance evaluation. The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality in both the input and output space.
Tasks	Image Captioning, Machine Translation, Multimodal Machine Translation, Speech Recognition, Video Captioning
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12798v1
PDF	https://arxiv.org/pdf/1911.12798v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-machine-translation-through
Repo
Framework

ROMark: A Robust Watermarking System Using Adversarial Training


Title	ROMark: A Robust Watermarking System Using Adversarial Training
Authors	Bingyang Wen, Sergul Aydore
Abstract	The availability and easy access to digital communication increase the risk of copyrighted material piracy. In order to detect illegal use or distribution of data, digital watermarking has been proposed as a suitable tool. It protects the copyright of digital content by embedding imperceptible information into the data in the presence of an adversary. The goal of the adversary is to remove the copyrighted content of the data. Therefore, an efficient watermarking framework must be robust to multiple image-processing operations known as attacks that can alter embedded copyright information. Another line of research \textit{adversarial machine learning} also tackles with similar problems to guarantee robustness to imperceptible perturbations of the input. In this work, we propose to apply robust optimization from adversarial machine learning to improve the robustness of a CNN-based watermarking framework. Our experimental results on the COCO dataset show that the robustness of a watermarking framework can be improved by utilizing robust optimization in training.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01221v1
PDF	https://arxiv.org/pdf/1910.01221v1.pdf
PWC	https://paperswithcode.com/paper/romark-a-robust-watermarking-system-using
Repo
Framework

Uniform convergence may be unable to explain generalization in deep learning


Title	Uniform convergence may be unable to explain generalization in deep learning
Authors	Vaishnavh Nagarajan, J. Zico Kolter
Abstract	Aimed at explaining the surprisingly good generalization behavior of overparameterized deep networks, recent works have developed a variety of generalization bounds for deep learning, all based on the fundamental learning-theoretic technique of uniform convergence. While it is well-known that many of these existing bounds are numerically large, through numerous experiments, we bring to light a more concerning aspect of these bounds: in practice, these bounds can {\em increase} with the training dataset size. Guided by our observations, we then present examples of overparameterized linear classifiers and neural networks trained by gradient descent (GD) where uniform convergence provably cannot ``explain generalization’’ – even if we take into account the implicit bias of GD {\em to the fullest extent possible}. More precisely, even if we consider only the set of classifiers output by GD, which have test errors less than some small $\epsilon$ in our settings, we show that applying (two-sided) uniform convergence on this set of classifiers will yield only a vacuous generalization guarantee larger than $1-\epsilon$. Through these findings, we cast doubt on the power of uniform convergence-based generalization bounds to provide a complete picture of why overparameterized deep networks generalize well. \|
Tasks
Published	2019-02-13
URL	https://arxiv.org/abs/1902.04742v3
PDF	https://arxiv.org/pdf/1902.04742v3.pdf
PWC	https://paperswithcode.com/paper/uniform-convergence-may-be-unable-to-explain
Repo
Framework

Fusion of heterogeneous bands and kernels in hyperspectral image processing


Title	Fusion of heterogeneous bands and kernels in hyperspectral image processing
Authors	Muhammad Aminul Islam, Derek T. Anderson, John E. Ball, Nicolas H. Younan
Abstract	Hyperspectral imaging is a powerful technology that is plagued by large dimensionality. Herein, we explore a way to combat that hindrance via non-contiguous and contiguous (simpler to realize sensor) band grouping for dimensionality reduction. Our approach is different in the respect that it is flexible and it follows a well-studied process of visual clustering in high-dimensional spaces. Specifically, we extend the improved visual assessment of cluster tendency and clustering in ordered dissimilarity data unsupervised clustering algorithms for supervised hyperspectral learning. In addition, we propose a way to extract diverse features via the use of different proximity metrics (ways to measure the similarity between bands) and kernel functions. The discovered features are fused with $l_{\infty}$-norm multiple kernel learning. Experiments are conducted on two benchmark datasets and our results are compared to related work. These datasets indicate that contiguous or not is application specific, but heterogeneous features and kernels usually lead to performance gain.
Tasks	Dimensionality Reduction
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09698v1
PDF	https://arxiv.org/pdf/1905.09698v1.pdf
PWC	https://paperswithcode.com/paper/fusion-of-heterogeneous-bands-and-kernels-in
Repo
Framework

Geometry-Aware Maximum Likelihood Estimation of Intrinsic Dimension


Title	Geometry-Aware Maximum Likelihood Estimation of Intrinsic Dimension
Authors	Marina Gomtsyan, Nikita Mokrov, Maxim Panov, Yury Yanovich
Abstract	The existing approaches to intrinsic dimension estimation usually are not reliable when the data are nonlinearly embedded in the high dimensional space. In this work, we show that the explicit accounting to geometric properties of unknown support leads to the polynomial correction to the standard maximum likelihood estimate of intrinsic dimension for flat manifolds. The proposed algorithm (GeoMLE) realizes the correction by regression of standard MLEs based on distances to nearest neighbors for different sizes of neighborhoods. Moreover, the proposed approach also efficiently handles the case of nonuniform sampling of the manifold. We perform numerous experiments on different synthetic and real-world datasets. The results show that our algorithm achieves state-of-the-art performance, while also being computationally efficient and robust to noise in the data.
Tasks
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06151v1
PDF	http://arxiv.org/pdf/1904.06151v1.pdf
PWC	https://paperswithcode.com/paper/geometry-aware-maximum-likelihood-estimation
Repo
Framework

Ultrafast Video Attention Prediction with Coupled Knowledge Distillation


Title	Ultrafast Video Attention Prediction with Coupled Knowledge Distillation
Authors	Kui Fu, Peipei Shi, Yafei Song, Shiming Ge, Xiangju Lu, Jia Li
Abstract	Large convolutional neural network models have recently demonstrated impressive performance on video attention prediction. Conventionally, these models are with intensive computation and large memory. To address these issues, we design an extremely light-weight network with ultrafast speed, named UVA-Net. The network is constructed based on depth-wise convolutions and takes low-resolution images as input. However, this straight-forward acceleration method will decrease performance dramatically. To this end, we propose a coupled knowledge distillation strategy to augment and train the network effectively. With this strategy, the model can further automatically discover and emphasize implicit useful cues contained in the data. Both spatial and temporal knowledge learned by the high-resolution complex teacher networks also can be distilled and transferred into the proposed low-resolution light-weight spatiotemporal network. Experimental results show that the performance of our model is comparable to 11 state-of-the-art models in video attention prediction, while it costs only 0.68 MB memory footprint, runs about 10,106 FPS on GPU and 404 FPS on CPU, which is 206 times faster than previous models.
Tasks
Published	2019-04-09
URL	https://arxiv.org/abs/1904.04449v2
PDF	https://arxiv.org/pdf/1904.04449v2.pdf
PWC	https://paperswithcode.com/paper/ultrafast-video-attention-prediction-with
Repo
Framework

Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation


Title	Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation
Authors	Zeyu Wang, Klint Qinami, Yannis Karakozis, Kyle Genova, Prem Nair, Kenji Hata, Olga Russakovsky
Abstract	Computer vision models learn to perform a task by capturing relevant statistics from training data. It has been shown that models learn spurious age, gender, and race correlations when trained for seemingly unrelated tasks like activity recognition or image captioning. Various mitigation techniques have been presented to prevent models from utilizing or learning such biases. However, there has been little systematic comparison between these techniques. We design a simple but surprisingly effective visual recognition benchmark for studying bias mitigation. Using this benchmark, we provide a thorough analysis of a wide range of techniques. We highlight the shortcomings of popular adversarial training approaches for bias mitigation, propose a simple but similarly effective alternative to the inference-time Reducing Bias Amplification method of Zhao et al., and design a domain-independent training technique that outperforms all other methods. Finally, we validate our findings on the attribute classification task in the CelebA dataset, where attribute presence is known to be correlated with the gender of people in the image, and demonstrate that the proposed technique is effective at mitigating real-world gender bias.
Tasks	Activity Recognition, Image Captioning
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11834v1
PDF	https://arxiv.org/pdf/1911.11834v1.pdf
PWC	https://paperswithcode.com/paper/towards-fairness-in-visual-recognition
Repo
Framework