October 16, 2019

3111 words 15 mins read

Paper Group ANR 1057

Paper Group ANR 1057

Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution. Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View. Self-Improving Visual Odometry. Learning to Detect. Semi-supervised Hashing for Semi-Paired Cross-View Retrieval. Non-attracting Regions of Local Minima in Deep and Wide Neural …

Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution

Title Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution
Authors Dimitrios I. Diochnos, Saeed Mahloujifar, Mohammad Mahmoody
Abstract We study adversarial perturbations when the instances are uniformly distributed over ${0,1}^n$. We study both “inherent” bounds that apply to any problem and any classifier for such a problem as well as bounds that apply to specific problems and specific hypothesis classes. As the current literature contains multiple definitions of adversarial risk and robustness, we start by giving a taxonomy for these definitions based on their goals, we identify one of them as the one guaranteeing misclassification by pushing the instances to the error region. We then study some classic algorithms for learning monotone conjunctions and compare their adversarial risk and robustness under different definitions by attacking the hypotheses using instances drawn from the uniform distribution. We observe that sometimes these definitions lead to significantly different bounds. Thus, this study advocates for the use of the error-region definition, even though other definitions, in other contexts, may coincide with the error-region definition. Using the error-region definition of adversarial perturbations, we then study inherent bounds on risk and robustness of any classifier for any classification problem whose instances are uniformly distributed over ${0,1}^n$. Using the isoperimetric inequality for the Boolean hypercube, we show that for initial error $0.01$, there always exists an adversarial perturbation that changes $O(\sqrt{n})$ bits of the instances to increase the risk to $0.5$, making classifier’s decisions meaningless. Furthermore, by also using the central limit theorem we show that when $n\to \infty$, at most $c \cdot \sqrt{n}$ bits of perturbations, for a universal constant $c< 1.17$, suffice for increasing the risk to $0.5$, and the same $c \cdot \sqrt{n} $ bits of perturbations on average suffice to increase the risk to $1$, hence bounding the robustness by $c \cdot \sqrt{n}$.
Tasks
Published 2018-10-29
URL http://arxiv.org/abs/1810.12272v1
PDF http://arxiv.org/pdf/1810.12272v1.pdf
PWC https://paperswithcode.com/paper/adversarial-risk-and-robustness-general
Repo
Framework

Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View

Title Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View
Authors Albert Pumarola, Antonio Agudo, Lorenzo Porzi, Alberto Sanfeliu, Vincent Lepetit, Francesc Moreno-Noguer
Abstract We propose a method for predicting the 3D shape of a deformable surface from a single view. By contrast with previous approaches, we do not need a pre-registered template of the surface, and our method is robust to the lack of texture and partial occlusions. At the core of our approach is a {\it geometry-aware} deep architecture that tackles the problem as usually done in analytic solutions: first perform 2D detection of the mesh and then estimate a 3D shape that is geometrically consistent with the image. We train this architecture in an end-to-end manner using a large dataset of synthetic renderings of shapes under different levels of deformation, material properties, textures and lighting conditions. We evaluate our approach on a test split of this dataset and available real benchmarks, consistently improving state-of-the-art solutions with a significantly lower computational time.
Tasks
Published 2018-09-27
URL http://arxiv.org/abs/1809.10305v1
PDF http://arxiv.org/pdf/1809.10305v1.pdf
PWC https://paperswithcode.com/paper/geometry-aware-network-for-non-rigid-shape
Repo
Framework

Self-Improving Visual Odometry

Title Self-Improving Visual Odometry
Authors Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich
Abstract We propose a self-supervised learning framework that uses unlabeled monocular video sequences to generate large-scale supervision for training a Visual Odometry (VO) frontend, a network which computes pointwise data associations across images. Our self-improving method enables a VO frontend to learn over time, unlike other VO and SLAM systems which require time-consuming hand-tuning or expensive data collection to adapt to new environments. Our proposed frontend operates on monocular images and consists of a single multi-task convolutional neural network which outputs 2D keypoints locations, keypoint descriptors, and a novel point stability score. We use the output of VO to create a self-supervised dataset of point correspondences to retrain the frontend. When trained using VO at scale on 2.5 million monocular images from ScanNet, the stability classifier automatically discovers a ranking for keypoints that are not likely to help in VO, such as t-junctions across depth discontinuities, features on shadows and highlights, and dynamic objects like people. The resulting frontend outperforms both traditional methods (SIFT, ORB, AKAZE) and deep learning methods (SuperPoint and LF-Net) in a 3D-to-2D pose estimation task on ScanNet.
Tasks Pose Estimation, Visual Odometry
Published 2018-12-08
URL http://arxiv.org/abs/1812.03245v1
PDF http://arxiv.org/pdf/1812.03245v1.pdf
PWC https://paperswithcode.com/paper/self-improving-visual-odometry
Repo
Framework

Learning to Detect

Title Learning to Detect
Authors Neev Samuel, Tzvi Diskin, Ami Wiesel
Abstract In this paper we consider Multiple-Input-Multiple-Output (MIMO) detection using deep neural networks. We introduce two different deep architectures: a standard fully connected multi-layer network, and a Detection Network (DetNet) which is specifically designed for the task. The structure of DetNet is obtained by unfolding the iterations of a projected gradient descent algorithm into a network. We compare the accuracy and runtime complexity of the purposed approaches and achieve state-of-the-art performance while maintaining low computational requirements. Furthermore, we manage to train a single network to detect over an entire distribution of channels. Finally, we consider detection with soft outputs and show that the networks can easily be modified to produce soft decisions.
Tasks
Published 2018-05-19
URL http://arxiv.org/abs/1805.07631v1
PDF http://arxiv.org/pdf/1805.07631v1.pdf
PWC https://paperswithcode.com/paper/learning-to-detect
Repo
Framework

Semi-supervised Hashing for Semi-Paired Cross-View Retrieval

Title Semi-supervised Hashing for Semi-Paired Cross-View Retrieval
Authors Jun Yu, Xiao-Jun Wu, Josef Kittler
Abstract Recently, hashing techniques have gained importance in large-scale retrieval tasks because of their retrieval speed. Most of the existing cross-view frameworks assume that data are well paired. However, the fully-paired multiview situation is not universal in real applications. The aim of the method proposed in this paper is to learn the hashing function for semi-paired cross-view retrieval tasks. To utilize the label information of partial data, we propose a semi-supervised hashing learning framework which jointly performs feature extraction and classifier learning. The experimental results on two datasets show that our method outperforms several state-of-the-art methods in terms of retrieval accuracy.
Tasks
Published 2018-06-19
URL http://arxiv.org/abs/1806.07155v1
PDF http://arxiv.org/pdf/1806.07155v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-hashing-for-semi-paired-cross
Repo
Framework

Non-attracting Regions of Local Minima in Deep and Wide Neural Networks

Title Non-attracting Regions of Local Minima in Deep and Wide Neural Networks
Authors Henning Petzka, Cristian Sminchisescu
Abstract Understanding the loss surface of neural networks is essential for the design of models with predictable performance and their success in applications. Experimental results suggest that sufficiently deep and wide neural networks are not negatively impacted by suboptimal local minima. Despite recent progress, the reason for this outcome is not fully understood. Could deep networks have very few, if at all, suboptimal local optima? or could all of them be equally good? We provide a construction to show that suboptimal local minima (i.e. non-global ones), even though degenerate, exist for fully connected neural networks with sigmoid activation functions. The local minima obtained by our proposed construction belong to a connected set of local solutions that can be escaped from via a non-increasing path on the loss curve. For extremely wide neural networks with two hidden layers, we prove that every suboptimal local minimum belongs to such a connected set. This provides a partial explanation for the successful application of deep neural networks. In addition, we also characterize under what conditions the same construction leads to saddle points instead of local minima for deep neural networks.
Tasks
Published 2018-12-16
URL https://arxiv.org/abs/1812.06486v3
PDF https://arxiv.org/pdf/1812.06486v3.pdf
PWC https://paperswithcode.com/paper/non-attracting-regions-of-local-minima-in
Repo
Framework

Analyzing and Characterizing User Intent in Information-seeking Conversations

Title Analyzing and Characterizing User Intent in Information-seeking Conversations
Authors Chen Qu, Liu Yang, W. Bruce Croft, Johanne R. Trippas, Yongfeng Zhang, Minghui Qiu
Abstract Understanding and characterizing how people interact in information-seeking conversations is crucial in developing conversational search systems. In this paper, we introduce a new dataset designed for this purpose and use it to analyze information-seeking conversations by user intent distribution, co-occurrence, and flow patterns. The MSDialog dataset is a labeled dialog dataset of question answering (QA) interactions between information seekers and providers from an online forum on Microsoft products. The dataset contains more than 2,000 multi-turn QA dialogs with 10,000 utterances that are annotated with user intent on the utterance level. Annotations were done using crowdsourcing. With MSDialog, we find some highly recurring patterns in user intent during an information-seeking process. They could be useful for designing conversational search systems. We will make our dataset freely available to encourage exploration of information-seeking conversation models.
Tasks Question Answering
Published 2018-04-23
URL http://arxiv.org/abs/1804.08759v1
PDF http://arxiv.org/pdf/1804.08759v1.pdf
PWC https://paperswithcode.com/paper/analyzing-and-characterizing-user-intent-in
Repo
Framework

Unsupervised Low-Dimensional Vector Representations for Words, Phrases and Text that are Transparent, Scalable, and produce Similarity Metrics that are Complementary to Neural Embeddings

Title Unsupervised Low-Dimensional Vector Representations for Words, Phrases and Text that are Transparent, Scalable, and produce Similarity Metrics that are Complementary to Neural Embeddings
Authors Neil R. Smalheiser, Gary Bonifield
Abstract Neural embeddings are a popular set of methods for representing words, phrases or text as a low dimensional vector (typically 50-500 dimensions). However, it is difficult to interpret these dimensions in a meaningful manner, and creating neural embeddings requires extensive training and tuning of multiple parameters and hyperparameters. We present here a simple unsupervised method for representing words, phrases or text as a low dimensional vector, in which the meaning and relative importance of dimensions is transparent to inspection. We have created a near-comprehensive vector representation of words, and selected bigrams, trigrams and abbreviations, using the set of titles and abstracts in PubMed as a corpus. This vector is used to create several novel implicit word-word and text-text similarity metrics. The implicit word-word similarity metrics correlate well with human judgement of word pair similarity and relatedness, and outperform or equal all other reported methods on a variety of biomedical benchmarks, including several implementations of neural embeddings trained on PubMed corpora. Our implicit word-word metrics capture different aspects of word-word relatedness than word2vec-based metrics and are only partially correlated (rho = ~0.5-0.8 depending on task and corpus). The vector representations of words, bigrams, trigrams, abbreviations, and PubMed title+abstracts are all publicly available from http://arrowsmith.psych.uic.edu for release under CC-BY-NC license. Several public web query interfaces are also available at the same site, including one which allows the user to specify a given word and view its most closely related terms according to direct co-occurrence as well as different implicit similarity metrics.
Tasks
Published 2018-01-05
URL http://arxiv.org/abs/1801.01884v2
PDF http://arxiv.org/pdf/1801.01884v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-low-dimensional-vector
Repo
Framework

Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms for Audio Tagging

Title Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms for Audio Tagging
Authors Marcel Lederle, Benjamin Wilhelm
Abstract In this paper, we describe our contribution to Task 2 of the DCASE 2018 Audio Challenge. While it has become ubiquitous to utilize an ensemble of machine learning methods for classification tasks to obtain better predictive performance, the majority of ensemble methods combine predictions rather than learned features. We propose a single-model method that combines learned high-level features computed from log-scaled mel-spectrograms and raw audio data. These features are learned separately by two Convolutional Neural Networks, one for each input type, and then combined by densely connected layers within a single network. This relatively simple approach along with data augmentation ranks among the best two percent in the Freesound General-Purpose Audio Tagging Challenge on Kaggle.
Tasks Audio Tagging, Data Augmentation
Published 2018-11-26
URL http://arxiv.org/abs/1811.10708v1
PDF http://arxiv.org/pdf/1811.10708v1.pdf
PWC https://paperswithcode.com/paper/combining-high-level-features-of-raw-audio
Repo
Framework

Dynamic Objects Segmentation for Visual Localization in Urban Environments

Title Dynamic Objects Segmentation for Visual Localization in Urban Environments
Authors Guoxiang Zhou, Berta Bescos, Marcin Dymczyk, Mark Pfeiffer, José Neira, Roland Siegwart
Abstract Visual localization and mapping is a crucial capability to address many challenges in mobile robotics. It constitutes a robust, accurate and cost-effective approach for local and global pose estimation within prior maps. Yet, in highly dynamic environments, like crowded city streets, problems arise as major parts of the image can be covered by dynamic objects. Consequently, visual odometry pipelines often diverge and the localization systems malfunction as detected features are not consistent with the precomputed 3D model. In this work, we present an approach to automatically detect dynamic object instances to improve the robustness of vision-based localization and mapping in crowded environments. By training a convolutional neural network model with a combination of synthetic and real-world data, dynamic object instance masks are learned in a semi-supervised way. The real-world data can be collected with a standard camera and requires minimal further post-processing. Our experiments show that a wide range of dynamic objects can be reliably detected using the presented method. Promising performance is demonstrated on our own and also publicly available datasets, which also shows the generalization capabilities of this approach.
Tasks Pose Estimation, Visual Localization, Visual Odometry
Published 2018-07-09
URL http://arxiv.org/abs/1807.02996v1
PDF http://arxiv.org/pdf/1807.02996v1.pdf
PWC https://paperswithcode.com/paper/dynamic-objects-segmentation-for-visual
Repo
Framework

Training of a Skull-Stripping Neural Network with efficient data augmentation

Title Training of a Skull-Stripping Neural Network with efficient data augmentation
Authors Gabriele Valvano, Nicola Martini, Andrea Leo, Gianmarco Santini, Daniele Della Latta, Emiliano Ricciardi, Dante Chiappino
Abstract Skull-stripping methods aim to remove the non-brain tissue from acquisition of brain scans in magnetic resonance (MR) imaging. Although several methods sharing this common purpose have been presented in literature, they all suffer from the great variability of the MR images. In this work we propose a novel approach based on Convolutional Neural Networks to automatically perform the brain extraction obtaining cutting-edge performance in the NFBS public database. Additionally, we focus on the efficient training of the neural network designing an effective data augmentation pipeline. Obtained results are evaluated through Dice metric, obtaining a value of 96.5%, and processing time, with 4.5s per volume.
Tasks Data Augmentation, Skull Stripping
Published 2018-10-25
URL http://arxiv.org/abs/1810.10853v1
PDF http://arxiv.org/pdf/1810.10853v1.pdf
PWC https://paperswithcode.com/paper/training-of-a-skull-stripping-neural-network
Repo
Framework

Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing

Title Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing
Authors Jean Maillard, Stephen Clark
Abstract Latent tree learning models represent sentences by composing their words according to an induced parse tree, all based on a downstream task. These models often outperform baselines which use (externally provided) syntax trees to drive the composition order. This work contributes (a) a new latent tree learning model based on shift-reduce parsing, with competitive downstream performance and non-trivial induced trees, and (b) an analysis of the trees learned by our shift-reduce model and by a chart-based model.
Tasks
Published 2018-06-03
URL http://arxiv.org/abs/1806.00840v1
PDF http://arxiv.org/pdf/1806.00840v1.pdf
PWC https://paperswithcode.com/paper/latent-tree-learning-with-differentiable
Repo
Framework

Generative Adversarial Frontal View to Bird View Synthesis

Title Generative Adversarial Frontal View to Bird View Synthesis
Authors Xinge Zhu, Zhichao Yin, Jianping Shi, Hongsheng Li, Dahua Lin
Abstract Environment perception is an important task with great practical value and bird view is an essential part for creating panoramas of surrounding environment. Due to the large gap and severe deformation between the frontal view and bird view, generating a bird view image from a single frontal view is challenging. To tackle this problem, we propose the BridgeGAN, i.e., a novel generative model for bird view synthesis. First, an intermediate view, i.e., homography view, is introduced to bridge the large gap. Next, conditioned on the three views (frontal view, homography view and bird view) in our task, a multi-GAN based model is proposed to learn the challenging cross-view translation. Extensive experiments conducted on a synthetic dataset have demonstrated that the images generated by our model are much better than those generated by existing methods, with more consistent global appearance and sharper details. Ablation studies and discussions show its reliability and robustness in some challenging cases.
Tasks Bird View Synthesis, Homography Estimation
Published 2018-08-01
URL http://arxiv.org/abs/1808.00327v3
PDF http://arxiv.org/pdf/1808.00327v3.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-frontal-view-to-bird
Repo
Framework

A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization

Title A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization
Authors Sendong Zhao, Ting Liu, Sicheng Zhao, Fei Wang
Abstract State-of-the-art studies have demonstrated the superiority of joint modelling over pipeline implementation for medical named entity recognition and normalization due to the mutual benefits between the two processes. To exploit these benefits in a more sophisticated way, we propose a novel deep neural multi-task learning framework with explicit feedback strategies to jointly model recognition and normalization. On one hand, our method benefits from the general representations of both tasks provided by multi-task learning. On the other hand, our method successfully converts hierarchical tasks into a parallel multi-task setting while maintaining the mutual supports between tasks. Both of these aspects improve the model performance. Experimental results demonstrate that our method performs significantly better than state-of-the-art approaches on two publicly available medical literature datasets.
Tasks Medical Named Entity Recognition, Multi-Task Learning, Named Entity Recognition
Published 2018-12-14
URL http://arxiv.org/abs/1812.06081v1
PDF http://arxiv.org/pdf/1812.06081v1.pdf
PWC https://paperswithcode.com/paper/a-neural-multi-task-learning-framework-to
Repo
Framework

Discovering Fair Representations in the Data Domain

Title Discovering Fair Representations in the Data Domain
Authors Novi Quadrianto, Viktoriia Sharmanska, Oliver Thomas
Abstract Interpretability and fairness are critical in computer vision and machine learning applications, in particular when dealing with human outcomes, e.g. inviting or not inviting for a job interview based on application materials that may include photographs. One promising direction to achieve fairness is by learning data representations that remove the semantics of protected characteristics, and are therefore able to mitigate unfair outcomes. All available models however learn latent embeddings which comes at the cost of being uninterpretable. We propose to cast this problem as data-to-data translation, i.e. learning a mapping from an input domain to a fair target domain, where a fairness definition is being enforced. Here the data domain can be images, or any tabular data representation. This task would be straightforward if we had fair target data available, but this is not the case. To overcome this, we learn a highly unconstrained mapping by exploiting statistics of residuals - the difference between input data and its translated version - and the protected characteristics. When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions. Intriguingly, on the same dataset we arrive at similar conclusions when using semantic attribute representations of images for translation. On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions. In the Adult income dataset, also with protected gender attribute, our model achieves equality of opportunity by, among others, obfuscating the wife and husband relationship. Analyzing those systematic changes will allow us to scrutinize the interplay of fairness criterion, chosen protected characteristics, and prediction performance.
Tasks
Published 2018-10-15
URL http://arxiv.org/abs/1810.06755v2
PDF http://arxiv.org/pdf/1810.06755v2.pdf
PWC https://paperswithcode.com/paper/neural-styling-for-interpretable-fair
Repo
Framework
comments powered by Disqus