Paper Group ANR 136
Translating Visual Art into Music. Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations. WhiteNNer-Blind Image Denoising via Noise Whiteness Priors. Combining Parametric and Nonparametric Models for Off-Policy Evaluation. PUTWorkbench: Analysing Privacy in AI-intensive Systems. A Decentralized Proximal Point-type Method …
Translating Visual Art into Music
Title | Translating Visual Art into Music |
Authors | Maximilian Müller-Eberstein, Nanne van Noord |
Abstract | The Synesthetic Variational Autoencoder (SynVAE) introduced in this research is able to learn a consistent mapping between visual and auditive sensory modalities in the absence of paired datasets. A quantitative evaluation on MNIST as well as the Behance Artistic Media dataset (BAM) shows that SynVAE is capable of retaining sufficient information content during the translation while maintaining cross-modal latent space consistency. In a qualitative evaluation trial, human evaluators were furthermore able to match musical samples with the images which generated them with accuracies of up to 73%. |
Tasks | |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01218v1 |
https://arxiv.org/pdf/1909.01218v1.pdf | |
PWC | https://paperswithcode.com/paper/translating-visual-art-into-music |
Repo | |
Framework | |
Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations
Title | Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations |
Authors | Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li |
Abstract | Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach. |
Tasks | Machine Translation |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01181v1 |
https://arxiv.org/pdf/1906.01181v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-zero-shot-neural-machine-translation |
Repo | |
Framework | |
WhiteNNer-Blind Image Denoising via Noise Whiteness Priors
Title | WhiteNNer-Blind Image Denoising via Noise Whiteness Priors |
Authors | Saeed Izadi, Zahra Mirikharaji, Mengliu Zhao, Ghassan Hamarneh |
Abstract | The accuracy of medical imaging-based diagnostics is directly impacted by the quality of the collected images. A passive approach to improve image quality is one that lags behind improvements in imaging hardware, awaiting better sensor technology of acquisition devices. An alternative, active strategy is to utilize prior knowledge of the imaging system to directly post-process and improve the acquired images. Traditionally, priors about the image properties are taken into account to restrict the solution space. However, few techniques exploit the prior about the noise properties. In this paper, we propose a neural network-based model for disentangling the signal and noise components of an input noisy image, without the need for any ground truth training data. We design a unified loss function that encodes priors about signal as well as noise estimate in the form of regularization terms. Specifically, by using total variation and piecewise constancy priors along with noise whiteness priors such as auto-correlation and stationary losses, our network learns to decouple an input noisy image into the underlying signal and noise components. We compare our proposed method to Noise2Noise and Noise2Self, as well as non-local mean and BM3D, on three public confocal laser endomicroscopy datasets. Experimental results demonstrate the superiority of our network compared to state-of-the-art in terms of PSNR and SSIM. |
Tasks | Denoising, Image Denoising |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.03238v2 |
https://arxiv.org/pdf/1908.03238v2.pdf | |
PWC | https://paperswithcode.com/paper/whitenner-blind-image-denoising-via-noise |
Repo | |
Framework | |
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Title | Combining Parametric and Nonparametric Models for Off-Policy Evaluation |
Authors | Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez |
Abstract | We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at every time step as to minimize the return error estimate along entire trajectories. Across a variety of domains, our mixture-based approach outperforms the individual models alone as well as state-of-the-art importance sampling-based estimators. |
Tasks | |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05787v2 |
https://arxiv.org/pdf/1905.05787v2.pdf | |
PWC | https://paperswithcode.com/paper/combining-parametric-and-nonparametric-models |
Repo | |
Framework | |
PUTWorkbench: Analysing Privacy in AI-intensive Systems
Title | PUTWorkbench: Analysing Privacy in AI-intensive Systems |
Authors | Saurabh Srivastava, Vinay P. Namboodiri, T. V. Prabhakar |
Abstract | AI intensive systems that operate upon user data face the challenge of balancing data utility with privacy concerns. We propose the idea and present the prototype of an open-source tool called Privacy Utility Trade-off (PUT) Workbench which seeks to aid software practitioners to take such crucial decisions. We pick a simple privacy model that doesn’t require any background knowledge in Data Science and show how even that can achieve significant results over standard and real-life datasets. The tool and the source code is made freely available for extensions and usage. |
Tasks | |
Published | 2019-02-05 |
URL | http://arxiv.org/abs/1902.01580v1 |
http://arxiv.org/pdf/1902.01580v1.pdf | |
PWC | https://paperswithcode.com/paper/putworkbench-analysing-privacy-in-ai |
Repo | |
Framework | |
A Decentralized Proximal Point-type Method for Saddle Point Problems
Title | A Decentralized Proximal Point-type Method for Saddle Point Problems |
Authors | Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng |
Abstract | In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point method for solving this problem. We show that when the objective function is $\rho$-weakly convex-weakly concave the iterates converge to approximate stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$ where the approximation error depends linearly on $\sqrt{\rho}$. We further show that when the objective function satisfies the Minty VI condition (which generalizes the convex-concave case) we obtain convergence to stationarity with a rate of $\mathcal{O}(1/\sqrt{T})$. To the best of our knowledge, our proposed method is the first decentralized algorithm with theoretical guarantees for solving a non-convex non-concave decentralized saddle point problem. Our numerical results for training a general adversarial network (GAN) in a decentralized manner match our theoretical guarantees. |
Tasks | |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14380v1 |
https://arxiv.org/pdf/1910.14380v1.pdf | |
PWC | https://paperswithcode.com/paper/a-decentralized-proximal-point-type-method |
Repo | |
Framework | |
On the effect of the activation function on the distribution of hidden nodes in a deep network
Title | On the effect of the activation function on the distribution of hidden nodes in a deep network |
Authors | Philip M. Long, Hanie Sedghi |
Abstract | We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in ${ -1, 1}^N$. We show that, if the activation function $\phi$ satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the `length process’ converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases, and the activation function $\phi$. We also show that this convergence may fail for $\phi$ that violate our assumptions. | |
Tasks | |
Published | 2019-01-07 |
URL | http://arxiv.org/abs/1901.02104v1 |
http://arxiv.org/pdf/1901.02104v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-effect-of-the-activation-function-on |
Repo | |
Framework | |
Extracting clinical concepts from user queries
Title | Extracting clinical concepts from user queries |
Authors | Yue Zhao, John Handley |
Abstract | Clinical concept extraction often begins with clinical Named Entity Recognition (NER). Often trained on annotated clinical notes, clinical NER models tend to struggle with tagging clinical entities in user queries because of the structural differences between clinical notes and user queries. User queries, unlike clinical notes, are often ungrammatical and incoherent. In many cases, user queries are compounded of multiple clinical entities, without comma or conjunction words separating them. By using as dataset a mixture of annotated clinical notes and synthesized user queries, we adapt a clinical NER model based on the BiLSTM-CRF architecture for tagging clinical entities in user queries. Our contribution are the following: 1) We found that when trained on a mixture of synthesized user queries and clinical notes, the NER model performs better on both user queries and clinical notes. 2) We provide an end-to-end and easy-to-implement framework for clinical concept extraction from user queries. |
Tasks | Clinical Concept Extraction, Named Entity Recognition |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06262v2 |
https://arxiv.org/pdf/1912.06262v2.pdf | |
PWC | https://paperswithcode.com/paper/extracting-clinical-concepts-from-user |
Repo | |
Framework | |
Clinical Concept Extraction for Document-Level Coding
Title | Clinical Concept Extraction for Document-Level Coding |
Authors | Sarah Wiegreffe, Edward Choi, Sherry Yan, Jimeng Sun, Jacob Eisenstein |
Abstract | The text of clinical notes can be a valuable source of patient information and clinical assessments. Historically, the primary approach for exploiting clinical notes has been information extraction: linking spans of text to concepts in a detailed domain ontology. However, recent work has demonstrated the potential of supervised machine learning to extract document-level codes directly from the raw text of clinical notes. We propose to bridge the gap between the two approaches with two novel syntheses: (1) treating extracted concepts as features, which are used to supplement or replace the text of the note; (2) treating extracted concepts as labels, which are used to learn a better representation of the text. Unfortunately, the resulting concepts do not yield performance gains on the document-level clinical coding task. We explore possible explanations and future research directions. |
Tasks | Clinical Concept Extraction |
Published | 2019-06-08 |
URL | https://arxiv.org/abs/1906.03380v1 |
https://arxiv.org/pdf/1906.03380v1.pdf | |
PWC | https://paperswithcode.com/paper/clinical-concept-extraction-for-document |
Repo | |
Framework | |
Illuminated Decision Trees with Lucid
Title | Illuminated Decision Trees with Lucid |
Authors | David Mott, Richard Tomsett |
Abstract | The Lucid methods described by Olah et al. (2018) provide a way to inspect the inner workings of neural networks trained on image classification tasks using feature visualization. Such methods have generally been applied to networks trained on visually rich, large-scale image datasets like ImageNet, which enables them to produce enticing feature visualizations. To investigate these methods further, we applied them to classifiers trained to perform the much simpler (in terms of dataset size and visual richness), yet challenging task of distinguishing between different kinds of white blood cell from microscope images. Such a task makes generating useful feature visualizations difficult, as the discriminative features are inherently hard to identify and interpret. We address this by presenting the “Illuminated Decision Tree” approach, in which we use a neural network trained on the task as a feature extractor, then learn a decision tree based on these features, and provide Lucid visualizations for each node in the tree. We demonstrate our approach with several examples, showing how this approach could be useful both in model development and debugging, and when explaining model outputs to non-experts. |
Tasks | Image Classification |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.05644v1 |
https://arxiv.org/pdf/1909.05644v1.pdf | |
PWC | https://paperswithcode.com/paper/illuminated-decision-trees-with-lucid |
Repo | |
Framework | |
Condition-Invariant Multi-View Place Recognition
Title | Condition-Invariant Multi-View Place Recognition |
Authors | Jose M. Facil, Daniel Olid, Luis Montesano, Javier Civera |
Abstract | Visual place recognition is particularly challenging when places suffer changes in its appearance. Such changes are indeed common, e.g., due to weather, night/day or seasons. In this paper we leverage on recent research using deep networks, and explore how they can be improved by exploiting the temporal sequence information. Specifically, we propose 3 different alternatives (Descriptor Grouping, Fusion and Recurrent Descriptors) for deep networks to use several frames of a sequence. We show that our approaches produce more compact and best performing descriptors than single- and multi-view baselines in the literature in two public databases. |
Tasks | Visual Place Recognition |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09516v1 |
http://arxiv.org/pdf/1902.09516v1.pdf | |
PWC | https://paperswithcode.com/paper/condition-invariant-multi-view-place |
Repo | |
Framework | |
Feature Graph Learning for 3D Point Cloud Denoising
Title | Feature Graph Learning for 3D Point Cloud Denoising |
Authors | Wei Hu, Xiang Gao, Gene Cheung, Zongming Guo |
Abstract | Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix $\mathbf{L}$, but require a large number of signal observations $\mathbf{z}$'s for a stable estimate. In this work, we assume instead the availability of a relevant feature vector $\mathbf{f}i$ per node $i$, from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix $\mathbf{M}$ by minimizing the graph Laplacian regularizer (GLR) $\mathbf{z}^{\top} \mathbf{L} \mathbf{z}$, where edge weight is $w{i,j} = \exp{-(\mathbf{f}_i - \mathbf{f}_j)^{\top} \mathbf{M} (\mathbf{f}_i - \mathbf{f}j) }$, given a single observation $\mathbf{z}$. We optimize diagonal entries via proximal gradient (PG), where we constrain $\mathbf{M}$ to be positive definite (PD) via linear inequalities derived from the Gershgorin circle theorem. To optimize off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of $\mathbf{M}$. To keep $\mathbf{M}$ PD, we constrain the Schur complement of sub-matrix $\mathbf{M}{2,2}$ of $\mathbf{M}$ to be PD when optimizing via PG. Our algorithm mitigates full eigen-decomposition of $\mathbf{M}$, thus ensuring fast computation speed even when feature vector $\mathbf{f}_i$ has high dimension. To validate its usefulness, we apply our feature graph learning algorithm to the problem of 3D point cloud denoising, resulting in state-of-the-art performance compared to competing schemes in extensive experiments. |
Tasks | Denoising, Image Denoising |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09138v2 |
https://arxiv.org/pdf/1907.09138v2.pdf | |
PWC | https://paperswithcode.com/paper/feature-graph-learning-for-3d-point-cloud |
Repo | |
Framework | |
Fooling Computer Vision into Inferring the Wrong Body Mass Index
Title | Fooling Computer Vision into Inferring the Wrong Body Mass Index |
Authors | Owen Levin, Zihang Meng, Vikas Singh, Xiaojin Zhu |
Abstract | Recently it’s been shown that neural networks can use images of human faces to accurately predict Body Mass Index (BMI), a widely used health indicator. In this paper we demonstrate that a neural network performing BMI inference is indeed vulnerable to test-time adversarial attacks. This extends test-time adversarial attacks from classification tasks to regression. The application we highlight is BMI inference in the insurance industry, where such adversarial attacks imply a danger of insurance fraud. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06916v1 |
https://arxiv.org/pdf/1905.06916v1.pdf | |
PWC | https://paperswithcode.com/paper/fooling-computer-vision-into-inferring-the |
Repo | |
Framework | |
A Dictionary Based Generalization of Robust PCA
Title | A Dictionary Based Generalization of Robust PCA |
Authors | Sirisha Rambhatla, Xingguo Li, Jarvis Haupt |
Abstract | We analyze the decomposition of a data matrix, assumed to be a superposition of a low-rank component and a component which is sparse in a known dictionary, using a convex demixing method. We provide a unified analysis, encompassing both undercomplete and overcomplete dictionary cases, and show that the constituent components can be successfully recovered under some relatively mild assumptions up to a certain $\textit{global}$ sparsity level. Further, we corroborate our theoretical results by presenting empirical evaluations in terms of phase transitions in rank and sparsity for various dictionary sizes. |
Tasks | |
Published | 2019-02-21 |
URL | http://arxiv.org/abs/1902.08171v1 |
http://arxiv.org/pdf/1902.08171v1.pdf | |
PWC | https://paperswithcode.com/paper/a-dictionary-based-generalization-of-robust |
Repo | |
Framework | |
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
Title | On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization |
Authors | Hao Yu, Rong Jin, Sen Yang |
Abstract | Recent developments on large-scale distributed machine learning applications, e.g., deep neural networks, benefit enormously from the advances in distributed non-convex optimization techniques, e.g., distributed Stochastic Gradient Descent (SGD). A series of recent works study the linear speedup property of distributed SGD variants with reduced communication. The linear speedup property enable us to scale out the computing capability by adding more computing nodes into our system. The reduced communication complexity is desirable since communication overhead is often the performance bottleneck in distributed systems. Recently, momentum methods are more and more widely adopted in training machine learning models and can often converge faster and generalize better. For example, many practitioners use distributed SGD with momentum to train deep neural networks with big data. However, it remains unclear whether any distributed momentum SGD possesses the same linear speedup property as distributed SGD and has reduced communication complexity. This paper fills the gap by considering a distributed communication efficient momentum SGD method and proving its linear speedup property. |
Tasks | |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03817v1 |
https://arxiv.org/pdf/1905.03817v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-linear-speedup-analysis-of |
Repo | |
Framework | |