October 20, 2019

2984 words 15 mins read

Paper Group AWR 319

Paper Group AWR 319

Multimodal Differential Network for Visual Question Generation. Accurate and Diverse Sampling of Sequences based on a “Best of Many” Sample Objective. SparseMAP: Differentiable Sparse Structured Inference. Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision. Brain MRI Image Super Resolution using Phase Stretch T …

Multimodal Differential Network for Visual Question Generation

Title Multimodal Differential Network for Visual Question Generation
Authors Badri N. Patro, Sandeep Kumar, Vinod K. Kurmi, Vinay P. Namboodiri
Abstract Generating natural questions from an image is a semantic task that requires using visual and language modality to learn multimodal representations. Images can have multiple visual and language contexts that are relevant for generating questions namely places, captions, and tags. In this paper, we propose the use of exemplars for obtaining the relevant context. We obtain this by using a Multimodal Differential Network to produce natural and engaging questions. The generated questions show a remarkable similarity to the natural questions as validated by a human study. Further, we observe that the proposed approach substantially improves over state-of-the-art benchmarks on the quantitative metrics (BLEU, METEOR, ROUGE, and CIDEr).
Tasks Question Generation
Published 2018-08-12
URL https://arxiv.org/abs/1808.03986v2
PDF https://arxiv.org/pdf/1808.03986v2.pdf
PWC https://paperswithcode.com/paper/multimodal-differential-network-for-visual
Repo https://github.com/badripatro/MDN-VQG
Framework pytorch

Accurate and Diverse Sampling of Sequences based on a “Best of Many” Sample Objective

Title Accurate and Diverse Sampling of Sequences based on a “Best of Many” Sample Objective
Authors Apratim Bhattacharyya, Bernt Schiele, Mario Fritz
Abstract For autonomous agents to successfully operate in the real world, anticipation of future events and states of their environment is a key competence. This problem has been formalized as a sequence extrapolation problem, where a number of observations are used to predict the sequence into the future. Real-world scenarios demand a model of uncertainty of such predictions, as predictions become increasingly uncertain – in particular on long time horizons. While impressive results have been shown on point estimates, scenarios that induce multi-modal distributions over future sequences remain challenging. Our work addresses these challenges in a Gaussian Latent Variable model for sequence prediction. Our core contribution is a “Best of Many” sample objective that leads to more accurate and more diverse predictions that better capture the true variations in real-world sequence data. Beyond our analysis of improved model fit, our models also empirically outperform prior work on three diverse tasks ranging from traffic scenes to weather data.
Tasks
Published 2018-06-20
URL http://arxiv.org/abs/1806.07772v2
PDF http://arxiv.org/pdf/1806.07772v2.pdf
PWC https://paperswithcode.com/paper/accurate-and-diverse-sampling-of-sequences
Repo https://github.com/apratimbhattacharyya18/CGM_BestOfMany
Framework tf

SparseMAP: Differentiable Sparse Structured Inference

Title SparseMAP: Differentiable Sparse Structured Inference
Authors Vlad Niculae, André F. T. Martins, Mathieu Blondel, Claire Cardie
Abstract Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP: a new method for sparse structured inference, and its natural loss function. SparseMAP automatically selects only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structures, including implausible ones. Importantly, SparseMAP can be computed using only calls to a MAP oracle, making it applicable to problems with intractable marginal inference, e.g., linear assignment. Sparsity makes gradient backpropagation efficient regardless of the structure, enabling us to augment deep neural networks with generic and sparse structured hidden layers. Experiments in dependency parsing and natural language inference reveal competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems.
Tasks Dependency Parsing, Natural Language Inference, Structured Prediction
Published 2018-02-12
URL http://arxiv.org/abs/1802.04223v2
PDF http://arxiv.org/pdf/1802.04223v2.pdf
PWC https://paperswithcode.com/paper/sparsemap-differentiable-sparse-structured
Repo https://github.com/mblondel/projection-losses
Framework none

Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision

Title Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision
Authors Sathya N. Ravi, Tuan Dinh, Vishnu Sai Rao Lokhande, Vikas Singh
Abstract A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better accuracy to compression. But support for general constraints within widely used libraries remains scarce and their broader deployment within many applications that can benefit from them remains under-explored. Part of the reason is that Stochastic gradient descent (SGD), the workhorse for training deep neural networks, does not natively deal with constraints with global scope very well. In this paper, we revisit a classical first order scheme from numerical optimization, Conditional Gradients (CG), that has, thus far had limited applicability in training deep models. We show via rigorous analysis how various constraints can be naturally handled by modifications of this algorithm. We provide convergence guarantees and show a suite of immediate benefits that are possible – from training ResNets with fewer layers but better accuracy simply by substituting in our version of CG to faster training of GANs with 50% fewer epochs in image inpainting applications to provably better generalization guarantees using efficiently implementable forms of recently proposed regularizers.
Tasks Image Inpainting
Published 2018-03-17
URL http://arxiv.org/abs/1803.06453v1
PDF http://arxiv.org/pdf/1803.06453v1.pdf
PWC https://paperswithcode.com/paper/constrained-deep-learning-using-conditional
Repo https://github.com/lokhande-vishnu/deepcg
Framework tf

Brain MRI Image Super Resolution using Phase Stretch Transform and Transfer Learning

Title Brain MRI Image Super Resolution using Phase Stretch Transform and Transfer Learning
Authors Sifeng He, Bahram Jalali
Abstract A hallucination-free and computationally efficient algorithm for enhancing the resolution of brain MRI images is demonstrated.
Tasks Image Super-Resolution, Medical Super-Resolution, Super-Resolution, Transfer Learning
Published 2018-07-31
URL http://arxiv.org/abs/1807.11643v1
PDF http://arxiv.org/pdf/1807.11643v1.pdf
PWC https://paperswithcode.com/paper/brain-mri-image-super-resolution-using-phase
Repo https://github.com/JalaliLabUCLA/Jalali-Lab-Implementation-of-RAISR
Framework none

Efficient Augmentation via Data Subsampling

Title Efficient Augmentation via Data Subsampling
Authors Michael Kuchnik, Virginia Smith
Abstract Data augmentation is commonly used to encode invariances in learning methods. However, this process is often performed in an inefficient manner, as artificial examples are created by applying a number of transformations to all points in the training set. The resulting explosion of the dataset size can be an issue in terms of storage and training costs, as well as in selecting and tuning the optimal set of transformations to apply. In this work, we demonstrate that it is possible to significantly reduce the number of data points included in data augmentation while realizing the same accuracy and invariance benefits of augmenting the entire dataset. We propose a novel set of subsampling policies, based on model influence and loss, that can achieve a 90% reduction in augmentation set size while maintaining the accuracy gains of standard data augmentation.
Tasks Data Augmentation, Image Augmentation
Published 2018-10-11
URL http://arxiv.org/abs/1810.05222v2
PDF http://arxiv.org/pdf/1810.05222v2.pdf
PWC https://paperswithcode.com/paper/efficient-augmentation-via-data-subsampling
Repo https://github.com/mkuchnik/Efficient_Augmentation
Framework none

Improving Generalization via Scalable Neighborhood Component Analysis

Title Improving Generalization via Scalable Neighborhood Component Analysis
Authors Zhirong Wu, Alexei A. Efros, Stella X. Yu
Abstract Current major approaches to visual recognition follow an end-to-end formulation that classifies an input image into one of the pre-determined set of semantic categories. Parametric softmax classifiers are a common choice for such a closed world with fixed categories, especially when big labeled data is available during training. However, this becomes problematic for open-set scenarios where new categories are encountered with very few examples for learning a generalizable parametric classifier. We adopt a non-parametric approach for visual recognition by optimizing feature embeddings instead of parametric classifiers. We use a deep neural network to learn the visual feature that preserves the neighborhood structure in the semantic space, based on the Neighborhood Component Analysis (NCA) criterion. Limited by its computational bottlenecks, we devise a mechanism to use augmented memory to scale NCA for large datasets and very deep networks. Our experiments deliver not only remarkable performance on ImageNet classification for such a simple non-parametric method, but most importantly a more generalizable feature representation for sub-category discovery and few-shot recognition.
Tasks
Published 2018-08-14
URL http://arxiv.org/abs/1808.04699v1
PDF http://arxiv.org/pdf/1808.04699v1.pdf
PWC https://paperswithcode.com/paper/improving-generalization-via-scalable
Repo https://github.com/Microsoft/snca.pytorch
Framework pytorch

Invariance Analysis of Saliency Models versus Human Gaze During Scene Free Viewing

Title Invariance Analysis of Saliency Models versus Human Gaze During Scene Free Viewing
Authors Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min
Abstract Most of current studies on human gaze and saliency modeling have used high-quality stimuli. In real world, however, captured images undergo various types of distortions during the whole acquisition, transmission, and displaying chain. Some distortion types include motion blur, lighting variations and rotation. Despite few efforts, influences of ubiquitous distortions on visual attention and saliency models have not been systematically investigated. In this paper, we first create a large-scale database including eye movements of 10 observers over 1900 images degraded by 19 types of distortions. Second, by analyzing eye movements and saliency models, we find that: a) observers look at different locations over distorted versus original images, and b) performances of saliency models are drastically hindered over distorted images, with the maximum performance drop belonging to Rotation and Shearing distortions. Finally, we investigate the effectiveness of different distortions when serving as data augmentation transformations. Experimental results verify that some useful data augmentation transformations which preserve human gaze of reference images can improve deep saliency models against distortions, while some invalid transformations which severely change human gaze will degrade the performance.
Tasks Data Augmentation
Published 2018-10-10
URL http://arxiv.org/abs/1810.04456v1
PDF http://arxiv.org/pdf/1810.04456v1.pdf
PWC https://paperswithcode.com/paper/invariance-analysis-of-saliency-models-versus
Repo https://github.com/CZHQuality/Sal-CFS-GAN
Framework tf

Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising

Title Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising
Authors Borja Balle, Yu-Xiang Wang
Abstract The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations. Our analysis reveals that the variance formula for the original mechanism is far from tight in the high privacy regime ($\varepsilon \to 0$) and it cannot be extended to the low privacy regime ($\varepsilon \to \infty$). We address these limitations by developing an optimal Gaussian mechanism whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation. We also propose to equip the Gaussian mechanism with a post-processing step based on adaptive estimation techniques by leveraging that the distribution of the perturbation is known. Our experiments show that analytical calibration removes at least a third of the variance of the noise compared to the classical Gaussian mechanism, and that denoising dramatically improves the accuracy of the Gaussian mechanism in the high-dimensional regime.
Tasks Calibration, Denoising
Published 2018-05-16
URL http://arxiv.org/abs/1805.06530v2
PDF http://arxiv.org/pdf/1805.06530v2.pdf
PWC https://paperswithcode.com/paper/improving-the-gaussian-mechanism-for
Repo https://github.com/BorjaBalle/analytic-gaussian-mechanism
Framework none

CLUSE: Cross-Lingual Unsupervised Sense Embeddings

Title CLUSE: Cross-Lingual Unsupervised Sense Embeddings
Authors Ta-Chung Chi, Yun-Nung Chen
Abstract This paper proposes a modularized sense induction and representation learning model that jointly learns bilingual sense embeddings that align well in the vector space, where the cross-lingual signal in the English-Chinese parallel corpus is exploited to capture the collocation and distributed characteristics in the language pair. The model is evaluated on the Stanford Contextual Word Similarity (SCWS) dataset to ensure the quality of monolingual sense embeddings. In addition, we introduce Bilingual Contextual Word Similarity (BCWS), a large and high-quality dataset for evaluating cross-lingual sense embeddings, which is the first attempt of measuring whether the learned embeddings are indeed aligned well in the vector space. The proposed approach shows the superior quality of sense embeddings evaluated in both monolingual and bilingual spaces.
Tasks Representation Learning
Published 2018-09-15
URL http://arxiv.org/abs/1809.05694v2
PDF http://arxiv.org/pdf/1809.05694v2.pdf
PWC https://paperswithcode.com/paper/cluse-cross-lingual-unsupervised-sense
Repo https://github.com/MiuLab/BCWS
Framework none

Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph

Title Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph
Authors Luca Albergante, Evgeny M. Mirkes, Huidong Chen, Alexis Martin, Louis Faure, Emmanuel Barillot, Luca Pinello, Alexander N. Gorban, Andrei Zinovyev
Abstract Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of developing embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.
Tasks
Published 2018-04-20
URL http://arxiv.org/abs/1804.07580v2
PDF http://arxiv.org/pdf/1804.07580v2.pdf
PWC https://paperswithcode.com/paper/robust-and-scalable-learning-of-complex
Repo https://github.com/j-bac/elpigraph-python
Framework none

Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities

Title Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities
Authors Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, Barnabas Poczos
Abstract Multimodal sentiment analysis is a core research area that studies speaker sentiment expressed from the language, visual, and acoustic modalities. The central challenge in multimodal learning involves inferring joint representations that can process and relate information from these modalities. However, existing work learns joint representations by requiring all modalities as input and as a result, the learned representations may be sensitive to noisy or missing modalities at test time. With the recent success of sequence to sequence (Seq2Seq) models in machine translation, there is an opportunity to explore new ways of learning joint representations that may not require all input modalities at test time. In this paper, we propose a method to learn robust joint representations by translating between modalities. Our method is based on the key insight that translation from a source to a target modality provides a method of learning joint representations using only the source modality as input. We augment modality translations with a cycle consistency loss to ensure that our joint representations retain maximal information from all modalities. Once our translation model is trained with paired multimodal data, we only need data from the source modality at test time for final sentiment prediction. This ensures that our model remains robust from perturbations or missing information in the other modalities. We train our model with a coupled translation-prediction objective and it achieves new state-of-the-art results on multimodal sentiment analysis datasets: CMU-MOSI, ICT-MMMO, and YouTube. Additional experiments show that our model learns increasingly discriminative joint representations with more input modalities while maintaining robustness to missing or perturbed modalities.
Tasks Machine Translation, Multimodal Sentiment Analysis, Sentiment Analysis
Published 2018-12-19
URL https://arxiv.org/abs/1812.07809v2
PDF https://arxiv.org/pdf/1812.07809v2.pdf
PWC https://paperswithcode.com/paper/found-in-translation-learning-robust-joint
Repo https://github.com/hainow/MCTN
Framework tf

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Title Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Authors Filip Radenović, Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondřej Chum
Abstract In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth. Three new protocols of varying difficulty are introduced. The protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. For each dataset, 15 new challenging queries are introduced. Finally, a new set of 1M hard, semi-automatically cleaned distractors is selected. An extensive comparison of the state-of-the-art methods is performed on the new benchmark. Different types of methods are evaluated, ranging from local-feature-based to modern CNN based methods. The best results are achieved by taking the best of the two worlds. Most importantly, image retrieval appears far from being solved.
Tasks Image Retrieval
Published 2018-03-29
URL http://arxiv.org/abs/1803.11285v1
PDF http://arxiv.org/pdf/1803.11285v1.pdf
PWC https://paperswithcode.com/paper/revisiting-oxford-and-paris-large-scale-image
Repo https://github.com/tensorflow/models
Framework tf

A Simple Method for Commonsense Reasoning

Title A Simple Method for Commonsense Reasoning
Authors Trieu H. Trinh, Quoc V. Le
Abstract Commonsense reasoning is a long-standing challenge for deep learning. For example, it is difficult to use neural networks to tackle the Winograd Schema dataset (Levesque et al., 2011). In this paper, we present a simple method for commonsense reasoning with neural networks, using unsupervised learning. Key to our method is the use of language models, trained on a massive amount of unlabled data, to score multiple choice questions posed by commonsense reasoning tests. On both Pronoun Disambiguation and Winograd Schema challenges, our models outperform previous state-of-the-art methods by a large margin, without using expensive annotated knowledge bases or hand-engineered features. We train an array of large RNN language models that operate at word or character level on LM-1-Billion, CommonCrawl, SQuAD, Gutenberg Books, and a customized corpus for this task and show that diversity of training data plays an important role in test performance. Further analysis also shows that our system successfully discovers important features of the context that decide the correct answer, indicating a good grasp of commonsense knowledge.
Tasks Common Sense Reasoning
Published 2018-06-07
URL https://arxiv.org/abs/1806.02847v2
PDF https://arxiv.org/pdf/1806.02847v2.pdf
PWC https://paperswithcode.com/paper/a-simple-method-for-commonsense-reasoning
Repo https://github.com/tensorflow/models/tree/master/research/lm_commonsense
Framework tf

Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks

Title Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks
Authors Shahar Harel, Kira Radinsky
Abstract Designing a new drug is a lengthy and expensive process. As the space of potential molecules is very large (10^23-10^60), a common technique during drug discovery is to start from a molecule which already has some of the desired properties. An interdisciplinary team of scientists generates hypothesis about the required changes to the prototype. In this work, we develop an algorithmic unsupervised-approach that automatically generates potential drug molecules given a prototype drug. We show that the molecules generated by the system are valid molecules and significantly different from the prototype drug. Out of the compounds generated by the system, we identified 35 FDA-approved drugs. As an example, our system generated Isoniazid - one of the main drugs for Tuberculosis. The system is currently being deployed for use in collaboration with pharmaceutical companies to further analyze the additional generated molecules.
Tasks Drug Discovery
Published 2018-04-08
URL http://arxiv.org/abs/1804.02668v1
PDF http://arxiv.org/pdf/1804.02668v1.pdf
PWC https://paperswithcode.com/paper/accelerating-prototype-based-drug-discovery
Repo https://github.com/0h-n0/cdn_molecule_pytorch
Framework pytorch
comments powered by Disqus