January 31, 2020

3288 words 16 mins read

Paper Group AWR 407

Paper Group AWR 407

Lane Detection and Classification using Cascaded CNNs. TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks. Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings. Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks. Amortized Be …

Lane Detection and Classification using Cascaded CNNs

Title Lane Detection and Classification using Cascaded CNNs
Authors Fabio Pizzati, Marco Allodi, Alejandro Barrera, Fernando García
Abstract Lane detection is extremely important for autonomous vehicles. For this reason, many approaches use lane boundary information to locate the vehicle inside the street, or to integrate GPS-based localization. As many other computer vision based tasks, convolutional neural networks (CNNs) represent the state-of-the-art technology to indentify lane boundaries. However, the position of the lane boundaries w.r.t. the vehicle may not suffice for a reliable positioning, as for path planning or localization information regarding lane types may also be needed. In this work, we present an end-to-end system for lane boundary identification, clustering and classification, based on two cascaded neural networks, that runs in real-time. To build the system, 14336 lane boundaries instances of the TuSimple dataset for lane detection have been labelled using 8 different classes. Our dataset and the code for inference are available online.
Tasks Autonomous Vehicles, Lane Detection
Published 2019-07-02
URL https://arxiv.org/abs/1907.01294v2
PDF https://arxiv.org/pdf/1907.01294v2.pdf
PWC https://paperswithcode.com/paper/lane-detection-and-classification-using
Repo https://github.com/fabvio/TuSimple-lane-classes
Framework none

TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks

Title TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks
Authors Zhengchun Liu, Tekin Bicer, Rajkumar Kettimuthu, Doga Gursoy, Francesco De Carlo, Ian Foster
Abstract Synchrotron-based x-ray tomography is a noninvasive imaging technique that allows for reconstructing the internal structure of materials at high spatial resolutions from tens of micrometers to a few nanometers. In order to resolve sample features at smaller length scales, however, a higher radiation dose is required. Therefore, the limitation on the achievable resolution is set primarily by noise at these length scales. We present \TOMOGAN{}, a denoising technique based on generative adversarial networks, for improving the quality of reconstructed images for low-dose imaging conditions. We evaluate our approach in two photon-budget-limited experimental conditions: (1) sufficient number of low-dose projections (based on Nyquist sampling), and (2) insufficient or limited number of high-dose projections. In both cases the angular sampling is assumed to be isotropic, and the photon budget throughout the experiment is fixed based on the maximum allowable radiation dose on the sample. Evaluation with both simulated and experimental datasets shows that our approach can significantly reduce noise in reconstructed images, improving the structural similarity score of simulation and experimental data from 0.18 to 0.9 and from 0.18 to 0.41, respectively. Furthermore, the quality of the reconstructed images with filtered back projection followed by our denoising approach exceeds that of reconstructions with the simultaneous iterative reconstruction technique, showing the computational superiority of our approach.
Tasks Denoising
Published 2019-02-20
URL https://arxiv.org/abs/1902.07582v5
PDF https://arxiv.org/pdf/1902.07582v5.pdf
PWC https://paperswithcode.com/paper/tomogan-low-dose-x-ray-tomography-with
Repo https://github.com/ramsesproject/TomoGAN
Framework tf

Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings

Title Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings
Authors Gerhard Wohlgenannt, Artemii Babushkin, Denis Romashov, Igor Ukrainets, Anton Maskaykin, Ilya Shutov
Abstract In this paper, we present Russian language datasets in the digital humanities domain for the evaluation of word embedding techniques or similar language modeling and feature learning algorithms. The datasets are split into two task types, word intrusion and word analogy, and contain 31362 task units in total. The characteristics of the tasks and datasets are that they build upon small, domain-specific corpora, and that the datasets contain a high number of named entities. The datasets were created manually for two fantasy novel book series (“A Song of Ice and Fire” and “Harry Potter”). We provide baseline evaluations with popular word embedding models trained on the book corpora for the given tasks, both for the Russian and English language versions of the datasets. Finally, we compare and analyze the results and discuss specifics of Russian language with regards to the problem setting.
Tasks Language Modelling, Word Embeddings
Published 2019-03-04
URL http://arxiv.org/abs/1903.08739v1
PDF http://arxiv.org/pdf/1903.08739v1.pdf
PWC https://paperswithcode.com/paper/russian-language-datasets-in-the-digitial
Repo https://github.com/ishutov/nlp2018_hp_asoif_rus
Framework none

Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks

Title Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks
Authors Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-ichi Maeda
Abstract Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which of them is optimal. In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation. The class includes such well-known CNN modules as depthwise separable convolution layers and bottleneck layers, but also previously unknown modules with nonlinear activations. We also experimentally compare the tradeoff between prediction accuracy and time/space complexity for modules found by enumerating all possible decompositions, or by using a neural architecture search. We find some nonlinear decompositions outperform existing ones.
Tasks Model Compression, Neural Architecture Search
Published 2019-08-13
URL https://arxiv.org/abs/1908.04471v2
PDF https://arxiv.org/pdf/1908.04471v2.pdf
PWC https://paperswithcode.com/paper/einconv-exploring-unexplored-tensor
Repo https://github.com/pfnet-research/einconv
Framework pytorch

Amortized Bethe Free Energy Minimization for Learning MRFs

Title Amortized Bethe Free Energy Minimization for Learning MRFs
Authors Sam Wiseman, Yoon Kim
Abstract We propose to learn deep undirected graphical models (i.e., MRFs) with a non-ELBO objective for which we can calculate exact gradients. In particular, we optimize a saddle-point objective deriving from the Bethe free energy approximation to the partition function. Unlike much recent work in approximate inference, the derived objective requires no sampling, and can be efficiently computed even for very expressive MRFs. We furthermore amortize this optimization with trained inference networks. Experimentally, we find that the proposed approach compares favorably with loopy belief propagation, but is faster, and it allows for attaining better held out log likelihood than other recent approximate inference schemes.
Tasks
Published 2019-06-14
URL https://arxiv.org/abs/1906.06399v2
PDF https://arxiv.org/pdf/1906.06399v2.pdf
PWC https://paperswithcode.com/paper/amortized-bethe-free-energy-minimization-for
Repo https://github.com/swiseman/bethe-min
Framework pytorch

Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality

Title Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality
Authors Sukarna Barua, Xingjun Ma, Sarah Monazam Erfani, Michael E. Houle, James Bailey
Abstract Generative Adversarial Networks (GANs) are an elegant mechanism for data generation. However, a key challenge when using GANs is how to best measure their ability to generate realistic data. In this paper, we demonstrate that an intrinsic dimensional characterization of the data space learned by a GAN model leads to an effective evaluation metric for GAN quality. In particular, we propose a new evaluation measure, CrossLID, that assesses the local intrinsic dimensionality (LID) of real-world data with respect to neighborhoods found in GAN-generated samples. Intuitively, CrossLID measures the degree to which manifolds of two data distributions coincide with each other. In experiments on 4 benchmark image datasets, we compare our proposed measure to several state-of-the-art evaluation metrics. Our experiments show that CrossLID is strongly correlated with the progress of GAN training, is sensitive to mode collapse, is robust to small-scale noise and image transformations, and robust to sample size. Furthermore, we show how CrossLID can be used within the GAN training process to improve generation quality.
Tasks
Published 2019-05-02
URL http://arxiv.org/abs/1905.00643v1
PDF http://arxiv.org/pdf/1905.00643v1.pdf
PWC https://paperswithcode.com/paper/quality-evaluation-of-gans-using-cross-local
Repo https://github.com/sukarnabarua/CrossLID
Framework tf

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data

Title Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Authors Sergei Popov, Stanislav Morozov, Artem Babenko
Abstract Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data.
Tasks Representation Learning
Published 2019-09-13
URL https://arxiv.org/abs/1909.06312v2
PDF https://arxiv.org/pdf/1909.06312v2.pdf
PWC https://paperswithcode.com/paper/neural-oblivious-decision-ensembles-for-deep
Repo https://github.com/Qwicen/node
Framework pytorch

O-MedAL: Online Active Deep Learning for Medical Image Analysis

Title O-MedAL: Online Active Deep Learning for Medical Image Analysis
Authors Asim Smailagic, Pedro Costa, Alex Gaudio, Kartik Khandelwal, Mostafa Mirshekari, Jonathon Fagert, Devesh Walawalkar, Susu Xu, Adrian Galdran, Pei Zhang, Aurélio Campilho, Hae Young Noh
Abstract Active Learning methods create an optimized and labeled training set from unlabeled data. We introduce a novel Online Active Deep Learning method for Medical Image Analysis. We extend our MedAL active learning framework to present new results in this paper. Experiments on three medical image datasets show that our novel online active learning model requires significantly less labelings, is more accurate, and is more robust to class imbalances than existing methods. Our method is also more accurate and computationally efficient than the baseline model. Compared to random sampling and uncertainty sampling, the method uses 275 and 200 (out of 768) fewer labeled examples, respectively. For Diabetic Retinopathy detection, our method attains a 5.88% accuracy improvement over the baseline model when 80% of the dataset is labeled, and the model reaches baseline accuracy when only 40% is labeled.
Tasks Active Learning, Diabetic Retinopathy Detection
Published 2019-08-28
URL https://arxiv.org/abs/1908.10508v1
PDF https://arxiv.org/pdf/1908.10508v1.pdf
PWC https://paperswithcode.com/paper/o-medal-online-active-deep-learning-for
Repo https://github.com/adgaudio/O-MedAL
Framework pytorch

Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting

Title Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting
Authors Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, Deyu Meng
Abstract Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function mapping from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify the weighting function as well as its additional hyper-parameters. It makes them fairly hard to be generally applied in practice due to the significant variation of proper weighting schemes relying on the investigated problem and training data. To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data. The weighting function is an MLP with one hidden layer, constituting a universal approximator to almost any continuous functions, making the method able to fit a wide range of weighting functions including those assumed in conventional research. Guided by a small amount of unbiased meta-data, the parameters of the weighting function can be finely updated simultaneously with the learning process of the classifiers. Synthetic and real experiments substantiate the capability of our method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases. This naturally leads to its better accuracy than other state-of-the-art methods.
Tasks Image Classification, Meta-Learning
Published 2019-02-20
URL https://arxiv.org/abs/1902.07379v6
PDF https://arxiv.org/pdf/1902.07379v6.pdf
PWC https://paperswithcode.com/paper/push-the-student-to-learn-right-progressive
Repo https://github.com/xjtushujun/meta-weight-net
Framework pytorch

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

Title DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
Authors Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie Zhou
Abstract Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question- and history-aware image features and the question- and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin.
Tasks Visual Dialog
Published 2019-12-18
URL https://arxiv.org/abs/1912.08360v1
PDF https://arxiv.org/pdf/1912.08360v1.pdf
PWC https://paperswithcode.com/paper/dmrm-a-dual-channel-multi-hop-reasoning-model
Repo https://github.com/phellonchen/DMRM
Framework pytorch

Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset

Title Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset
Authors Misgina Tsighe Hagos, Shri Kant
Abstract Annotated training data insufficiency remains to be one of the challenges of applying deep learning in medical data classification problems. Transfer learning from an already trained deep convolutional network can be used to reduce the cost of training from scratch and to train with small training data for deep learning. This raises the question of whether we can use transfer learning to overcome the training data insufficiency problem in deep learning based medical data classifications. Deep convolutional networks have been achieving high performance results on the ImageNet Large Scale Visual Recognition Competition (ILSVRC) image classification challenge. One example is the Inception-V3 model that was the first runner up on the ILSVRC 2015 challenge. Inception modules that help to extract different sized features of input images in one level of convolution are the unique features of the Inception-V3. In this work, we have used a pretrained Inception-V3 model to take advantage of its Inception modules for Diabetic Retinopathy detection. In order to tackle the labelled data insufficiency problem, we sub-sampled a smaller version of the Kaggle Diabetic Retinopathy classification challenge dataset for model training, and tested the model’s accuracy on a previously unseen data subset. Our technique could be used in other deep learning based medical image classification problems facing the challenge of labeled training data insufficiency.
Tasks Diabetic Retinopathy Detection, Image Classification, Object Recognition, Transfer Learning
Published 2019-05-17
URL https://arxiv.org/abs/1905.07203v2
PDF https://arxiv.org/pdf/1905.07203v2.pdf
PWC https://paperswithcode.com/paper/transfer-learning-based-detection-of-diabetic
Repo https://github.com/ShubhayanS/Multiclass-Diabetic-Retinopathy-Detection
Framework tf

Derivative Manipulation for General Example Weighting

Title Derivative Manipulation for General Example Weighting
Authors Xinshao Wang, Elyor Kodirov, Yang Hua, Neil M. Robertson
Abstract We propose derivative manipulation (DM) for training accurate and robust softmax-based deep neural networks, for two reasons: (1) In gradient-based optimisation, manipulating the derivative directly is more straightforward than designing loss functions, and it has a direct impact on the update of a model. (2) A loss function’s derivative magnitude function can be understood as a weighting scheme; the loss’s derivative of an example defines how much impact it has on the update of a model. Therefore, manipulating the derivative is to adjust the weighting scheme. DM simply modifies the derivative magnitude, including transformation and normalisation, after which the derivative magnitude function is termed emphasis density function (EDF). An EDF is a formula expressing an example weighting scheme and we may deduce many options for EDFs from common probability density functions (PDFs). We demonstrate the effectiveness of the DM formulation empirically by extensive experiments on both vision and language tasks, especially when adverse conditions exist, e.g., noisy data and sample imbalance.
Tasks Image Classification, Representation Learning
Published 2019-05-27
URL https://arxiv.org/abs/1905.11233v6
PDF https://arxiv.org/pdf/1905.11233v6.pdf
PWC https://paperswithcode.com/paper/emphasis-regularisation-by-gradient-rescaling
Repo https://github.com/XinshaoAmosWang/Emphasis-Regularisation-by-Gradient-Rescaling
Framework none

Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images

Title Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images
Authors Ivan Krešo, Josip Krapac, Siniša Šegvić
Abstract Recent progress of deep image classification models has provided great potential to improve state-of-the-art performance in related computer vision tasks. However, the transition to semantic segmentation is hampered by strict memory limitations of contemporary GPUs. The extent of feature map caching required by convolutional backprop poses significant challenges even for moderately sized Pascal images, while requiring careful architectural considerations when the source resolution is in the megapixel range. To address these concerns, we propose a novel DenseNet-based ladder-style architecture which features high modelling power and a very lean upsampling datapath. We also propose to substantially reduce the extent of feature map caching by exploiting inherent spatial efficiency of the DenseNet feature extractor. The resulting models deliver high performance with fewer parameters than competitive approaches, and allow training at megapixel resolution on commodity hardware. The presented experimental results outperform the state-of-the-art in terms of prediction accuracy and execution speed on Cityscapes, Pascal VOC 2012, CamVid and ROB 2018 datasets. Source code will be released upon publication.
Tasks Image Classification, Semantic Segmentation
Published 2019-05-14
URL https://arxiv.org/abs/1905.05661v1
PDF https://arxiv.org/pdf/1905.05661v1.pdf
PWC https://paperswithcode.com/paper/190505661
Repo https://github.com/Maligetzus/Semantic-Segmentation-of-Aerial-Photographs
Framework none

An Annotated Corpus of Reference Resolution for Interpreting Common Grounding

Title An Annotated Corpus of Reference Resolution for Interpreting Common Grounding
Authors Takuma Udagawa, Akiko Aizawa
Abstract Common grounding is the process of creating, repairing and updating mutual understandings, which is a fundamental aspect of natural language conversation. However, interpreting the process of common grounding is a challenging task, especially under continuous and partially-observable context where complex ambiguity, uncertainty, partial understandings and misunderstandings are introduced. Interpretation becomes even more challenging when we deal with dialogue systems which still have limited capability of natural language understanding and generation. To address this problem, we consider reference resolution as the central subtask of common grounding and propose a new resource to study its intermediate process. Based on a simple and general annotation schema, we collected a total of 40,172 referring expressions in 5,191 dialogues curated from an existing corpus, along with multiple judgements of referent interpretations. We show that our annotation is highly reliable, captures the complexity of common grounding through a natural degree of reasonable disagreements, and allows for more detailed and quantitative analyses of common grounding strategies. Finally, we demonstrate the advantages of our annotation for interpreting, analyzing and improving common grounding in baseline dialogue systems.
Tasks Coreference Resolution, Goal-Oriented Dialog, Visual Dialog
Published 2019-11-18
URL https://arxiv.org/abs/1911.07588v1
PDF https://arxiv.org/pdf/1911.07588v1.pdf
PWC https://paperswithcode.com/paper/an-annotated-corpus-of-reference-resolution
Repo https://github.com/Alab-NII/onecommon
Framework none

Improving Generative Visual Dialog by Answering Diverse Questions

Title Improving Generative Visual Dialog by Answering Diverse Questions
Authors Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das
Abstract Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this ‘self-talk’ approach can lead to improved performance at the downstream dialog-conditioned image-guessing task. However, this improvement saturates and starts degrading after a few rounds of interaction, and does not lead to a better Visual Dialog model. We find that this is due in part to repeated interactions between Qbot and Abot during self-talk, which are not informative with respect to the image. To improve this, we devise a simple auxiliary objective that incentivizes Qbot to ask diverse questions, thus reducing repetitions and in turn enabling Abot to explore a larger state space during RL ie. be exposed to more visual concepts to talk about, and varied questions to answer. We evaluate our approach via a host of automatic metrics and human studies, and demonstrate that it leads to better dialog, ie. dialog that is more diverse (ie. less repetitive), consistent (ie. has fewer conflicting exchanges), fluent (ie. more human-like),and detailed, while still being comparably image-relevant as prior work and ablations.
Tasks Representation Learning, Visual Dialog
Published 2019-09-23
URL https://arxiv.org/abs/1909.10470v2
PDF https://arxiv.org/pdf/1909.10470v2.pdf
PWC https://paperswithcode.com/paper/190910470
Repo https://github.com/vmurahari3/visdial-diversity
Framework pytorch
comments powered by Disqus