January 31, 2020

3288 words 16 mins read

Paper Group AWR 407

Lane Detection and Classification using Cascaded CNNs. TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks. Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings. Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks. Amortized Be …

Lane Detection and Classification using Cascaded CNNs


Title	Lane Detection and Classification using Cascaded CNNs
Authors	Fabio Pizzati, Marco Allodi, Alejandro Barrera, Fernando García
Abstract	Lane detection is extremely important for autonomous vehicles. For this reason, many approaches use lane boundary information to locate the vehicle inside the street, or to integrate GPS-based localization. As many other computer vision based tasks, convolutional neural networks (CNNs) represent the state-of-the-art technology to indentify lane boundaries. However, the position of the lane boundaries w.r.t. the vehicle may not suffice for a reliable positioning, as for path planning or localization information regarding lane types may also be needed. In this work, we present an end-to-end system for lane boundary identification, clustering and classification, based on two cascaded neural networks, that runs in real-time. To build the system, 14336 lane boundaries instances of the TuSimple dataset for lane detection have been labelled using 8 different classes. Our dataset and the code for inference are available online.
Tasks	Autonomous Vehicles, Lane Detection
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01294v2
PDF	https://arxiv.org/pdf/1907.01294v2.pdf
PWC	https://paperswithcode.com/paper/lane-detection-and-classification-using
Repo	https://github.com/fabvio/TuSimple-lane-classes
Framework	none

TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks


Title	TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks
Authors	Zhengchun Liu, Tekin Bicer, Rajkumar Kettimuthu, Doga Gursoy, Francesco De Carlo, Ian Foster
Abstract	Synchrotron-based x-ray tomography is a noninvasive imaging technique that allows for reconstructing the internal structure of materials at high spatial resolutions from tens of micrometers to a few nanometers. In order to resolve sample features at smaller length scales, however, a higher radiation dose is required. Therefore, the limitation on the achievable resolution is set primarily by noise at these length scales. We present \TOMOGAN{}, a denoising technique based on generative adversarial networks, for improving the quality of reconstructed images for low-dose imaging conditions. We evaluate our approach in two photon-budget-limited experimental conditions: (1) sufficient number of low-dose projections (based on Nyquist sampling), and (2) insufficient or limited number of high-dose projections. In both cases the angular sampling is assumed to be isotropic, and the photon budget throughout the experiment is fixed based on the maximum allowable radiation dose on the sample. Evaluation with both simulated and experimental datasets shows that our approach can significantly reduce noise in reconstructed images, improving the structural similarity score of simulation and experimental data from 0.18 to 0.9 and from 0.18 to 0.41, respectively. Furthermore, the quality of the reconstructed images with filtered back projection followed by our denoising approach exceeds that of reconstructions with the simultaneous iterative reconstruction technique, showing the computational superiority of our approach.
Tasks	Denoising
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07582v5
PDF	https://arxiv.org/pdf/1902.07582v5.pdf
PWC	https://paperswithcode.com/paper/tomogan-low-dose-x-ray-tomography-with
Repo	https://github.com/ramsesproject/TomoGAN
Framework	tf

Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings


Title	Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings
Authors	Gerhard Wohlgenannt, Artemii Babushkin, Denis Romashov, Igor Ukrainets, Anton Maskaykin, Ilya Shutov
Abstract	In this paper, we present Russian language datasets in the digital humanities domain for the evaluation of word embedding techniques or similar language modeling and feature learning algorithms. The datasets are split into two task types, word intrusion and word analogy, and contain 31362 task units in total. The characteristics of the tasks and datasets are that they build upon small, domain-specific corpora, and that the datasets contain a high number of named entities. The datasets were created manually for two fantasy novel book series (“A Song of Ice and Fire” and “Harry Potter”). We provide baseline evaluations with popular word embedding models trained on the book corpora for the given tasks, both for the Russian and English language versions of the datasets. Finally, we compare and analyze the results and discuss specifics of Russian language with regards to the problem setting.
Tasks	Language Modelling, Word Embeddings
Published	2019-03-04
URL	http://arxiv.org/abs/1903.08739v1
PDF	http://arxiv.org/pdf/1903.08739v1.pdf
PWC	https://paperswithcode.com/paper/russian-language-datasets-in-the-digitial
Repo	https://github.com/ishutov/nlp2018_hp_asoif_rus
Framework	none

Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks


Title	Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks
Authors	Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-ichi Maeda
Abstract	Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which of them is optimal. In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation. The class includes such well-known CNN modules as depthwise separable convolution layers and bottleneck layers, but also previously unknown modules with nonlinear activations. We also experimentally compare the tradeoff between prediction accuracy and time/space complexity for modules found by enumerating all possible decompositions, or by using a neural architecture search. We find some nonlinear decompositions outperform existing ones.
Tasks	Model Compression, Neural Architecture Search
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04471v2
PDF	https://arxiv.org/pdf/1908.04471v2.pdf
PWC	https://paperswithcode.com/paper/einconv-exploring-unexplored-tensor
Repo	https://github.com/pfnet-research/einconv
Framework	pytorch

Amortized Bethe Free Energy Minimization for Learning MRFs


Title	Amortized Bethe Free Energy Minimization for Learning MRFs
Authors	Sam Wiseman, Yoon Kim
Abstract	We propose to learn deep undirected graphical models (i.e., MRFs) with a non-ELBO objective for which we can calculate exact gradients. In particular, we optimize a saddle-point objective deriving from the Bethe free energy approximation to the partition function. Unlike much recent work in approximate inference, the derived objective requires no sampling, and can be efficiently computed even for very expressive MRFs. We furthermore amortize this optimization with trained inference networks. Experimentally, we find that the proposed approach compares favorably with loopy belief propagation, but is faster, and it allows for attaining better held out log likelihood than other recent approximate inference schemes.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06399v2
PDF	https://arxiv.org/pdf/1906.06399v2.pdf
PWC	https://paperswithcode.com/paper/amortized-bethe-free-energy-minimization-for
Repo	https://github.com/swiseman/bethe-min
Framework	pytorch

Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality


Title	Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality
Authors	Sukarna Barua, Xingjun Ma, Sarah Monazam Erfani, Michael E. Houle, James Bailey
Abstract	Generative Adversarial Networks (GANs) are an elegant mechanism for data generation. However, a key challenge when using GANs is how to best measure their ability to generate realistic data. In this paper, we demonstrate that an intrinsic dimensional characterization of the data space learned by a GAN model leads to an effective evaluation metric for GAN quality. In particular, we propose a new evaluation measure, CrossLID, that assesses the local intrinsic dimensionality (LID) of real-world data with respect to neighborhoods found in GAN-generated samples. Intuitively, CrossLID measures the degree to which manifolds of two data distributions coincide with each other. In experiments on 4 benchmark image datasets, we compare our proposed measure to several state-of-the-art evaluation metrics. Our experiments show that CrossLID is strongly correlated with the progress of GAN training, is sensitive to mode collapse, is robust to small-scale noise and image transformations, and robust to sample size. Furthermore, we show how CrossLID can be used within the GAN training process to improve generation quality.
Tasks
Published	2019-05-02
URL	http://arxiv.org/abs/1905.00643v1
PDF	http://arxiv.org/pdf/1905.00643v1.pdf
PWC	https://paperswithcode.com/paper/quality-evaluation-of-gans-using-cross-local
Repo	https://github.com/sukarnabarua/CrossLID
Framework	tf

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data


Title	Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Authors	Sergei Popov, Stanislav Morozov, Artem Babenko
Abstract	Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data.
Tasks	Representation Learning
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06312v2
PDF	https://arxiv.org/pdf/1909.06312v2.pdf
PWC	https://paperswithcode.com/paper/neural-oblivious-decision-ensembles-for-deep
Repo	https://github.com/Qwicen/node
Framework	pytorch

O-MedAL: Online Active Deep Learning for Medical Image Analysis


Title	O-MedAL: Online Active Deep Learning for Medical Image Analysis
Authors	Asim Smailagic, Pedro Costa, Alex Gaudio, Kartik Khandelwal, Mostafa Mirshekari, Jonathon Fagert, Devesh Walawalkar, Susu Xu, Adrian Galdran, Pei Zhang, Aurélio Campilho, Hae Young Noh
Abstract	Active Learning methods create an optimized and labeled training set from unlabeled data. We introduce a novel Online Active Deep Learning method for Medical Image Analysis. We extend our MedAL active learning framework to present new results in this paper. Experiments on three medical image datasets show that our novel online active learning model requires significantly less labelings, is more accurate, and is more robust to class imbalances than existing methods. Our method is also more accurate and computationally efficient than the baseline model. Compared to random sampling and uncertainty sampling, the method uses 275 and 200 (out of 768) fewer labeled examples, respectively. For Diabetic Retinopathy detection, our method attains a 5.88% accuracy improvement over the baseline model when 80% of the dataset is labeled, and the model reaches baseline accuracy when only 40% is labeled.
Tasks	Active Learning, Diabetic Retinopathy Detection
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10508v1
PDF	https://arxiv.org/pdf/1908.10508v1.pdf
PWC	https://paperswithcode.com/paper/o-medal-online-active-deep-learning-for
Repo	https://github.com/adgaudio/O-MedAL
Framework	pytorch

Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting


Title	Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting
Authors	Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, Deyu Meng
Abstract	Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function mapping from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify the weighting function as well as its additional hyper-parameters. It makes them fairly hard to be generally applied in practice due to the significant variation of proper weighting schemes relying on the investigated problem and training data. To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data. The weighting function is an MLP with one hidden layer, constituting a universal approximator to almost any continuous functions, making the method able to fit a wide range of weighting functions including those assumed in conventional research. Guided by a small amount of unbiased meta-data, the parameters of the weighting function can be finely updated simultaneously with the learning process of the classifiers. Synthetic and real experiments substantiate the capability of our method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases. This naturally leads to its better accuracy than other state-of-the-art methods.
Tasks	Image Classification, Meta-Learning
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07379v6
PDF	https://arxiv.org/pdf/1902.07379v6.pdf
PWC	https://paperswithcode.com/paper/push-the-student-to-learn-right-progressive
Repo	https://github.com/xjtushujun/meta-weight-net
Framework	pytorch

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog


Title	DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
Authors	Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie Zhou
Abstract	Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question- and history-aware image features and the question- and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin.
Tasks	Visual Dialog
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08360v1
PDF	https://arxiv.org/pdf/1912.08360v1.pdf
PWC	https://paperswithcode.com/paper/dmrm-a-dual-channel-multi-hop-reasoning-model
Repo	https://github.com/phellonchen/DMRM
Framework	pytorch

Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset


Title	Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset
Authors	Misgina Tsighe Hagos, Shri Kant
Abstract	Annotated training data insufficiency remains to be one of the challenges of applying deep learning in medical data classification problems. Transfer learning from an already trained deep convolutional network can be used to reduce the cost of training from scratch and to train with small training data for deep learning. This raises the question of whether we can use transfer learning to overcome the training data insufficiency problem in deep learning based medical data classifications. Deep convolutional networks have been achieving high performance results on the ImageNet Large Scale Visual Recognition Competition (ILSVRC) image classification challenge. One example is the Inception-V3 model that was the first runner up on the ILSVRC 2015 challenge. Inception modules that help to extract different sized features of input images in one level of convolution are the unique features of the Inception-V3. In this work, we have used a pretrained Inception-V3 model to take advantage of its Inception modules for Diabetic Retinopathy detection. In order to tackle the labelled data insufficiency problem, we sub-sampled a smaller version of the Kaggle Diabetic Retinopathy classification challenge dataset for model training, and tested the model’s accuracy on a previously unseen data subset. Our technique could be used in other deep learning based medical image classification problems facing the challenge of labeled training data insufficiency.
Tasks	Diabetic Retinopathy Detection, Image Classification, Object Recognition, Transfer Learning
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07203v2
PDF	https://arxiv.org/pdf/1905.07203v2.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-based-detection-of-diabetic
Repo	https://github.com/ShubhayanS/Multiclass-Diabetic-Retinopathy-Detection
Framework	tf

Derivative Manipulation for General Example Weighting


Title	Derivative Manipulation for General Example Weighting
Authors	Xinshao Wang, Elyor Kodirov, Yang Hua, Neil M. Robertson
Abstract	We propose derivative manipulation (DM) for training accurate and robust softmax-based deep neural networks, for two reasons: (1) In gradient-based optimisation, manipulating the derivative directly is more straightforward than designing loss functions, and it has a direct impact on the update of a model. (2) A loss function’s derivative magnitude function can be understood as a weighting scheme; the loss’s derivative of an example defines how much impact it has on the update of a model. Therefore, manipulating the derivative is to adjust the weighting scheme. DM simply modifies the derivative magnitude, including transformation and normalisation, after which the derivative magnitude function is termed emphasis density function (EDF). An EDF is a formula expressing an example weighting scheme and we may deduce many options for EDFs from common probability density functions (PDFs). We demonstrate the effectiveness of the DM formulation empirically by extensive experiments on both vision and language tasks, especially when adverse conditions exist, e.g., noisy data and sample imbalance.
Tasks	Image Classification, Representation Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11233v6
PDF	https://arxiv.org/pdf/1905.11233v6.pdf
PWC	https://paperswithcode.com/paper/emphasis-regularisation-by-gradient-rescaling
Repo	https://github.com/XinshaoAmosWang/Emphasis-Regularisation-by-Gradient-Rescaling
Framework	none

Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images


Title	Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images
Authors	Ivan Krešo, Josip Krapac, Siniša Šegvić
Abstract	Recent progress of deep image classification models has provided great potential to improve state-of-the-art performance in related computer vision tasks. However, the transition to semantic segmentation is hampered by strict memory limitations of contemporary GPUs. The extent of feature map caching required by convolutional backprop poses significant challenges even for moderately sized Pascal images, while requiring careful architectural considerations when the source resolution is in the megapixel range. To address these concerns, we propose a novel DenseNet-based ladder-style architecture which features high modelling power and a very lean upsampling datapath. We also propose to substantially reduce the extent of feature map caching by exploiting inherent spatial efficiency of the DenseNet feature extractor. The resulting models deliver high performance with fewer parameters than competitive approaches, and allow training at megapixel resolution on commodity hardware. The presented experimental results outperform the state-of-the-art in terms of prediction accuracy and execution speed on Cityscapes, Pascal VOC 2012, CamVid and ROB 2018 datasets. Source code will be released upon publication.
Tasks	Image Classification, Semantic Segmentation
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05661v1
PDF	https://arxiv.org/pdf/1905.05661v1.pdf
PWC	https://paperswithcode.com/paper/190505661
Repo	https://github.com/Maligetzus/Semantic-Segmentation-of-Aerial-Photographs
Framework	none

An Annotated Corpus of Reference Resolution for Interpreting Common Grounding


Title	An Annotated Corpus of Reference Resolution for Interpreting Common Grounding
Authors	Takuma Udagawa, Akiko Aizawa
Abstract	Common grounding is the process of creating, repairing and updating mutual understandings, which is a fundamental aspect of natural language conversation. However, interpreting the process of common grounding is a challenging task, especially under continuous and partially-observable context where complex ambiguity, uncertainty, partial understandings and misunderstandings are introduced. Interpretation becomes even more challenging when we deal with dialogue systems which still have limited capability of natural language understanding and generation. To address this problem, we consider reference resolution as the central subtask of common grounding and propose a new resource to study its intermediate process. Based on a simple and general annotation schema, we collected a total of 40,172 referring expressions in 5,191 dialogues curated from an existing corpus, along with multiple judgements of referent interpretations. We show that our annotation is highly reliable, captures the complexity of common grounding through a natural degree of reasonable disagreements, and allows for more detailed and quantitative analyses of common grounding strategies. Finally, we demonstrate the advantages of our annotation for interpreting, analyzing and improving common grounding in baseline dialogue systems.
Tasks	Coreference Resolution, Goal-Oriented Dialog, Visual Dialog
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07588v1
PDF	https://arxiv.org/pdf/1911.07588v1.pdf
PWC	https://paperswithcode.com/paper/an-annotated-corpus-of-reference-resolution
Repo	https://github.com/Alab-NII/onecommon
Framework	none

Improving Generative Visual Dialog by Answering Diverse Questions


Title	Improving Generative Visual Dialog by Answering Diverse Questions
Authors	Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das
Abstract	Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this ‘self-talk’ approach can lead to improved performance at the downstream dialog-conditioned image-guessing task. However, this improvement saturates and starts degrading after a few rounds of interaction, and does not lead to a better Visual Dialog model. We find that this is due in part to repeated interactions between Qbot and Abot during self-talk, which are not informative with respect to the image. To improve this, we devise a simple auxiliary objective that incentivizes Qbot to ask diverse questions, thus reducing repetitions and in turn enabling Abot to explore a larger state space during RL ie. be exposed to more visual concepts to talk about, and varied questions to answer. We evaluate our approach via a host of automatic metrics and human studies, and demonstrate that it leads to better dialog, ie. dialog that is more diverse (ie. less repetitive), consistent (ie. has fewer conflicting exchanges), fluent (ie. more human-like),and detailed, while still being comparably image-relevant as prior work and ablations.
Tasks	Representation Learning, Visual Dialog
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10470v2
PDF	https://arxiv.org/pdf/1909.10470v2.pdf
PWC	https://paperswithcode.com/paper/190910470
Repo	https://github.com/vmurahari3/visdial-diversity
Framework	pytorch