Paper Group AWR 407
Lane Detection and Classification using Cascaded CNNs. TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks. Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings. Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks. Amortized Be …
Lane Detection and Classification using Cascaded CNNs
Title | Lane Detection and Classification using Cascaded CNNs |
Authors | Fabio Pizzati, Marco Allodi, Alejandro Barrera, Fernando García |
Abstract | Lane detection is extremely important for autonomous vehicles. For this reason, many approaches use lane boundary information to locate the vehicle inside the street, or to integrate GPS-based localization. As many other computer vision based tasks, convolutional neural networks (CNNs) represent the state-of-the-art technology to indentify lane boundaries. However, the position of the lane boundaries w.r.t. the vehicle may not suffice for a reliable positioning, as for path planning or localization information regarding lane types may also be needed. In this work, we present an end-to-end system for lane boundary identification, clustering and classification, based on two cascaded neural networks, that runs in real-time. To build the system, 14336 lane boundaries instances of the TuSimple dataset for lane detection have been labelled using 8 different classes. Our dataset and the code for inference are available online. |
Tasks | Autonomous Vehicles, Lane Detection |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01294v2 |
https://arxiv.org/pdf/1907.01294v2.pdf | |
PWC | https://paperswithcode.com/paper/lane-detection-and-classification-using |
Repo | https://github.com/fabvio/TuSimple-lane-classes |
Framework | none |
TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks
Title | TomoGAN: Low-Dose Synchrotron X-Ray Tomography with Generative Adversarial Networks |
Authors | Zhengchun Liu, Tekin Bicer, Rajkumar Kettimuthu, Doga Gursoy, Francesco De Carlo, Ian Foster |
Abstract | Synchrotron-based x-ray tomography is a noninvasive imaging technique that allows for reconstructing the internal structure of materials at high spatial resolutions from tens of micrometers to a few nanometers. In order to resolve sample features at smaller length scales, however, a higher radiation dose is required. Therefore, the limitation on the achievable resolution is set primarily by noise at these length scales. We present \TOMOGAN{}, a denoising technique based on generative adversarial networks, for improving the quality of reconstructed images for low-dose imaging conditions. We evaluate our approach in two photon-budget-limited experimental conditions: (1) sufficient number of low-dose projections (based on Nyquist sampling), and (2) insufficient or limited number of high-dose projections. In both cases the angular sampling is assumed to be isotropic, and the photon budget throughout the experiment is fixed based on the maximum allowable radiation dose on the sample. Evaluation with both simulated and experimental datasets shows that our approach can significantly reduce noise in reconstructed images, improving the structural similarity score of simulation and experimental data from 0.18 to 0.9 and from 0.18 to 0.41, respectively. Furthermore, the quality of the reconstructed images with filtered back projection followed by our denoising approach exceeds that of reconstructions with the simultaneous iterative reconstruction technique, showing the computational superiority of our approach. |
Tasks | Denoising |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07582v5 |
https://arxiv.org/pdf/1902.07582v5.pdf | |
PWC | https://paperswithcode.com/paper/tomogan-low-dose-x-ray-tomography-with |
Repo | https://github.com/ramsesproject/TomoGAN |
Framework | tf |
Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings
Title | Russian Language Datasets in the Digitial Humanities Domain and Their Evaluation with Word Embeddings |
Authors | Gerhard Wohlgenannt, Artemii Babushkin, Denis Romashov, Igor Ukrainets, Anton Maskaykin, Ilya Shutov |
Abstract | In this paper, we present Russian language datasets in the digital humanities domain for the evaluation of word embedding techniques or similar language modeling and feature learning algorithms. The datasets are split into two task types, word intrusion and word analogy, and contain 31362 task units in total. The characteristics of the tasks and datasets are that they build upon small, domain-specific corpora, and that the datasets contain a high number of named entities. The datasets were created manually for two fantasy novel book series (“A Song of Ice and Fire” and “Harry Potter”). We provide baseline evaluations with popular word embedding models trained on the book corpora for the given tasks, both for the Russian and English language versions of the datasets. Finally, we compare and analyze the results and discuss specifics of Russian language with regards to the problem setting. |
Tasks | Language Modelling, Word Embeddings |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.08739v1 |
http://arxiv.org/pdf/1903.08739v1.pdf | |
PWC | https://paperswithcode.com/paper/russian-language-datasets-in-the-digitial |
Repo | https://github.com/ishutov/nlp2018_hp_asoif_rus |
Framework | none |
Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks
Title | Einconv: Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks |
Authors | Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-ichi Maeda |
Abstract | Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which of them is optimal. In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation. The class includes such well-known CNN modules as depthwise separable convolution layers and bottleneck layers, but also previously unknown modules with nonlinear activations. We also experimentally compare the tradeoff between prediction accuracy and time/space complexity for modules found by enumerating all possible decompositions, or by using a neural architecture search. We find some nonlinear decompositions outperform existing ones. |
Tasks | Model Compression, Neural Architecture Search |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04471v2 |
https://arxiv.org/pdf/1908.04471v2.pdf | |
PWC | https://paperswithcode.com/paper/einconv-exploring-unexplored-tensor |
Repo | https://github.com/pfnet-research/einconv |
Framework | pytorch |
Amortized Bethe Free Energy Minimization for Learning MRFs
Title | Amortized Bethe Free Energy Minimization for Learning MRFs |
Authors | Sam Wiseman, Yoon Kim |
Abstract | We propose to learn deep undirected graphical models (i.e., MRFs) with a non-ELBO objective for which we can calculate exact gradients. In particular, we optimize a saddle-point objective deriving from the Bethe free energy approximation to the partition function. Unlike much recent work in approximate inference, the derived objective requires no sampling, and can be efficiently computed even for very expressive MRFs. We furthermore amortize this optimization with trained inference networks. Experimentally, we find that the proposed approach compares favorably with loopy belief propagation, but is faster, and it allows for attaining better held out log likelihood than other recent approximate inference schemes. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06399v2 |
https://arxiv.org/pdf/1906.06399v2.pdf | |
PWC | https://paperswithcode.com/paper/amortized-bethe-free-energy-minimization-for |
Repo | https://github.com/swiseman/bethe-min |
Framework | pytorch |
Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality
Title | Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality |
Authors | Sukarna Barua, Xingjun Ma, Sarah Monazam Erfani, Michael E. Houle, James Bailey |
Abstract | Generative Adversarial Networks (GANs) are an elegant mechanism for data generation. However, a key challenge when using GANs is how to best measure their ability to generate realistic data. In this paper, we demonstrate that an intrinsic dimensional characterization of the data space learned by a GAN model leads to an effective evaluation metric for GAN quality. In particular, we propose a new evaluation measure, CrossLID, that assesses the local intrinsic dimensionality (LID) of real-world data with respect to neighborhoods found in GAN-generated samples. Intuitively, CrossLID measures the degree to which manifolds of two data distributions coincide with each other. In experiments on 4 benchmark image datasets, we compare our proposed measure to several state-of-the-art evaluation metrics. Our experiments show that CrossLID is strongly correlated with the progress of GAN training, is sensitive to mode collapse, is robust to small-scale noise and image transformations, and robust to sample size. Furthermore, we show how CrossLID can be used within the GAN training process to improve generation quality. |
Tasks | |
Published | 2019-05-02 |
URL | http://arxiv.org/abs/1905.00643v1 |
http://arxiv.org/pdf/1905.00643v1.pdf | |
PWC | https://paperswithcode.com/paper/quality-evaluation-of-gans-using-cross-local |
Repo | https://github.com/sukarnabarua/CrossLID |
Framework | tf |
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
Title | Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data |
Authors | Sergei Popov, Stanislav Morozov, Artem Babenko |
Abstract | Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data. |
Tasks | Representation Learning |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06312v2 |
https://arxiv.org/pdf/1909.06312v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-oblivious-decision-ensembles-for-deep |
Repo | https://github.com/Qwicen/node |
Framework | pytorch |
O-MedAL: Online Active Deep Learning for Medical Image Analysis
Title | O-MedAL: Online Active Deep Learning for Medical Image Analysis |
Authors | Asim Smailagic, Pedro Costa, Alex Gaudio, Kartik Khandelwal, Mostafa Mirshekari, Jonathon Fagert, Devesh Walawalkar, Susu Xu, Adrian Galdran, Pei Zhang, Aurélio Campilho, Hae Young Noh |
Abstract | Active Learning methods create an optimized and labeled training set from unlabeled data. We introduce a novel Online Active Deep Learning method for Medical Image Analysis. We extend our MedAL active learning framework to present new results in this paper. Experiments on three medical image datasets show that our novel online active learning model requires significantly less labelings, is more accurate, and is more robust to class imbalances than existing methods. Our method is also more accurate and computationally efficient than the baseline model. Compared to random sampling and uncertainty sampling, the method uses 275 and 200 (out of 768) fewer labeled examples, respectively. For Diabetic Retinopathy detection, our method attains a 5.88% accuracy improvement over the baseline model when 80% of the dataset is labeled, and the model reaches baseline accuracy when only 40% is labeled. |
Tasks | Active Learning, Diabetic Retinopathy Detection |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10508v1 |
https://arxiv.org/pdf/1908.10508v1.pdf | |
PWC | https://paperswithcode.com/paper/o-medal-online-active-deep-learning-for |
Repo | https://github.com/adgaudio/O-MedAL |
Framework | pytorch |
Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting
Title | Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting |
Authors | Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, Deyu Meng |
Abstract | Current deep neural networks (DNNs) can easily overfit to biased training data with corrupted labels or class imbalance. Sample re-weighting strategy is commonly used to alleviate this issue by designing a weighting function mapping from training loss to sample weight, and then iterating between weight recalculating and classifier updating. Current approaches, however, need manually pre-specify the weighting function as well as its additional hyper-parameters. It makes them fairly hard to be generally applied in practice due to the significant variation of proper weighting schemes relying on the investigated problem and training data. To address this issue, we propose a method capable of adaptively learning an explicit weighting function directly from data. The weighting function is an MLP with one hidden layer, constituting a universal approximator to almost any continuous functions, making the method able to fit a wide range of weighting functions including those assumed in conventional research. Guided by a small amount of unbiased meta-data, the parameters of the weighting function can be finely updated simultaneously with the learning process of the classifiers. Synthetic and real experiments substantiate the capability of our method for achieving proper weighting functions in class imbalance and noisy label cases, fully complying with the common settings in traditional methods, and more complicated scenarios beyond conventional cases. This naturally leads to its better accuracy than other state-of-the-art methods. |
Tasks | Image Classification, Meta-Learning |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07379v6 |
https://arxiv.org/pdf/1902.07379v6.pdf | |
PWC | https://paperswithcode.com/paper/push-the-student-to-learn-right-progressive |
Repo | https://github.com/xjtushujun/meta-weight-net |
Framework | pytorch |
DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
Title | DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog |
Authors | Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie Zhou |
Abstract | Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question- and history-aware image features and the question- and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin. |
Tasks | Visual Dialog |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08360v1 |
https://arxiv.org/pdf/1912.08360v1.pdf | |
PWC | https://paperswithcode.com/paper/dmrm-a-dual-channel-multi-hop-reasoning-model |
Repo | https://github.com/phellonchen/DMRM |
Framework | pytorch |
Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset
Title | Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset |
Authors | Misgina Tsighe Hagos, Shri Kant |
Abstract | Annotated training data insufficiency remains to be one of the challenges of applying deep learning in medical data classification problems. Transfer learning from an already trained deep convolutional network can be used to reduce the cost of training from scratch and to train with small training data for deep learning. This raises the question of whether we can use transfer learning to overcome the training data insufficiency problem in deep learning based medical data classifications. Deep convolutional networks have been achieving high performance results on the ImageNet Large Scale Visual Recognition Competition (ILSVRC) image classification challenge. One example is the Inception-V3 model that was the first runner up on the ILSVRC 2015 challenge. Inception modules that help to extract different sized features of input images in one level of convolution are the unique features of the Inception-V3. In this work, we have used a pretrained Inception-V3 model to take advantage of its Inception modules for Diabetic Retinopathy detection. In order to tackle the labelled data insufficiency problem, we sub-sampled a smaller version of the Kaggle Diabetic Retinopathy classification challenge dataset for model training, and tested the model’s accuracy on a previously unseen data subset. Our technique could be used in other deep learning based medical image classification problems facing the challenge of labeled training data insufficiency. |
Tasks | Diabetic Retinopathy Detection, Image Classification, Object Recognition, Transfer Learning |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07203v2 |
https://arxiv.org/pdf/1905.07203v2.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-based-detection-of-diabetic |
Repo | https://github.com/ShubhayanS/Multiclass-Diabetic-Retinopathy-Detection |
Framework | tf |
Derivative Manipulation for General Example Weighting
Title | Derivative Manipulation for General Example Weighting |
Authors | Xinshao Wang, Elyor Kodirov, Yang Hua, Neil M. Robertson |
Abstract | We propose derivative manipulation (DM) for training accurate and robust softmax-based deep neural networks, for two reasons: (1) In gradient-based optimisation, manipulating the derivative directly is more straightforward than designing loss functions, and it has a direct impact on the update of a model. (2) A loss function’s derivative magnitude function can be understood as a weighting scheme; the loss’s derivative of an example defines how much impact it has on the update of a model. Therefore, manipulating the derivative is to adjust the weighting scheme. DM simply modifies the derivative magnitude, including transformation and normalisation, after which the derivative magnitude function is termed emphasis density function (EDF). An EDF is a formula expressing an example weighting scheme and we may deduce many options for EDFs from common probability density functions (PDFs). We demonstrate the effectiveness of the DM formulation empirically by extensive experiments on both vision and language tasks, especially when adverse conditions exist, e.g., noisy data and sample imbalance. |
Tasks | Image Classification, Representation Learning |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11233v6 |
https://arxiv.org/pdf/1905.11233v6.pdf | |
PWC | https://paperswithcode.com/paper/emphasis-regularisation-by-gradient-rescaling |
Repo | https://github.com/XinshaoAmosWang/Emphasis-Regularisation-by-Gradient-Rescaling |
Framework | none |
Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images
Title | Efficient Ladder-style DenseNets for Semantic Segmentation of Large Images |
Authors | Ivan Krešo, Josip Krapac, Siniša Šegvić |
Abstract | Recent progress of deep image classification models has provided great potential to improve state-of-the-art performance in related computer vision tasks. However, the transition to semantic segmentation is hampered by strict memory limitations of contemporary GPUs. The extent of feature map caching required by convolutional backprop poses significant challenges even for moderately sized Pascal images, while requiring careful architectural considerations when the source resolution is in the megapixel range. To address these concerns, we propose a novel DenseNet-based ladder-style architecture which features high modelling power and a very lean upsampling datapath. We also propose to substantially reduce the extent of feature map caching by exploiting inherent spatial efficiency of the DenseNet feature extractor. The resulting models deliver high performance with fewer parameters than competitive approaches, and allow training at megapixel resolution on commodity hardware. The presented experimental results outperform the state-of-the-art in terms of prediction accuracy and execution speed on Cityscapes, Pascal VOC 2012, CamVid and ROB 2018 datasets. Source code will be released upon publication. |
Tasks | Image Classification, Semantic Segmentation |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05661v1 |
https://arxiv.org/pdf/1905.05661v1.pdf | |
PWC | https://paperswithcode.com/paper/190505661 |
Repo | https://github.com/Maligetzus/Semantic-Segmentation-of-Aerial-Photographs |
Framework | none |
An Annotated Corpus of Reference Resolution for Interpreting Common Grounding
Title | An Annotated Corpus of Reference Resolution for Interpreting Common Grounding |
Authors | Takuma Udagawa, Akiko Aizawa |
Abstract | Common grounding is the process of creating, repairing and updating mutual understandings, which is a fundamental aspect of natural language conversation. However, interpreting the process of common grounding is a challenging task, especially under continuous and partially-observable context where complex ambiguity, uncertainty, partial understandings and misunderstandings are introduced. Interpretation becomes even more challenging when we deal with dialogue systems which still have limited capability of natural language understanding and generation. To address this problem, we consider reference resolution as the central subtask of common grounding and propose a new resource to study its intermediate process. Based on a simple and general annotation schema, we collected a total of 40,172 referring expressions in 5,191 dialogues curated from an existing corpus, along with multiple judgements of referent interpretations. We show that our annotation is highly reliable, captures the complexity of common grounding through a natural degree of reasonable disagreements, and allows for more detailed and quantitative analyses of common grounding strategies. Finally, we demonstrate the advantages of our annotation for interpreting, analyzing and improving common grounding in baseline dialogue systems. |
Tasks | Coreference Resolution, Goal-Oriented Dialog, Visual Dialog |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07588v1 |
https://arxiv.org/pdf/1911.07588v1.pdf | |
PWC | https://paperswithcode.com/paper/an-annotated-corpus-of-reference-resolution |
Repo | https://github.com/Alab-NII/onecommon |
Framework | none |
Improving Generative Visual Dialog by Answering Diverse Questions
Title | Improving Generative Visual Dialog by Answering Diverse Questions |
Authors | Vishvak Murahari, Prithvijit Chattopadhyay, Dhruv Batra, Devi Parikh, Abhishek Das |
Abstract | Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this ‘self-talk’ approach can lead to improved performance at the downstream dialog-conditioned image-guessing task. However, this improvement saturates and starts degrading after a few rounds of interaction, and does not lead to a better Visual Dialog model. We find that this is due in part to repeated interactions between Qbot and Abot during self-talk, which are not informative with respect to the image. To improve this, we devise a simple auxiliary objective that incentivizes Qbot to ask diverse questions, thus reducing repetitions and in turn enabling Abot to explore a larger state space during RL ie. be exposed to more visual concepts to talk about, and varied questions to answer. We evaluate our approach via a host of automatic metrics and human studies, and demonstrate that it leads to better dialog, ie. dialog that is more diverse (ie. less repetitive), consistent (ie. has fewer conflicting exchanges), fluent (ie. more human-like),and detailed, while still being comparably image-relevant as prior work and ablations. |
Tasks | Representation Learning, Visual Dialog |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10470v2 |
https://arxiv.org/pdf/1909.10470v2.pdf | |
PWC | https://paperswithcode.com/paper/190910470 |
Repo | https://github.com/vmurahari3/visdial-diversity |
Framework | pytorch |