July 30, 2019

3171 words 15 mins read

Paper Group AWR 68

Paper Group AWR 68

Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. Flow: A Modular Learning Framework for Autonomy in Traffic. Super-FAN: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs. On the Expressive Power of Overlapping Architectures of De …

Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations

Title Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations
Authors Carole H Sudre, Wenqi Li, Tom Vercauteren, Sébastien Ourselin, M. Jorge Cardoso
Abstract Deep-learning has proved in recent years to be a powerful tool for image analysis and is now widely used to segment both 2D and 3D medical images. Deep-learning segmentation frameworks rely not only on the choice of network architecture but also on the choice of loss function. When the segmentation process targets rare observations, a severe class imbalance is likely to occur between candidate labels, thus resulting in sub-optimal performance. In order to mitigate this issue, strategies such as the weighted cross-entropy function, the sensitivity function or the Dice loss function, have been proposed. In this work, we investigate the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks. We also propose to use the class re-balancing properties of the Generalized Dice overlap, a known metric for segmentation assessment, as a robust and accurate deep-learning loss function for unbalanced tasks.
Tasks
Published 2017-07-11
URL http://arxiv.org/abs/1707.03237v3
PDF http://arxiv.org/pdf/1707.03237v3.pdf
PWC https://paperswithcode.com/paper/generalised-dice-overlap-as-a-deep-learning
Repo https://github.com/neshitov/UNet
Framework pytorch

Flow: A Modular Learning Framework for Autonomy in Traffic

Title Flow: A Modular Learning Framework for Autonomy in Traffic
Authors Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, Alexandre M Bayen
Abstract The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, due to numerous technical, political, and human factors challenges, new methodologies are needed to design vehicles and transportation systems for these positive outcomes. This article tackles important technical challenges arising from the partial adoption of autonomy (hence termed mixed autonomy, to involve both AVs and human-driven vehicles): partial control, partial observation, complex multi-vehicle interactions, and the sheer variety of traffic settings represented by real-world networks. To enable the study of the full diversity of traffic settings, we first propose to decompose traffic control tasks into modules, which may be configured and composed to create new control tasks of interest. These modules include salient aspects of traffic control tasks: networks, actors, control laws, metrics, initialization, and additional dynamics. Second, we study the potential of model-free deep Reinforcement Learning (RL) methods to address the complexity of traffic dynamics. The resulting modular learning framework is called Flow. Using Flow, we create and study a variety of mixed-autonomy settings, including single-lane, multi-lane, and intersection traffic. In all cases, the learned control law exceeds human driving performance (measured by system-level velocity) by at least 40% with only 5-10% adoption of AVs. In the case of partially-observed single-lane traffic, we show that a low-parameter neural network control law can eliminate commonly observed stop-and-go traffic. In particular, the control laws surpass all known model-based controllers, achieving near-optimal performance across a wide spectrum of vehicle densities (even with a memoryless control law) and generalizing to out-of-distribution vehicle densities.
Tasks Autonomous Vehicles
Published 2017-10-16
URL https://arxiv.org/abs/1710.05465v2
PDF https://arxiv.org/pdf/1710.05465v2.pdf
PWC https://paperswithcode.com/paper/flow-architecture-and-benchmarking-for
Repo https://github.com/cathywu/flow
Framework none

Super-FAN: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs

Title Super-FAN: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs
Authors Adrian Bulat, Georgios Tzimiropoulos
Abstract This paper addresses 2 challenging tasks: improving the quality of low resolution facial images and accurately locating the facial landmarks on such poor resolution images. To this end, we make the following 5 contributions: (a) we propose Super-FAN: the very first end-to-end system that addresses both tasks simultaneously, i.e. both improves face resolution and detects the facial landmarks. The novelty or Super-FAN lies in incorporating structural information in a GAN-based super-resolution algorithm via integrating a sub-network for face alignment through heatmap regression and optimizing a novel heatmap loss. (b) We illustrate the benefit of training the two networks jointly by reporting good results not only on frontal images (as in prior work) but on the whole spectrum of facial poses, and not only on synthetic low resolution images (as in prior work) but also on real-world images. (c) We improve upon the state-of-the-art in face super-resolution by proposing a new residual-based architecture. (d) Quantitatively, we show large improvement over the state-of-the-art for both face super-resolution and alignment. (e) Qualitatively, we show for the first time good results on real-world low resolution images.
Tasks Face Alignment, Super-Resolution
Published 2017-12-07
URL http://arxiv.org/abs/1712.02765v2
PDF http://arxiv.org/pdf/1712.02765v2.pdf
PWC https://paperswithcode.com/paper/super-fan-integrated-facial-landmark
Repo https://github.com/ZoieMo/Multi-task
Framework none

On the Expressive Power of Overlapping Architectures of Deep Learning

Title On the Expressive Power of Overlapping Architectures of Deep Learning
Authors Or Sharir, Amnon Shashua
Abstract Expressive efficiency refers to the relation between two architectures A and B, whereby any function realized by B could be replicated by A, but there exists functions realized by A, which cannot be replicated by B unless its size grows significantly larger. For example, it is known that deep networks are exponentially efficient with respect to shallow networks, in the sense that a shallow network must grow exponentially large in order to approximate the functions represented by a deep network of polynomial size. In this work, we extend the study of expressive efficiency to the attribute of network connectivity and in particular to the effect of “overlaps” in the convolutional process, i.e., when the stride of the convolution is smaller than its filter size (receptive field). To theoretically analyze this aspect of network’s design, we focus on a well-established surrogate for ConvNets called Convolutional Arithmetic Circuits (ConvACs), and then demonstrate empirically that our results hold for standard ConvNets as well. Specifically, our analysis shows that having overlapping local receptive fields, and more broadly denser connectivity, results in an exponential increase in the expressive capacity of neural networks. Moreover, while denser connectivity can increase the expressive capacity, we show that the most common types of modern architectures already exhibit exponential increase in expressivity, without relying on fully-connected layers.
Tasks
Published 2017-03-06
URL http://arxiv.org/abs/1703.02065v4
PDF http://arxiv.org/pdf/1703.02065v4.pdf
PWC https://paperswithcode.com/paper/on-the-expressive-power-of-overlapping
Repo https://github.com/HUJI-Deep/OverlapsAndExpressiveness
Framework none

Zonotope hit-and-run for efficient sampling from projection DPPs

Title Zonotope hit-and-run for efficient sampling from projection DPPs
Authors Guillaume Gautier, Rémi Bardenet, Michal Valko
Abstract Determinantal point processes (DPPs) are distributions over sets of items that model diversity using kernels. Their applications in machine learning include summary extraction and recommendation systems. Yet, the cost of sampling from a DPP is prohibitive in large-scale applications, which has triggered an effort towards efficient approximate samplers. We build a novel MCMC sampler that combines ideas from combinatorial geometry, linear programming, and Monte Carlo methods to sample from DPPs with a fixed sample cardinality, also called projection DPPs. Our sampler leverages the ability of the hit-and-run MCMC kernel to efficiently move across convex bodies. Previous theoretical results yield a fast mixing time of our chain when targeting a distribution that is close to a projection DPP, but not a DPP in general. Our empirical results demonstrate that this extends to sampling projection DPPs, i.e., our sampler is more sample-efficient than previous approaches which in turn translates to faster convergence when dealing with costly-to-evaluate functions, such as summary extraction in our experiments.
Tasks Point Processes, Recommendation Systems
Published 2017-05-30
URL http://arxiv.org/abs/1705.10498v2
PDF http://arxiv.org/pdf/1705.10498v2.pdf
PWC https://paperswithcode.com/paper/zonotope-hit-and-run-for-efficient-sampling
Repo https://github.com/guilgautier/DPPy
Framework none

On Calibration of Modern Neural Networks

Title On Calibration of Modern Neural Networks
Authors Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger
Abstract Confidence calibration – the problem of predicting probability estimates representative of the true correctness likelihood – is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling – a single-parameter variant of Platt Scaling – is surprisingly effective at calibrating predictions.
Tasks Calibration, Document Classification
Published 2017-06-14
URL http://arxiv.org/abs/1706.04599v2
PDF http://arxiv.org/pdf/1706.04599v2.pdf
PWC https://paperswithcode.com/paper/on-calibration-of-modern-neural-networks
Repo https://github.com/gpleiss/temperature_scaling
Framework pytorch

Sharing Residual Units Through Collective Tensor Factorization in Deep Neural Networks

Title Sharing Residual Units Through Collective Tensor Factorization in Deep Neural Networks
Authors Chen Yunpeng, Jin Xiaojie, Kang Bingyi, Feng Jiashi, Yan Shuicheng
Abstract Residual units are wildly used for alleviating optimization difficulties when building deep neural networks. However, the performance gain does not well compensate the model size increase, indicating low parameter efficiency in these residual units. In this work, we first revisit the residual function in several variations of residual units and demonstrate that these residual functions can actually be explained with a unified framework based on generalized block term decomposition. Then, based on the new explanation, we propose a new architecture, Collective Residual Unit (CRU), which enhances the parameter efficiency of deep neural networks through collective tensor factorization. CRU enables knowledge sharing across different residual units using shared factors. Experimental results show that our proposed CRU Network demonstrates outstanding parameter efficiency, achieving comparable classification performance to ResNet-200 with the model size of ResNet-50. By building a deeper network using CRU, we can achieve state-of-the-art single model classification accuracy on ImageNet-1k and Places365-Standard benchmark datasets. (Code and trained models are available on GitHub)
Tasks
Published 2017-03-07
URL http://arxiv.org/abs/1703.02180v2
PDF http://arxiv.org/pdf/1703.02180v2.pdf
PWC https://paperswithcode.com/paper/sharing-residual-units-through-collective
Repo https://github.com/osmr/imgclsmob
Framework mxnet

Latent Variable Dialogue Models and their Diversity

Title Latent Variable Dialogue Models and their Diversity
Authors Kris Cao, Stephen Clark
Abstract We present a dialogue generation model that directly captures the variability in possible responses to a given input, which reduces the `boring output’ issue of deterministic dialogue models. Experiments show that our model generates more diverse outputs than baseline models, and also generates more consistently acceptable output than sampling from a deterministic encoder-decoder model. |
Tasks Dialogue Generation
Published 2017-02-20
URL http://arxiv.org/abs/1702.05962v1
PDF http://arxiv.org/pdf/1702.05962v1.pdf
PWC https://paperswithcode.com/paper/latent-variable-dialogue-models-and-their
Repo https://github.com/timbmg/DIAL-LV
Framework pytorch

Convolutional Gaussian Processes

Title Convolutional Gaussian Processes
Authors Mark van der Wilk, Carl Edward Rasmussen, James Hensman
Abstract We present a practical way of introducing convolutional structure into Gaussian processes, making them more suited to high-dimensional inputs like images. The main contribution of our work is the construction of an inter-domain inducing point approximation that is well-tailored to the convolutional kernel. This allows us to gain the generalisation benefit of a convolutional kernel, together with fast but accurate posterior inference. We investigate several variations of the convolutional kernel, and apply it to MNIST and CIFAR-10, which have both been known to be challenging for Gaussian processes. We also show how the marginal likelihood can be used to find an optimal weighting between convolutional and RBF kernels to further improve performance. We hope that this illustration of the usefulness of a marginal likelihood will help automate discovering architectures in larger models.
Tasks Gaussian Processes
Published 2017-09-06
URL http://arxiv.org/abs/1709.01894v1
PDF http://arxiv.org/pdf/1709.01894v1.pdf
PWC https://paperswithcode.com/paper/convolutional-gaussian-processes
Repo https://github.com/GPflow/GPflow
Framework tf

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

Title Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
Authors Antti Tarvainen, Harri Valpola
Abstract The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.
Tasks Semi-Supervised Image Classification
Published 2017-03-06
URL http://arxiv.org/abs/1703.01780v6
PDF http://arxiv.org/pdf/1703.01780v6.pdf
PWC https://paperswithcode.com/paper/mean-teachers-are-better-role-models-weight
Repo https://github.com/sud0301/semisup-semseg
Framework pytorch

Reproducing and learning new algebraic operations on word embeddings using genetic programming

Title Reproducing and learning new algebraic operations on word embeddings using genetic programming
Authors Roberto Santana
Abstract Word-vector representations associate a high dimensional real-vector to every word from a corpus. Recently, neural-network based methods have been proposed for learning this representation from large corpora. This type of word-to-vector embedding is able to keep, in the learned vector space, some of the syntactic and semantic relationships present in the original word corpus. This, in turn, serves to address different types of language classification tasks by doing algebraic operations defined on the vectors. The general practice is to assume that the semantic relationships between the words can be inferred by the application of a-priori specified algebraic operations. Our general goal in this paper is to show that it is possible to learn methods for word composition in semantic spaces. Instead of expressing the compositional method as an algebraic operation, we will encode it as a program, which can be linear, nonlinear, or involve more intricate expressions. More remarkably, this program will be evolved from a set of initial random programs by means of genetic programming (GP). We show that our method is able to reproduce the same behavior as human-designed algebraic operators. Using a word analogy task as benchmark, we also show that GP-generated programs are able to obtain accuracy values above those produced by the commonly used human-designed rule for algebraic manipulation of word vectors. Finally, we show the robustness of our approach by executing the evolved programs on the word2vec GoogleNews vectors, learned over 3 billion running words, and assessing their accuracy in the same word analogy task.
Tasks Word Embeddings
Published 2017-02-18
URL http://arxiv.org/abs/1702.05624v1
PDF http://arxiv.org/pdf/1702.05624v1.pdf
PWC https://paperswithcode.com/paper/reproducing-and-learning-new-algebraic
Repo https://github.com/rsantana-isg/GP_word2vec
Framework none

Query-based Attention CNN for Text Similarity Map

Title Query-based Attention CNN for Text Similarity Map
Authors Tzu-Chien Liu, Yu-Hsueh Wu, Hung-Yi Lee
Abstract In this paper, we introduce Query-based Attention CNN(QACNN) for Text Similarity Map, an end-to-end neural network for question answering. This network is composed of compare mechanism, two-staged CNN architecture with attention mechanism, and a prediction layer. First, the compare mechanism compares between the given passage, query, and multiple answer choices to build similarity maps. Then, the two-staged CNN architecture extracts features through word-level and sentence-level. At the same time, attention mechanism helps CNN focus more on the important part of the passage based on the query information. Finally, the prediction layer find out the most possible answer choice. We conduct this model on the MovieQA dataset using Plot Synopses only, and achieve 79.99% accuracy which is the state of the art on the dataset.
Tasks Question Answering
Published 2017-09-15
URL http://arxiv.org/abs/1709.05036v2
PDF http://arxiv.org/pdf/1709.05036v2.pdf
PWC https://paperswithcode.com/paper/query-based-attention-cnn-for-text-similarity
Repo https://github.com/coderalo/QACNN-PyTorch
Framework pytorch

Towards Adversarial Retinal Image Synthesis

Title Towards Adversarial Retinal Image Synthesis
Authors Pedro Costa, Adrian Galdran, Maria Inês Meyer, Michael David Abràmoff, Meindert Niemeijer, Ana Maria Mendonça, Aurélio Campilho
Abstract Synthesizing images of the eye fundus is a challenging task that has been previously approached by formulating complex models of the anatomy of the eye. New images can then be generated by sampling a suitable parameter space. In this work, we propose a method that learns to synthesize eye fundus images directly from data. For that, we pair true eye fundus images with their respective vessel trees, by means of a vessel segmentation technique. These pairs are then used to learn a mapping from a binary vessel tree to a new retinal image. For this purpose, we use a recent image-to-image translation technique, based on the idea of adversarial learning. Experimental results show that the original and the generated images are visually different in terms of their global appearance, in spite of sharing the same vessel tree. Additionally, a quantitative quality analysis of the synthetic retinal images confirms that the produced images retain a high proportion of the true image set quality.
Tasks Image Generation, Image-to-Image Translation, Medical Image Generation
Published 2017-01-31
URL http://arxiv.org/abs/1701.08974v1
PDF http://arxiv.org/pdf/1701.08974v1.pdf
PWC https://paperswithcode.com/paper/towards-adversarial-retinal-image-synthesis
Repo https://github.com/costapt/vess2ret
Framework tf

Chunk-Based Bi-Scale Decoder for Neural Machine Translation

Title Chunk-Based Bi-Scale Decoder for Neural Machine Translation
Authors Hao Zhou, Zhaopeng Tu, Shujian Huang, Xiaohua Liu, Hang Li, Jiajun Chen
Abstract In typical neural machine translation~(NMT), the decoder generates a sentence word by word, packing all linguistic granularities in the same time-scale of RNN. In this paper, we propose a new type of decoder for NMT, which splits the decode state into two parts and updates them in two different time-scales. Specifically, we first predict a chunk time-scale state for phrasal modeling, on top of which multiple word time-scale states are generated. In this way, the target sentence is translated hierarchically from chunks to words, with information in different granularities being leveraged. Experiments show that our proposed model significantly improves the translation performance over the state-of-the-art NMT model.
Tasks Machine Translation
Published 2017-05-03
URL http://arxiv.org/abs/1705.01452v1
PDF http://arxiv.org/pdf/1705.01452v1.pdf
PWC https://paperswithcode.com/paper/chunk-based-bi-scale-decoder-for-neural
Repo https://github.com/zhouh/chunk-nmt
Framework none

Interpretation of Neural Networks is Fragile

Title Interpretation of Neural Networks is Fragile
Authors Amirata Ghorbani, Abubakar Abid, James Zou
Abstract In order for machine learning to be deployed and trusted in many applications, it is crucial to be able to reliably explain why the machine learning algorithm makes certain predictions. For example, if an algorithm classifies a given pathology image to be a malignant tumor, then the doctor may need to know which parts of the image led the algorithm to this classification. How to interpret black-box predictors is thus an important and active area of research. A fundamental question is: how much can we trust the interpretation itself? In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations. We systematically characterize the fragility of several widely-used feature-importance interpretation methods (saliency maps, relevance propagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that even small random perturbation can change the feature importance and new systematic perturbations can lead to dramatically different interpretations without changing the label. We extend these results to show that interpretations based on exemplars (e.g. influence functions) are similarly fragile. Our analysis of the geometry of the Hessian matrix gives insight on why fragility could be a fundamental challenge to the current interpretation approaches.
Tasks Feature Importance
Published 2017-10-29
URL http://arxiv.org/abs/1710.10547v2
PDF http://arxiv.org/pdf/1710.10547v2.pdf
PWC https://paperswithcode.com/paper/interpretation-of-neural-networks-is-fragile
Repo https://github.com/amiratag/InterpretationFragility
Framework tf
comments powered by Disqus