October 18, 2019

3201 words 16 mins read

Paper Group ANR 592

Pathwise Derivatives Beyond the Reparameterization Trick. Dense Multi-path U-Net for Ischemic Stroke Lesion Segmentation in Multiple Image Modalities. Injective State-Image Mapping facilitates Visual Adversarial Imitation Learning. The Convergence of Sparsified Gradient Methods. Magnetic Resonance Spectroscopy Quantification using Deep Learning. On …

Pathwise Derivatives Beyond the Reparameterization Trick


Title	Pathwise Derivatives Beyond the Reparameterization Trick
Authors	Martin Jankowiak, Fritz Obermeyer
Abstract	We observe that gradients computed via the reparameterization trick are in direct correspondence with solutions of the transport equation in the formalism of optimal transport. We use this perspective to compute (approximate) pathwise gradients for probability distributions not directly amenable to the reparameterization trick: Gamma, Beta, and Dirichlet. We further observe that when the reparameterization trick is applied to the Cholesky-factorized multivariate Normal distribution, the resulting gradients are suboptimal in the sense of optimal transport. We derive the optimal gradients and show that they have reduced variance in a Gaussian Process regression task. We demonstrate with a variety of synthetic experiments and stochastic variational inference tasks that our pathwise gradients are competitive with other methods.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01851v2
PDF	http://arxiv.org/pdf/1806.01851v2.pdf
PWC	https://paperswithcode.com/paper/pathwise-derivatives-beyond-the
Repo
Framework

Dense Multi-path U-Net for Ischemic Stroke Lesion Segmentation in Multiple Image Modalities


Title	Dense Multi-path U-Net for Ischemic Stroke Lesion Segmentation in Multiple Image Modalities
Authors	Jose Dolz, Ismail Ben Ayed, Christian Desrosiers
Abstract	Delineating infarcted tissue in ischemic stroke lesions is crucial to determine the extend of damage and optimal treatment for this life-threatening condition. However, this problem remains challenging due to high variability of ischemic strokes’ location and shape. Recently, fully-convolutional neural networks (CNN), in particular those based on U-Net, have led to improved performances for this task. In this work, we propose a novel architecture that improves standard U-Net based methods in three important ways. First, instead of combining the available image modalities at the input, each of them is processed in a different path to better exploit their unique information. Moreover, the network is densely-connected (i.e., each layer is connected to all following layers), both within each path and across different paths, similar to HyperDenseNet. This gives our model the freedom to learn the scale at which modalities should be processed and combined. Finally, inspired by the Inception architecture, we improve standard U-Net modules by extending inception modules with two convolutional blocks with dilated convolutions of different scale. This helps handling the variability in lesion sizes. We split the 93 stroke datasets into training and validation sets containing 83 and 9 examples respectively. Our network was trained on a NVidia TITAN XP GPU with 16 GBs RAM, using ADAM as optimizer and a learning rate of 1$\times$10$^{-5}$ during 200 epochs. Training took around 5 hours and segmentation of a whole volume took between 0.2 and 2 seconds, as average. The performance on the test set obtained by our method is compared to several baselines, to demonstrate the effectiveness of our architecture, and to a state-of-art architecture that employs factorized dilated convolutions, i.e., ERFNet.
Tasks	Ischemic Stroke Lesion Segmentation, Lesion Segmentation
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07003v1
PDF	http://arxiv.org/pdf/1810.07003v1.pdf
PWC	https://paperswithcode.com/paper/dense-multi-path-u-net-for-ischemic-stroke
Repo
Framework

Injective State-Image Mapping facilitates Visual Adversarial Imitation Learning


Title	Injective State-Image Mapping facilitates Visual Adversarial Imitation Learning
Authors	Subhajit Chaudhury, Daiki Kimura, Asim Munawar, Ryuki Tachibana
Abstract	The growing use of virtual autonomous agents in applications like games and entertainment demands better control policies for natural-looking movements and actions. Unlike the conventional approach of hard-coding motion routines, we propose a deep learning method for obtaining control policies by directly mimicking raw video demonstrations. Previous methods in this domain rely on extracting low-dimensional features from expert videos followed by a separate hand-crafted reward estimation step. We propose an imitation learning framework that reduces the dependence on hand-engineered reward functions by jointly learning the feature extraction and reward estimation steps using Generative Adversarial Networks (GANs). Our main contribution in this paper is to show that under injective mapping between low-level joint state (angles and velocities) trajectories and corresponding raw video stream, performing adversarial imitation learning on video demonstrations is equivalent to learning from the state trajectories. Experimental results show that the proposed adversarial learning method from raw videos produces a similar performance to state-of-the-art imitation learning techniques while frequently outperforming existing hand-crafted video imitation methods. Furthermore, we show that our method can learn action policies by imitating video demonstrations on YouTube with similar performance to learned agents from true reward signals. Please see the supplementary video submission at https://ibm.biz/BdzzNA.
Tasks	Imitation Learning
Published	2018-10-02
URL	https://arxiv.org/abs/1810.01108v2
PDF	https://arxiv.org/pdf/1810.01108v2.pdf
PWC	https://paperswithcode.com/paper/video-imitation-gan-learning-control-policies
Repo
Framework

The Convergence of Sparsified Gradient Methods


Title	The Convergence of Sparsified Gradient Methods
Authors	Dan Alistarh, Torsten Hoefler, Mikael Johansson, Sarit Khirirat, Nikola Konstantinov, Cédric Renggli
Abstract	Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed. To date, gradient sparsification methods - where each node sorts gradients by magnitude, and only communicates a subset of the components, accumulating the rest locally - are known to yield some of the largest practical gains. Such methods can reduce the amount of communication per step by up to three orders of magnitude, while preserving model accuracy. Yet, this family of methods currently has no theoretical justification. This is the question we address in this paper. We prove that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD. The main insight is that sparsification methods implicitly maintain bounds on the maximum impact of stale updates, thanks to selection by magnitude. Our analysis and empirical validation also reveal that these methods do require analytical conditions to converge well, justifying existing heuristics.
Tasks	Quantization
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10505v1
PDF	http://arxiv.org/pdf/1809.10505v1.pdf
PWC	https://paperswithcode.com/paper/the-convergence-of-sparsified-gradient
Repo
Framework

Magnetic Resonance Spectroscopy Quantification using Deep Learning


Title	Magnetic Resonance Spectroscopy Quantification using Deep Learning
Authors	Nima Hatami, Michaël Sdika, Hélène Ratiney
Abstract	Magnetic resonance spectroscopy (MRS) is an important technique in biomedical research and it has the unique capability to give a non-invasive access to the biochemical content (metabolites) of scanned organs. In the literature, the quantification (the extraction of the potential biomarkers from the MRS signals) involves the resolution of an inverse problem based on a parametric model of the metabolite signal. However, poor signal-to-noise ratio (SNR), presence of the macromolecule signal or high correlation between metabolite spectral patterns can cause high uncertainties for most of the metabolites, which is one of the main reasons that prevents use of MRS in clinical routine. In this paper, quantification of metabolites in MR Spectroscopic imaging using deep learning is proposed. A regression framework based on the Convolutional Neural Networks (CNN) is introduced for an accurate estimation of spectral parameters. The proposed model learns the spectral features from a large-scale simulated data set with different variations of human brain spectra and SNRs. Experimental results demonstrate the accuracy of the proposed method, compared to state of the art standard quantification method (QUEST), on concentration of 20 metabolites and the macromolecule.
Tasks
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07237v1
PDF	http://arxiv.org/pdf/1806.07237v1.pdf
PWC	https://paperswithcode.com/paper/magnetic-resonance-spectroscopy
Repo
Framework

On Design of Problem Token Questions in Quality of Experience Surveys


Title	On Design of Problem Token Questions in Quality of Experience Surveys
Authors	Jayant Gupchup, Ebrahim Beyrami, Martin Ellis, Yasaman Hosseinkashi, Sam Johnson, Ross Cutler
Abstract	User surveys for Quality of Experience (QoE) are a critical source of information. In addition to the common “star rating” used to estimate Mean Opinion Score (MOS), more detailed survey questions (problem tokens) about specific areas provide valuable insight into the factors impacting QoE. This paper explores two aspects of the problem token questionnaire design. First, we study the bias introduced by fixed question order, and second, we study the challenge of selecting a subset of questions to keep the token set small. Based on 900,000 calls gathered using a randomized controlled experiment from a live system, we find that the order bias can be significantly reduced by randomizing the display order of tokens. The difference in response rate varies based on token position and display design. It is worth noting that the users respond to the randomized-order variant at levels that are comparable to the fixed-order variant. The effective selection of a subset of token questions is achieved by extracting tokens that provide the highest information gain over user ratings. This selection is known to be in the class of NP-hard problems. We apply a well-known greedy submodular maximization method on our dataset to capture 94% of the information using just 30% of the questions.
Tasks
Published	2018-08-19
URL	http://arxiv.org/abs/1808.06152v1
PDF	http://arxiv.org/pdf/1808.06152v1.pdf
PWC	https://paperswithcode.com/paper/on-design-of-problem-token-questions-in
Repo
Framework

On the Properties of MVR Chain Graphs


Title	On the Properties of MVR Chain Graphs
Authors	Mohammad Ali Javidian, Marco Valtorta
Abstract	Depending on the interpretation of the type of edges, a chain graph can represent different relations between variables and thereby independence models. Three interpretations, known by the acronyms LWF, MVR, and AMP, are prevalent. Multivariate regression chain graphs (MVR CGs) were introduced by Cox and Wermuth in 1993. We review Markov properties for MVR chain graphs and propose an alternative global and local Markov property for them. Except for pairwise Markov properties, we show that for MVR chain graphs all Markov properties in the literature are equivalent for semi-graphoids. We derive a new factorization formula for MVR chain graphs which is more explicit than and different from the proposed factorizations for MVR chain graphs in the literature. Finally, we provide a summary table comparing different features of LWF, AMP, and MVR chain graphs.
Tasks
Published	2018-03-09
URL	http://arxiv.org/abs/1803.04262v7
PDF	http://arxiv.org/pdf/1803.04262v7.pdf
PWC	https://paperswithcode.com/paper/on-the-properties-of-mvr-chain-graphs
Repo
Framework

Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform


Title	Detailed Dense Inference with Convolutional Neural Networks via Discrete Wavelet Transform
Authors	Lingni Ma, Jörg Stückler, Tao Wu, Daniel Cremers
Abstract	Dense pixelwise prediction such as semantic segmentation is an up-to-date challenge for deep convolutional neural networks (CNNs). Many state-of-the-art approaches either tackle the loss of high-resolution information due to pooling in the encoder stage, or use dilated convolutions or high-resolution lanes to maintain detailed feature maps and predictions. Motivated by the structural analogy between multi-resolution wavelet analysis and the pooling/unpooling layers of CNNs, we introduce discrete wavelet transform (DWT) into the CNN encoder-decoder architecture and propose WCNN. The high-frequency wavelet coefficients are computed at encoder, which are later used at the decoder to unpooled jointly with coarse-resolution feature maps through the inverse DWT. The DWT/iDWT is further used to develop two wavelet pyramids to capture the global context, where the multi-resolution DWT is applied to successively reduce the spatial resolution and increase the receptive field. Experiment with the Cityscape dataset, the proposed WCNNs are computationally efficient and yield improvements the accuracy for high-resolution dense pixelwise prediction.
Tasks	Semantic Segmentation
Published	2018-08-06
URL	http://arxiv.org/abs/1808.01834v1
PDF	http://arxiv.org/pdf/1808.01834v1.pdf
PWC	https://paperswithcode.com/paper/detailed-dense-inference-with-convolutional
Repo
Framework

Generating Magnetic Resonance Spectroscopy Imaging Data of Brain Tumours from Linear, Non-Linear and Deep Learning Models


Title	Generating Magnetic Resonance Spectroscopy Imaging Data of Brain Tumours from Linear, Non-Linear and Deep Learning Models
Authors	Nathan J Olliverre, Guang Yang, Gregory Slabaugh, Constantino Carlos Reyes-Aldasoro, Eduardo Alonso
Abstract	Magnetic Resonance Spectroscopy (MRS) provides valuable information to help with the identification and understanding of brain tumors, yet MRS is not a widely available medical imaging modality. Aiming to counter this issue, this research draws on the advancements in machine learning techniques in other fields for the generation of artificial data. The generated methods were tested through the evaluation of their output against that of a real-world labelled MRS brain tumor data-set. Furthermore the resultant output from the generative techniques were each used to train separate traditional classifiers which were tested on a subset of the real MRS brain tumor dataset. The results suggest that there exist methods capable of producing accurate, ground truth based MRS voxels. These findings indicate that through generative techniques, large datasets can be made available for training deep, learning models for the use in brain tumor diagnosis.
Tasks
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07592v1
PDF	http://arxiv.org/pdf/1808.07592v1.pdf
PWC	https://paperswithcode.com/paper/generating-magnetic-resonance-spectroscopy
Repo
Framework

Projection-Free Algorithms in Statistical Estimation


Title	Projection-Free Algorithms in Statistical Estimation
Authors	Yan Li, Chao Qu, Huan Xu
Abstract	Frank-Wolfe algorithm (FW) and its variants have gained a surge of interests in machine learning community due to its projection-free property. Recently people have reduced the gradient evaluation complexity of FW algorithm to $\log(\frac{1}{\epsilon})$ for the smooth and strongly convex objective. This complexity result is especially significant in learning problem, as the overwhelming data size makes a single evluation of gradient computational expensive. However, in high-dimensional statistical estimation problems, the objective is typically not strongly convex, and sometimes even non-convex. In this paper, we extend the state-of-the-art FW type algorithms for the large-scale, high-dimensional estimation problem. We show that as long as the objective satisfies {\em restricted strong convexity}, and we are not optimizing over statistical limit of the model, the $\log(\frac{1}{\epsilon})$ gradient evaluation complexity could still be attained.
Tasks
Published	2018-05-20
URL	http://arxiv.org/abs/1805.07844v1
PDF	http://arxiv.org/pdf/1805.07844v1.pdf
PWC	https://paperswithcode.com/paper/projection-free-algorithms-in-statistical
Repo
Framework

Stochastic Learning under Random Reshuffling with Constant Step-sizes


Title	Stochastic Learning under Random Reshuffling with Constant Step-sizes
Authors	Bicheng Ying, Kun Yuan, Stefan Vlaski, Ali H. Sayed
Abstract	In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case and strongly convex loss function. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size $O(\mu^2)$ around the minimizer rather than $O(\mu)$. Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail the differences between sampling with and without replacement. We also explain the periodic behavior that is observed in random reshuffling implementations.
Tasks
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07964v2
PDF	http://arxiv.org/pdf/1803.07964v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-learning-under-random-reshuffling
Repo
Framework

BIRNet: Brain Image Registration Using Dual-Supervised Fully Convolutional Networks


Title	BIRNet: Brain Image Registration Using Dual-Supervised Fully Convolutional Networks
Authors	Jingfan Fan, Xiaohuan Cao, Pew-Thian Yap, Dinggang Shen
Abstract	In this paper, we propose a deep learning approach for image registration by predicting deformation from image appearance. Since obtaining ground-truth deformation fields for training can be challenging, we design a fully convolutional network that is subject to dual-guidance: (1) Coarse guidance using deformation fields obtained by an existing registration method; and (2) Fine guidance using image similarity. The latter guidance helps avoid overly relying on the supervision from the training deformation fields, which could be inaccurate. For effective training, we further improve the deep convolutional network with gap filling, hierarchical loss, and multi-source strategies. Experiments on a variety of datasets show promising registration accuracy and efficiency compared with state-of-the-art methods.
Tasks	Image Registration
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04692v1
PDF	http://arxiv.org/pdf/1802.04692v1.pdf
PWC	https://paperswithcode.com/paper/birnet-brain-image-registration-using-dual
Repo
Framework

Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)


Title	Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract)
Authors	Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach
Abstract	Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data. This can lead to incorrect captions in domains where unbiased captions are desired, or required, due to over reliance on the learned prior and image context. We investigate generation of gender specific caption words (e.g. man, woman) based on the person’s appearance or the image context. We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present. The resulting model is forced to look at a person rather than use contextual cues to make a gender specific prediction. The losses that comprise our model, the Appearance Confusion Loss and the Confident Loss, are general, and can be added to any description model in order to mitigate impacts of unwanted bias in a description dataset. Our proposed model has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men.
Tasks	Image Captioning
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00517v1
PDF	http://arxiv.org/pdf/1807.00517v1.pdf
PWC	https://paperswithcode.com/paper/women-also-snowboard-overcoming-bias-in
Repo
Framework

Scalable Angular Discriminative Deep Metric Learning for Face Recognition


Title	Scalable Angular Discriminative Deep Metric Learning for Face Recognition
Authors	Bowen Wu, Huaming Wu, Monica M. Y. Zhang
Abstract	With the development of deep learning, Deep Metric Learning (DML) has achieved great improvements in face recognition. Specifically, the widely used softmax loss in the training process often bring large intra-class variations, and feature normalization is only exploited in the testing process to compute the pair similarities. To bridge the gap, we impose the intra-class cosine similarity between the features and weight vectors in softmax loss larger than a margin in the training step, and extend it from four aspects. First, we explore the effect of a hard sample mining strategy. To alleviate the human labor of adjusting the margin hyper-parameter, a self-adaptive margin updating strategy is proposed. Then, a normalized version is given to take full advantage of the cosine similarity constraint. Furthermore, we enhance the former constraint to force the intra-class cosine similarity larger than the mean inter-class cosine similarity with a margin in the exponential feature projection space. Extensive experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and IARPA Janus Benchmark A (IJB-A) datasets demonstrate that the proposed methods outperform the mainstream DML methods and approach the state-of-the-art performance.
Tasks	Face Recognition, Metric Learning
Published	2018-04-29
URL	http://arxiv.org/abs/1804.10899v2
PDF	http://arxiv.org/pdf/1804.10899v2.pdf
PWC	https://paperswithcode.com/paper/scalable-angular-discriminative-deep-metric
Repo
Framework

Online convex optimization and no-regret learning: Algorithms, guarantees and applications


Title	Online convex optimization and no-regret learning: Algorithms, guarantees and applications
Authors	E. Veronica Belmega, Panayotis Mertikopoulos, Romain Negrel, Luca Sanguinetti
Abstract	Spurred by the enthusiasm surrounding the “Big Data” paradigm, the mathematical and algorithmic tools of online optimization have found widespread use in problems where the trade-off between data exploration and exploitation plays a predominant role. This trade-off is of particular importance to several branches and applications of signal processing, such as data mining, statistical inference, multimedia indexing and wireless communications (to name but a few). With this in mind, the aim of this tutorial paper is to provide a gentle introduction to online optimization and learning algorithms that are asymptotically optimal in hindsight - i.e., they approach the performance of a virtual algorithm with unlimited computational power and full knowledge of the future, a property known as no-regret. Particular attention is devoted to identifying the algorithms’ theoretical performance guarantees and to establish links with classic optimization paradigms (both static and stochastic). To allow a better understanding of this toolbox, we provide several examples throughout the tutorial ranging from metric learning to wireless resource allocation problems.
Tasks	Metric Learning
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04529v1
PDF	http://arxiv.org/pdf/1804.04529v1.pdf
PWC	https://paperswithcode.com/paper/online-convex-optimization-and-no-regret
Repo
Framework