April 1, 2020

3070 words 15 mins read

Paper Group ANR 480

A Computational Investigation on Denominalization. A One-Shot Learning Framework for Assessment of Fibrillar Collagen from Second Harmonic Generation Images of an Infarcted Myocardium. Plant Stem Segmentation Using Fast Ground Truth Generation. Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems. Improved inter-scanner MS lesi …

A Computational Investigation on Denominalization


Title	A Computational Investigation on Denominalization
Authors	Zahra Shekarchi, Yang Xu
Abstract	Language has been a dynamic system and word meanings always have been changed over times. Every time a novel concept or sense is introduced, we need to assign it a word to express it. Also, some changes have happened because the result of a change can be more desirable for humans, or cognitively easier to be used by humans. Finding the patterns of these changes is interesting and can reveal some facts about human cognitive evolution. As we have enough resources for studying this problem, it is a good idea to work on the problem through computational modeling, and that can make the work easier and possible to be studied on large scale. In this work, we want to study the nouns which have been used as verbs after some years of their emergence as nouns and find some commonalities among these nouns. In other words, we are interested in finding what potential requirements are essential for this change.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2003.04975v1
PDF	https://arxiv.org/pdf/2003.04975v1.pdf
PWC	https://paperswithcode.com/paper/a-computational-investigation-on
Repo
Framework

A One-Shot Learning Framework for Assessment of Fibrillar Collagen from Second Harmonic Generation Images of an Infarcted Myocardium


Title	A One-Shot Learning Framework for Assessment of Fibrillar Collagen from Second Harmonic Generation Images of an Infarcted Myocardium
Authors	Qun Liu, Supratik Mukhopadhyay, Maria Ximena Bastidas Rodriguez, Xing Fu, Sushant Sahu, David Burk, Manas Gartia
Abstract	Myocardial infarction (MI) is a scientific term that refers to heart attack. In this study, we infer highly relevant second harmonic generation (SHG) cues from collagen fibers exhibiting highly non-centrosymmetric assembly together with two-photon excited cellular autofluorescence in infarcted mouse heart to quantitatively probe fibrosis, especially targeted at an early stage after MI. We present a robust one-shot machine learning algorithm that enables determination of 2D assembly of collagen with high spatial resolution along with its structural arrangement in heart tissues post-MI with spectral specificity and sensitivity. Detection, evaluation, and precise quantification of fibrosis extent at early stage would guide one to develop treatment therapies that may prevent further progression and determine heart transplant needs for patient survival.
Tasks	One-Shot Learning
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08395v2
PDF	https://arxiv.org/pdf/2001.08395v2.pdf
PWC	https://paperswithcode.com/paper/a-one-shot-learning-framework-for-assessment
Repo
Framework

Plant Stem Segmentation Using Fast Ground Truth Generation


Title	Plant Stem Segmentation Using Fast Ground Truth Generation
Authors	Changye Yang, Sriram Baireddy, Yuhao Chen, Enyu Cai, Denise Caldwell, Valérian Méline, Anjali S. Iyer-Pascuzzi, Edward J. Delp
Abstract	Accurately phenotyping plant wilting is important for understanding responses to environmental stress. Analysis of the shape of plants can potentially be used to accurately quantify the degree of wilting. Plant shape analysis can be enhanced by locating the stem, which serves as a consistent reference point during wilting. In this paper, we show that deep learning methods can accurately segment tomato plant stems. We also propose a control-point-based ground truth method that drastically reduces the resources needed to create a training dataset for a deep learning approach. Experimental results show the viability of both our proposed ground truth approach and deep learning based stem segmentation.
Tasks
Published	2020-01-24
URL	https://arxiv.org/abs/2001.08854v1
PDF	https://arxiv.org/pdf/2001.08854v1.pdf
PWC	https://paperswithcode.com/paper/plant-stem-segmentation-using-fast-ground
Repo
Framework

Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems


Title	Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems
Authors	Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar
Abstract	We study the problem of adaptive control in partially observable linear dynamical systems. We propose a novel algorithm, adaptive control online learning algorithm (AdaptOn), which efficiently explores the environment, estimates the system dynamics episodically and exploits these estimates to design effective controllers to minimize the cumulative costs. Through interaction with the environment, AdaptOn deploys online convex optimization to optimize the controller while simultaneously learning the system dynamics to improve the accuracy of controller updates. We show that when the cost functions are strongly convex, after $T$ times step of agent-environment interaction, AdaptOn achieves regret upper bound of $\text{polylog}\left(T\right)$. To the best of our knowledge, AdaptOn is the first algorithm which achieves $\text{polylog}\left(T\right)$ regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.
Tasks
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11227v1
PDF	https://arxiv.org/pdf/2003.11227v1.pdf
PWC	https://paperswithcode.com/paper/logarithmic-regret-bound-in-partially
Repo
Framework

Improved inter-scanner MS lesion segmentation by adversarial training on longitudinal data


Title	Improved inter-scanner MS lesion segmentation by adversarial training on longitudinal data
Authors	Mattias Billast, Maria Ines Meyer, Diana M. Sima, David Robben
Abstract	The evaluation of white matter lesion progression is an important biomarker in the follow-up of MS patients and plays a crucial role when deciding the course of treatment. Current automated lesion segmentation algorithms are susceptible to variability in image characteristics related to MRI scanner or protocol differences. We propose a model that improves the consistency of MS lesion segmentations in inter-scanner studies. First, we train a CNN base model to approximate the performance of icobrain, an FDA-approved clinically available lesion segmentation software. A discriminator model is then trained to predict if two lesion segmentations are based on scans acquired using the same scanner type or not, achieving a 78% accuracy in this task. Finally, the base model and the discriminator are trained adversarially on multi-scanner longitudinal data to improve the inter-scanner consistency of the base model. The performance of the models is evaluated on an unseen dataset containing manual delineations. The inter-scanner variability is evaluated on test-retest data, where the adversarial network produces improved results over the base model and the FDA-approved solution.
Tasks	Lesion Segmentation
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00952v1
PDF	https://arxiv.org/pdf/2002.00952v1.pdf
PWC	https://paperswithcode.com/paper/improved-inter-scanner-ms-lesion-segmentation
Repo
Framework

STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition


Title	STH: Spatio-Temporal Hybrid Convolution for Efficient Action Recognition
Authors	Xu Li, Jingwen Wang, Lin Ma, Kaihao Zhang, Fengzong Lian, Zhanhui Kang, Jinjun Wang
Abstract	Effective and Efficient spatio-temporal modeling is essential for action recognition. Existing methods suffer from the trade-off between model performance and model complexity. In this paper, we present a novel Spatio-Temporal Hybrid Convolution Network (denoted as “STH”) which simultaneously encodes spatial and temporal video information with a small parameter cost. Different from existing works that sequentially or parallelly extract spatial and temporal information with different convolutional layers, we divide the input channels into multiple groups and interleave the spatial and temporal operations in one convolutional layer, which deeply incorporates spatial and temporal clues. Such a design enables efficient spatio-temporal modeling and maintains a small model scale. STH-Conv is a general building block, which can be plugged into existing 2D CNN architectures such as ResNet and MobileNet by replacing the conventional 2D-Conv blocks (2D convolutions). STH network achieves competitive or even better performance than its competitors on benchmark datasets such as Something-Something (V1 & V2), Jester, and HMDB-51. Moreover, STH enjoys performance superiority over 3D CNNs while maintaining an even smaller parameter cost than 2D CNNs.
Tasks
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08042v1
PDF	https://arxiv.org/pdf/2003.08042v1.pdf
PWC	https://paperswithcode.com/paper/sth-spatio-temporal-hybrid-convolution-for
Repo
Framework

MFFW: A new dataset for multi-focus image fusion


Title	MFFW: A new dataset for multi-focus image fusion
Authors	Shuang Xu, Xiaoli Wei, Chunxia Zhang, Junmin Liu, Jiangshe Zhang
Abstract	Multi-focus image fusion (MFF) is a fundamental task in the field of computational photography. Current methods have achieved significant performance improvement. It is found that current methods are evaluated on simulated image sets or Lytro dataset. Recently, a growing number of researchers pay attention to defocus spread effect, a phenomenon of real-world multi-focus images. Nonetheless, defocus spread effect is not obvious in simulated or Lytro datasets, where popular methods perform very similar. To compare their performance on images with defocus spread effect, this paper constructs a new dataset called MFF in the wild (MFFW). It contains 19 pairs of multi-focus images collected on the Internet. We register all pairs of source images, and provide focus maps and reference images for part of pairs. Compared with Lytro dataset, images in MFFW significantly suffer from defocus spread effect. In addition, the scenes of MFFW are more complex. The experiments demonstrate that most state-of-the-art methods on MFFW dataset cannot robustly generate satisfactory fusion images. MFFW can be a new baseline dataset to test whether an MMF algorithm is able to deal with defocus spread effect.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.04780v1
PDF	https://arxiv.org/pdf/2002.04780v1.pdf
PWC	https://paperswithcode.com/paper/mffw-a-new-dataset-for-multi-focus-image
Repo
Framework

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition


Title	A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Authors	Erik McDermott, Hasim Sak, Ehsan Variani
Abstract	This article describes a density ratio approach to integrating external Language Models (LMs) into end-to-end models for Automatic Speech Recognition (ASR). Applied to a Recurrent Neural Network Transducer (RNN-T) ASR model trained on a given domain, a matched in-domain RNN-LM, and a target domain RNN-LM, the proposed method uses Bayes’ Rule to define RNN-T posteriors for the target domain, in a manner directly analogous to the classic hybrid model for ASR based on Deep Neural Networks (DNNs) or LSTMs in the Hidden Markov Model (HMM) framework (Bourlard & Morgan, 1994). The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T. Specifically, an RNN-T model trained on paired audio & transcript data from YouTube is evaluated for its ability to generalize to Voice Search data. The Density Ratio method was found to consistently outperform the dominant approach to LM and end-to-end ASR integration, Shallow Fusion.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11268v3
PDF	https://arxiv.org/pdf/2002.11268v3.pdf
PWC	https://paperswithcode.com/paper/a-density-ratio-approach-to-language-model
Repo
Framework

Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data


Title	Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data
Authors	Yuxin Chen, Jianqing Fan, Cong Ma, Yuling Yan
Abstract	This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as robust principal component analysis (robust PCA), finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-a-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the $\ell_{\infty}$ loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.
Tasks
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05484v1
PDF	https://arxiv.org/pdf/2001.05484v1.pdf
PWC	https://paperswithcode.com/paper/bridging-convex-and-nonconvex-optimization-in
Repo
Framework

Learned Threshold Pruning


Title	Learned Threshold Pruning
Authors	Kambiz Azarian, Yash Bhalgat, Jinwon Lee, Tijmen Blankevoort
Abstract	This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method enjoys a number of important advantages. First, it learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes less than $30$ epochs for LTP to prune most networks on ImageNet. This is in contrast to other methods that search for per-layer thresholds via a computationally intensive iterative pruning and fine-tuning process. Additionally, with a novel differentiable $L_0$ regularization, LTP is able to operate effectively on architectures with batch-normalization. This is important since $L_1$ and $L_2$ penalties lose their regularizing effect in networks with batch-normalization. Finally, LTP generates a trail of progressively sparser networks from which the desired pruned network can be picked based on sparsity and performance requirements. These features allow LTP to achieve state-of-the-art compression rates on ImageNet networks such as AlexNet ($26.4\times$ compression with $79.1%$ Top-5 accuracy) and ResNet50 ($9.1\times$ compression with $92.0%$ Top-5 accuracy). We also show that LTP effectively prunes newer architectures, such as EfficientNet, MobileNetV2 and MixNet.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00075v1
PDF	https://arxiv.org/pdf/2003.00075v1.pdf
PWC	https://paperswithcode.com/paper/learned-threshold-pruning
Repo
Framework

Sampling and Update Frequencies in Proximal Variance Reduced Stochastic Gradient Methods


Title	Sampling and Update Frequencies in Proximal Variance Reduced Stochastic Gradient Methods
Authors	Martin Morin, Pontus Giselsson
Abstract	Variance reduced stochastic gradient methods have gained popularity in recent times. Several variants exist with different strategies for the storing and sampling of gradients. In this work we focus on the analysis of the interaction of these two aspects. We present and analyze a general proximal variance reduced gradient method under strong convexity assumptions. Special cases of the algorithm include SAGA, L-SVRG and their proximal variants. Our analysis sheds light on epoch-length selection and the need to balance the convergence of the iterates and how often gradients are stored. The analysis improves on other convergence rates found in literature and produces a new and faster converging sampling strategy for SAGA. Problem instances for which the predicted rates are the same as the practical rates are presented together with problems based on real world data.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05545v2
PDF	https://arxiv.org/pdf/2002.05545v2.pdf
PWC	https://paperswithcode.com/paper/sampling-and-update-frequencies-in-proximal
Repo
Framework

Learning the mapping $\mathbf{x}\mapsto \sum_


Title	Learning the mapping $\mathbf{x}\mapsto \sum_{i=1}^d x_i^2$: the cost of finding the needle in a haystack
Authors	Jiefu Zhang, Leonardo Zepeda-Núñez, Yuan Yao, Lin Lin
Abstract	The task of using machine learning to approximate the mapping $\mathbf{x}\mapsto\sum_{i=1}^d x_i^2$ with $x_i\in[-1,1]$ seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to find the sparse network embedded in the dense network is similar to finding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of finding the needle is directly related to the Barron norm of the function. While only a small number of samples is needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. In order to control the size of the generalization gap, we find that the use of explicit regularization becomes increasingly more important as $d$ increases. The numerically observed sample complexity with explicit regularization scales as $\mathcal{O}(d^{2.5})$, which is in fact better than the theoretically predicted sample complexity that scales as $\mathcal{O}(d^{4})$. Without explicit regularization (also called implicit regularization), the numerically observed sample complexity is significantly higher and is close to $\mathcal{O}(d^{4.5})$.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10561v1
PDF	https://arxiv.org/pdf/2002.10561v1.pdf
PWC	https://paperswithcode.com/paper/learning-the-mapping-mathbfxmapsto-sum_i1d
Repo
Framework

Neural-Swarm: Decentralized Close-Proximity Multirotor Control Using Learned Interactions


Title	Neural-Swarm: Decentralized Close-Proximity Multirotor Control Using Learned Interactions
Authors	Guanya Shi, Wolfgang Hönig, Yisong Yue, Soon-Jo Chung
Abstract	In this paper, we present Neural-Swarm, a nonlinear decentralized stable controller for close-proximity flight of multirotor swarms. Close-proximity control is challenging due to the complex aerodynamic interaction effects between multirotors, such as downwash from higher vehicles to lower ones. Conventional methods often fail to properly capture these interaction effects, resulting in controllers that must maintain large safety distances between vehicles, and thus are not capable of close-proximity flight. Our approach combines a nominal dynamics model with a regularized permutation-invariant Deep Neural Network (DNN) that accurately learns the high-order multi-vehicle interactions. We design a stable nonlinear tracking controller using the learned model. Experimental results demonstrate that the proposed controller significantly outperforms a baseline nonlinear tracking controller with up to four times smaller worst-case height tracking errors. We also empirically demonstrate the ability of our learned model to generalize to larger swarm sizes.
Tasks
Published	2020-03-06
URL	https://arxiv.org/abs/2003.02992v1
PDF	https://arxiv.org/pdf/2003.02992v1.pdf
PWC	https://paperswithcode.com/paper/neural-swarm-decentralized-close-proximity
Repo
Framework

Universal Phone Recognition with a Multilingual Allophone System


Title	Universal Phone Recognition with a Multilingual Allophone System
Authors	Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W Black, Florian Metze
Abstract	Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can support lexical contrasts in a particular language) and their corresponding phones (the sounds that are actually spoken, which are language independent). This can lead to performance degradation when combining a variety of training languages, as identically annotated phonemes can actually correspond to several different underlying phonetic realizations. In this work, we propose a joint model of both language-independent phone and language-dependent phoneme distributions. In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute in low-resource conditions. Additionally, because we are explicitly modeling language-independent phones, we can build a (nearly-)universal phone recognizer that, when combined with the PHOIBLE large, manually curated database of phone inventories, can be customized into 2,000 language dependent recognizers. Experiments on two low-resourced indigenous languages, Inuktitut and Tusom, show that our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
Tasks	Speech Recognition
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11800v1
PDF	https://arxiv.org/pdf/2002.11800v1.pdf
PWC	https://paperswithcode.com/paper/universal-phone-recognition-with-a
Repo
Framework

Residual Attention Net for Superior Cross-Domain Time Sequence Modeling


Title	Residual Attention Net for Superior Cross-Domain Time Sequence Modeling
Authors	Seth H. Huang, Xu Lingjie, Jiang Congwei
Abstract	We present a novel architecture, residual attention net (RAN), which merges a sequence architecture, universal transformer, and a computer vision architecture, residual net, with a high-way architecture for cross-domain sequence modeling. The architecture aims at addressing the long dependency issue often faced by recurrent-neural-net-based structures. This paper serves as a proof-of-concept for a new architecture, with RAN aiming at providing the model a higher level understanding of sequence patterns. To our best knowledge, we are the first to propose such an architecture. Out of the standard 85 UCR data sets, we have achieved 35 state-of-the-art results with 10 results matching current state-of-the-art results without further model fine-tuning. The results indicate that such architecture is promising in complex, long-sequence modeling and may have vast, cross-domain applications.
Tasks
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04077v1
PDF	https://arxiv.org/pdf/2001.04077v1.pdf
PWC	https://paperswithcode.com/paper/residual-attention-net-for-superior-cross
Repo
Framework