January 31, 2020

3287 words 16 mins read

Paper Group ANR 85

A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging. Wavelet Domain Style Transfer for an Effective Perception-distortion Tradeoff in Single Image Super-Resolution. Two-Pass End-to-End Speech Recognition. Overlearning Reveals Sensitive Attributes. Cross-Attention End-to-End ASR for Two-Par …

A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging


Title	A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging
Authors	Kianoush Falahkheirkhah, Kevin Yeh, Shachi Mittal, Luke Pfister, Rohit Bhargava
Abstract	Infrared (IR) microscopes measure spectral information that quantifies molecular content to assign the identity of biomedical cells but lack the spatial quality of optical microscopy to appreciate morphologic features. Here, we propose a method to utilize the semantic information of cellular identity from IR imaging with the morphologic detail of pathology images in a deep learning-based approach to image super-resolution. Using Generative Adversarial Networks (GANs), we enhance the spatial detail in IR imaging beyond the diffraction limit while retaining their spectral contrast. This technique can be rapidly integrated with modern IR microscopes to provide a framework useful for routine pathology.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-11-06
URL	https://arxiv.org/abs/1911.04410v2
PDF	https://arxiv.org/pdf/1911.04410v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-framework-for-morphologic
Repo
Framework

Wavelet Domain Style Transfer for an Effective Perception-distortion Tradeoff in Single Image Super-Resolution


Title	Wavelet Domain Style Transfer for an Effective Perception-distortion Tradeoff in Single Image Super-Resolution
Authors	Xin Deng, Ren Yang, Mai Xu, Pier Luigi Dragotti
Abstract	In single image super-resolution (SISR), given a low-resolution (LR) image, one wishes to find a high-resolution (HR) version of it which is both accurate and photo-realistic. Recently, it has been shown that there exists a fundamental tradeoff between low distortion and high perceptual quality, and the generative adversarial network (GAN) is demonstrated to approach the perception-distortion (PD) bound effectively. In this paper, we propose a novel method based on wavelet domain style transfer (WDST), which achieves a better PD tradeoff than the GAN based methods. Specifically, we propose to use 2D stationary wavelet transform (SWT) to decompose one image into low-frequency and high-frequency sub-bands. For the low-frequency sub-band, we improve its objective quality through an enhancement network. For the high-frequency sub-band, we propose to use WDST to effectively improve its perceptual quality. By feat of the perfect reconstruction property of wavelets, these sub-bands can be re-combined to obtain an image which has simultaneously high objective and perceptual quality. The numerical results on various datasets show that our method achieves the best trade-off between the distortion and perceptual quality among the existing state-of-the-art SISR methods.
Tasks	Image Super-Resolution, Style Transfer, Super-Resolution
Published	2019-10-09
URL	https://arxiv.org/abs/1910.04074v1
PDF	https://arxiv.org/pdf/1910.04074v1.pdf
PWC	https://paperswithcode.com/paper/wavelet-domain-style-transfer-for-an
Repo
Framework

Two-Pass End-to-End Speech Recognition


Title	Two-Pass End-to-End Speech Recognition
Authors	Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu
Abstract	The requirements for many applications of state-of-the-art speech recognition systems include not only low word error rate (WER) but also low latency. Specifically, for many use-cases, the system must be able to decode utterances in a streaming fashion and faster than real-time. Recently, a streaming recurrent neural network transducer (RNN-T) end-to-end (E2E) model has shown to be a good candidate for on-device speech recognition, with improved WER and latency metrics compared to conventional on-device models [1]. However, this model still lags behind a large state-of-the-art conventional model in quality [2]. On the other hand, a non-streaming E2E Listen, Attend and Spell (LAS) model has shown comparable quality to large conventional models [3]. This work aims to bring the quality of an E2E streaming model closer to that of a conventional system by incorporating a LAS network as a second-pass component, while still abiding by latency constraints. Our proposed two-pass model achieves a 17%-22% relative reduction in WER compared to RNN-T alone and increases latency by a small fraction over RNN-T.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-08-29
URL	https://arxiv.org/abs/1908.10992v1
PDF	https://arxiv.org/pdf/1908.10992v1.pdf
PWC	https://paperswithcode.com/paper/two-pass-end-to-end-speech-recognition
Repo
Framework

Overlearning Reveals Sensitive Attributes


Title	Overlearning Reveals Sensitive Attributes
Authors	Congzheng Song, Vitaly Shmatikov
Abstract	“Overlearning” means that a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts that are (1) not part of the learning objective, and (2) sensitive from a privacy or bias perspective. For example, a binary gender classifier of facial images also learns to recognize races\textemdash even races that are not represented in the training data\textemdash and identities. We demonstrate overlearning in several vision and NLP models and analyze its harmful consequences. First, inference-time representations of an overlearned model reveal sensitive attributes of the input, breaking privacy protections such as model partitioning. Second, an overlearned model can be “re-purposed” for a different, privacy-violating task even in the absence of the original training data. We show that overlearning is intrinsic for some tasks and cannot be prevented by censoring unwanted attributes. Finally, we investigate where, when, and why overlearning happens during model training.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11742v3
PDF	https://arxiv.org/pdf/1905.11742v3.pdf
PWC	https://paperswithcode.com/paper/overlearning-reveals-sensitive-attributes
Repo
Framework

Cross-Attention End-to-End ASR for Two-Party Conversations


Title	Cross-Attention End-to-End ASR for Two-Party Conversations
Authors	Suyoun Kim, Siddharth Dalmia, Florian Metze
Abstract	We present an end-to-end speech recognition model that learns interaction between two speakers based on the turn-changing information. Unlike conventional speech recognition models, our model exploits two speakers’ history of conversational-context information that spans across multiple turns within an end-to-end framework. Specifically, we propose a speaker-specific cross-attention mechanism that can look at the output of the other speaker side as well as the one of the current speaker for better at recognizing long conversations. We evaluated the models on the Switchboard conversational speech corpus and show that our model outperforms standard end-to-end speech recognition models.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10726v1
PDF	https://arxiv.org/pdf/1907.10726v1.pdf
PWC	https://paperswithcode.com/paper/cross-attention-end-to-end-asr-for-two-party
Repo
Framework

Thirty Years of Machine Learning:The Road to Pareto-Optimal Next-Generation Wireless Networks


Title	Thirty Years of Machine Learning:The Road to Pareto-Optimal Next-Generation Wireless Networks
Authors	Jingjing Wang amd Chunxiao Jiang, Haijun Zhang, Yong Ren, Kwang-Cheng Chen, Lajos Hanzo
Abstract	Next-generation wireless networks (NGWN) have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of machine learning by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning, respectively. Furthermore, we investigate their employment in the compelling applications of NGWNs, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various machine learning algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.
Tasks	Decision Making
Published	2019-01-24
URL	http://arxiv.org/abs/1902.01946v1
PDF	http://arxiv.org/pdf/1902.01946v1.pdf
PWC	https://paperswithcode.com/paper/thirty-years-of-machine-learningthe-road-to
Repo
Framework

Learned Belief-Propagation Decoding with Simple Scaling and SNR Adaptation


Title	Learned Belief-Propagation Decoding with Simple Scaling and SNR Adaptation
Authors	Mengke Lian, Fabrizio Carpi, Christian Häger, Henry D. Pfister
Abstract	We consider the weighted belief-propagation (WBP) decoder recently proposed by Nachmani et al. where different weights are introduced for each Tanner graph edge and optimized using machine learning techniques. Our focus is on simple-scaling models that use the same weights across certain edges to reduce the storage and computational burden. The main contribution is to show that simple scaling with few parameters often achieves the same gain as the full parameterization. Moreover, several training improvements for WBP are proposed. For example, it is shown that minimizing average binary cross-entropy is suboptimal in general in terms of bit error rate (BER) and a new “soft-BER” loss is proposed which can lead to better performance. We also investigate parameter adapter networks (PANs) that learn the relation between the signal-to-noise ratio and the WBP parameters. As an example, for the (32,16) Reed-Muller code with a highly redundant parity-check matrix, training a PAN with soft-BER loss gives near-maximum-likelihood performance assuming simple scaling with only three parameters.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08621v1
PDF	http://arxiv.org/pdf/1901.08621v1.pdf
PWC	https://paperswithcode.com/paper/learned-belief-propagation-decoding-with
Repo
Framework

Machine Learning and Visualization in Clinical Decision Support: Current State and Future Directions


Title	Machine Learning and Visualization in Clinical Decision Support: Current State and Future Directions
Authors	Gal Levy-Fix, Gilad J. Kuperman, Noémie Elhadad
Abstract	Deep learning, an area of machine learning, is set to revolutionize patient care. But it is not yet part of standard of care, especially when it comes to individual patient care. In fact, it is unclear to what extent data-driven techniques are being used to support clinical decision making (CDS). Heretofore, there has not been a review of ways in which research in machine learning and other types of data-driven techniques can contribute effectively to clinical care and the types of support they can bring to clinicians. In this paper, we consider ways in which two data driven domains - machine learning and data visualizations - can contribute to the next generation of clinical decision support systems. We review the literature regarding the ways heuristic knowledge, machine learning, and visualization are - and can be - applied to three types of CDS. There has been substantial research into the use of predictive modeling for alerts, however current CDS systems are not utilizing these methods. Approaches that leverage interactive visualizations and machine-learning inferences to organize and review patient data are gaining popularity but are still at the prototype stage and are not yet in use. CDS systems that could benefit from prescriptive machine learning (e.g., treatment recommendations for specific patients) have not yet been developed. We discuss potential reasons for the lack of deployment of data-driven methods in CDS and directions for future research.
Tasks	Decision Making
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02664v1
PDF	https://arxiv.org/pdf/1906.02664v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-and-visualization-in
Repo
Framework

PreCall: A Visual Interface for Threshold Optimization in ML Model Selection


Title	PreCall: A Visual Interface for Threshold Optimization in ML Model Selection
Authors	Christoph Kinkeldey, Claudia Müller-Birn, Tom Gülenman, Jesse Josua Benjamin, Aaron Halfaker
Abstract	Machine learning systems are ubiquitous in various kinds of digital applications and have a huge impact on our everyday life. But a lack of explainability and interpretability of such systems hinders meaningful participation by people, especially by those without a technical background. Interactive visual interfaces (e.g., providing means for manipulating parameters in the user interface) can help tackle this challenge. In this paper we present PreCall, an interactive visual interface for ORES, a machine learning-based web service for Wikimedia projects such as Wikipedia. While ORES can be used for a number of settings, it can be challenging to translate requirements from the application domain into formal parameter sets needed to configure the ORES models. Assisting Wikipedia editors in finding damaging edits, for example, can be realized at various stages of automatization, which might impact the precision of the applied model. Our prototype PreCall attempts to close this translation gap by interactively visualizing the relationship between major model metrics (recall, precision, false positive rate) and a parameter (the threshold between valuable and damaging edits). Furthermore, PreCall visualizes the probable results for the current model configuration to improve the human’s understanding of the relationship between metrics and outcome when using ORES. We describe PreCall’s components and present a use case that highlights the benefits of our approach. Finally, we pose further research questions we would like to discuss during the workshop.
Tasks	Model Selection
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05131v1
PDF	https://arxiv.org/pdf/1907.05131v1.pdf
PWC	https://paperswithcode.com/paper/precall-a-visual-interface-for-threshold
Repo
Framework

Learning the Sampling Pattern for MRI


Title	Learning the Sampling Pattern for MRI
Authors	Ferdia Sherry, Martin Benning, Juan Carlos De los Reyes, Martin J. Graves, Georg Maierhofer, Guy Williams, Carola-Bibiane Schönlieb, Matthias J. Ehrhardt
Abstract	The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are “incomplete”. This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition time versus quality of the reconstructed image. We use a supervised learning approach, making the assumption that our training data is representative enough of new data acquisitions. We demonstrate that this is indeed the case, even if the training data consists of just 5 training pairs of measurements and ground-truth images; with a training set of brain images of size 192 by 192, for instance, one of the learned patterns samples only 32% of k-space, however results in reconstructions with mean SSIM 0.956 on a test set of similar images. The proposed framework is general enough to learn arbitrary sampling patterns, including common patterns such as Cartesian, spiral and radial sampling.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08754v1
PDF	https://arxiv.org/pdf/1906.08754v1.pdf
PWC	https://paperswithcode.com/paper/learning-the-sampling-pattern-for-mri
Repo
Framework

Predicting Landscapes from Environmental Conditions Using Generative Networks


Title	Predicting Landscapes from Environmental Conditions Using Generative Networks
Authors	Christian Requena-Mesa, Markus Reichstein, Miguel Mahecha, Basil Kraft, Joachim Denzler
Abstract	Landscapes are meaningful ecological units that strongly depend on the environmental conditions. Such dependencies between landscapes and the environment have been noted since the beginning of Earth sciences and cast into conceptual models describing the interdependencies of climate, geology, vegetation and geomorphology. Here, we ask whether landscapes, as seen from space, can be statistically predicted from pertinent environmental conditions. To this end we adapted a deep learning generative model in order to establish the relationship between the environmental conditions and the view of landscapes from the Sentinel-2 satellite. We trained a conditional generative adversarial network to generate multispectral imagery given a set of climatic, terrain and anthropogenic predictors. The generated imagery of the landscapes share many characteristics with the real one. Results based on landscape patch metrics, indicative of landscape composition and structure, show that the proposed generative model creates landscapes that are more similar to the targets than the baseline models while overall reflectance and vegetation cover are predicted better. We demonstrate that for many purposes the generated landscapes behave as real with immediate application for global change studies. We envision the application of machine learning as a tool to forecast the effects of climate change on the spatial features of landscapes, while we assess its limitations and breaking points.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10296v1
PDF	https://arxiv.org/pdf/1909.10296v1.pdf
PWC	https://paperswithcode.com/paper/190910296
Repo
Framework

Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space


Title	Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Authors	Zhou Fan, Rui Su, Weinan Zhang, Yong Yu
Abstract	In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning.
Tasks
Published	2019-03-04
URL	https://arxiv.org/abs/1903.01344v3
PDF	https://arxiv.org/pdf/1903.01344v3.pdf
PWC	https://paperswithcode.com/paper/hybrid-actor-critic-reinforcement-learning-in
Repo
Framework

RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?


Title	RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?
Authors	Anil Kag, Ziming Zhang, Venkatesh Saligrama
Abstract	Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate. While a number of works attempt to mitigate this effect through gated recurrent units, well-chosen parametric constraints, and skip-connections, we develop a novel perspective that seeks to evolve the hidden state on the equilibrium manifold of an ordinary differential equation (ODE). We propose a family of novel RNNs, namely {\em Equilibriated Recurrent Neural Networks} (ERNNs) that overcome the gradient decay or explosion effect and lead to recurrent models that evolve on the equilibrium manifold. We show that equilibrium points are stable, leading to fast convergence of the discretized ODE to fixed points. Furthermore, ERNNs account for long-term dependencies, and can efficiently recall informative aspects of data from the distant past. We show that ERNNs achieve state-of-the-art accuracy on many challenging data sets with 3-10x speedups, 1.5-3x model size reduction, and with similar prediction cost relative to vanilla RNNs.
Tasks
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08574v2
PDF	https://arxiv.org/pdf/1908.08574v2.pdf
PWC	https://paperswithcode.com/paper/rnns-evolving-in-equilibrium-a-solution-to
Repo
Framework

Analysis of effectiveness of thresholding in perfusion ROI detection on T2-weighted MR images with abnormal brain anatomy


Title	Analysis of effectiveness of thresholding in perfusion ROI detection on T2-weighted MR images with abnormal brain anatomy
Authors	Svitlana Alkhimova, Svitlana Sliusar
Abstract	The brain perfusion ROI detection being a preliminary step, designed to exclude non-brain tissues from analyzed DSC perfusion MR images. Its accuracy is considered as the key factor for delivering correct results of perfusion data analysis. Despite the large variety of algorithms developed on brain tissues segmentation, there is no one that works reliably and robustly on 2T-waited MR images of a human head with abnormal brain anatomy. Therefore, thresholding method is still the state-of-the-art technique that is widely used as a way of managing pixels involved in brain perfusion ROI. This paper presents the analysis of effectiveness of thresholding techniques in brain perfusion ROI detection on 2T-waited MR images of a human head with abnormal brain anatomy. Four threshold-based algorithms implementation are considered: according to Otsu method as global thresholding, according to Niblack method as local thresholding, thresholding in approximate anatomical brain location, and brute force thresholding. The analysis is done using comparison of qualitative maps produced from thresholded images and from the reference ones. Pearson correlation analysis showed strong positive (r was ranged from 0.7123 to 0.8518, p<0.01) and weak positive (r<0.35, p<0.01) relationship in case of conducted experiments with CBF, CBV, MTT and Tmax maps, respectively. Linear regression analysis showed at level of 95% confidence interval that maps produced from thresholded images were subject to scale and offset errors in all conducted experiments. The experimental results showed that widely used thresholding methods are an ineffective way of managing pixels involved in brain perfusion ROI. Thresholding as brain segmentation tool can lead to poor placement of perfusion ROI and, as a result, produced maps will be subject to artifacts and can cause falsely high or falsely low perfusion parameters assessment.
Tasks	Brain Segmentation
Published	2019-12-05
URL	https://arxiv.org/abs/1912.05469v1
PDF	https://arxiv.org/pdf/1912.05469v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-effectiveness-of-thresholding-in
Repo
Framework

Datasets for Face and Object Detection in Fisheye Images


Title	Datasets for Face and Object Detection in Fisheye Images
Authors	Jianglin Fu, Ivan V. Bajic, Rodney G. Vaughan
Abstract	We present two new fisheye image datasets for training face and object detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway.
Tasks	Face Detection, Object Detection
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11942v1
PDF	https://arxiv.org/pdf/1906.11942v1.pdf
PWC	https://paperswithcode.com/paper/datasets-for-face-and-object-detection-in
Repo
Framework