Paper Group ANR 85
A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging. Wavelet Domain Style Transfer for an Effective Perception-distortion Tradeoff in Single Image Super-Resolution. Two-Pass End-to-End Speech Recognition. Overlearning Reveals Sensitive Attributes. Cross-Attention End-to-End ASR for Two-Par …
A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging
Title | A deep learning framework for morphologic detail beyond the diffraction limit in infrared spectroscopic imaging |
Authors | Kianoush Falahkheirkhah, Kevin Yeh, Shachi Mittal, Luke Pfister, Rohit Bhargava |
Abstract | Infrared (IR) microscopes measure spectral information that quantifies molecular content to assign the identity of biomedical cells but lack the spatial quality of optical microscopy to appreciate morphologic features. Here, we propose a method to utilize the semantic information of cellular identity from IR imaging with the morphologic detail of pathology images in a deep learning-based approach to image super-resolution. Using Generative Adversarial Networks (GANs), we enhance the spatial detail in IR imaging beyond the diffraction limit while retaining their spectral contrast. This technique can be rapidly integrated with modern IR microscopes to provide a framework useful for routine pathology. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.04410v2 |
https://arxiv.org/pdf/1911.04410v2.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-framework-for-morphologic |
Repo | |
Framework | |
Wavelet Domain Style Transfer for an Effective Perception-distortion Tradeoff in Single Image Super-Resolution
Title | Wavelet Domain Style Transfer for an Effective Perception-distortion Tradeoff in Single Image Super-Resolution |
Authors | Xin Deng, Ren Yang, Mai Xu, Pier Luigi Dragotti |
Abstract | In single image super-resolution (SISR), given a low-resolution (LR) image, one wishes to find a high-resolution (HR) version of it which is both accurate and photo-realistic. Recently, it has been shown that there exists a fundamental tradeoff between low distortion and high perceptual quality, and the generative adversarial network (GAN) is demonstrated to approach the perception-distortion (PD) bound effectively. In this paper, we propose a novel method based on wavelet domain style transfer (WDST), which achieves a better PD tradeoff than the GAN based methods. Specifically, we propose to use 2D stationary wavelet transform (SWT) to decompose one image into low-frequency and high-frequency sub-bands. For the low-frequency sub-band, we improve its objective quality through an enhancement network. For the high-frequency sub-band, we propose to use WDST to effectively improve its perceptual quality. By feat of the perfect reconstruction property of wavelets, these sub-bands can be re-combined to obtain an image which has simultaneously high objective and perceptual quality. The numerical results on various datasets show that our method achieves the best trade-off between the distortion and perceptual quality among the existing state-of-the-art SISR methods. |
Tasks | Image Super-Resolution, Style Transfer, Super-Resolution |
Published | 2019-10-09 |
URL | https://arxiv.org/abs/1910.04074v1 |
https://arxiv.org/pdf/1910.04074v1.pdf | |
PWC | https://paperswithcode.com/paper/wavelet-domain-style-transfer-for-an |
Repo | |
Framework | |
Two-Pass End-to-End Speech Recognition
Title | Two-Pass End-to-End Speech Recognition |
Authors | Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu |
Abstract | The requirements for many applications of state-of-the-art speech recognition systems include not only low word error rate (WER) but also low latency. Specifically, for many use-cases, the system must be able to decode utterances in a streaming fashion and faster than real-time. Recently, a streaming recurrent neural network transducer (RNN-T) end-to-end (E2E) model has shown to be a good candidate for on-device speech recognition, with improved WER and latency metrics compared to conventional on-device models [1]. However, this model still lags behind a large state-of-the-art conventional model in quality [2]. On the other hand, a non-streaming E2E Listen, Attend and Spell (LAS) model has shown comparable quality to large conventional models [3]. This work aims to bring the quality of an E2E streaming model closer to that of a conventional system by incorporating a LAS network as a second-pass component, while still abiding by latency constraints. Our proposed two-pass model achieves a 17%-22% relative reduction in WER compared to RNN-T alone and increases latency by a small fraction over RNN-T. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2019-08-29 |
URL | https://arxiv.org/abs/1908.10992v1 |
https://arxiv.org/pdf/1908.10992v1.pdf | |
PWC | https://paperswithcode.com/paper/two-pass-end-to-end-speech-recognition |
Repo | |
Framework | |
Overlearning Reveals Sensitive Attributes
Title | Overlearning Reveals Sensitive Attributes |
Authors | Congzheng Song, Vitaly Shmatikov |
Abstract | “Overlearning” means that a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts that are (1) not part of the learning objective, and (2) sensitive from a privacy or bias perspective. For example, a binary gender classifier of facial images also learns to recognize races\textemdash even races that are not represented in the training data\textemdash and identities. We demonstrate overlearning in several vision and NLP models and analyze its harmful consequences. First, inference-time representations of an overlearned model reveal sensitive attributes of the input, breaking privacy protections such as model partitioning. Second, an overlearned model can be “re-purposed” for a different, privacy-violating task even in the absence of the original training data. We show that overlearning is intrinsic for some tasks and cannot be prevented by censoring unwanted attributes. Finally, we investigate where, when, and why overlearning happens during model training. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11742v3 |
https://arxiv.org/pdf/1905.11742v3.pdf | |
PWC | https://paperswithcode.com/paper/overlearning-reveals-sensitive-attributes |
Repo | |
Framework | |
Cross-Attention End-to-End ASR for Two-Party Conversations
Title | Cross-Attention End-to-End ASR for Two-Party Conversations |
Authors | Suyoun Kim, Siddharth Dalmia, Florian Metze |
Abstract | We present an end-to-end speech recognition model that learns interaction between two speakers based on the turn-changing information. Unlike conventional speech recognition models, our model exploits two speakers’ history of conversational-context information that spans across multiple turns within an end-to-end framework. Specifically, we propose a speaker-specific cross-attention mechanism that can look at the output of the other speaker side as well as the one of the current speaker for better at recognizing long conversations. We evaluated the models on the Switchboard conversational speech corpus and show that our model outperforms standard end-to-end speech recognition models. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10726v1 |
https://arxiv.org/pdf/1907.10726v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-attention-end-to-end-asr-for-two-party |
Repo | |
Framework | |
Thirty Years of Machine Learning:The Road to Pareto-Optimal Next-Generation Wireless Networks
Title | Thirty Years of Machine Learning:The Road to Pareto-Optimal Next-Generation Wireless Networks |
Authors | Jingjing Wang amd Chunxiao Jiang, Haijun Zhang, Yong Ren, Kwang-Cheng Chen, Lajos Hanzo |
Abstract | Next-generation wireless networks (NGWN) have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of machine learning by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning, respectively. Furthermore, we investigate their employment in the compelling applications of NGWNs, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various machine learning algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks. |
Tasks | Decision Making |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1902.01946v1 |
http://arxiv.org/pdf/1902.01946v1.pdf | |
PWC | https://paperswithcode.com/paper/thirty-years-of-machine-learningthe-road-to |
Repo | |
Framework | |
Learned Belief-Propagation Decoding with Simple Scaling and SNR Adaptation
Title | Learned Belief-Propagation Decoding with Simple Scaling and SNR Adaptation |
Authors | Mengke Lian, Fabrizio Carpi, Christian Häger, Henry D. Pfister |
Abstract | We consider the weighted belief-propagation (WBP) decoder recently proposed by Nachmani et al. where different weights are introduced for each Tanner graph edge and optimized using machine learning techniques. Our focus is on simple-scaling models that use the same weights across certain edges to reduce the storage and computational burden. The main contribution is to show that simple scaling with few parameters often achieves the same gain as the full parameterization. Moreover, several training improvements for WBP are proposed. For example, it is shown that minimizing average binary cross-entropy is suboptimal in general in terms of bit error rate (BER) and a new “soft-BER” loss is proposed which can lead to better performance. We also investigate parameter adapter networks (PANs) that learn the relation between the signal-to-noise ratio and the WBP parameters. As an example, for the (32,16) Reed-Muller code with a highly redundant parity-check matrix, training a PAN with soft-BER loss gives near-maximum-likelihood performance assuming simple scaling with only three parameters. |
Tasks | |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08621v1 |
http://arxiv.org/pdf/1901.08621v1.pdf | |
PWC | https://paperswithcode.com/paper/learned-belief-propagation-decoding-with |
Repo | |
Framework | |
Machine Learning and Visualization in Clinical Decision Support: Current State and Future Directions
Title | Machine Learning and Visualization in Clinical Decision Support: Current State and Future Directions |
Authors | Gal Levy-Fix, Gilad J. Kuperman, Noémie Elhadad |
Abstract | Deep learning, an area of machine learning, is set to revolutionize patient care. But it is not yet part of standard of care, especially when it comes to individual patient care. In fact, it is unclear to what extent data-driven techniques are being used to support clinical decision making (CDS). Heretofore, there has not been a review of ways in which research in machine learning and other types of data-driven techniques can contribute effectively to clinical care and the types of support they can bring to clinicians. In this paper, we consider ways in which two data driven domains - machine learning and data visualizations - can contribute to the next generation of clinical decision support systems. We review the literature regarding the ways heuristic knowledge, machine learning, and visualization are - and can be - applied to three types of CDS. There has been substantial research into the use of predictive modeling for alerts, however current CDS systems are not utilizing these methods. Approaches that leverage interactive visualizations and machine-learning inferences to organize and review patient data are gaining popularity but are still at the prototype stage and are not yet in use. CDS systems that could benefit from prescriptive machine learning (e.g., treatment recommendations for specific patients) have not yet been developed. We discuss potential reasons for the lack of deployment of data-driven methods in CDS and directions for future research. |
Tasks | Decision Making |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02664v1 |
https://arxiv.org/pdf/1906.02664v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-and-visualization-in |
Repo | |
Framework | |
PreCall: A Visual Interface for Threshold Optimization in ML Model Selection
Title | PreCall: A Visual Interface for Threshold Optimization in ML Model Selection |
Authors | Christoph Kinkeldey, Claudia Müller-Birn, Tom Gülenman, Jesse Josua Benjamin, Aaron Halfaker |
Abstract | Machine learning systems are ubiquitous in various kinds of digital applications and have a huge impact on our everyday life. But a lack of explainability and interpretability of such systems hinders meaningful participation by people, especially by those without a technical background. Interactive visual interfaces (e.g., providing means for manipulating parameters in the user interface) can help tackle this challenge. In this paper we present PreCall, an interactive visual interface for ORES, a machine learning-based web service for Wikimedia projects such as Wikipedia. While ORES can be used for a number of settings, it can be challenging to translate requirements from the application domain into formal parameter sets needed to configure the ORES models. Assisting Wikipedia editors in finding damaging edits, for example, can be realized at various stages of automatization, which might impact the precision of the applied model. Our prototype PreCall attempts to close this translation gap by interactively visualizing the relationship between major model metrics (recall, precision, false positive rate) and a parameter (the threshold between valuable and damaging edits). Furthermore, PreCall visualizes the probable results for the current model configuration to improve the human’s understanding of the relationship between metrics and outcome when using ORES. We describe PreCall’s components and present a use case that highlights the benefits of our approach. Finally, we pose further research questions we would like to discuss during the workshop. |
Tasks | Model Selection |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05131v1 |
https://arxiv.org/pdf/1907.05131v1.pdf | |
PWC | https://paperswithcode.com/paper/precall-a-visual-interface-for-threshold |
Repo | |
Framework | |
Learning the Sampling Pattern for MRI
Title | Learning the Sampling Pattern for MRI |
Authors | Ferdia Sherry, Martin Benning, Juan Carlos De los Reyes, Martin J. Graves, Georg Maierhofer, Guy Williams, Carola-Bibiane Schönlieb, Matthias J. Ehrhardt |
Abstract | The discovery of the theory of compressed sensing brought the realisation that many inverse problems can be solved even when measurements are “incomplete”. This is particularly interesting in magnetic resonance imaging (MRI), where long acquisition times can limit its use. In this work, we consider the problem of learning a sparse sampling pattern that can be used to optimally balance acquisition time versus quality of the reconstructed image. We use a supervised learning approach, making the assumption that our training data is representative enough of new data acquisitions. We demonstrate that this is indeed the case, even if the training data consists of just 5 training pairs of measurements and ground-truth images; with a training set of brain images of size 192 by 192, for instance, one of the learned patterns samples only 32% of k-space, however results in reconstructions with mean SSIM 0.956 on a test set of similar images. The proposed framework is general enough to learn arbitrary sampling patterns, including common patterns such as Cartesian, spiral and radial sampling. |
Tasks | |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08754v1 |
https://arxiv.org/pdf/1906.08754v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-sampling-pattern-for-mri |
Repo | |
Framework | |
Predicting Landscapes from Environmental Conditions Using Generative Networks
Title | Predicting Landscapes from Environmental Conditions Using Generative Networks |
Authors | Christian Requena-Mesa, Markus Reichstein, Miguel Mahecha, Basil Kraft, Joachim Denzler |
Abstract | Landscapes are meaningful ecological units that strongly depend on the environmental conditions. Such dependencies between landscapes and the environment have been noted since the beginning of Earth sciences and cast into conceptual models describing the interdependencies of climate, geology, vegetation and geomorphology. Here, we ask whether landscapes, as seen from space, can be statistically predicted from pertinent environmental conditions. To this end we adapted a deep learning generative model in order to establish the relationship between the environmental conditions and the view of landscapes from the Sentinel-2 satellite. We trained a conditional generative adversarial network to generate multispectral imagery given a set of climatic, terrain and anthropogenic predictors. The generated imagery of the landscapes share many characteristics with the real one. Results based on landscape patch metrics, indicative of landscape composition and structure, show that the proposed generative model creates landscapes that are more similar to the targets than the baseline models while overall reflectance and vegetation cover are predicted better. We demonstrate that for many purposes the generated landscapes behave as real with immediate application for global change studies. We envision the application of machine learning as a tool to forecast the effects of climate change on the spatial features of landscapes, while we assess its limitations and breaking points. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10296v1 |
https://arxiv.org/pdf/1909.10296v1.pdf | |
PWC | https://paperswithcode.com/paper/190910296 |
Repo | |
Framework | |
Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Title | Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space |
Authors | Zhou Fan, Rui Su, Weinan Zhang, Yong Yu |
Abstract | In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning. |
Tasks | |
Published | 2019-03-04 |
URL | https://arxiv.org/abs/1903.01344v3 |
https://arxiv.org/pdf/1903.01344v3.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-actor-critic-reinforcement-learning-in |
Repo | |
Framework | |
RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?
Title | RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients? |
Authors | Anil Kag, Ziming Zhang, Venkatesh Saligrama |
Abstract | Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate. While a number of works attempt to mitigate this effect through gated recurrent units, well-chosen parametric constraints, and skip-connections, we develop a novel perspective that seeks to evolve the hidden state on the equilibrium manifold of an ordinary differential equation (ODE). We propose a family of novel RNNs, namely {\em Equilibriated Recurrent Neural Networks} (ERNNs) that overcome the gradient decay or explosion effect and lead to recurrent models that evolve on the equilibrium manifold. We show that equilibrium points are stable, leading to fast convergence of the discretized ODE to fixed points. Furthermore, ERNNs account for long-term dependencies, and can efficiently recall informative aspects of data from the distant past. We show that ERNNs achieve state-of-the-art accuracy on many challenging data sets with 3-10x speedups, 1.5-3x model size reduction, and with similar prediction cost relative to vanilla RNNs. |
Tasks | |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08574v2 |
https://arxiv.org/pdf/1908.08574v2.pdf | |
PWC | https://paperswithcode.com/paper/rnns-evolving-in-equilibrium-a-solution-to |
Repo | |
Framework | |
Analysis of effectiveness of thresholding in perfusion ROI detection on T2-weighted MR images with abnormal brain anatomy
Title | Analysis of effectiveness of thresholding in perfusion ROI detection on T2-weighted MR images with abnormal brain anatomy |
Authors | Svitlana Alkhimova, Svitlana Sliusar |
Abstract | The brain perfusion ROI detection being a preliminary step, designed to exclude non-brain tissues from analyzed DSC perfusion MR images. Its accuracy is considered as the key factor for delivering correct results of perfusion data analysis. Despite the large variety of algorithms developed on brain tissues segmentation, there is no one that works reliably and robustly on 2T-waited MR images of a human head with abnormal brain anatomy. Therefore, thresholding method is still the state-of-the-art technique that is widely used as a way of managing pixels involved in brain perfusion ROI. This paper presents the analysis of effectiveness of thresholding techniques in brain perfusion ROI detection on 2T-waited MR images of a human head with abnormal brain anatomy. Four threshold-based algorithms implementation are considered: according to Otsu method as global thresholding, according to Niblack method as local thresholding, thresholding in approximate anatomical brain location, and brute force thresholding. The analysis is done using comparison of qualitative maps produced from thresholded images and from the reference ones. Pearson correlation analysis showed strong positive (r was ranged from 0.7123 to 0.8518, p<0.01) and weak positive (r<0.35, p<0.01) relationship in case of conducted experiments with CBF, CBV, MTT and Tmax maps, respectively. Linear regression analysis showed at level of 95% confidence interval that maps produced from thresholded images were subject to scale and offset errors in all conducted experiments. The experimental results showed that widely used thresholding methods are an ineffective way of managing pixels involved in brain perfusion ROI. Thresholding as brain segmentation tool can lead to poor placement of perfusion ROI and, as a result, produced maps will be subject to artifacts and can cause falsely high or falsely low perfusion parameters assessment. |
Tasks | Brain Segmentation |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.05469v1 |
https://arxiv.org/pdf/1912.05469v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-effectiveness-of-thresholding-in |
Repo | |
Framework | |
Datasets for Face and Object Detection in Fisheye Images
Title | Datasets for Face and Object Detection in Fisheye Images |
Authors | Jianglin Fu, Ivan V. Bajic, Rodney G. Vaughan |
Abstract | We present two new fisheye image datasets for training face and object detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway. |
Tasks | Face Detection, Object Detection |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11942v1 |
https://arxiv.org/pdf/1906.11942v1.pdf | |
PWC | https://paperswithcode.com/paper/datasets-for-face-and-object-detection-in |
Repo | |
Framework | |