Paper Group ANR 177
REST: Performance Improvement of a Black Box Model via RL-based Spatial Transformation. Fully-automated Body Composition Analysis in Routine CT Imaging Using 3D Semantic Segmentation Convolutional Neural Networks. Transfer between long-term and short-term memory using Conceptors. Unbalanced GANs: Pre-training the Generator of Generative Adversarial …
REST: Performance Improvement of a Black Box Model via RL-based Spatial Transformation
Title | REST: Performance Improvement of a Black Box Model via RL-based Spatial Transformation |
Authors | Jae Myung Kim, Hyungjin Kim, Chanwoo Park, Jungwoo Lee |
Abstract | In recent years, deep neural networks (DNN) have become a highly active area of research, and shown remarkable achievements on a variety of computer vision tasks. DNNs, however, are known to often make overconfident yet incorrect predictions on out-of-distribution samples, which can be a major obstacle to real-world deployments because the training dataset is always limited compared to diverse real-world samples. Thus, it is fundamental to provide guarantees of robustness to the distribution shift between training and test time when we construct DNN models in practice. Moreover, in many cases, the deep learning models are deployed as black boxes and the performance has been already optimized for a training dataset, thus changing the black box itself can lead to performance degradation. We here study the robustness to the geometric transformations in a specific condition where the black-box image classifier is given. We propose an additional learner, \emph{REinforcement Spatial Transform learner (REST)}, that transforms the warped input data into samples regarded as in-distribution by the black-box models. Our work aims to improve the robustness by adding a REST module in front of any black boxes and training only the REST module without retraining the original black box model in an end-to-end manner, i.e. we try to convert the real-world data into training distribution which the performance of the black-box model is best suited for. We use a confidence score that is obtained from the black-box model to determine whether the transformed input is drawn from in-distribution. We empirically show that our method has an advantage in generalization to geometric transformations and sample efficiency. |
Tasks | |
Published | 2020-02-16 |
URL | https://arxiv.org/abs/2002.06610v1 |
https://arxiv.org/pdf/2002.06610v1.pdf | |
PWC | https://paperswithcode.com/paper/rest-performance-improvement-of-a-black-box |
Repo | |
Framework | |
Fully-automated Body Composition Analysis in Routine CT Imaging Using 3D Semantic Segmentation Convolutional Neural Networks
Title | Fully-automated Body Composition Analysis in Routine CT Imaging Using 3D Semantic Segmentation Convolutional Neural Networks |
Authors | Sven Koitka, Lennard Kroll, Eugen Malamutmann, Arzu Oezcelik, Felix Nensa |
Abstract | Body tissue composition is a long-known biomarker with high diagnostic and prognostic value in cardiovascular, oncological and orthopaedic diseases, but also in rehabilitation medicine or drug dosage. In this study, the aim was to develop a fully automated, reproducible and quantitative 3D volumetry of body tissue composition from standard CT examinations of the abdomen in order to be able to offer such valuable biomarkers as part of routine clinical imaging. Therefore an in-house dataset of 40 CTs for training and 10 CTs for testing were fully annotated on every fifth axial slice with five different semantic body regions: abdominal cavity, bones, muscle, subcutaneous tissue, and thoracic cavity. Multi-resolution U-Net 3D neural networks were employed for segmenting these body regions, followed by subclassifying adipose tissue and muscle using known hounsfield unit limits. The S{\o}rensen Dice scores averaged over all semantic regions was 0.9553 and the intra-class correlation coefficients for subclassified tissues were above 0.99. Our results show that fully-automated body composition analysis on routine CT imaging can provide stable biomarkers across the whole abdomen and not just on L3 slices, which is historically the reference location for analysing body composition in the clinical routine. |
Tasks | 3D Semantic Segmentation, Semantic Segmentation |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10776v1 |
https://arxiv.org/pdf/2002.10776v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-automated-body-composition-analysis-in |
Repo | |
Framework | |
Transfer between long-term and short-term memory using Conceptors
Title | Transfer between long-term and short-term memory using Conceptors |
Authors | Anthony Strock, Nicolas Rougier, Xavier Hinaut |
Abstract | We introduce a recurrent neural network model of working memory combining short-term and long-term components. e short-term component is modelled using a gated reservoir model that is trained to hold a value from an input stream when a gate signal is on. e long-term component is modelled using conceptors in order to store inner temporal patterns (that corresponds to values). We combine these two components to obtain a model where information can go from long-term memory to short-term memory and vice-versa and we show how standard operations on conceptors allow to combine long-term memories and describe their effect on short-term memory. |
Tasks | |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.11640v1 |
https://arxiv.org/pdf/2003.11640v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-between-long-term-and-short-term |
Repo | |
Framework | |
Unbalanced GANs: Pre-training the Generator of Generative Adversarial Network using Variational Autoencoder
Title | Unbalanced GANs: Pre-training the Generator of Generative Adversarial Network using Variational Autoencoder |
Authors | Hyungrok Ham, Tae Joon Jun, Daeyoung Kim |
Abstract | We propose Unbalanced GANs, which pre-trains the generator of the generative adversarial network (GAN) using variational autoencoder (VAE). We guarantee the stable training of the generator by preventing the faster convergence of the discriminator at early epochs. Furthermore, we balance between the generator and the discriminator at early epochs and thus maintain the stabilized training of GANs. We apply Unbalanced GANs to well known public datasets and find that Unbalanced GANs reduce mode collapses. We also show that Unbalanced GANs outperform ordinary GANs in terms of stabilized learning, faster convergence and better image quality at early epochs. |
Tasks | |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02112v1 |
https://arxiv.org/pdf/2002.02112v1.pdf | |
PWC | https://paperswithcode.com/paper/unbalanced-gans-pre-training-the-generator-of |
Repo | |
Framework | |
An experiment exploring the theoretical and methodological challenges in developing a semi-automated approach to analysis of small-N qualitative data
Title | An experiment exploring the theoretical and methodological challenges in developing a semi-automated approach to analysis of small-N qualitative data |
Authors | Sandro Tsang |
Abstract | This paper experiments with designing a semi-automated qualitative data analysis (QDA) algorithm to analyse 20 transcripts by using freeware. Text-mining (TM) and QDA were guided by frequency and association measures, because these statistics remain robust when the sample size is small. The refined TM algorithm split the text into various sizes based on a manually revised dictionary. This lemmatisation approach may reflect the context of the text better than uniformly tokenising the text into one single size. TM results were used for initial coding. Code repacking was guided by association measures and external data to implement a general inductive QDA approach. The information retrieved by TM and QDA was depicted in subgraphs for comparisons. The analyses were completed in 6-7 days. Both algorithms retrieved contextually consistent and relevant information. However, the QDA algorithm retrieved more specific information than TM alone. The QDA algorithm does not strictly comply with the convention of TM or of QDA, but becomes a more efficient, systematic and transparent text analysis approach than a conventional QDA approach. Scaling up QDA to reliably discover knowledge from text was exactly the research purpose. This paper also sheds light on understanding the relations between information technologies, theory and methodologies. |
Tasks | |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.04513v2 |
https://arxiv.org/pdf/2002.04513v2.pdf | |
PWC | https://paperswithcode.com/paper/an-experiment-exploring-the-theoretical-and |
Repo | |
Framework | |
Serialized Output Training for End-to-End Overlapped Speech Recognition
Title | Serialized Output Training for End-to-End Overlapped Speech Recognition |
Authors | Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka |
Abstract | This paper proposes serialized output training (SOT), a novel framework for multi-speaker overlapped speech recognition based on an attention-based encoder-decoder approach. Instead of having multiple output layers as with the permutation invariant training (PIT), SOT uses a model with only one output layer that generates the transcriptions of multiple speakers one after another. The attention and decoder modules take care of producing multiple transcriptions from overlapped speech. SOT has two advantages over PIT: (1) no limitation in the maximum number of speakers, and (2) an ability to model the dependencies among outputs for different speakers. We also propose a simple trick to reduce the complexity of processing each training sample from $O(S!)$ to $O(1)$, where $S$ is the number of the speakers in the training sample, by using the start times of the constituent source utterances. Experimental results on LibriSpeech corpus show that the SOT models can transcribe overlapped speech with variable numbers of speakers significantly better than PIT-based models. We also show that the SOT models can accurately count the number of speakers in the input audio. |
Tasks | Speech Recognition |
Published | 2020-03-28 |
URL | https://arxiv.org/abs/2003.12687v1 |
https://arxiv.org/pdf/2003.12687v1.pdf | |
PWC | https://paperswithcode.com/paper/serialized-output-training-for-end-to-end |
Repo | |
Framework | |
Can you hear me $\textit{now}$? Sensitive comparisons of human and machine perception
Title | Can you hear me $\textit{now}$? Sensitive comparisons of human and machine perception |
Authors | Michael A Lepori, Chaz Firestone |
Abstract | The rise of sophisticated machine-recognition systems has brought with it a rise in comparisons between human and machine perception. But such comparisons face an asymmetry: Whereas machine perception of some stimulus can often be probed through direct and explicit measures, much of human perceptual knowledge is latent, incomplete, or embedded in unconscious mental processes that may not be available for explicit report. Here, we show how this asymmetry can cause such comparisons to underestimate the overlap in human and machine perception. As a case study, we consider human perception of $\textit{adversarial speech}$ – synthetic audio commands that are recognized as valid messages by automated speech-recognition systems but that human listeners reportedly hear as meaningless noise. In five experiments, we adapt task designs from the human psychophysics literature to show that even when subjects cannot freely transcribe adversarial speech (the previous benchmark for human understanding), they nevertheless $\textit{can}$ discriminate adversarial speech from closely matched non-speech (Experiments 1-2), finish common phrases begun in adversarial speech (Experiments 3-4), and solve simple math problems posed in adversarial speech (Experiment 5) – even for stimuli previously described as “unintelligible to human listeners”. We recommend the adoption of $\textit{sensitive tests}$ of human and machine perception, and discuss the broader consequences of this approach for comparing natural and artificial intelligence. |
Tasks | Speech Recognition |
Published | 2020-03-27 |
URL | https://arxiv.org/abs/2003.12362v1 |
https://arxiv.org/pdf/2003.12362v1.pdf | |
PWC | https://paperswithcode.com/paper/can-you-hear-me-textit-now-sensitive |
Repo | |
Framework | |
Emosaic: Visualizing Affective Content of Text at Varying Granularity
Title | Emosaic: Visualizing Affective Content of Text at Varying Granularity |
Authors | Philipp Geuder, Marie Claire Leidinger, Martin von Lupin, Marian Dörk, Tobias Schröder |
Abstract | This paper presents Emosaic, a tool for visualizing the emotional tone of text documents, considering multiple dimensions of emotion and varying levels of semantic granularity. Emosaic is grounded in psychological research on the relationship between language, affect, and color perception. We capitalize on an established three-dimensional model of human emotion: valence (good, nice vs. bad, awful), arousal (calm, passive vs. exciting, active) and dominance (weak, controlled vs. strong, in control). Previously, multi-dimensional models of emotion have been used rarely in visualizations of textual data, due to the perceptual challenges involved. Furthermore, until recently most text visualizations remained at a high level, precluding closer engagement with the deep semantic content of the text. Informed by empirical studies, we introduce a color mapping that translates any point in three-dimensional affective space into a unique color. Emosaic uses affective dictionaries of words annotated with the three emotional parameters of the valence-arousal-dominance model to extract emotional meanings from texts and then assigns to them corresponding color parameters of the hue-saturation-brightness color space. This approach of mapping emotion to color is aimed at helping readers to more easily grasp the emotional tone of the text. Several features of Emosaic allow readers to interactively explore the affective content of the text in more detail; e.g., in aggregated form as histograms, in sequential form following the order of text, and in detail embedded into the text display itself. Interaction techniques have been included to allow for filtering and navigating of text and visualizations. |
Tasks | |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10096v1 |
https://arxiv.org/pdf/2002.10096v1.pdf | |
PWC | https://paperswithcode.com/paper/emosaic-visualizing-affective-content-of-text |
Repo | |
Framework | |
Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation
Title | Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation |
Authors | Jinghui Qin, Zheng Ye, Jianheng Tang, Xiaodan Liang |
Abstract | Target-guided open-domain conversation aims to proactively and naturally guide a dialogue agent or human to achieve specific goals, topics or keywords during open-ended conversations. Existing methods mainly rely on single-turn datadriven learning and simple target-guided strategy without considering semantic or factual knowledge relations among candidate topics/keywords. This results in poor transition smoothness and low success rate. In this work, we adopt a structured approach that controls the intended content of system responses by introducing coarse-grained keywords, attains smooth conversation transition through turn-level supervised learning and knowledge relations between candidate keywords, and drives an conversation towards an specified target with discourse-level guiding strategy. Specially, we propose a novel dynamic knowledge routing network (DKRN) which considers semantic knowledge relations among candidate keywords for accurate next topic prediction of next discourse. With the help of more accurate keyword prediction, our keyword-augmented response retrieval module can achieve better retrieval performance and more meaningful conversations. Besides, we also propose a novel dual discourse-level target-guided strategy to guide conversations to reach their goals smoothly with higher success rate. Furthermore, to push the research boundary of target-guided open-domain conversation to match real-world scenarios better, we introduce a new large-scale Chinese target-guided open-domain conversation dataset (more than 900K conversations) crawled from Sina Weibo. Quantitative and human evaluations show our method can produce meaningful and effective target-guided conversations, significantly improving over other state-of-the-art methods by more than 20% in success rate and more than 0.6 in average smoothness score. |
Tasks | |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01196v2 |
https://arxiv.org/pdf/2002.01196v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-knowledge-routing-network-for-target |
Repo | |
Framework | |
Metric-Free Individual Fairness in Online Learning
Title | Metric-Free Individual Fairness in Online Learning |
Authors | Yahav Bechavod, Christopher Jung, Zhiwei Steven Wu |
Abstract | We study an online learning problem subject to the constraint of individual fairness, which requires that similar individuals are treated similarly. Unlike prior work on individual fairness, we do not assume the similarity measure among individuals is known, nor do we assume that such measure takes a certain parametric form. Instead, we leverage the existence of an auditor who detects fairness violations without enunciating the quantitative measure. In each round, the auditor examines the learner’s decisions and attempts to identify a pair of individuals that are treated unfairly by the learner. We provide a general reduction framework that reduces online classification in our model to standard online classification, which allows us to leverage existing online learning algorithms to achieve sub-linear regret and number of fairness violations. Surprisingly, in the stochastic setting where the data are drawn independently from a distribution, we are also able to establish PAC-style fairness and accuracy generalization guarantees (Yona and Rothblum [2018]), despite only having access to a very restricted form of fairness feedback. Our fairness generalization bound qualitatively matches the uniform convergence bound of Yona and Rothblum [2018], while also providing a meaningful accuracy generalization guarantee. Our results resolve an open question by Gillen et al. [2018] by showing that online learning under an unknown individual fairness constraint is possible even without assuming a strong parametric form of the underlying similarity measure. |
Tasks | |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05474v2 |
https://arxiv.org/pdf/2002.05474v2.pdf | |
PWC | https://paperswithcode.com/paper/metric-free-individual-fairness-in-online |
Repo | |
Framework | |
Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix
Title | Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix |
Authors | Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Mendez Mendez, Benjamin Elizalde, Philippe Pasquier |
Abstract | Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i.e., multiple sound events occur in one recording). To address this latter problem, we propose different Deep Learning architectures with a Siamese-structure and a Pairwise Presence Matrix. The networks are trained and evaluated using the SONYC-UST dataset containing both single- and multi-label soundscape recordings. The performance results show the effectiveness of our proposed model. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.09026v1 |
https://arxiv.org/pdf/2002.09026v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-label-sound-event-retrieval-using-a |
Repo | |
Framework | |
A Survey of End-to-End Driving: Architectures and Training Methods
Title | A Survey of End-to-End Driving: Architectures and Training Methods |
Authors | Ardi Tampuu, Maksym Semikin, Naveed Muhammad, Dmytro Fishman, Tambet Matiisen |
Abstract | Autonomous driving is of great interest to industry and academia alike. The use of machine learning approaches for autonomous driving has long been studied, but mostly in the context of perception. In this paper we take a deeper look on the so called end-to-end approaches for autonomous driving, where the entire driving pipeline is replaced with a single neural network. We review the learning methods, input and output modalities, network architectures and evaluation schemes in end-to-end driving literature. Interpretability and safety are discussed separately, as they remain challenging for this approach. Beyond providing a comprehensive overview of existing methods, we conclude the review with an architecture that combines the most promising elements of the end-to-end autonomous driving systems. |
Tasks | Autonomous Driving |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06404v1 |
https://arxiv.org/pdf/2003.06404v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-end-to-end-driving-architectures |
Repo | |
Framework | |
Inverted-File k-Means Clustering: Performance Analysis
Title | Inverted-File k-Means Clustering: Performance Analysis |
Authors | Kazuo Aoyama, Kazumi Saito, Tetsuo Ikeda |
Abstract | This paper presents an inverted-file k-means clustering algorithm (IVF) suitable for a large-scale sparse data set with potentially numerous classes. Given such a data set, IVF efficiently works at high-speed and with low memory consumption, which keeps the same solution as a standard Lloyd’s algorithm. The high performance arises from two distinct data representations. One is a sparse expression for both the object and mean feature vectors. The other is an inverted-file data structure for a set of the mean feature vectors. To confirm the effect of these representations, we design three algorithms using distinct data structures and expressions for comparison. We experimentally demonstrate that IVF achieves better performance than the designed algorithms when they are applied to large-scale real document data sets in a modern computer system equipped with superscalar out-of-order processors and a deep hierarchical memory system. We also introduce a simple yet practical clock-cycle per instruction (CPI) model for speed-performance analysis. Analytical results reveal that IVF suppresses three performance degradation factors: the numbers of cache misses, branch mispredictions, and the completed instructions. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09094v1 |
https://arxiv.org/pdf/2002.09094v1.pdf | |
PWC | https://paperswithcode.com/paper/inverted-file-k-means-clustering-performance |
Repo | |
Framework | |
Subadditivity of Probability Divergences on Bayes-Nets with Applications to Time Series GANs
Title | Subadditivity of Probability Divergences on Bayes-Nets with Applications to Time Series GANs |
Authors | Mucong Ding, Constantinos Daskalakis, Soheil Feizi |
Abstract | GANs for time series data often use sliding windows or self-attention to capture underlying time dependencies. While these techniques have no clear theoretical justification, they are successful in significantly reducing the discriminator size, speeding up the training process, and improving the generation quality. In this paper, we provide both theoretical foundations and a practical framework of GANs for high-dimensional distributions with conditional independence structure captured by a Bayesian network, such as time series data. We prove that several probability divergences satisfy subadditivity properties with respect to the neighborhoods of the Bayes-net graph, providing an upper bound on the distance between two Bayes-nets by the sum of (local) distances between their marginals on every neighborhood of the graph. This leads to our proposed Subadditive GAN framework that uses a set of simple discriminators on the neighborhoods of the Bayes-net, rather than a giant discriminator on the entire network, providing significant statistical and computational benefits. We show that several probability distances including Jensen-Shannon, Total Variation, and Wasserstein, have subadditivity or generalized subadditivity. Moreover, we prove that Integral Probability Metrics (IPMs), which encompass commonly-used loss functions in GANs, also enjoy a notion of subadditivity under some mild conditions. Furthermore, we prove that nearly all f-divergences satisfy local subadditivity in which subadditivity holds when the distributions are relatively close. Our experiments on synthetic as well as real-world datasets verify the proposed theory and the benefits of subadditive GANs. |
Tasks | Time Series |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.00652v1 |
https://arxiv.org/pdf/2003.00652v1.pdf | |
PWC | https://paperswithcode.com/paper/subadditivity-of-probability-divergences-on |
Repo | |
Framework | |
Single-Stage Object Detection from Top-View Grid Maps on Custom Sensor Setups
Title | Single-Stage Object Detection from Top-View Grid Maps on Custom Sensor Setups |
Authors | Sascha Wirges, Shuxiao Ding, Christoph Stiller |
Abstract | We present our approach to unsupervised domain adaptation for single-stage object detectors on top-view grid maps in automated driving scenarios. Our goal is to train a robust object detector on grid maps generated from custom sensor data and setups. We first introduce a single-stage object detector for grid maps based on RetinaNet. We then extend our model by image- and instance-level domain classifiers at different feature pyramid levels which are trained in an adversarial manner. This allows us to train robust object detectors for unlabeled domains. We evaluate our approach quantitatively on the nuScenes and KITTI benchmarks and present qualitative domain adaptation results for unlabeled measurements recorded by our experimental vehicle. Our results demonstrate that object detection accuracy for unlabeled domains can be improved by applying our domain adaptation strategy. |
Tasks | Domain Adaptation, Object Detection, Unsupervised Domain Adaptation |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.00667v1 |
https://arxiv.org/pdf/2002.00667v1.pdf | |
PWC | https://paperswithcode.com/paper/single-stage-object-detection-from-top-view |
Repo | |
Framework | |