Paper Group ANR 844
The Deterministic plus Stochastic Model of the Residual Signal and its Applications. Multi-Resolution 3D CNN for MRI Brain Tumor Segmentation and Survival Prediction. 4X4 Census Transform. Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?. Learning from Implicit Information in Natural Language Instructions for Robo …
The Deterministic plus Stochastic Model of the Residual Signal and its Applications
Title | The Deterministic plus Stochastic Model of the Residual Signal and its Applications |
Authors | Thomas Drugman, Thierry Dutoit |
Abstract | The modeling of speech production often relies on a source-filter approach. Although methods parameterizing the filter have nowadays reached a certain maturity, there is still a lot to be gained for several speech processing applications in finding an appropriate excitation model. This manuscript presents a Deterministic plus Stochastic Model (DSM) of the residual signal. The DSM consists of two contributions acting in two distinct spectral bands delimited by a maximum voiced frequency. Both components are extracted from an analysis performed on a speaker-dependent dataset of pitch-synchronous residual frames. The deterministic part models the low-frequency contents and arises from an orthonormal decomposition of these frames. As for the stochastic component, it is a high-frequency noise modulated both in time and frequency. Some interesting phonetic and computational properties of the DSM are also highlighted. The applicability of the DSM in two fields of speech processing is then studied. First, it is shown that incorporating the DSM vocoder in HMM-based speech synthesis enhances the delivered quality. The proposed approach turns out to significantly outperform the traditional pulse excitation and provides a quality equivalent to STRAIGHT. In a second application, the potential of glottal signatures derived from the proposed DSM is investigated for speaker identification purpose. Interestingly, these signatures are shown to lead to better recognition rates than other glottal-based methods. |
Tasks | Speaker Identification, Speech Synthesis |
Published | 2019-12-29 |
URL | https://arxiv.org/abs/2001.01000v1 |
https://arxiv.org/pdf/2001.01000v1.pdf | |
PWC | https://paperswithcode.com/paper/the-deterministic-plus-stochastic-model-of |
Repo | |
Framework | |
Multi-Resolution 3D CNN for MRI Brain Tumor Segmentation and Survival Prediction
Title | Multi-Resolution 3D CNN for MRI Brain Tumor Segmentation and Survival Prediction |
Authors | Mehdi Amian, Mohammadreza Soltaninejad |
Abstract | In this study, an automated three dimensional (3D) deep segmentation approach for detecting gliomas in 3D pre-operative MRI scans is proposed. Then, a classi-fication algorithm based on random forests, for survival prediction is presented. The objective is to segment the glioma area and produce segmentation labels for its different sub-regions, i.e. necrotic and the non-enhancing tumor core, the peri-tumoral edema, and enhancing tumor. The proposed deep architecture for the segmentation task encompasses two parallel streamlines with two different reso-lutions. One deep convolutional neural network is to learn local features of the input data while the other one is set to have a global observation on whole image. Deemed to be complementary, the outputs of each stream are then merged to pro-vide an ensemble complete learning of the input image. The proposed network takes the whole image as input instead of patch-based approaches in order to con-sider the semantic features throughout the whole volume. The algorithm is trained on BraTS 2019 which included 335 training cases, and validated on 127 unseen cases from the validation dataset using a blind testing approach. The proposed method was also evaluated on the BraTS 2019 challenge test dataset of 166 cases. The results show that the proposed methods provide promising segmentations as well as survival prediction. The mean Dice overlap measures of automatic brain tumor segmentation for validation set were 0.84, 0.74 and 0.71 for the whole tu-mor, core and enhancing tumor, respectively. The corresponding results for the challenge test dataset were 0.82, 0.72, and 0.70, respectively. The overall accura-cy of the proposed model for the survival prediction task is %52 for the valida-tion and %49 for the test dataset. |
Tasks | Brain Tumor Segmentation |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08388v1 |
https://arxiv.org/pdf/1911.08388v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-resolution-3d-cnn-for-mri-brain-tumor |
Repo | |
Framework | |
4X4 Census Transform
Title | 4X4 Census Transform |
Authors | Olivier Rukundo |
Abstract | This paper proposes a 4X4 Census Transform (4X4CT) to encourage further research in computer vision and visual computing. Unlike the traditional 3X3 CT which uses a nine pixels kernel, the proposed 4X4CT uses a sixteen pixels kernel with four overlapped groups of 3X3 kernel size. In each overlapping group, a reference input pixel profits from its nearest eight pixels to produce an eight bits binary string convertible to a grayscale integer of the 4X4CT’s output pixel. Preliminary experiments demonstrated more image textural crispness and contrast than the CT as well as alternativeness to enable meaningful solutions to be achieved. |
Tasks | |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1907.12891v1 |
https://arxiv.org/pdf/1907.12891v1.pdf | |
PWC | https://paperswithcode.com/paper/4x4-census-transform |
Repo | |
Framework | |
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?
Title | Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion? |
Authors | Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent |
Abstract | Automatic speech recognition (ASR) is a key technology in many services and applications. This typically requires user devices to send their speech data to the cloud for ASR decoding. As the speech signal carries a lot of information about the speaker, this raises serious privacy concerns. As a solution, an encoder may reside on each user device which performs local computations to anonymize the representation. In this paper, we focus on the protection of speaker identity and study the extent to which users can be recognized based on the encoded representation of their speech as obtained by a deep encoder-decoder architecture trained for ASR. Through speaker identification and verification experiments on the Librispeech corpus with open and closed sets of speakers, we show that the representations obtained from a standard architecture still carry a lot of information about speaker identity. We then propose to use adversarial training to learn representations that perform well in ASR while hiding speaker identity. Our results demonstrate that adversarial training dramatically reduces the closed-set classification accuracy, but this does not translate into increased open-set verification error hence into increased protection of the speaker identity in practice. We suggest several possible reasons behind this negative result. |
Tasks | Representation Learning, Speaker Identification, Speech Recognition |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04913v1 |
https://arxiv.org/pdf/1911.04913v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-adversarial-representation |
Repo | |
Framework | |
Learning from Implicit Information in Natural Language Instructions for Robotic Manipulations
Title | Learning from Implicit Information in Natural Language Instructions for Robotic Manipulations |
Authors | Ozan Arkan Can, Pedro Zuidberg Dos Martires, Andreas Persson, Julian Gaal, Amy Loutfi, Luc De Raedt, Deniz Yuret, Alessandro Saffiotti |
Abstract | Human-robot interaction often occurs in the form of instructions given from a human to a robot. For a robot to successfully follow instructions, a common representation of the world and objects in it should be shared between humans and the robot so that the instructions can be grounded. Achieving this representation can be done via learning, where both the world representation and the language grounding are learned simultaneously. However, in robotics this can be a difficult task due to the cost and scarcity of data. In this paper, we tackle the problem by separately learning the world representation of the robot and the language grounding. While this approach can address the challenges in getting sufficient data, it may give rise to inconsistencies between both learned components. Therefore, we further propose Bayesian learning to resolve such inconsistencies between the natural language grounding and a robot’s world representation by exploiting spatio-relational information that is implicitly present in instructions given by a human. Moreover, we demonstrate the feasibility of our approach on a scenario involving a robotic arm in the physical world. |
Tasks | |
Published | 2019-04-30 |
URL | http://arxiv.org/abs/1904.13324v1 |
http://arxiv.org/pdf/1904.13324v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-implicit-information-in-natural |
Repo | |
Framework | |
An Overview of the Ludii General Game System
Title | An Overview of the Ludii General Game System |
Authors | Matthew Stephenson, Éric Piette, Dennis J. N. J. Soemers, Cameron Browne |
Abstract | The Digital Ludeme Project (DLP) aims to reconstruct and analyse over 1000 traditional strategy games using modern techniques. One of the key aspects of this project is the development of Ludii, a general game system that will be able to model and play the complete range of games required by this project. Such an undertaking will create a wide range of possibilities for new AI challenges. In this paper we describe many of the features of Ludii that can be used. This includes designing and modifying games using the Ludii game description language, creating agents capable of playing these games, and several advantages the system has over prior general game software. |
Tasks | |
Published | 2019-06-29 |
URL | https://arxiv.org/abs/1907.00240v1 |
https://arxiv.org/pdf/1907.00240v1.pdf | |
PWC | https://paperswithcode.com/paper/an-overview-of-the-ludii-general-game-system |
Repo | |
Framework | |
Residual-CNDS for Grand Challenge Scene Dataset
Title | Residual-CNDS for Grand Challenge Scene Dataset |
Authors | Hussein A. Al-Barazanchi, Hussam Qassim, David Feinzimer, Abhishek Verma |
Abstract | Increasing depth of convolutional neural networks (CNNs) is a highly promising method of increasing the accuracy of the (CNNs). Increased CNN depth will also result in increased layer count (parameters), leading to a slow backpropagation convergence prone to overfitting. We trained our model (Residual-CNDS) to classify very large-scale scene datasets MIT Places 205, and MIT Places 365-Standard. The outcome result from the two datasets proved our proposed model (Residual-CNDS) effectively handled the slow convergence, overfitting, and degradation. CNNs that include deep supervision (CNDS) add supplementary branches to the deep convolutional neural network in specified layers by calculating vanishing, effectively addressing delayed convergence and overfitting. Nevertheless, (CNDS) does not resolve degradation; hence, we add residual learning to the (CNDS) in certain layers after studying the best place in which to add it. With this approach we overcome degradation in the very deep network. We have built two models (Residual-CNDS 8), and (Residual-CNDS 10). Moreover, we tested our models on two large-scale datasets, and we compared our results with other recently introduced cutting-edge networks in the domain of top-1 and top-5 classification accuracy. As a result, both of models have shown good improvement, which supports the assertion that the addition of residual connections enhances network CNDS accuracy without adding any computation complexity. |
Tasks | |
Published | 2019-01-13 |
URL | http://arxiv.org/abs/1902.10030v1 |
http://arxiv.org/pdf/1902.10030v1.pdf | |
PWC | https://paperswithcode.com/paper/residual-cnds-for-grand-challenge-scene |
Repo | |
Framework | |
Capturing the Production of the Innovative Ideas: An Online Social Network Experiment and “Idea Geography” Visualization
Title | Capturing the Production of the Innovative Ideas: An Online Social Network Experiment and “Idea Geography” Visualization |
Authors | Yiding Cao, Yingjun Dong, Minjun Kim, Neil G. MacLaren, Ankita Kulkarni, Shelley D. Dionne, Francis J. Yammarino, Hiroki Sayama |
Abstract | Collective design and innovation are crucial in organizations. To investigate how the collective design and innovation processes would be affected by the diversity of knowledge and background of collective individual members, we conducted three collaborative design task experiments which involved nearly 300 participants who worked together anonymously in a social network structure using a custom-made computer-mediated collaboration platform. We compared the idea generation activity among three different background distribution conditions (clustered, random, and dispersed) with the help of the “doc2vec” text representation machine learning algorithm. We also developed a new method called “Idea Geography” to visualize the idea utility terrain on a 2D problem domain. The results showed that groups with random background allocation tended to produce the best design idea with highest utility values. It was also suggested that the diversity of participants’ backgrounds distribution on the network might interact with each other to affect the diversity of ideas generated. The proposed idea geography successfully visualized that the collective design processes did find the high utility area through exploration and exploitation in collaborative work. |
Tasks | |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.06353v2 |
https://arxiv.org/pdf/1911.06353v2.pdf | |
PWC | https://paperswithcode.com/paper/capturing-the-production-of-the-innovative |
Repo | |
Framework | |
Hidden Covariate Shift: A Minimal Assumption For Domain Adaptation
Title | Hidden Covariate Shift: A Minimal Assumption For Domain Adaptation |
Authors | Victor Bouvier, Philippe Very, Céline Hudelot, Clément Chastagnol |
Abstract | Unsupervised Domain Adaptation aims to learn a model on a source domain with labeled data in order to perform well on unlabeled data of a target domain. Current approaches focus on learning \textit{Domain Invariant Representations}. It relies on the assumption that such representations are well-suited for learning the supervised task in the target domain. We rather believe that a better and minimal assumption for performing Domain Adaptation is the \textit{Hidden Covariate Shift} hypothesis. Such approach consists in learning a representation of the data such that the label distribution conditioned on this representation is domain invariant. From the Hidden Covariate Shift assumption, we derive an optimization procedure which learns to match an estimated joint distribution on the target domain and a re-weighted joint distribution on the source domain. The re-weighting is done in the representation space and is learned during the optimization procedure. We show on synthetic data and real world data that our approach deals with both \textit{Target Shift} and \textit{Concept Drift}. We report state-of-the-art performances on Amazon Reviews dataset \cite{blitzer2007biographies} demonstrating the viability of this approach. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12299v1 |
https://arxiv.org/pdf/1907.12299v1.pdf | |
PWC | https://paperswithcode.com/paper/hidden-covariate-shift-a-minimal-assumption |
Repo | |
Framework | |
Audio De-identification: A New Entity Recognition Task
Title | Audio De-identification: A New Entity Recognition Task |
Authors | Ido Cohn, Itay Laish, Genady Beryozkin, Gang Li, Izhak Shafran, Idan Szpektor, Tzvika Hartman, Avinatan Hassidim, Yossi Matias |
Abstract | Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline’s results on it. |
Tasks | Named Entity Recognition, Speech Recognition |
Published | 2019-03-17 |
URL | https://arxiv.org/abs/1903.07037v2 |
https://arxiv.org/pdf/1903.07037v2.pdf | |
PWC | https://paperswithcode.com/paper/audio-de-identification-a-new-entity |
Repo | |
Framework | |
Learning to Combat Compounding-Error in Model-Based Reinforcement Learning
Title | Learning to Combat Compounding-Error in Model-Based Reinforcement Learning |
Authors | Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller |
Abstract | Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate. An algorithm should ideally be able to trust an imperfect model over a reasonably long planning horizon, and only rely on model-free updates when the model errors get infeasibly large. In this paper, we investigate techniques for choosing the planning horizon on a state-dependent basis, where a state’s planning horizon is determined by the maximum cumulative model error around that state. We demonstrate that these state-dependent model errors can be learned with Temporal Difference methods, based on a novel approach of temporally decomposing the cumulative model errors. Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines. |
Tasks | |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.11206v1 |
https://arxiv.org/pdf/1912.11206v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-combat-compounding-error-in-model-1 |
Repo | |
Framework | |
Leveraging Simple Model Predictions for Enhancing its Performance
Title | Leveraging Simple Model Predictions for Enhancing its Performance |
Authors | Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss |
Abstract | There has been recent interest in improving performance of simple models for multiple reasons such as interpretability, robust learning from small data, deployment in memory constrained settings as well as environmental considerations. In this paper, we propose a novel method SRatio that can utilize information from high performing complex models (viz. deep neural networks, boosted trees, random forests) to reweight a training dataset for a potentially low performing simple model such as a decision tree or a shallow network enhancing its performance. Our method also leverages the per sample hardness estimate of the simple model which is not the case with the prior works which primarily consider the complex model’s confidences/predictions and is thus conceptually novel. Moreover, we generalize and formalize the concept of attaching probes to intermediate layers of a neural network to other commonly used classifiers and incorporate this into our method. The benefit of these contributions is witnessed in the experiments where on 6 UCI datasets and CIFAR-10 we outperform competitors in a majority (16 out of 27) of the cases and tie for best performance in the remaining cases. In fact, in a couple of cases, we even approach the complex model’s performance. We also conduct further experiments to validate assertions and intuitively understand why our method works. Theoretically, we motivate our approach by showing that the weighted loss minimized by simple models using our weighting upper bounds the loss of the complex model. |
Tasks | |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13565v2 |
https://arxiv.org/pdf/1905.13565v2.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-simple-model-predictions-for |
Repo | |
Framework | |
LS-Tree: Model Interpretation When the Data Are Linguistic
Title | LS-Tree: Model Interpretation When the Data Are Linguistic |
Authors | Jianbo Chen, Michael I. Jordan |
Abstract | We study the problem of interpreting trained classification models in the setting of linguistic data sets. Leveraging a parse tree, we propose to assign least-squares based importance scores to each word of an instance by exploiting syntactic constituency structure. We establish an axiomatic characterization of these importance scores by relating them to the Banzhaf value in coalitional game theory. Based on these importance scores, we develop a principled method for detecting and quantifying interactions between words in a sentence. We demonstrate that the proposed method can aid in interpretability and diagnostics for several widely-used language models. |
Tasks | |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.04187v1 |
http://arxiv.org/pdf/1902.04187v1.pdf | |
PWC | https://paperswithcode.com/paper/ls-tree-model-interpretation-when-the-data |
Repo | |
Framework | |
PNUNet: Anomaly Detection using Positive-and-Negative Noise based on Self-Training Procedure
Title | PNUNet: Anomaly Detection using Positive-and-Negative Noise based on Self-Training Procedure |
Authors | Masanari Kimura |
Abstract | We propose the novel framework for anomaly detection in images. Our new framework, PNUNet, is based on many normal data and few anomalous data. We assume that some noises are added to the input images and learn to remove the noise. In addition, the proposed method achieves significant performance improvement by updating the noise assumed in the inputs using a self-training framework. The experimental results for the benchmark datasets show the usefulness of our new anomaly detection framework. |
Tasks | Anomaly Detection |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.10939v1 |
https://arxiv.org/pdf/1905.10939v1.pdf | |
PWC | https://paperswithcode.com/paper/pnunet-anomaly-detection-using-positive-and |
Repo | |
Framework | |
Regular Expressions with Backreferences: Polynomial-Time Matching Techniques
Title | Regular Expressions with Backreferences: Polynomial-Time Matching Techniques |
Authors | Markus L. Schmid |
Abstract | Regular expressions with backreferences (regex, for short), as supported by most modern libraries for regular expression matching, have an NP-complete matching problem. We define a complexity parameter of regex, called active variable degree, such that regex with this parameter bounded by a constant can be matched in polynomial-time. Moreover, we formulate a novel type of determinism for regex (on an automaton-theoretic level), which yields the class of memory-deterministic regex that can be matched in time O(wp(r)) for a polynomial p (where r is the regex and w the word). Natural extensions of these concepts lead to properties of regex that are intractable to check. |
Tasks | |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.05896v1 |
http://arxiv.org/pdf/1903.05896v1.pdf | |
PWC | https://paperswithcode.com/paper/regular-expressions-with-backreferences |
Repo | |
Framework | |