Paper Group ANR 84
Disentangled Speech Embeddings using Cross-modal Self-supervision. Towards Learning Multi-agent Negotiations via Self-Play. Robust Speaker Recognition Using Speech Enhancement And Attention Model. An Experiment in Morphological Development for Learning ANN Based Controllers. Intelligent Arxiv: Sort daily papers by learning users topics preference. …
Disentangled Speech Embeddings using Cross-modal Self-supervision
Title | Disentangled Speech Embeddings using Cross-modal Self-supervision |
Authors | Arsha Nagrani, Joon Son Chung, Samuel Albanie, Andrew Zisserman |
Abstract | The objective of this paper is to learn representations of speaker identity without access to manually annotated data. To do so, we develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces and audio in video. The key idea behind our approach is to tease apart—without annotation—the representations of linguistic content and speaker identity. We construct a two-stream architecture which: (1) shares low-level features common to both representations; and (2) provides a natural mechanism for explicitly disentangling these factors, offering the potential for greater generalisation to novel combinations of content and identity and ultimately producing speaker identity representations that are more robust. We train our method on a large-scale audio-visual dataset of talking heads `in the wild’, and demonstrate its efficacy by evaluating the learned speaker representations for standard speaker recognition performance. | |
Tasks | Speaker Recognition |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08742v1 |
https://arxiv.org/pdf/2002.08742v1.pdf | |
PWC | https://paperswithcode.com/paper/disentangled-speech-embeddings-using-cross |
Repo | |
Framework | |
Towards Learning Multi-agent Negotiations via Self-Play
Title | Towards Learning Multi-agent Negotiations via Self-Play |
Authors | Yichuan Charlie Tang |
Abstract | Making sophisticated, robust, and safe sequential decisions is at the heart of intelligent systems. This is especially critical for planning in complex multi-agent environments, where agents need to anticipate other agents’ intentions and possible future actions. Traditional methods formulate the problem as a Markov Decision Process, but the solutions often rely on various assumptions and become brittle when presented with corner cases. In contrast, deep reinforcement learning (Deep RL) has been very effective at finding policies by simultaneously exploring, interacting, and learning from environments. Leveraging the powerful Deep RL paradigm, we demonstrate that an iterative procedure of self-play can create progressively more diverse environments, leading to the learning of sophisticated and robust multi-agent policies. We demonstrate this in a challenging multi-agent simulation of merging traffic, where agents must interact and negotiate with others in order to successfully merge on or off the road. While the environment starts off simple, we increase its complexity by iteratively adding an increasingly diverse set of agents to the agent “zoo” as training progresses. Qualitatively, we find that through self-play, our policies automatically learn interesting behaviors such as defensive driving, overtaking, yielding, and the use of signal lights to communicate intentions to other agents. In addition, quantitatively, we show a dramatic improvement of the success rate of merging maneuvers from 63% to over 98%. |
Tasks | |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10208v1 |
https://arxiv.org/pdf/2001.10208v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-learning-multi-agent-negotiations-via |
Repo | |
Framework | |
Robust Speaker Recognition Using Speech Enhancement And Attention Model
Title | Robust Speaker Recognition Using Speech Enhancement And Attention Model |
Authors | Yanpei Shi, Qiang Huang, Thomas Hain |
Abstract | In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain. To evaluate speaker identification and verification performance of the proposed approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark datasets. Moreover, the robustness of our proposed approach is also tested on VoxCeleb1 data when being corrupted by three types of interferences, general noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments. |
Tasks | Speaker Identification, Speaker Recognition, Speech Enhancement |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.05031v1 |
https://arxiv.org/pdf/2001.05031v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-speaker-recognition-using-speech |
Repo | |
Framework | |
An Experiment in Morphological Development for Learning ANN Based Controllers
Title | An Experiment in Morphological Development for Learning ANN Based Controllers |
Authors | M. Naya-Varela, A. Faina, R. J. Duro |
Abstract | Morphological development is part of the way any human or animal learns. The learning processes starts with the morphology at birth and progresses through changing morphologies until adulthood is reached. Biologically, this seems to facilitate learning and make it more robust. However, when this approach is transferred to robotic systems, the results found in the literature are inconsistent: morphological development does not provide a learning advantage in every case. In fact, it can lead to poorer results than when learning with a fixed morphology. In this paper we analyze some of the issues involved by means of a simple, but very informative experiment in quadruped walking. From the results obtained an initial series of insights on when and under what conditions to apply morphological development for learning are presented. |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.07195v1 |
https://arxiv.org/pdf/2003.07195v1.pdf | |
PWC | https://paperswithcode.com/paper/an-experiment-in-morphological-development |
Repo | |
Framework | |
Intelligent Arxiv: Sort daily papers by learning users topics preference
Title | Intelligent Arxiv: Sort daily papers by learning users topics preference |
Authors | Ezequiel Alvarez, Federico Lamagna, Cesar Miquel, Manuel Szewc |
Abstract | Current daily paper releases are becoming increasingly large and areas of research are growing in diversity. This makes it harder for scientists to keep up to date with current state of the art and identify relevant work within their lines of interest. The goal of this article is to address this problem using Machine Learning techniques. We model a scientific paper to be built as a combination of different scientific knowledge from diverse topics into a new problem. In light of this, we implement the unsupervised Machine Learning technique of Latent Dirichlet Allocation (LDA) on the corpus of papers in a given field to: i) define and extract underlying topics in the corpus; ii) get the topics weight vector for each paper in the corpus; and iii) get the topics weight vector for new papers. By registering papers preferred by a user, we build a user vector of weights using the information of the vectors of the selected papers. Hence, by performing an inner product between the user vector and each paper in the daily Arxiv release, we can sort the papers according to the user preference on the underlying topics. We have created the website IArxiv.org where users can read sorted daily Arxiv releases (and more) while the algorithm learns each users preference, yielding a more accurate sorting every day. Current IArxiv.org version runs on Arxiv categories astro-ph, gr-qc, hep-ph and hep-th and we plan to extend to others. We propose several new useful and relevant implementations to be additionally developed as well as new Machine Learning techniques beyond LDA to further improve the accuracy of this new tool. |
Tasks | |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02460v1 |
https://arxiv.org/pdf/2002.02460v1.pdf | |
PWC | https://paperswithcode.com/paper/intelligent-arxiv-sort-daily-papers-by |
Repo | |
Framework | |
Majority Voting and the Condorcet’s Jury Theorem
Title | Majority Voting and the Condorcet’s Jury Theorem |
Authors | Hanan Shteingart, Eran Marom, Igor Itkin, Gil Shabat, Michael Kolomenkin, Moshe Salhov, Liran Katzir |
Abstract | There is a striking relationship between a three hundred years old Political Science theorem named “Condorcet’s jury theorem” (1785), which states that majorities are more likely to choose correctly when individual votes are often correct and independent, and a modern Machine Learning concept called “Strength of Weak Learnability” (1990), which describes a method for converting a weak learning algorithm into one that achieves arbitrarily high accuracy and stands in the basis of Ensemble Learning. Albeit the intuitive statement of Condorcet’s theorem, we could not find a compact and simple rigorous mathematical proof of the theorem neither in classical handbooks of Machine Learning nor in published papers. By all means we do not claim to discover or reinvent a theory nor a result. We humbly want to offer a more publicly available simple derivation of the theorem. We will find joy in seeing more teachers of introduction-to-machine-learning courses use the proof we provide here as an exercise to explain the motivation of ensemble learning. |
Tasks | |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.03153v2 |
https://arxiv.org/pdf/2002.03153v2.pdf | |
PWC | https://paperswithcode.com/paper/majority-voting-and-the-condorcets-jury |
Repo | |
Framework | |
Local Propagation in Constraint-based Neural Network
Title | Local Propagation in Constraint-based Neural Network |
Authors | Giuseppe Marra, Matteo Tiezzi, Stefano Melacci, Alessandro Betti, Marco Maggini, Marco Gori |
Abstract | In this paper we study a constraint-based representation of neural network architectures. We cast the learning problem in the Lagrangian framework and we investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints, learning from the available supervisions. The computational structure of the proposed Local Propagation (LP) algorithm is based on the search for saddle points in the adjoint space composed of weights, neural outputs, and Lagrange multipliers. All the updates of the model variables are locally performed, so that LP is fully parallelizable over the neural units, circumventing the classic problem of gradient vanishing in deep networks. The implementation of popular neural models is described in the context of LP, together with those conditions that trace a natural connection with Backpropagation. We also investigate the setting in which we tolerate bounded violations of the architectural constraints, and we provide experimental evidence that LP is a feasible approach to train shallow and deep networks, opening the road to further investigations on more complex architectures, easily describable by constraints. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07720v1 |
https://arxiv.org/pdf/2002.07720v1.pdf | |
PWC | https://paperswithcode.com/paper/local-propagation-in-constraint-based-neural |
Repo | |
Framework | |
Gastric histopathology image segmentation using a hierarchical conditional random field
Title | Gastric histopathology image segmentation using a hierarchical conditional random field |
Authors | Changhao Sun, Chen Li, Xiaoyan Li |
Abstract | In this paper, a Hierarchical Conditional Random Field (HCRF) model based Gastric Histopathology Image Segmentation (GHIS) method is proposed, which can localize abnormal (cancer) regions in gastric histopathology images obtained by optical microscope to assist histopathologists in medical work. First, to obtain pixel-level segmentation information, we retrain a Convolutional Neural Network (CNN) to build up our pixel-level potentials. Then, in order to obtain abundant spatial segmentation information in patch-level, we fine-tune another three CNNs to build up our patch-level potentials. Thirdly, based on the pixel- and patch-level potentials, our HCRF model is structured. Finally, graph-based post-processing is applied to further improve our segmentation performance. In the experiment, a segmentation accuracy of 78.91% is achieved on a Hematoxylin and Eosin (H&E) stained gastric histopathological dataset with 560 images, showing the effectiveness and future potential of the proposed GHIS method. |
Tasks | Semantic Segmentation |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01302v2 |
https://arxiv.org/pdf/2003.01302v2.pdf | |
PWC | https://paperswithcode.com/paper/gastric-histopathology-image-segmentation |
Repo | |
Framework | |
Performance Analysis of Semi-supervised Learning in the Small-data Regime using VAEs
Title | Performance Analysis of Semi-supervised Learning in the Small-data Regime using VAEs |
Authors | Varun Mannam, Arman Kazemi |
Abstract | Extracting large amounts of data from biological samples is not feasible due to radiation issues, and image processing in the small-data regime is one of the critical challenges when working with a limited amount of data. In this work, we applied an existing algorithm named Variational Auto Encoder (VAE) that pre-trains a latent space representation of the data to capture the features in a lower-dimension for the small-data regime input. The fine-tuned latent space provides constant weights that are useful for classification. Here we will present the performance analysis of the VAE algorithm with different latent space sizes in the semi-supervised learning using the CIFAR-10 dataset. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.12164v1 |
https://arxiv.org/pdf/2002.12164v1.pdf | |
PWC | https://paperswithcode.com/paper/performance-analysis-of-semi-supervised |
Repo | |
Framework | |
Proxy Anchor Loss for Deep Metric Learning
Title | Proxy Anchor Loss for Deep Metric Learning |
Authors | Sungyeon Kim, Dongwon Kim, Minsu Cho, Suha Kwak |
Abstract | Existing metric learning losses can be categorized into two classes: pair-based and proxy-based losses. The former class can leverage fine-grained semantic relations between data points, but slows convergence in general due to its high training complexity. In contrast, the latter class enables fast and reliable convergence, but cannot consider the rich data-to-data relations. This paper presents a new proxy-based loss that takes advantages of both pair- and proxy-based methods and overcomes their limitations. Thanks to the use of proxies, our loss boosts the speed of convergence and is robust against noisy labels and outliers. At the same time, it allows embedding vectors of data to interact with each other in its gradients to exploit data-to-data relations. Our method is evaluated on four public benchmarks, where a standard network trained with our loss achieves state-of-the-art performance and most quickly converges. |
Tasks | Metric Learning |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.13911v1 |
https://arxiv.org/pdf/2003.13911v1.pdf | |
PWC | https://paperswithcode.com/paper/proxy-anchor-loss-for-deep-metric-learning |
Repo | |
Framework | |
Kalman Recursions Aggregated Online
Title | Kalman Recursions Aggregated Online |
Authors | Eric Adjakossa, Yannig Goude, Olivier Wintenberger |
Abstract | In this article, we aim at improving the prediction of expert aggregation by using the underlying properties of the models that provide expert predictions. We restrict ourselves to the case where expert predictions come from Kalman recursions, fitting state-space models. By using exponential weights, we construct different algorithms of Kalman recursions Aggregated Online (KAO) that compete with the best expert or the best convex combination of experts in a more or less adaptive way. We improve the existing results on expert aggregation literature when the experts are Kalman recursions by taking advantage of the second-order properties of the Kalman recursions. We apply our approach to Kalman recursions and extend it to the general adversarial expert setting by state-space modeling the errors of the experts. We apply these new algorithms to a real dataset of electricity consumption and show how it can improve forecast performances comparing to other exponentially weighted average procedures. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.12173v1 |
https://arxiv.org/pdf/2002.12173v1.pdf | |
PWC | https://paperswithcode.com/paper/kalman-recursions-aggregated-online |
Repo | |
Framework | |
A multimodal deep learning approach for named entity recognition from social media
Title | A multimodal deep learning approach for named entity recognition from social media |
Authors | Meysam Asgari-Chenaghlu, M. Reza Feizi-Derakhshi, Leili Farzinvash, Cina Motamed |
Abstract | Named Entity Recognition (NER) from social media posts is a challenging task. User generated content which forms the nature of social media, is noisy and contains grammatical and linguistic errors. This noisy content makes it much harder for tasks such as named entity recognition. However some applications like automatic journalism or information retrieval from social media, require more information about entities mentioned in groups of social media posts. Conventional methods applied to structured and well typed documents provide acceptable results while compared to new user generated media, these methods are not satisfactory. One valuable piece of information about an entity is the related image to the text. Combining this multimodal data reduces ambiguity and provides wider information about the entities mentioned. In order to address this issue, we propose a novel deep learning approach utilizing multimodal deep learning. Our solution is able to provide more accurate results on named entity recognition task. Experimental results, namely the precision, recall and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions. |
Tasks | Information Retrieval, Named Entity Recognition |
Published | 2020-01-19 |
URL | https://arxiv.org/abs/2001.06888v1 |
https://arxiv.org/pdf/2001.06888v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multimodal-deep-learning-approach-for-named |
Repo | |
Framework | |
A Swiss German Dictionary: Variation in Speech and Writing
Title | A Swiss German Dictionary: Variation in Speech and Writing |
Authors | Larissa Schmidt, Lucy Linder, Sandra Djambazovska, Alexandros Lazaridis, Tanja Samardžić, Claudiu Musat |
Abstract | We introduce a dictionary containing forms of common words in various Swiss German dialects normalized into High German. As Swiss German is, for now, a predominantly spoken language, there is a significant variation in the written forms, even between speakers of the same dialect. To alleviate the uncertainty associated with this diversity, we complement the pairs of Swiss German - High German words with the Swiss German phonetic transcriptions (SAMPA). This dictionary becomes thus the first resource to combine large-scale spontaneous translation with phonetic transcriptions. Moreover, we control for the regional distribution and insure the equal representation of the major Swiss dialects. The coupling of the phonetic and written Swiss German forms is powerful. We show that they are sufficient to train a Transformer-based phoneme to grapheme model that generates credible novel Swiss German writings. In addition, we show that the inverse mapping - from graphemes to phonemes - can be modeled with a transformer trained with the novel dictionary. This generation of pronunciations for previously unknown words is key in training extensible automated speech recognition (ASR) systems, which are key beneficiaries of this dictionary. |
Tasks | Speech Recognition |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2004.00139v1 |
https://arxiv.org/pdf/2004.00139v1.pdf | |
PWC | https://paperswithcode.com/paper/a-swiss-german-dictionary-variation-in-speech |
Repo | |
Framework | |
Stochastic Local Interaction Model: Geostatistics without Kriging
Title | Stochastic Local Interaction Model: Geostatistics without Kriging |
Authors | Dionissios T. Hristopulos, Andreas Pavlides, Vasiliki D. Agou, Panagiota Gkafa |
Abstract | Classical geostatistical methods face serious computational challenges if they are confronted with large sets of spatially distributed data. We present a simplified stochastic local interaction (SLI) model for computationally efficient spatial prediction that can handle large data. The SLI method constructs a spatial interaction matrix (precision matrix) that accounts for the data values, their locations, and the sampling density variations without user input. We show that this precision matrix is strictly positive definite. The SLI approach does not require matrix inversion for parameter estimation, spatial prediction, and uncertainty estimation, leading to computational procedures that are significantly less intensive computationally than kriging. The precision matrix involves compact kernel functions (spherical, quadratic, etc.) which enable the application of sparse matrix methods, thus improving computational time and memory requirements. We investigate the proposed SLI method with a data set that includes approximately 11500 drill-hole data of coal thickness from Campbell County (Wyoming, USA). We also compare SLI with ordinary kriging (OK) in terms of estimation performance, using cross validation analysis, and computational time. According to the validation measures used, SLI performs slightly better in estimating seam thickness than OK in terms of cross-validation measures. In terms of computation time, SLI prediction is 3 to 25 times (depending on the size of the kriging neighborhood) faster than OK for the same grid size. |
Tasks | |
Published | 2020-01-07 |
URL | https://arxiv.org/abs/2001.02246v1 |
https://arxiv.org/pdf/2001.02246v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-local-interaction-model |
Repo | |
Framework | |
Diversity and Inclusion Metrics in Subset Selection
Title | Diversity and Inclusion Metrics in Subset Selection |
Authors | Margaret Mitchell, Dylan Baker, Nyalleng Moorosi, Emily Denton, Ben Hutchinson, Alex Hanna, Timnit Gebru, Jamie Morgenstern |
Abstract | The ethical concept of fairness has recently been applied in machine learning (ML) settings to describe a wide range of constraints and objectives. When considering the relevance of ethical concepts to subset selection problems, the concepts of diversity and inclusion are additionally applicable in order to create outputs that account for social power and access differentials. We introduce metrics based on these concepts, which can be applied together, separately, and in tandem with additional fairness constraints. Results from human subject experiments lend support to the proposed criteria. Social choice methods can additionally be leveraged to aggregate and choose preferable sets, and we detail how these may be applied. |
Tasks | |
Published | 2020-02-09 |
URL | https://arxiv.org/abs/2002.03256v1 |
https://arxiv.org/pdf/2002.03256v1.pdf | |
PWC | https://paperswithcode.com/paper/diversity-and-inclusion-metrics-in-subset |
Repo | |
Framework | |