Paper Group ANR 403
Solving Tree Problems with Category Theory. Expectation Propagation for Approximate Inference: Free Probability Framework. Effective Parallelisation for Machine Learning. A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment. Multimodal Deep Domain Adaptation. On an Imm …
Solving Tree Problems with Category Theory
Title | Solving Tree Problems with Category Theory |
Authors | Rafik Hadfi |
Abstract | Artificial Intelligence (AI) has long pursued models, theories, and techniques to imbue machines with human-like general intelligence. Yet even the currently predominant data-driven approaches in AI seem to be lacking humans’ unique ability to solve wide ranges of problems. This situation begs the question of the existence of principles that underlie general problem-solving capabilities. We approach this question through the mathematical formulation of analogies across different problems and solutions. We focus in particular on problems that could be represented as tree-like structures. Most importantly, we adopt a category-theoretic approach in formalising tree problems as categories, and in proving the existence of equivalences across apparently unrelated problem domains. We prove the existence of a functor between the category of tree problems and the category of solutions. We also provide a weaker version of the functor by quantifying equivalences of problem categories using a metric on tree problems. |
Tasks | |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.07307v1 |
http://arxiv.org/pdf/1810.07307v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-tree-problems-with-category-theory |
Repo | |
Framework | |
Expectation Propagation for Approximate Inference: Free Probability Framework
Title | Expectation Propagation for Approximate Inference: Free Probability Framework |
Authors | Burak Çakmak, Manfred Opper |
Abstract | We study asymptotic properties of expectation propagation (EP) – a method for approximate inference originally developed in the field of machine learning. Applied to generalized linear models, EP iteratively computes a multivariate Gaussian approximation to the exact posterior distribution. The computational complexity of the repeated update of covariance matrices severely limits the application of EP to large problem sizes. In this study, we present a rigorous analysis by means of free probability theory that allows us to overcome this computational bottleneck if specific data matrices in the problem fulfill certain properties of asymptotic freeness. We demonstrate the relevance of our approach on the gene selection problem of a microarray dataset. |
Tasks | |
Published | 2018-01-16 |
URL | http://arxiv.org/abs/1801.05411v2 |
http://arxiv.org/pdf/1801.05411v2.pdf | |
PWC | https://paperswithcode.com/paper/expectation-propagation-for-approximate |
Repo | |
Framework | |
Effective Parallelisation for Machine Learning
Title | Effective Parallelisation for Machine Learning |
Authors | Michael Kamp, Mario Boley, Olana Missura, Thomas Gärtner |
Abstract | We present a novel parallelisation scheme that simplifies the adaptation of learning algorithms to growing amounts of data as well as growing needs for accurate and confident predictions in critical applications. In contrast to other parallelisation techniques, it can be applied to a broad class of learning algorithms without further mathematical derivations and without writing dedicated code, while at the same time maintaining theoretical performance guarantees. Moreover, our parallelisation scheme is able to reduce the runtime of many learning algorithms to polylogarithmic time on quasi-polynomially many processing units. This is a significant step towards a general answer to an open question on the efficient parallelisation of machine learning algorithms in the sense of Nick’s Class (NC). The cost of this parallelisation is in the form of a larger sample complexity. Our empirical study confirms the potential of our parallelisation scheme with fixed numbers of processors and instances in realistic application scenarios. |
Tasks | |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03530v1 |
http://arxiv.org/pdf/1810.03530v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-parallelisation-for-machine |
Repo | |
Framework | |
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment
Title | A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment |
Authors | Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhenhua Ling |
Abstract | Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expensive, and non-reproducible process, it hinders rapid prototyping of new VC technology. We address artifact assessment using an alternative, objective approach leveraging from prior work on spoofing countermeasures (CMs) for automatic speaker verification. Therein, CMs are used for rejecting fake' inputs such as replayed, synthetic or converted speech but their potential for automatic speech artifact assessment remains unknown. This study serves to fill that gap. As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts. Equal error rate (EER) of the CM, a confusability index of VC samples with real human speech, serves as our artifact measure. Two clusters of VCC'18 entries are identified: low-quality ones with detectable artifacts (low EERs), and higher quality ones with less artifacts. None of the VCC'18 systems, however, is perfect: all EERs are < 30 % (the ideal’ value would be 50 %). Our preliminary findings suggest potential of CMs outside of their original application, as a supplemental optimization and benchmarking tool to enhance VC technology. |
Tasks | Speaker Verification, Voice Conversion |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08438v2 |
http://arxiv.org/pdf/1804.08438v2.pdf | |
PWC | https://paperswithcode.com/paper/a-spoofing-benchmark-for-the-2018-voice |
Repo | |
Framework | |
Multimodal Deep Domain Adaptation
Title | Multimodal Deep Domain Adaptation |
Authors | Silvia Bucci, Mohammad Reza Loghmani, Barbara Caputo |
Abstract | Typically a classifier trained on a given dataset (source domain) does not performs well if it is tested on data acquired in a different setting (target domain). This is the problem that domain adaptation (DA) tries to overcome and, while it is a well explored topic in computer vision, it is largely ignored in robotic vision where usually visual classification methods are trained and tested in the same domain. Robots should be able to deal with unknown environments, recognize objects and use them in the correct way, so it is important to explore the domain adaptation scenario also in this context. The goal of the project is to define a benchmark and a protocol for multi-modal domain adaptation that is valuable for the robot vision community. With this purpose some of the state-of-the-art DA methods are selected: Deep Adaptation Network (DAN), Domain Adversarial Training of Neural Network (DANN), Automatic Domain Alignment Layers (AutoDIAL) and Adversarial Discriminative Domain Adaptation (ADDA). Evaluations have been done using different data types: RGB only, depth only and RGB-D over the following datasets, designed for the robotic community: RGB-D Object Dataset (ROD), Web Object Dataset (WOD), Autonomous Robot Indoor Dataset (ARID), Big Berkeley Instance Recognition Dataset (BigBIRD) and Active Vision Dataset. Although progresses have been made on the formulation of effective adaptation algorithms and more realistic object datasets are available, the results obtained show that, training a sufficiently good object classifier, especially in the domain adaptation scenario, is still an unsolved problem. Also the best way to combine depth with RGB informations to improve the performance is a point that needs to be investigated more. |
Tasks | Domain Adaptation |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1807.11697v1 |
http://arxiv.org/pdf/1807.11697v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-deep-domain-adaptation |
Repo | |
Framework | |
On an Immuno-inspired Distributed, Embodied Action-Evolution cum Selection Algorithm
Title | On an Immuno-inspired Distributed, Embodied Action-Evolution cum Selection Algorithm |
Authors | Tushar Semwal, Divya D Kulkarni, Shivashankar B. Nair |
Abstract | Traditional Evolutionary Robotics (ER) employs evolutionary techniques to search for a single monolithic controller which can aid a robot to learn a desired task. These techniques suffer from bootstrap and deception issues when the tasks are complex for a single controller to learn. Behaviour-decomposition techniques have been used to divide a task into multiple subtasks and evolve separate subcontrollers for each subtask. However, these subcontrollers and the associated subcontroller arbitrator(s) are all evolved off-line. A distributed, fully embodied and evolutionary version of such approaches will greatly aid online learning and help reduce the reality gap. In this paper, we propose an immunology-inspired embodied action-evolution cum selection algorithm that can cater to distributed ER. This algorithm evolves different subcontrollers for different portions of the search space in a distributed manner just as antibodies are evolved and primed for different antigens in the antigenic space. Experimentation on a collective of real robots embodied with the algorithm showed that a repertoire of antibody-like subcontrollers was created, evolved and shared on-the-fly to cope up with different environmental conditions. In addition, instead of the conventionally used approach of broadcasting for sharing, we present an Intelligent Packet Migration scheme that reduces energy consumption. |
Tasks | |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09789v1 |
http://arxiv.org/pdf/1806.09789v1.pdf | |
PWC | https://paperswithcode.com/paper/on-an-immuno-inspired-distributed-embodied |
Repo | |
Framework | |
Joint PLDA for Simultaneous Modeling of Two Factors
Title | Joint PLDA for Simultaneous Modeling of Two Factors |
Authors | Luciana Ferrer, Mitchell McLaren |
Abstract | Probabilistic linear discriminant analysis (PLDA) is a method used for biometric problems like speaker or face recognition that models the variability of the samples using two latent variables, one that depends on the class of the sample and another one that is assumed independent across samples and models the within-class variability. In this work, we propose a generalization of PLDA that enables joint modeling of two sample-dependent factors: the class of interest and a nuisance condition. The approach does not change the basic form of PLDA but rather modifies the training procedure to consider the dependency across samples of the latent variable that models within-class variability. While the identity of the nuisance condition is needed during training, it is not needed during testing since we propose a scoring procedure that marginalizes over the corresponding latent variable. We show results on a multilingual speaker-verification task, where the language spoken is considered a nuisance condition. We show that the proposed joint PLDA approach leads to significant performance gains in this task for two different datasets, in particular when the training data contains mostly or only monolingual speakers. |
Tasks | Face Recognition, Speaker Verification |
Published | 2018-03-28 |
URL | http://arxiv.org/abs/1803.10554v1 |
http://arxiv.org/pdf/1803.10554v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-plda-for-simultaneous-modeling-of-two |
Repo | |
Framework | |
Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks
Title | Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks |
Authors | Nathalia Nascimento, Carlos Lucena, Paulo Alencar, Donald Cowan |
Abstract | Several papers have recently contained reports on applying machine learning (ML) to the automation of software engineering (SE) tasks, such as project management, modeling and development. However, there appear to be no approaches comparing how software engineers fare against machine-learning algorithms as applied to specific software development tasks. Such a comparison is essential to gain insight into which tasks are better performed by humans and which by machine learning and how cooperative work or human-in-the-loop processes can be implemented more effectively. In this paper, we present an empirical study that compares how software engineers and machine-learning algorithms perform and reuse tasks. The empirical study involves the synthesis of the control structure of an autonomous streetlight application. Our approach consists of four steps. First, we solved the problem using machine learning to determine specific performance and reuse tasks. Second, we asked software engineers with different domain knowledge levels to provide a solution to the same tasks. Third, we compared how software engineers fare against machine-learning algorithms when accomplishing the performance and reuse tasks based on criteria such as energy consumption and safety. Finally, we analyzed the results to understand which tasks are better performed by either humans or algorithms so that they can work together more effectively. Such an understanding and the resulting human-in-the-loop approaches, which take into account the strengths and weaknesses of humans and machine-learning algorithms, are fundamental not only to provide a basis for cooperative work in support of software engineering, but also, in other areas. |
Tasks | |
Published | 2018-02-04 |
URL | http://arxiv.org/abs/1802.01096v2 |
http://arxiv.org/pdf/1802.01096v2.pdf | |
PWC | https://paperswithcode.com/paper/software-engineers-vs-machine-learning |
Repo | |
Framework | |
Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics
Title | Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics |
Authors | Arindam Jati, Panayiotis Georgiou |
Abstract | Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain many non-speech events and multi-speaker audio streams. The NPC framework exploits the proposed short-term active-speaker stationarity hypothesis which assumes two temporally-close short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce “speaker embeddings” by learning to separate same' vs different’ speaker pairs which are generated from an unlabeled data of audio streams. Two sets of experiments are done in different scenarios to evaluate the strength of NPC embeddings and compare with state-of-the-art in-domain supervised methods. First, two speaker identification experiments with different context lengths are performed in a scenario with comparatively limited within-speaker channel variability. NPC embeddings are found to perform the best at short duration experiment, and they provide complementary information to i-vectors for full utterance experiments. Second, a large scale speaker verification task having a wide range of within-speaker channel variability is adopted as an upper-bound experiment where comparisons are drawn with in-domain supervised methods. |
Tasks | Speaker Identification, Speaker Recognition, Speaker Verification, Speech Recognition |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.07860v2 |
http://arxiv.org/pdf/1802.07860v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-predictive-coding-using-convolutional |
Repo | |
Framework | |
Predicting Hurricane Trajectories using a Recurrent Neural Network
Title | Predicting Hurricane Trajectories using a Recurrent Neural Network |
Authors | Sheila Alemany, Jonathan Beltran, Adrian Perez, Sam Ganzfried |
Abstract | Hurricanes are cyclones circulating about a defined center whose closed wind speeds exceed 75 mph originating over tropical and subtropical waters. At landfall, hurricanes can result in severe disasters. The accuracy of predicting their trajectory paths is critical to reduce economic loss and save human lives. Given the complexity and nonlinearity of weather data, a recurrent neural network (RNN) could be beneficial in modeling hurricane behavior. We propose the application of a fully connected RNN to predict the trajectory of hurricanes. We employed the RNN over a fine grid to reduce typical truncation errors. We utilized their latitude, longitude, wind speed, and pressure publicly provided by the National Hurricane Center (NHC) to predict the trajectory of a hurricane at 6-hour intervals. Results show that this proposed technique is competitive to methods currently employed by the NHC and can predict up to approximately 120 hours of hurricane path. |
Tasks | |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.02548v3 |
http://arxiv.org/pdf/1802.02548v3.pdf | |
PWC | https://paperswithcode.com/paper/predicting-hurricane-trajectories-using-a |
Repo | |
Framework | |
Fooling End-to-end Speaker Verification by Adversarial Examples
Title | Fooling End-to-end Speaker Verification by Adversarial Examples |
Authors | Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet |
Abstract | Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attack. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable from the original examples by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on an end-to-end deep network that was either trained on YOHO or NTIMIT. We also present two black-box attacks: where the adversarial examples were generated with a system that was trained on YOHO, but the attack is on a system that was trained on NTIMIT; and when the adversarial examples were generated with a system that was trained on Mel-spectrum feature set, but the attack is on a system that was trained on MFCC. Results suggest that the accuracy of the attacked system was decreased and the false-positive rate was dramatically increased. |
Tasks | Speaker Verification |
Published | 2018-01-10 |
URL | http://arxiv.org/abs/1801.03339v2 |
http://arxiv.org/pdf/1801.03339v2.pdf | |
PWC | https://paperswithcode.com/paper/fooling-end-to-end-speaker-verification-by |
Repo | |
Framework | |
Extracting News Events from Microblogs
Title | Extracting News Events from Microblogs |
Authors | Øystein Repp, Heri Ramampiaro |
Abstract | Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance. |
Tasks | |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07573v1 |
http://arxiv.org/pdf/1806.07573v1.pdf | |
PWC | https://paperswithcode.com/paper/extracting-news-events-from-microblogs |
Repo | |
Framework | |
Pay Voice: Point of Sale Recognition for Visually Impaired People
Title | Pay Voice: Point of Sale Recognition for Visually Impaired People |
Authors | Guilherme Folego, Filipe Costa, Bruno Costa, Alan Godoy, Luiz Pita |
Abstract | Millions of visually impaired people depend on relatives and friends to perform their everyday tasks. One relevant step towards self-sufficiency is to provide them with means to verify the value and operation presented in payment machines. In this work, we developed and released a smartphone application, named Pay Voice, that uses image processing, optical character recognition (OCR) and voice synthesis to recognize the value and operation presented in POS and PIN pad machines, and thus informing the user with auditive and visual feedback. The proposed approach presented significant results for value and operation recognition, especially for POS, due to the higher display quality. Importantly, we achieved the key performance indicators, namely, more than 80% of accuracy in a real-world scenario, and less than $5$ seconds of processing time for recognition. Pay Voice is publicly available on Google Play and App Store for free. |
Tasks | Optical Character Recognition |
Published | 2018-12-14 |
URL | http://arxiv.org/abs/1812.05740v1 |
http://arxiv.org/pdf/1812.05740v1.pdf | |
PWC | https://paperswithcode.com/paper/pay-voice-point-of-sale-recognition-for |
Repo | |
Framework | |
Model-free, Model-based, and General Intelligence
Title | Model-free, Model-based, and General Intelligence |
Authors | Hector Geffner |
Abstract | During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. Model-free learners and model-based solvers have close parallels with Systems 1 and 2 in current theories of the human mind: the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this paper, I review developments in AI and draw on these theories to discuss the gap between model-free learners and model-based solvers, a gap that needs to be bridged in order to have intelligent systems that are robust and general. |
Tasks | |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02308v1 |
http://arxiv.org/pdf/1806.02308v1.pdf | |
PWC | https://paperswithcode.com/paper/model-free-model-based-and-general |
Repo | |
Framework | |
A Sequential Embedding Approach for Item Recommendation with Heterogeneous Attributes
Title | A Sequential Embedding Approach for Item Recommendation with Heterogeneous Attributes |
Authors | Kuan Liu, Xing Shi, Prem Natarajan |
Abstract | Attributes, such as metadata and profile, carry useful information which in principle can help improve accuracy in recommender systems. However, existing approaches have difficulty in fully leveraging attribute information due to practical challenges such as heterogeneity and sparseness. These approaches also fail to combine recurrent neural networks which have recently shown effectiveness in item recommendations in applications such as video and music browsing. To overcome the challenges and to harvest the advantages of sequence models, we present a novel approach, Heterogeneous Attribute Recurrent Neural Networks (HA-RNN), which incorporates heterogeneous attributes and captures sequential dependencies in \textit{both} items and attributes. HA-RNN extends recurrent neural networks with 1) a hierarchical attribute combination input layer and 2) an output attribute embedding layer. We conduct extensive experiments on two large-scale datasets. The new approach show significant improvements over the state-of-the-art models. Our ablation experiments demonstrate the effectiveness of the two components to address heterogeneous attribute challenges including variable lengths and attribute sparseness. We further investigate why sequence modeling works well by conducting exploratory studies and show sequence models are more effective when data scale increases. |
Tasks | Recommendation Systems |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.11008v1 |
http://arxiv.org/pdf/1805.11008v1.pdf | |
PWC | https://paperswithcode.com/paper/a-sequential-embedding-approach-for-item |
Repo | |
Framework | |