October 19, 2019

3448 words 17 mins read

Paper Group ANR 403

Solving Tree Problems with Category Theory. Expectation Propagation for Approximate Inference: Free Probability Framework. Effective Parallelisation for Machine Learning. A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment. Multimodal Deep Domain Adaptation. On an Imm …

Solving Tree Problems with Category Theory


Title	Solving Tree Problems with Category Theory
Authors	Rafik Hadfi
Abstract	Artificial Intelligence (AI) has long pursued models, theories, and techniques to imbue machines with human-like general intelligence. Yet even the currently predominant data-driven approaches in AI seem to be lacking humans’ unique ability to solve wide ranges of problems. This situation begs the question of the existence of principles that underlie general problem-solving capabilities. We approach this question through the mathematical formulation of analogies across different problems and solutions. We focus in particular on problems that could be represented as tree-like structures. Most importantly, we adopt a category-theoretic approach in formalising tree problems as categories, and in proving the existence of equivalences across apparently unrelated problem domains. We prove the existence of a functor between the category of tree problems and the category of solutions. We also provide a weaker version of the functor by quantifying equivalences of problem categories using a metric on tree problems.
Tasks
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07307v1
PDF	http://arxiv.org/pdf/1810.07307v1.pdf
PWC	https://paperswithcode.com/paper/solving-tree-problems-with-category-theory
Repo
Framework

Expectation Propagation for Approximate Inference: Free Probability Framework


Title	Expectation Propagation for Approximate Inference: Free Probability Framework
Authors	Burak Çakmak, Manfred Opper
Abstract	We study asymptotic properties of expectation propagation (EP) – a method for approximate inference originally developed in the field of machine learning. Applied to generalized linear models, EP iteratively computes a multivariate Gaussian approximation to the exact posterior distribution. The computational complexity of the repeated update of covariance matrices severely limits the application of EP to large problem sizes. In this study, we present a rigorous analysis by means of free probability theory that allows us to overcome this computational bottleneck if specific data matrices in the problem fulfill certain properties of asymptotic freeness. We demonstrate the relevance of our approach on the gene selection problem of a microarray dataset.
Tasks
Published	2018-01-16
URL	http://arxiv.org/abs/1801.05411v2
PDF	http://arxiv.org/pdf/1801.05411v2.pdf
PWC	https://paperswithcode.com/paper/expectation-propagation-for-approximate
Repo
Framework

Effective Parallelisation for Machine Learning


Title	Effective Parallelisation for Machine Learning
Authors	Michael Kamp, Mario Boley, Olana Missura, Thomas Gärtner
Abstract	We present a novel parallelisation scheme that simplifies the adaptation of learning algorithms to growing amounts of data as well as growing needs for accurate and confident predictions in critical applications. In contrast to other parallelisation techniques, it can be applied to a broad class of learning algorithms without further mathematical derivations and without writing dedicated code, while at the same time maintaining theoretical performance guarantees. Moreover, our parallelisation scheme is able to reduce the runtime of many learning algorithms to polylogarithmic time on quasi-polynomially many processing units. This is a significant step towards a general answer to an open question on the efficient parallelisation of machine learning algorithms in the sense of Nick’s Class (NC). The cost of this parallelisation is in the form of a larger sample complexity. Our empirical study confirms the potential of our parallelisation scheme with fixed numbers of processors and instances in realistic application scenarios.
Tasks
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03530v1
PDF	http://arxiv.org/pdf/1810.03530v1.pdf
PWC	https://paperswithcode.com/paper/effective-parallelisation-for-machine
Repo
Framework

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment


Title	A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment
Authors	Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhenhua Ling
Abstract	Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expensive, and non-reproducible process, it hinders rapid prototyping of new VC technology. We address artifact assessment using an alternative, objective approach leveraging from prior work on spoofing countermeasures (CMs) for automatic speaker verification. Therein, CMs are used for rejecting fake' inputs such as replayed, synthetic or converted speech but their potential for automatic speech artifact assessment remains unknown. This study serves to fill that gap. As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts. Equal error rate (EER) of the CM, a confusability index of VC samples with real human speech, serves as our artifact measure. Two clusters of VCC'18 entries are identified: low-quality ones with detectable artifacts (low EERs), and higher quality ones with less artifacts. None of the VCC'18 systems, however, is perfect: all EERs are < 30 % (the ideal’ value would be 50 %). Our preliminary findings suggest potential of CMs outside of their original application, as a supplemental optimization and benchmarking tool to enhance VC technology.
Tasks	Speaker Verification, Voice Conversion
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08438v2
PDF	http://arxiv.org/pdf/1804.08438v2.pdf
PWC	https://paperswithcode.com/paper/a-spoofing-benchmark-for-the-2018-voice
Repo
Framework

Multimodal Deep Domain Adaptation


Title	Multimodal Deep Domain Adaptation
Authors	Silvia Bucci, Mohammad Reza Loghmani, Barbara Caputo
Abstract	Typically a classifier trained on a given dataset (source domain) does not performs well if it is tested on data acquired in a different setting (target domain). This is the problem that domain adaptation (DA) tries to overcome and, while it is a well explored topic in computer vision, it is largely ignored in robotic vision where usually visual classification methods are trained and tested in the same domain. Robots should be able to deal with unknown environments, recognize objects and use them in the correct way, so it is important to explore the domain adaptation scenario also in this context. The goal of the project is to define a benchmark and a protocol for multi-modal domain adaptation that is valuable for the robot vision community. With this purpose some of the state-of-the-art DA methods are selected: Deep Adaptation Network (DAN), Domain Adversarial Training of Neural Network (DANN), Automatic Domain Alignment Layers (AutoDIAL) and Adversarial Discriminative Domain Adaptation (ADDA). Evaluations have been done using different data types: RGB only, depth only and RGB-D over the following datasets, designed for the robotic community: RGB-D Object Dataset (ROD), Web Object Dataset (WOD), Autonomous Robot Indoor Dataset (ARID), Big Berkeley Instance Recognition Dataset (BigBIRD) and Active Vision Dataset. Although progresses have been made on the formulation of effective adaptation algorithms and more realistic object datasets are available, the results obtained show that, training a sufficiently good object classifier, especially in the domain adaptation scenario, is still an unsolved problem. Also the best way to combine depth with RGB informations to improve the performance is a point that needs to be investigated more.
Tasks	Domain Adaptation
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11697v1
PDF	http://arxiv.org/pdf/1807.11697v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-deep-domain-adaptation
Repo
Framework

On an Immuno-inspired Distributed, Embodied Action-Evolution cum Selection Algorithm


Title	On an Immuno-inspired Distributed, Embodied Action-Evolution cum Selection Algorithm
Authors	Tushar Semwal, Divya D Kulkarni, Shivashankar B. Nair
Abstract	Traditional Evolutionary Robotics (ER) employs evolutionary techniques to search for a single monolithic controller which can aid a robot to learn a desired task. These techniques suffer from bootstrap and deception issues when the tasks are complex for a single controller to learn. Behaviour-decomposition techniques have been used to divide a task into multiple subtasks and evolve separate subcontrollers for each subtask. However, these subcontrollers and the associated subcontroller arbitrator(s) are all evolved off-line. A distributed, fully embodied and evolutionary version of such approaches will greatly aid online learning and help reduce the reality gap. In this paper, we propose an immunology-inspired embodied action-evolution cum selection algorithm that can cater to distributed ER. This algorithm evolves different subcontrollers for different portions of the search space in a distributed manner just as antibodies are evolved and primed for different antigens in the antigenic space. Experimentation on a collective of real robots embodied with the algorithm showed that a repertoire of antibody-like subcontrollers was created, evolved and shared on-the-fly to cope up with different environmental conditions. In addition, instead of the conventionally used approach of broadcasting for sharing, we present an Intelligent Packet Migration scheme that reduces energy consumption.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09789v1
PDF	http://arxiv.org/pdf/1806.09789v1.pdf
PWC	https://paperswithcode.com/paper/on-an-immuno-inspired-distributed-embodied
Repo
Framework

Joint PLDA for Simultaneous Modeling of Two Factors


Title	Joint PLDA for Simultaneous Modeling of Two Factors
Authors	Luciana Ferrer, Mitchell McLaren
Abstract	Probabilistic linear discriminant analysis (PLDA) is a method used for biometric problems like speaker or face recognition that models the variability of the samples using two latent variables, one that depends on the class of the sample and another one that is assumed independent across samples and models the within-class variability. In this work, we propose a generalization of PLDA that enables joint modeling of two sample-dependent factors: the class of interest and a nuisance condition. The approach does not change the basic form of PLDA but rather modifies the training procedure to consider the dependency across samples of the latent variable that models within-class variability. While the identity of the nuisance condition is needed during training, it is not needed during testing since we propose a scoring procedure that marginalizes over the corresponding latent variable. We show results on a multilingual speaker-verification task, where the language spoken is considered a nuisance condition. We show that the proposed joint PLDA approach leads to significant performance gains in this task for two different datasets, in particular when the training data contains mostly or only monolingual speakers.
Tasks	Face Recognition, Speaker Verification
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10554v1
PDF	http://arxiv.org/pdf/1803.10554v1.pdf
PWC	https://paperswithcode.com/paper/joint-plda-for-simultaneous-modeling-of-two
Repo
Framework

Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks


Title	Software Engineers vs. Machine Learning Algorithms: An Empirical Study Assessing Performance and Reuse Tasks
Authors	Nathalia Nascimento, Carlos Lucena, Paulo Alencar, Donald Cowan
Abstract	Several papers have recently contained reports on applying machine learning (ML) to the automation of software engineering (SE) tasks, such as project management, modeling and development. However, there appear to be no approaches comparing how software engineers fare against machine-learning algorithms as applied to specific software development tasks. Such a comparison is essential to gain insight into which tasks are better performed by humans and which by machine learning and how cooperative work or human-in-the-loop processes can be implemented more effectively. In this paper, we present an empirical study that compares how software engineers and machine-learning algorithms perform and reuse tasks. The empirical study involves the synthesis of the control structure of an autonomous streetlight application. Our approach consists of four steps. First, we solved the problem using machine learning to determine specific performance and reuse tasks. Second, we asked software engineers with different domain knowledge levels to provide a solution to the same tasks. Third, we compared how software engineers fare against machine-learning algorithms when accomplishing the performance and reuse tasks based on criteria such as energy consumption and safety. Finally, we analyzed the results to understand which tasks are better performed by either humans or algorithms so that they can work together more effectively. Such an understanding and the resulting human-in-the-loop approaches, which take into account the strengths and weaknesses of humans and machine-learning algorithms, are fundamental not only to provide a basis for cooperative work in support of software engineering, but also, in other areas.
Tasks
Published	2018-02-04
URL	http://arxiv.org/abs/1802.01096v2
PDF	http://arxiv.org/pdf/1802.01096v2.pdf
PWC	https://paperswithcode.com/paper/software-engineers-vs-machine-learning
Repo
Framework

Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics


Title	Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics
Authors	Arindam Jati, Panayiotis Georgiou
Abstract	Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain many non-speech events and multi-speaker audio streams. The NPC framework exploits the proposed short-term active-speaker stationarity hypothesis which assumes two temporally-close short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce “speaker embeddings” by learning to separate `same' vs` different’ speaker pairs which are generated from an unlabeled data of audio streams. Two sets of experiments are done in different scenarios to evaluate the strength of NPC embeddings and compare with state-of-the-art in-domain supervised methods. First, two speaker identification experiments with different context lengths are performed in a scenario with comparatively limited within-speaker channel variability. NPC embeddings are found to perform the best at short duration experiment, and they provide complementary information to i-vectors for full utterance experiments. Second, a large scale speaker verification task having a wide range of within-speaker channel variability is adopted as an upper-bound experiment where comparisons are drawn with in-domain supervised methods.
Tasks	Speaker Identification, Speaker Recognition, Speaker Verification, Speech Recognition
Published	2018-02-22
URL	http://arxiv.org/abs/1802.07860v2
PDF	http://arxiv.org/pdf/1802.07860v2.pdf
PWC	https://paperswithcode.com/paper/neural-predictive-coding-using-convolutional
Repo
Framework

Predicting Hurricane Trajectories using a Recurrent Neural Network


Title	Predicting Hurricane Trajectories using a Recurrent Neural Network
Authors	Sheila Alemany, Jonathan Beltran, Adrian Perez, Sam Ganzfried
Abstract	Hurricanes are cyclones circulating about a defined center whose closed wind speeds exceed 75 mph originating over tropical and subtropical waters. At landfall, hurricanes can result in severe disasters. The accuracy of predicting their trajectory paths is critical to reduce economic loss and save human lives. Given the complexity and nonlinearity of weather data, a recurrent neural network (RNN) could be beneficial in modeling hurricane behavior. We propose the application of a fully connected RNN to predict the trajectory of hurricanes. We employed the RNN over a fine grid to reduce typical truncation errors. We utilized their latitude, longitude, wind speed, and pressure publicly provided by the National Hurricane Center (NHC) to predict the trajectory of a hurricane at 6-hour intervals. Results show that this proposed technique is competitive to methods currently employed by the NHC and can predict up to approximately 120 hours of hurricane path.
Tasks
Published	2018-02-01
URL	http://arxiv.org/abs/1802.02548v3
PDF	http://arxiv.org/pdf/1802.02548v3.pdf
PWC	https://paperswithcode.com/paper/predicting-hurricane-trajectories-using-a
Repo
Framework

Fooling End-to-end Speaker Verification by Adversarial Examples


Title	Fooling End-to-end Speaker Verification by Adversarial Examples
Authors	Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet
Abstract	Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attack. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in such a way that they are almost indistinguishable from the original examples by a human listener. Yet, the generated waveforms, which sound as speaker A can be used to fool such a system by claiming as if the waveforms were uttered by speaker B. We present white-box attacks on an end-to-end deep network that was either trained on YOHO or NTIMIT. We also present two black-box attacks: where the adversarial examples were generated with a system that was trained on YOHO, but the attack is on a system that was trained on NTIMIT; and when the adversarial examples were generated with a system that was trained on Mel-spectrum feature set, but the attack is on a system that was trained on MFCC. Results suggest that the accuracy of the attacked system was decreased and the false-positive rate was dramatically increased.
Tasks	Speaker Verification
Published	2018-01-10
URL	http://arxiv.org/abs/1801.03339v2
PDF	http://arxiv.org/pdf/1801.03339v2.pdf
PWC	https://paperswithcode.com/paper/fooling-end-to-end-speaker-verification-by
Repo
Framework

Extracting News Events from Microblogs


Title	Extracting News Events from Microblogs
Authors	Øystein Repp, Heri Ramampiaro
Abstract	Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance.
Tasks
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07573v1
PDF	http://arxiv.org/pdf/1806.07573v1.pdf
PWC	https://paperswithcode.com/paper/extracting-news-events-from-microblogs
Repo
Framework

Pay Voice: Point of Sale Recognition for Visually Impaired People


Title	Pay Voice: Point of Sale Recognition for Visually Impaired People
Authors	Guilherme Folego, Filipe Costa, Bruno Costa, Alan Godoy, Luiz Pita
Abstract	Millions of visually impaired people depend on relatives and friends to perform their everyday tasks. One relevant step towards self-sufficiency is to provide them with means to verify the value and operation presented in payment machines. In this work, we developed and released a smartphone application, named Pay Voice, that uses image processing, optical character recognition (OCR) and voice synthesis to recognize the value and operation presented in POS and PIN pad machines, and thus informing the user with auditive and visual feedback. The proposed approach presented significant results for value and operation recognition, especially for POS, due to the higher display quality. Importantly, we achieved the key performance indicators, namely, more than 80% of accuracy in a real-world scenario, and less than $5$ seconds of processing time for recognition. Pay Voice is publicly available on Google Play and App Store for free.
Tasks	Optical Character Recognition
Published	2018-12-14
URL	http://arxiv.org/abs/1812.05740v1
PDF	http://arxiv.org/pdf/1812.05740v1.pdf
PWC	https://paperswithcode.com/paper/pay-voice-point-of-sale-recognition-for
Repo
Framework

Model-free, Model-based, and General Intelligence


Title	Model-free, Model-based, and General Intelligence
Authors	Hector Geffner
Abstract	During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. Model-free learners and model-based solvers have close parallels with Systems 1 and 2 in current theories of the human mind: the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this paper, I review developments in AI and draw on these theories to discuss the gap between model-free learners and model-based solvers, a gap that needs to be bridged in order to have intelligent systems that are robust and general.
Tasks
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02308v1
PDF	http://arxiv.org/pdf/1806.02308v1.pdf
PWC	https://paperswithcode.com/paper/model-free-model-based-and-general
Repo
Framework

A Sequential Embedding Approach for Item Recommendation with Heterogeneous Attributes


Title	A Sequential Embedding Approach for Item Recommendation with Heterogeneous Attributes
Authors	Kuan Liu, Xing Shi, Prem Natarajan
Abstract	Attributes, such as metadata and profile, carry useful information which in principle can help improve accuracy in recommender systems. However, existing approaches have difficulty in fully leveraging attribute information due to practical challenges such as heterogeneity and sparseness. These approaches also fail to combine recurrent neural networks which have recently shown effectiveness in item recommendations in applications such as video and music browsing. To overcome the challenges and to harvest the advantages of sequence models, we present a novel approach, Heterogeneous Attribute Recurrent Neural Networks (HA-RNN), which incorporates heterogeneous attributes and captures sequential dependencies in \textit{both} items and attributes. HA-RNN extends recurrent neural networks with 1) a hierarchical attribute combination input layer and 2) an output attribute embedding layer. We conduct extensive experiments on two large-scale datasets. The new approach show significant improvements over the state-of-the-art models. Our ablation experiments demonstrate the effectiveness of the two components to address heterogeneous attribute challenges including variable lengths and attribute sparseness. We further investigate why sequence modeling works well by conducting exploratory studies and show sequence models are more effective when data scale increases.
Tasks	Recommendation Systems
Published	2018-05-28
URL	http://arxiv.org/abs/1805.11008v1
PDF	http://arxiv.org/pdf/1805.11008v1.pdf
PWC	https://paperswithcode.com/paper/a-sequential-embedding-approach-for-item
Repo
Framework