October 17, 2019

2996 words 15 mins read

Paper Group ANR 742

Clustrophile 2: Guided Visual Clustering Analysis. How to Improve Your Speaker Embeddings Extractor in Generic Toolkits. Can machine learning identify interesting mathematics? An exploration using empirically observed laws. Actor and Action Video Segmentation from a Sentence. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of L …

Clustrophile 2: Guided Visual Clustering Analysis


Title	Clustrophile 2: Guided Visual Clustering Analysis
Authors	Marco Cavallo, Çağatay Demiralp
Abstract	Data clustering is a common unsupervised learning method frequently used in exploratory data analysis. However, identifying relevant structures in unlabeled, high-dimensional data is nontrivial, requiring iterative experimentation with clustering parameters as well as data features and instances. The number of possible clusterings for a typical dataset is vast, and navigating in this vast space is also challenging. The absence of ground-truth labels makes it impossible to define an optimal solution, thus requiring user judgment to establish what can be considered a satisfiable clustering result. Data scientists need adequate interactive tools to effectively explore and navigate the large clustering space so as to improve the effectiveness of exploratory clustering analysis. We introduce \textit{Clustrophile~2}, a new interactive tool for guided clustering analysis. \textit{Clustrophile~2} guides users in clustering-based exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings. To this end, \textit{Clustrophile~2} contributes a novel feature, the Clustering Tour, to help users choose clustering parameters and assess the quality of different clustering results in relation to current analysis goals and user expectations. We evaluate \textit{Clustrophile~2} through a user study with 12 data scientists, who used our tool to explore and interpret sub-cohorts in a dataset of Parkinson’s disease patients. Results suggest that \textit{Clustrophile~2} improves the speed and effectiveness of exploratory clustering analysis for both experts and non-experts.
Tasks
Published	2018-04-09
URL	http://arxiv.org/abs/1804.03048v3
PDF	http://arxiv.org/pdf/1804.03048v3.pdf
PWC	https://paperswithcode.com/paper/clustrophile-2-guided-visual-clustering
Repo
Framework

How to Improve Your Speaker Embeddings Extractor in Generic Toolkits


Title	How to Improve Your Speaker Embeddings Extractor in Generic Toolkits
Authors	Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky
Abstract	Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification. In this paper we aim to facilitate its implementation on a more generic toolkit than Kaldi, which we anticipate to enable further improvements on the method. We examine several tricks in training, such as the effects of normalizing input features and pooled statistics, different methods for preventing overfitting as well as alternative non-linearities that can be used instead of Rectifier Linear Units. In addition, we investigate the difference in performance between TDNN and CNN, and between two types of attention mechanism. Experimental results on Speaker in the Wild, SRE 2016 and SRE 2018 datasets demonstrate the effectiveness of the proposed implementation.
Tasks	Speaker Verification
Published	2018-11-05
URL	http://arxiv.org/abs/1811.02066v1
PDF	http://arxiv.org/pdf/1811.02066v1.pdf
PWC	https://paperswithcode.com/paper/how-to-improve-your-speaker-embeddings
Repo
Framework

Can machine learning identify interesting mathematics? An exploration using empirically observed laws


Title	Can machine learning identify interesting mathematics? An exploration using empirically observed laws
Authors	Chai Wah Wu
Abstract	We explore the possibility of using machine learning to identify interesting mathematical structures by using certain quantities that serve as fingerprints. In particular, we extract features from integer sequences using two empirical laws: Benford’s law and Taylor’s law and experiment with various classifiers to identify whether a sequence is, for example, nice, important, multiplicative, easy to compute or related to primes or palindromes.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07431v3
PDF	http://arxiv.org/pdf/1805.07431v3.pdf
PWC	https://paperswithcode.com/paper/can-machine-learning-identify-interesting
Repo
Framework

Actor and Action Video Segmentation from a Sentence


Title	Actor and Action Video Segmentation from a Sentence
Authors	Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek
Abstract	This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment pairs that are outside of the actor and action vocabulary. We propose a fully-convolutional model for pixel-level actor and action segmentation using an encoder-decoder architecture optimized for video. To show the potential of actor and action video segmentation from a sentence, we extend two popular actor and action datasets with more than 7,500 natural language descriptions. Experiments demonstrate the quality of the sentence-guided segmentations, the generalization ability of our model, and its advantage for traditional actor and action segmentation compared to the state-of-the-art.
Tasks	action segmentation, Video Semantic Segmentation
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07485v1
PDF	http://arxiv.org/pdf/1803.07485v1.pdf
PWC	https://paperswithcode.com/paper/actor-and-action-video-segmentation-from-a
Repo
Framework

Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change


Title	Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change
Authors	Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann
Abstract	We propose a framework that extends synchronic polysemy annotation to diachronic changes in lexical meaning, to counteract the lack of resources for evaluating computational models of lexical semantic change. Our framework exploits an intuitive notion of semantic relatedness, and distinguishes between innovative and reductive meaning changes with high inter-annotator agreement. The resulting test set for German comprises ratings from five annotators for the relatedness of 1,320 use pairs across 22 target words.
Tasks
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06517v1
PDF	http://arxiv.org/pdf/1804.06517v1.pdf
PWC	https://paperswithcode.com/paper/diachronic-usage-relatedness-durel-a
Repo
Framework

Deep neural network based i-vector mapping for speaker verification using short utterances


Title	Deep neural network based i-vector mapping for speaker verification using short utterances
Authors	Jinxi Guo, Ning Xu, Kailun Qian, Yang Shi, Kaiyuan Xu, Yingnian Wu, Abeer Alwan
Abstract	Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector based systems have become the standard in speaker verification applications, but they are less effective with short utterances. In this paper, we first compare two state-of-the-art universal background model training methods for i-vector modeling using full-length and short utterance evaluation tasks. The two methods are Gaussian mixture model (GMM) based and deep neural network (DNN) based methods. The results indicate that the I-vector_DNN system outperforms the I-vector_GMM system under various durations. However, the performances of both systems degrade significantly as the duration of the utterances decreases. To address this issue, we propose two novel nonlinear mapping methods which train DNN models to map the i-vectors extracted from short utterances to their corresponding long-utterance i-vectors. The mapped i-vector can restore missing information and reduce the variance of the original short-utterance i-vectors. The proposed methods both model the joint representation of short and long utterance i-vectors by using autoencoder. Experimental results using the NIST SRE 2010 dataset show that both methods provide significant improvement and result in a max of 28.43% relative improvement in Equal Error Rates from a baseline system, when using deep encoder with residual blocks and adding an additional phoneme vector. When further testing the best-validated models of SRE10 on the Speaker In The Wild dataset, the methods result in a 23.12% improvement on arbitrary-duration (1-5 s) short-utterance conditions.
Tasks	Speaker Recognition, Speaker Verification, Text-Independent Speaker Recognition
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07309v1
PDF	http://arxiv.org/pdf/1810.07309v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-based-i-vector-mapping
Repo
Framework

Speech Recognition: Keyword Spotting Through Image Recognition


Title	Speech Recognition: Keyword Spotting Through Image Recognition
Authors	Sanjay Krishna Gouda, Salil Kanetkar, David Harrison, Manfred K Warmuth
Abstract	The problem of identifying voice commands has always been a challenge due to the presence of noise and variability in speed, pitch, etc. We will compare the efficacies of several neural network architectures for the speech recognition problem. In particular, we will build a model to determine whether a one second audio clip contains a particular word (out of a set of 10), an unknown word, or silence. The models to be implemented are a CNN recommended by the Tensorflow Speech Recognition tutorial, a low-latency CNN, and an adversarially trained CNN. The result is a demonstration of how to convert a problem in audio recognition to the better-studied domain of image classification, where the powerful techniques of convolutional neural networks are fully developed. Additionally, we demonstrate the applicability of the technique of Virtual Adversarial Training (VAT) to this problem domain, functioning as a powerful regularizer with promising potential future applications.
Tasks	Image Classification, Keyword Spotting, Speech Recognition
Published	2018-03-10
URL	http://arxiv.org/abs/1803.03759v1
PDF	http://arxiv.org/pdf/1803.03759v1.pdf
PWC	https://paperswithcode.com/paper/speech-recognition-keyword-spotting-through
Repo
Framework

Spoken Pass-Phrase Verification in the i-vector Space


Title	Spoken Pass-Phrase Verification in the i-vector Space
Authors	Hossein Zeinali, Lukas Burget, Hossein Sameti, Jan Cernocky
Abstract	The task of spoken pass-phrase verification is to decide whether a test utterance contains the same phrase as given enrollment utterances. Beside other applications, pass-phrase verification can complement an independent speaker verification subsystem in text-dependent speaker verification. It can also be used for liveness detection by verifying that the user is able to correctly respond to a randomly prompted phrase. In this paper, we build on our previous work on i-vector based text-dependent speaker verification, where we have shown that i-vectors extracted using phrase specific Hidden Markov Models (HMMs) or using Deep Neural Network (DNN) based bottle-neck (BN) features help to reject utterances with wrong pass-phrases. We apply the same i-vector extraction techniques to the stand-alone task of speaker-independent spoken pass-phrase classification and verification. The experiments on RSR2015 and RedDots databases show that very simple scoring techniques (e.g. cosine distance scoring) applied to such i-vectors can provide results superior to those previously published on the same data.
Tasks	Speaker Verification, Text-Dependent Speaker Verification
Published	2018-09-28
URL	http://arxiv.org/abs/1809.11068v1
PDF	http://arxiv.org/pdf/1809.11068v1.pdf
PWC	https://paperswithcode.com/paper/spoken-pass-phrase-verification-in-the-i
Repo
Framework

Graph Based Analysis for Gene Segment Organization In a Scrambled Genome


Title	Graph Based Analysis for Gene Segment Organization In a Scrambled Genome
Authors	Mustafa Hajij, Nataša Jonoska, Denys Kukushkin, Masahico Saito
Abstract	DNA rearrangement processes recombine gene segments that are organized on the chromosome in a variety of ways. The segments can overlap, interleave or one may be a subsegment of another. We use directed graphs to represent segment organizations on a given locus where contigs containing rearranged segments represent vertices and the edges correspond to the segment relationships. Using graph properties we associate a point in a higher dimensional Euclidean space to each graph such that cluster formations and analysis can be performed with methods from topological data analysis. The method is applied to a recently sequenced model organism \textit{Oxytricha trifallax}, a species of ciliate with highly scrambled genome that undergoes massive rearrangement process after conjugation. The analysis shows some emerging star-like graph structures indicating that segments of a single gene can interleave, or even contain all of the segments from fifteen or more other genes in between its segments. We also observe that as many as six genes can have their segments mutually interleaving or overlapping.
Tasks	Topological Data Analysis
Published	2018-01-18
URL	http://arxiv.org/abs/1801.05922v2
PDF	http://arxiv.org/pdf/1801.05922v2.pdf
PWC	https://paperswithcode.com/paper/graph-based-analysis-for-gene-segment
Repo
Framework

Faster Learning by Reduction of Data Access Time


Title	Faster Learning by Reduction of Data Access Time
Authors	Vinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya
Abstract	Nowadays, the major challenge in machine learning is the Big Data challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used Empirical Risk Minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove the same convergence for systematic sampling, cyclic sampling and the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques and show up to six times faster training.
Tasks
Published	2018-01-18
URL	http://arxiv.org/abs/1801.05931v4
PDF	http://arxiv.org/pdf/1801.05931v4.pdf
PWC	https://paperswithcode.com/paper/faster-learning-by-reduction-of-data-access
Repo
Framework

Framewise approach in multimodal emotion recognition in OMG challenge


Title	Framewise approach in multimodal emotion recognition in OMG challenge
Authors	Grigoriy Sterling, Andrey Belyaev, Maxim Ryabov
Abstract	In this report we described our approach achieves $53%$ of unweighted accuracy over $7$ emotions and $0.05$ and $0.09$ mean squared errors for arousal and valence in OMG emotion recognition challenge. Our results were obtained with ensemble of single modality models trained on voice and face data from video separately. We consider each stream as a sequence of frames. Next we estimated features from frames and handle it with recurrent neural network. As audio frame we mean short $0.4$ second spectrogram interval. For features estimation for face pictures we used own ResNet neural network pretrained on AffectNet database. Each short spectrogram was considered as a picture and processed by convolutional network too. As a base audio model we used ResNet pretrained in speaker recognition task. Predictions from both modalities were fused on decision level and improve single-channel approaches by a few percent
Tasks	Emotion Recognition, Multimodal Emotion Recognition, Speaker Recognition
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01369v1
PDF	http://arxiv.org/pdf/1805.01369v1.pdf
PWC	https://paperswithcode.com/paper/framewise-approach-in-multimodal-emotion
Repo
Framework

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples


Title	Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
Authors	Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, Cho-Jui Hsieh
Abstract	Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: by changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. On the other hand, we recognize that, compared with the well-evaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.
Tasks	Image Classification, Machine Translation, Text Summarization
Published	2018-03-03
URL	https://arxiv.org/abs/1803.01128v2
PDF	https://arxiv.org/pdf/1803.01128v2.pdf
PWC	https://paperswithcode.com/paper/seq2sick-evaluating-the-robustness-of
Repo
Framework

Hierarchical Reinforcement Learning with Abductive Planning


Title	Hierarchical Reinforcement Learning with Abductive Planning
Authors	Kazeto Yamamoto, Takashi Onishi, Yoshimasa Tsuruoka
Abstract	One of the key challenges in applying reinforcement learning to real-life problems is that the amount of train-and-error required to learn a good policy increases drastically as the task becomes complex. One potential solution to this problem is to combine reinforcement learning with automated symbol planning and utilize prior knowledge on the domain. However, existing methods have limitations in their applicability and expressiveness. In this paper we propose a hierarchical reinforcement learning method based on abductive symbolic planning. The planner can deal with user-defined evaluation functions and is not based on the Herbrand theorem. Therefore it can utilize prior knowledge of the rewards and can work in a domain where the state space is unknown. We demonstrate empirically that our architecture significantly improves learning efficiency with respect to the amount of training examples on the evaluation domain, in which the state space is unknown and there exist multiple goals.
Tasks	Hierarchical Reinforcement Learning
Published	2018-06-28
URL	http://arxiv.org/abs/1806.10792v1
PDF	http://arxiv.org/pdf/1806.10792v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-reinforcement-learning-with
Repo
Framework

Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering


Title	Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering
Authors	Aleksandr I. Panov, Aleksey Skrynnik
Abstract	We introduce a new approach to hierarchy formation and task decomposition in hierarchical reinforcement learning. Our method is based on the Hierarchy Of Abstract Machines (HAM) framework because HAM approach is able to design efficient controllers that will realize specific behaviors in real robots. The key to our algorithm is the introduction of the internal or “mental” environment in which the state represents the structure of the HAM hierarchy. The internal action in this environment leads to changes the hierarchy of HAMs. We propose the classical Q-learning procedure in the internal environment which allows the agent to obtain an optimal hierarchy. We extends the HAM framework by adding on-model approach to select the appropriate sub-machine to execute action sequences for certain class of external environment states. Preliminary experiments demonstrated the prospects of the method.
Tasks	Hierarchical Reinforcement Learning, Q-Learning
Published	2018-06-13
URL	http://arxiv.org/abs/1806.05292v1
PDF	http://arxiv.org/pdf/1806.05292v1.pdf
PWC	https://paperswithcode.com/paper/automatic-formation-of-the-structure-of
Repo
Framework

Coupling weak and strong supervision for classification of prostate cancer histopathology images


Title	Coupling weak and strong supervision for classification of prostate cancer histopathology images
Authors	Eirini Arvaniti, Manfred Claassen
Abstract	Automated grading of prostate cancer histopathology images is a challenging task, with one key challenge being the scarcity of annotations down to the level of regions of interest (strong labels), as typically the prostate cancer Gleason score is known only for entire tissue slides (weak labels). In this study, we focus on automated Gleason score assignment of prostate cancer whole-slide images on the basis of a large weakly-labeled dataset and a smaller strongly-labeled one. We efficiently leverage information from both label sources by jointly training a classifier on the two datasets and by introducing a gradient update scheme that assigns different relative importances to each training example, as a means of self-controlling the weak supervision signal. Our approach achieves superior performance when compared with standard Gleason scoring methods.
Tasks
Published	2018-11-16
URL	http://arxiv.org/abs/1811.07013v1
PDF	http://arxiv.org/pdf/1811.07013v1.pdf
PWC	https://paperswithcode.com/paper/coupling-weak-and-strong-supervision-for
Repo
Framework