Paper Group ANR 742
Clustrophile 2: Guided Visual Clustering Analysis. How to Improve Your Speaker Embeddings Extractor in Generic Toolkits. Can machine learning identify interesting mathematics? An exploration using empirically observed laws. Actor and Action Video Segmentation from a Sentence. Diachronic Usage Relatedness (DURel): A Framework for the Annotation of L …
Clustrophile 2: Guided Visual Clustering Analysis
Title | Clustrophile 2: Guided Visual Clustering Analysis |
Authors | Marco Cavallo, Çağatay Demiralp |
Abstract | Data clustering is a common unsupervised learning method frequently used in exploratory data analysis. However, identifying relevant structures in unlabeled, high-dimensional data is nontrivial, requiring iterative experimentation with clustering parameters as well as data features and instances. The number of possible clusterings for a typical dataset is vast, and navigating in this vast space is also challenging. The absence of ground-truth labels makes it impossible to define an optimal solution, thus requiring user judgment to establish what can be considered a satisfiable clustering result. Data scientists need adequate interactive tools to effectively explore and navigate the large clustering space so as to improve the effectiveness of exploratory clustering analysis. We introduce \textit{Clustrophile~2}, a new interactive tool for guided clustering analysis. \textit{Clustrophile~2} guides users in clustering-based exploratory analysis, adapts user feedback to improve user guidance, facilitates the interpretation of clusters, and helps quickly reason about differences between clusterings. To this end, \textit{Clustrophile~2} contributes a novel feature, the Clustering Tour, to help users choose clustering parameters and assess the quality of different clustering results in relation to current analysis goals and user expectations. We evaluate \textit{Clustrophile~2} through a user study with 12 data scientists, who used our tool to explore and interpret sub-cohorts in a dataset of Parkinson’s disease patients. Results suggest that \textit{Clustrophile~2} improves the speed and effectiveness of exploratory clustering analysis for both experts and non-experts. |
Tasks | |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.03048v3 |
http://arxiv.org/pdf/1804.03048v3.pdf | |
PWC | https://paperswithcode.com/paper/clustrophile-2-guided-visual-clustering |
Repo | |
Framework | |
How to Improve Your Speaker Embeddings Extractor in Generic Toolkits
Title | How to Improve Your Speaker Embeddings Extractor in Generic Toolkits |
Authors | Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky |
Abstract | Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification. In this paper we aim to facilitate its implementation on a more generic toolkit than Kaldi, which we anticipate to enable further improvements on the method. We examine several tricks in training, such as the effects of normalizing input features and pooled statistics, different methods for preventing overfitting as well as alternative non-linearities that can be used instead of Rectifier Linear Units. In addition, we investigate the difference in performance between TDNN and CNN, and between two types of attention mechanism. Experimental results on Speaker in the Wild, SRE 2016 and SRE 2018 datasets demonstrate the effectiveness of the proposed implementation. |
Tasks | Speaker Verification |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.02066v1 |
http://arxiv.org/pdf/1811.02066v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-improve-your-speaker-embeddings |
Repo | |
Framework | |
Can machine learning identify interesting mathematics? An exploration using empirically observed laws
Title | Can machine learning identify interesting mathematics? An exploration using empirically observed laws |
Authors | Chai Wah Wu |
Abstract | We explore the possibility of using machine learning to identify interesting mathematical structures by using certain quantities that serve as fingerprints. In particular, we extract features from integer sequences using two empirical laws: Benford’s law and Taylor’s law and experiment with various classifiers to identify whether a sequence is, for example, nice, important, multiplicative, easy to compute or related to primes or palindromes. |
Tasks | |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07431v3 |
http://arxiv.org/pdf/1805.07431v3.pdf | |
PWC | https://paperswithcode.com/paper/can-machine-learning-identify-interesting |
Repo | |
Framework | |
Actor and Action Video Segmentation from a Sentence
Title | Actor and Action Video Segmentation from a Sentence |
Authors | Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek |
Abstract | This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment pairs that are outside of the actor and action vocabulary. We propose a fully-convolutional model for pixel-level actor and action segmentation using an encoder-decoder architecture optimized for video. To show the potential of actor and action video segmentation from a sentence, we extend two popular actor and action datasets with more than 7,500 natural language descriptions. Experiments demonstrate the quality of the sentence-guided segmentations, the generalization ability of our model, and its advantage for traditional actor and action segmentation compared to the state-of-the-art. |
Tasks | action segmentation, Video Semantic Segmentation |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07485v1 |
http://arxiv.org/pdf/1803.07485v1.pdf | |
PWC | https://paperswithcode.com/paper/actor-and-action-video-segmentation-from-a |
Repo | |
Framework | |
Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change
Title | Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change |
Authors | Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann |
Abstract | We propose a framework that extends synchronic polysemy annotation to diachronic changes in lexical meaning, to counteract the lack of resources for evaluating computational models of lexical semantic change. Our framework exploits an intuitive notion of semantic relatedness, and distinguishes between innovative and reductive meaning changes with high inter-annotator agreement. The resulting test set for German comprises ratings from five annotators for the relatedness of 1,320 use pairs across 22 target words. |
Tasks | |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06517v1 |
http://arxiv.org/pdf/1804.06517v1.pdf | |
PWC | https://paperswithcode.com/paper/diachronic-usage-relatedness-durel-a |
Repo | |
Framework | |
Deep neural network based i-vector mapping for speaker verification using short utterances
Title | Deep neural network based i-vector mapping for speaker verification using short utterances |
Authors | Jinxi Guo, Ning Xu, Kailun Qian, Yang Shi, Kaiyuan Xu, Yingnian Wu, Abeer Alwan |
Abstract | Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector based systems have become the standard in speaker verification applications, but they are less effective with short utterances. In this paper, we first compare two state-of-the-art universal background model training methods for i-vector modeling using full-length and short utterance evaluation tasks. The two methods are Gaussian mixture model (GMM) based and deep neural network (DNN) based methods. The results indicate that the I-vector_DNN system outperforms the I-vector_GMM system under various durations. However, the performances of both systems degrade significantly as the duration of the utterances decreases. To address this issue, we propose two novel nonlinear mapping methods which train DNN models to map the i-vectors extracted from short utterances to their corresponding long-utterance i-vectors. The mapped i-vector can restore missing information and reduce the variance of the original short-utterance i-vectors. The proposed methods both model the joint representation of short and long utterance i-vectors by using autoencoder. Experimental results using the NIST SRE 2010 dataset show that both methods provide significant improvement and result in a max of 28.43% relative improvement in Equal Error Rates from a baseline system, when using deep encoder with residual blocks and adding an additional phoneme vector. When further testing the best-validated models of SRE10 on the Speaker In The Wild dataset, the methods result in a 23.12% improvement on arbitrary-duration (1-5 s) short-utterance conditions. |
Tasks | Speaker Recognition, Speaker Verification, Text-Independent Speaker Recognition |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.07309v1 |
http://arxiv.org/pdf/1810.07309v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-based-i-vector-mapping |
Repo | |
Framework | |
Speech Recognition: Keyword Spotting Through Image Recognition
Title | Speech Recognition: Keyword Spotting Through Image Recognition |
Authors | Sanjay Krishna Gouda, Salil Kanetkar, David Harrison, Manfred K Warmuth |
Abstract | The problem of identifying voice commands has always been a challenge due to the presence of noise and variability in speed, pitch, etc. We will compare the efficacies of several neural network architectures for the speech recognition problem. In particular, we will build a model to determine whether a one second audio clip contains a particular word (out of a set of 10), an unknown word, or silence. The models to be implemented are a CNN recommended by the Tensorflow Speech Recognition tutorial, a low-latency CNN, and an adversarially trained CNN. The result is a demonstration of how to convert a problem in audio recognition to the better-studied domain of image classification, where the powerful techniques of convolutional neural networks are fully developed. Additionally, we demonstrate the applicability of the technique of Virtual Adversarial Training (VAT) to this problem domain, functioning as a powerful regularizer with promising potential future applications. |
Tasks | Image Classification, Keyword Spotting, Speech Recognition |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03759v1 |
http://arxiv.org/pdf/1803.03759v1.pdf | |
PWC | https://paperswithcode.com/paper/speech-recognition-keyword-spotting-through |
Repo | |
Framework | |
Spoken Pass-Phrase Verification in the i-vector Space
Title | Spoken Pass-Phrase Verification in the i-vector Space |
Authors | Hossein Zeinali, Lukas Burget, Hossein Sameti, Jan Cernocky |
Abstract | The task of spoken pass-phrase verification is to decide whether a test utterance contains the same phrase as given enrollment utterances. Beside other applications, pass-phrase verification can complement an independent speaker verification subsystem in text-dependent speaker verification. It can also be used for liveness detection by verifying that the user is able to correctly respond to a randomly prompted phrase. In this paper, we build on our previous work on i-vector based text-dependent speaker verification, where we have shown that i-vectors extracted using phrase specific Hidden Markov Models (HMMs) or using Deep Neural Network (DNN) based bottle-neck (BN) features help to reject utterances with wrong pass-phrases. We apply the same i-vector extraction techniques to the stand-alone task of speaker-independent spoken pass-phrase classification and verification. The experiments on RSR2015 and RedDots databases show that very simple scoring techniques (e.g. cosine distance scoring) applied to such i-vectors can provide results superior to those previously published on the same data. |
Tasks | Speaker Verification, Text-Dependent Speaker Verification |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.11068v1 |
http://arxiv.org/pdf/1809.11068v1.pdf | |
PWC | https://paperswithcode.com/paper/spoken-pass-phrase-verification-in-the-i |
Repo | |
Framework | |
Graph Based Analysis for Gene Segment Organization In a Scrambled Genome
Title | Graph Based Analysis for Gene Segment Organization In a Scrambled Genome |
Authors | Mustafa Hajij, Nataša Jonoska, Denys Kukushkin, Masahico Saito |
Abstract | DNA rearrangement processes recombine gene segments that are organized on the chromosome in a variety of ways. The segments can overlap, interleave or one may be a subsegment of another. We use directed graphs to represent segment organizations on a given locus where contigs containing rearranged segments represent vertices and the edges correspond to the segment relationships. Using graph properties we associate a point in a higher dimensional Euclidean space to each graph such that cluster formations and analysis can be performed with methods from topological data analysis. The method is applied to a recently sequenced model organism \textit{Oxytricha trifallax}, a species of ciliate with highly scrambled genome that undergoes massive rearrangement process after conjugation. The analysis shows some emerging star-like graph structures indicating that segments of a single gene can interleave, or even contain all of the segments from fifteen or more other genes in between its segments. We also observe that as many as six genes can have their segments mutually interleaving or overlapping. |
Tasks | Topological Data Analysis |
Published | 2018-01-18 |
URL | http://arxiv.org/abs/1801.05922v2 |
http://arxiv.org/pdf/1801.05922v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-based-analysis-for-gene-segment |
Repo | |
Framework | |
Faster Learning by Reduction of Data Access Time
Title | Faster Learning by Reduction of Data Access Time |
Authors | Vinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya |
Abstract | Nowadays, the major challenge in machine learning is the Big Data challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used Empirical Risk Minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove the same convergence for systematic sampling, cyclic sampling and the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques and show up to six times faster training. |
Tasks | |
Published | 2018-01-18 |
URL | http://arxiv.org/abs/1801.05931v4 |
http://arxiv.org/pdf/1801.05931v4.pdf | |
PWC | https://paperswithcode.com/paper/faster-learning-by-reduction-of-data-access |
Repo | |
Framework | |
Framewise approach in multimodal emotion recognition in OMG challenge
Title | Framewise approach in multimodal emotion recognition in OMG challenge |
Authors | Grigoriy Sterling, Andrey Belyaev, Maxim Ryabov |
Abstract | In this report we described our approach achieves $53%$ of unweighted accuracy over $7$ emotions and $0.05$ and $0.09$ mean squared errors for arousal and valence in OMG emotion recognition challenge. Our results were obtained with ensemble of single modality models trained on voice and face data from video separately. We consider each stream as a sequence of frames. Next we estimated features from frames and handle it with recurrent neural network. As audio frame we mean short $0.4$ second spectrogram interval. For features estimation for face pictures we used own ResNet neural network pretrained on AffectNet database. Each short spectrogram was considered as a picture and processed by convolutional network too. As a base audio model we used ResNet pretrained in speaker recognition task. Predictions from both modalities were fused on decision level and improve single-channel approaches by a few percent |
Tasks | Emotion Recognition, Multimodal Emotion Recognition, Speaker Recognition |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01369v1 |
http://arxiv.org/pdf/1805.01369v1.pdf | |
PWC | https://paperswithcode.com/paper/framewise-approach-in-multimodal-emotion |
Repo | |
Framework | |
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
Title | Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples |
Authors | Minhao Cheng, Jinfeng Yi, Pin-Yu Chen, Huan Zhang, Cho-Jui Hsieh |
Abstract | Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: by changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. On the other hand, we recognize that, compared with the well-evaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks. |
Tasks | Image Classification, Machine Translation, Text Summarization |
Published | 2018-03-03 |
URL | https://arxiv.org/abs/1803.01128v2 |
https://arxiv.org/pdf/1803.01128v2.pdf | |
PWC | https://paperswithcode.com/paper/seq2sick-evaluating-the-robustness-of |
Repo | |
Framework | |
Hierarchical Reinforcement Learning with Abductive Planning
Title | Hierarchical Reinforcement Learning with Abductive Planning |
Authors | Kazeto Yamamoto, Takashi Onishi, Yoshimasa Tsuruoka |
Abstract | One of the key challenges in applying reinforcement learning to real-life problems is that the amount of train-and-error required to learn a good policy increases drastically as the task becomes complex. One potential solution to this problem is to combine reinforcement learning with automated symbol planning and utilize prior knowledge on the domain. However, existing methods have limitations in their applicability and expressiveness. In this paper we propose a hierarchical reinforcement learning method based on abductive symbolic planning. The planner can deal with user-defined evaluation functions and is not based on the Herbrand theorem. Therefore it can utilize prior knowledge of the rewards and can work in a domain where the state space is unknown. We demonstrate empirically that our architecture significantly improves learning efficiency with respect to the amount of training examples on the evaluation domain, in which the state space is unknown and there exist multiple goals. |
Tasks | Hierarchical Reinforcement Learning |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.10792v1 |
http://arxiv.org/pdf/1806.10792v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-reinforcement-learning-with |
Repo | |
Framework | |
Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering
Title | Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering |
Authors | Aleksandr I. Panov, Aleksey Skrynnik |
Abstract | We introduce a new approach to hierarchy formation and task decomposition in hierarchical reinforcement learning. Our method is based on the Hierarchy Of Abstract Machines (HAM) framework because HAM approach is able to design efficient controllers that will realize specific behaviors in real robots. The key to our algorithm is the introduction of the internal or “mental” environment in which the state represents the structure of the HAM hierarchy. The internal action in this environment leads to changes the hierarchy of HAMs. We propose the classical Q-learning procedure in the internal environment which allows the agent to obtain an optimal hierarchy. We extends the HAM framework by adding on-model approach to select the appropriate sub-machine to execute action sequences for certain class of external environment states. Preliminary experiments demonstrated the prospects of the method. |
Tasks | Hierarchical Reinforcement Learning, Q-Learning |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.05292v1 |
http://arxiv.org/pdf/1806.05292v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-formation-of-the-structure-of |
Repo | |
Framework | |
Coupling weak and strong supervision for classification of prostate cancer histopathology images
Title | Coupling weak and strong supervision for classification of prostate cancer histopathology images |
Authors | Eirini Arvaniti, Manfred Claassen |
Abstract | Automated grading of prostate cancer histopathology images is a challenging task, with one key challenge being the scarcity of annotations down to the level of regions of interest (strong labels), as typically the prostate cancer Gleason score is known only for entire tissue slides (weak labels). In this study, we focus on automated Gleason score assignment of prostate cancer whole-slide images on the basis of a large weakly-labeled dataset and a smaller strongly-labeled one. We efficiently leverage information from both label sources by jointly training a classifier on the two datasets and by introducing a gradient update scheme that assigns different relative importances to each training example, as a means of self-controlling the weak supervision signal. Our approach achieves superior performance when compared with standard Gleason scoring methods. |
Tasks | |
Published | 2018-11-16 |
URL | http://arxiv.org/abs/1811.07013v1 |
http://arxiv.org/pdf/1811.07013v1.pdf | |
PWC | https://paperswithcode.com/paper/coupling-weak-and-strong-supervision-for |
Repo | |
Framework | |