January 26, 2020

3131 words 15 mins read

Paper Group ANR 1381

Lower Bounds for Compressed Sensing with Generative Models. Recurrent Attention Walk for Semi-supervised Classification. Classification of spherical objects based on the form function of acoustic echoes. A Generative Model for Punctuation in Dependency Trees. Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in …

Lower Bounds for Compressed Sensing with Generative Models


Title	Lower Bounds for Compressed Sensing with Generative Models
Authors	Akshay Kamath, Sushrut Karmalkar, Eric Price
Abstract	The goal of compressed sensing is to learn a structured signal $x$ from a limited number of noisy linear measurements $y \approx Ax$. In traditional compressed sensing, “structure” is represented by sparsity in some known basis. Inspired by the success of deep learning in modeling images, recent work starting with~\cite{BJPD17} has instead considered structure to come from a generative model $G: \mathbb{R}^k \to \mathbb{R}^n$. We present two results establishing the difficulty of this latter task, showing that existing bounds are tight. First, we provide a lower bound matching the~\cite{BJPD17} upper bound for compressed sensing from $L$-Lipschitz generative models $G$. In particular, there exists such a function that requires roughly $\Omega(k \log L)$ linear measurements for sparse recovery to be possible. This holds even for the more relaxed goal of \emph{nonuniform} recovery. Second, we show that generative models generalize sparsity as a representation of structure. In particular, we construct a ReLU-based neural network $G: \mathbb{R}^{2k} \to \mathbb{R}^n$ with $O(1)$ layers and $O(kn)$ activations per layer, such that the range of $G$ contains all $k$-sparse vectors.
Tasks
Published	2019-12-06
URL	https://arxiv.org/abs/1912.02938v1
PDF	https://arxiv.org/pdf/1912.02938v1.pdf
PWC	https://paperswithcode.com/paper/lower-bounds-for-compressed-sensing-with
Repo
Framework

Recurrent Attention Walk for Semi-supervised Classification


Title	Recurrent Attention Walk for Semi-supervised Classification
Authors	Uchenna Akujuobi, Qiannan Zhang, Han Yufei, Xiangliang Zhang
Abstract	In this paper, we study the graph-based semi-supervised learning for classifying nodes in attributed networks, where the nodes and edges possess content information. Recent approaches like graph convolution networks and attention mechanisms have been proposed to ensemble the first-order neighbors and incorporate the relevant neighbors. However, it is costly (especially in memory) to consider all neighbors without a prior differentiation. We propose to explore the neighborhood in a reinforcement learning setting and find a walk path well-tuned for classifying the unlabelled target nodes. We let an agent (of node classification task) walk over the graph and decide where to direct to maximize classification accuracy. We define the graph walk as a partially observable Markov decision process (POMDP). The proposed method is flexible for working in both transductive and inductive setting. Extensive experiments on four datasets demonstrate that our proposed method outperforms several state-of-the-art methods. Several case studies also illustrate the meaningful movement trajectory made by the agent.
Tasks	Node Classification
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10266v1
PDF	https://arxiv.org/pdf/1910.10266v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-attention-walk-for-semi-supervised
Repo
Framework

Classification of spherical objects based on the form function of acoustic echoes


Title	Classification of spherical objects based on the form function of acoustic echoes
Authors	Mariia Dmitrieva, Keith E. Brown, Gary J. Heald, David M. Lane
Abstract	One way to recognise an object is to study how the echo has been shaped during the interaction with the target. Wideband sonar allows the study of the energy distribution for a large range of frequencies. The frequency distribution contains information about an object, including its inner structure. This information is a key for automatic recognition. The scattering by a target can be quantitatively described by its Form Function. The Form Function can be calculated based on the data of the initial pulse, reflected pulse and parameters of a medium where the pulse is propagating. In this work spherical objects are classified based on their filler material - water or air. We limit the study to spherical 2 layered targets immersed in water. The Form Function is used as a descriptor and fed into a Neural Network classifier, Multilayer Perceptron (MLP). The performance of the classifier is compared with Support Vector Machine (SVM) and the Form Function descriptor is examined in contrast to the Time and Frequency Representation of the echo.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08501v1
PDF	https://arxiv.org/pdf/1910.08501v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-spherical-objects-based-on
Repo
Framework

A Generative Model for Punctuation in Dependency Trees


Title	A Generative Model for Punctuation in Dependency Trees
Authors	Xiang Lisa Li, Dingquan Wang, Jason Eisner
Abstract	Treebanks traditionally treat punctuation marks as ordinary words, but linguists have suggested that a tree’s “true” punctuation marks are not observed (Nunberg, 1990). These latent “underlying” marks serve to delimit or separate constituents in the syntax tree. When the tree’s yield is rendered as a written sentence, a string rewriting mechanism transduces the underlying marks into “surface” marks, which are part of the observed (surface) string but should not be regarded as part of the tree. We formalize this idea in a generative model of punctuation that admits efficient dynamic programming. We train it without observing the underlying marks, by locally maximizing the incomplete data likelihood (similarly to EM). When we use the trained model to reconstruct the tree’s underlying punctuation, the results appear plausible across 5 languages, and in particular, are consistent with Nunberg’s analysis of English. We show that our generative model can be used to beat baselines on punctuation restoration. Also, our reconstruction of a sentence’s underlying punctuation lets us appropriately render the surface punctuation (via our trained underlying-to-surface mechanism) when we syntactically transform the sentence.
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11298v1
PDF	https://arxiv.org/pdf/1906.11298v1.pdf
PWC	https://paperswithcode.com/paper/a-generative-model-for-punctuation-in
Repo
Framework

Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in a Comparison with Mainstream Religious Group


Title	Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in a Comparison with Mainstream Religious Group
Authors	Mojtaba Heidarysafa, Kamran Kowsari, Tolu Odukoya, Philip Potter, Laura E. Barnes, Donald E. Brown
Abstract	Online propaganda is central to the recruitment strategies of extremist groups and in recent years these efforts have increasingly extended to women. To investigate ISIS’ approach to targeting women in their online propaganda and uncover implications for counterterrorism, we rely on text mining and natural language processing (NLP). Specifically, we extract articles published in Dabiq and Rumiyah (ISIS’s online English language publications) to identify prominent topics. To identify similarities or differences between these texts and those produced by non-violent religious groups, we extend the analysis to articles from a Catholic forum dedicated to women. We also perform an emotional analysis of both of these resources to better understand the emotional components of propaganda. We rely on Depechemood (a lexical-base emotion analysis method) to detect emotions most likely to be evoked in readers of these materials. The findings indicate that the emotional appeal of ISIS and Catholic materials are similar
Tasks	Emotion Recognition
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03804v1
PDF	https://arxiv.org/pdf/1912.03804v1.pdf
PWC	https://paperswithcode.com/paper/women-in-isis-propaganda-a-natural-language
Repo
Framework

Bimodal Speech Emotion Recognition Using Pre-Trained Language Models


Title	Bimodal Speech Emotion Recognition Using Pre-Trained Language Models
Authors	Verena Heusser, Niklas Freymuth, Stefan Constantin, Alex Waibel
Abstract	Speech emotion recognition is a challenging task and an important step towards more natural human-machine interaction. We show that pre-trained language models can be fine-tuned for text emotion recognition, achieving an accuracy of 69.5% on Task 4A of SemEval 2017, improving upon the previous state of the art by over 3% absolute. We combine these language models with speech emotion recognition, achieving results of 73.5% accuracy when using provided transcriptions and speech data on a subset of four classes of the IEMOCAP dataset. The use of noise-induced transcriptions and speech data results in an accuracy of 71.4%. For our experiments, we created IEmoNet, a modular and adaptable bimodal framework for speech emotion recognition based on pre-trained language models. Lastly, we discuss the idea of using an emotional classifier as a reward for reinforcement learning as a step towards more successful and convenient human-machine interaction.
Tasks	Emotion Recognition, Speech Emotion Recognition
Published	2019-11-29
URL	https://arxiv.org/abs/1912.02610v1
PDF	https://arxiv.org/pdf/1912.02610v1.pdf
PWC	https://paperswithcode.com/paper/bimodal-speech-emotion-recognition-using-pre
Repo
Framework

Random Sampling for Distributed Coded Matrix Multiplication


Title	Random Sampling for Distributed Coded Matrix Multiplication
Authors	Wei-Ting Chang, Ravi Tandon
Abstract	Matrix multiplication is a fundamental building block for large scale computations arising in various applications, including machine learning. There has been significant recent interest in using coding to speed up distributed matrix multiplication, that are robust to stragglers (i.e., machines that may perform slower computations). In many scenarios, instead of exact computation, approximate matrix multiplication, i.e., allowing for a tolerable error is also sufficient. Such approximate schemes make use of randomization techniques to speed up the computation process. In this paper, we initiate the study of approximate coded matrix multiplication, and investigate the joint synergies offered by randomization and coding. Specifically, we propose two coded randomized sampling schemes that use (a) codes to achieve a desired recovery threshold and (b) random sampling to obtain approximation of the matrix multiplication. Tradeoffs between the recovery threshold and approximation error obtained through random sampling are investigated for a class of coded matrix multiplication schemes.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06942v1
PDF	https://arxiv.org/pdf/1905.06942v1.pdf
PWC	https://paperswithcode.com/paper/random-sampling-for-distributed-coded-matrix
Repo
Framework


Title	Emotion Recognition for Vietnamese Social Media Text
Authors	Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
Abstract	Emotion recognition or emotion prediction is a higher approach or a special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of analysis in which the results are depicted in more expressions like sadness, enjoyment, anger, disgust, fear, and surprise. Emotion recognition plays a critical role in measuring the brand value of a product by recognizing specific emotions of customers’ comments. In this study, we have achieved two targets. First and foremost, we built a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with exactly 6,927 emotion-annotated sentences, contributing to emotion recognition research in Vietnamese which is a low-resource language in natural language processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC corpus. As a result, the CNN model achieved the highest performance with the weighted F1-score of 59.74%. Our corpus is available at our research website.
Tasks	Emotion Recognition, Sentiment Analysis
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09339v2
PDF	https://arxiv.org/pdf/1911.09339v2.pdf
PWC	https://paperswithcode.com/paper/emotion-recognition-for-vietnamese-social
Repo
Framework

Medical Multimodal Classifiers Under Scarce Data Condition


Title	Medical Multimodal Classifiers Under Scarce Data Condition
Authors	Faik Aydin, Maggie Zhang, Michelle Ananda-Rajah, Gholamreza Haffari
Abstract	Data is one of the essential ingredients to power deep learning research. Small datasets, especially specific to medical institutes, bring challenges to deep learning training stage. This work aims to develop a practical deep multimodal that can classify patients into abnormal and normal categories accurately as well as assist radiologists to detect visual and textual anomalies by locating areas of interest. The detection of the anomalies is achieved through a novel technique which extends the integrated gradients methodology with an unsupervised clustering algorithm. This technique also introduces a tuning parameter which trades off true positive signals to denoise false positive signals in the detection process. To overcome the challenges of the small training dataset which only has 3K frontal X-ray images and medical reports in pairs, we have adopted transfer learning for the multimodal which concatenates the layers of image and text submodels. The image submodel was trained on the vast ChestX-ray14 dataset, while the text submodel transferred a pertained word embedding layer from a hospital-specific corpus. Experimental results show that our multimodal improves the accuracy of the classification by 4% and 7% on average of 50 epochs, compared to the individual text and image model, respectively.
Tasks	Transfer Learning
Published	2019-02-24
URL	http://arxiv.org/abs/1902.08888v1
PDF	http://arxiv.org/pdf/1902.08888v1.pdf
PWC	https://paperswithcode.com/paper/medical-multimodal-classifiers-under-scarce
Repo
Framework

Joint Speech Recognition and Speaker Diarization via Sequence Transduction


Title	Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Authors	Laurent El Shafey, Hagen Soltau, Izhak Shafran
Abstract	Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system. The two systems are trained independently with different objective functions. Often the SD systems operate directly on the acoustics and are not constrained to respect word boundaries and this deficiency is overcome in an ad hoc manner. Motivated by recent advances in sequence to sequence learning, we propose a novel approach to tackle the two tasks by a joint ASR and SD system using a recurrent neural network transducer. Our approach utilizes both linguistic and acoustic cues to infer speaker roles, as opposed to typical SD systems, which only use acoustic cues. We evaluated the performance of our approach on a large corpus of medical conversations between physicians and patients. Compared to a competitive conventional baseline, our approach improves word-level diarization error rate from 15.8% to 2.2%.
Tasks	Speaker Diarization, Speech Recognition
Published	2019-07-09
URL	https://arxiv.org/abs/1907.05337v1
PDF	https://arxiv.org/pdf/1907.05337v1.pdf
PWC	https://paperswithcode.com/paper/joint-speech-recognition-and-speaker
Repo
Framework

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions


Title	Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions
Authors	Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals
Abstract	We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words. For speaker diarization, we propose an ultrasound-based time-domain signal which we call estimated tongue activity. For word-alignment, we augment an acoustic model with low-dimensional representations of ultrasound images of the tongue, learned by a convolutional neural network. We conduct our experiments using the Ultrasuite repository of ultrasound and speech recordings for child speech therapy sessions. For both tasks, we observe that systems augmented with ultrasound data outperform corresponding systems using only the audio signal.
Tasks	Speaker Diarization, Word Alignment
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00818v2
PDF	https://arxiv.org/pdf/1907.00818v2.pdf
PWC	https://paperswithcode.com/paper/ultrasound-tongue-imaging-for-diarization-and
Repo
Framework

Encoder-Powered Generative Adversarial Networks


Title	Encoder-Powered Generative Adversarial Networks
Authors	Jiseob Kim, Seungjae Jung, Hyundo Lee, Byoung-Tak Zhang
Abstract	We present an encoder-powered generative adversarial network (EncGAN) that is able to learn both the multi-manifold structure and the abstract features of data. Unlike the conventional decoder-based GANs, EncGAN uses an encoder to model the manifold structure and invert the encoder to generate data. This unique scheme enables the proposed model to exclude discrete features from the smooth structure modeling and learn multi-manifold data without being hindered by the disconnections. Also, as EncGAN requires a single latent space to carry the information for all the manifolds, it builds abstract features shared among the manifolds in the latent space. For an efficient computation, we formulate EncGAN using a simple regularizer, and mathematically prove its validity. We also experimentally demonstrate that EncGAN successfully learns the multi-manifold structure and the abstract features of MNIST, 3D-chair and UT-Zap50k datasets. Our analysis shows that the learned abstract features are disentangled and make a good style-transfer even when the source data is off the trained distribution.
Tasks	Style Transfer
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00541v1
PDF	https://arxiv.org/pdf/1906.00541v1.pdf
PWC	https://paperswithcode.com/paper/190600541
Repo
Framework

Perceptual Embedding Consistency for Seamless Reconstruction of Tilewise Style Transfer


Title	Perceptual Embedding Consistency for Seamless Reconstruction of Tilewise Style Transfer
Authors	Amal Lahiani, Nassir Navab, Shadi Albarqouni, Eldad Klaiman
Abstract	Style transfer is a field with growing interest and use cases in deep learning. Recent work has shown Generative Adversarial Networks(GANs) can be used to create realistic images of virtually stained slide images in digital pathology with clinically validated interpretability. Digital pathology images are typically of extremely high resolution, making tilewise analysis necessary for deep learning applications. It has been shown that image generators with instance normalization can cause a tiling artifact when a large image is reconstructed from the tilewise analysis. We introduce a novel perceptual embedding consistency loss significantly reducing the tiling artifact created in the reconstructed whole slide image (WSI). We validate our results by comparing virtually stained slide images with consecutive real stained tissue slide images. We also demonstrate that our model is more robust to contrast, color and brightness perturbations by running comparative sensitivity analysis tests.
Tasks	Style Transfer
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00617v1
PDF	https://arxiv.org/pdf/1906.00617v1.pdf
PWC	https://paperswithcode.com/paper/190600617
Repo
Framework

Large-Scale Speaker Diarization of Radio Broadcast Archives


Title	Large-Scale Speaker Diarization of Radio Broadcast Archives
Authors	Emre Yılmaz, Adem Derinel, Zhou Kun, Henk van den Heuvel, Niko Brummer, Haizhou Li, David A. van Leeuwen
Abstract	This paper describes our initial efforts to build a large-scale speaker diarization (SD) and identification system on a recently digitized radio broadcast archive from the Netherlands which has more than 6500 audio tapes with 3000 hours of Frisian-Dutch speech recorded between 1950-2016. The employed large-scale diarization scheme involves two stages: (1) tape-level speaker diarization providing pseudo-speaker identities and (2) speaker linking to relate pseudo-speakers appearing in multiple tapes. Having access to the speaker models of several frequently appearing speakers from the previously collected FAME! speech corpus, we further perform speaker identification by linking these known speakers to the pseudo-speakers identified at the first stage. In this work, we present a recently created longitudinal and multilingual SD corpus designed for large-scale SD research and evaluate the performance of a new speaker linking system using x-vectors with PLDA to quantify cross-tape speaker similarity on this corpus. The performance of this speaker linking system is evaluated on a small subset of the archive which is manually annotated with speaker information. The speaker linking performance reported on this subset (53 hours) and the whole archive (3000 hours) is compared to quantify the impact of scaling up in the amount of speech data.
Tasks	Speaker Diarization, Speaker Identification
Published	2019-06-19
URL	https://arxiv.org/abs/1906.07955v2
PDF	https://arxiv.org/pdf/1906.07955v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-speaker-diarization-of-radio
Repo
Framework

Optimal WDM Power Allocation via Deep Learning for Radio on Free Space Optics Systems


Title	Optimal WDM Power Allocation via Deep Learning for Radio on Free Space Optics Systems
Authors	Zhan Gao, Mark Eisen, Alejandro Ribeiro
Abstract	Radio on Free Space Optics (RoFSO), as a universal platform for heterogeneous wireless services, is able to transmit multiple radio frequency signals at high rates in free space optical networks. This paper investigates the optimal design of power allocation for Wavelength Division Multiplexing (WDM) transmission in RoFSO systems. The proposed problem is a weighted total capacity maximization problem with two constraints of total power limitation and eye safety concern. The model-based Stochastic Dual Gradient algorithm is presented first, which solves the problem exactly by exploiting the null duality gap. The model-free Primal-Dual Deep Learning algorithm is then developed to learn and optimize the power allocation policy with Deep Neural Network (DNN) parametrization, which can be utilized without any knowledge of system models. Numerical simulations are performed to exhibit significant performance of our algorithms compared to the average equal power allocation.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09981v1
PDF	https://arxiv.org/pdf/1906.09981v1.pdf
PWC	https://paperswithcode.com/paper/optimal-wdm-power-allocation-via-deep
Repo
Framework