October 15, 2019

2730 words 13 mins read

Paper Group NANR 96

Paper Group NANR 96

On Four Metaheuristic Applications to Speech Enhancement—Implementing Optimization Algorithms with MATLAB R2018a. Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model. Two-Step Quantization for Low-Bit Neural Networks. Development of Natural Language Processing Tools f …

On Four Metaheuristic Applications to Speech Enhancement—Implementing Optimization Algorithms with MATLAB R2018a

Title On Four Metaheuristic Applications to Speech Enhancement—Implementing Optimization Algorithms with MATLAB R2018a
Authors Su-Mei Shiue, Lang-Jyi Huang, Wei-Ho Tsai, Yen-Lin Chen
Abstract
Tasks Speech Enhancement
Published 2018-10-01
URL https://www.aclweb.org/anthology/O18-1026/
PDF https://www.aclweb.org/anthology/O18-1026
PWC https://paperswithcode.com/paper/on-four-metaheuristic-applications-to-speech
Repo
Framework

Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model

Title Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model
Authors Satoru Tsuge, Shunichi Ishihara
Abstract Among the more typical forensic voice comparison (FVC) approaches, the acoustic-phonetic statistical approach is suitable for text-dependent FVC, but it does not fully exploit available time-varying information of speech in its modelling. The automatic approach, on the other hand, essentially deals with text-independent cases, which means temporal information is not explicitly incorporated in the modelling. Text-dependent likelihood ratio (LR)-based FVC studies, in particular those that adopt the automatic approach, are few. This preliminary LR-based FVC study compares two statistical models, the Hidden Markov Model (HMM) and the Gaussian Mixture Model (GMM), for the calculation of forensic LRs using the same speech data. FVC experiments were carried out using different lengths of Japanese short words under a forensically realistic, but challenging condition: only two speech tokens for model training and LR estimation. Log-likelihood-ratio cost (Cllr) was used as the assessment metric. The study demonstrates that the HMM system constantly outperforms the GMM system in terms of average Cllr values. However, words longer than three mora are needed if the advantage of the HMM is to become evident. With a seven-mora word, for example, the HMM outperformed the GMM by a Cllr value of 0.073.
Tasks
Published 2018-12-01
URL https://www.aclweb.org/anthology/U18-1002/
PDF https://www.aclweb.org/anthology/U18-1002
PWC https://paperswithcode.com/paper/text-dependent-forensic-voice-comparison
Repo
Framework

Two-Step Quantization for Low-Bit Neural Networks

Title Two-Step Quantization for Low-Bit Neural Networks
Authors Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu, Jian Cheng
Abstract Every bit matters in the hardware design of quantized neural networks. However, extremely-low-bit representation usually causes large accuracy drop. Thus, how to train extremely-low-bit neural networks with high accuracy is of central importance. Most existing network quantization approaches learn transformations (low-bit weights) as well as encodings (low-bit activations) simultaneously. This tight coupling makes the optimization problem difficult, and thus prevents the network from learning optimal representations. In this paper, we propose a simple yet effective Two-Step Quantization (TSQ) framework, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes. For the first step, we propose the sparse quantization method for code learning. The second step can be formulated as a non-linear least square regression problem with low-bit constraints, which can be solved efficiently in an iterative manner. Extensive experiments on CIFAR-10 and ILSVRC-12 datasets demonstrate that the proposed TSQ is effective and outperforms the state-of-the-art by a large margin. Especially, for 2-bit activation and ternary weight quantization of AlexNet, the accuracy of our TSQ drops only about 0.5 points compared with the full-precision counterpart, outperforming current state-of-the-art by more than 5 points.
Tasks Quantization
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Wang_Two-Step_Quantization_for_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Two-Step_Quantization_for_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/two-step-quantization-for-low-bit-neural
Repo
Framework

Development of Natural Language Processing Tools for Cook Islands M=aori

Title Development of Natural Language Processing Tools for Cook Islands M=aori
Authors Rol Solano, o Coto, Sally Akevai Nicholas, Samantha Wray
Abstract This paper presents three ongoing projects for NLP in Cook Islands Maori: Untrained Forced Alignment (approx. 9{%} error when detecting the center of words), speech-to-text (37{%} WER in the best trained models) and POS tagging (92{%} accuracy for the best performing model). Included as part of these projects are new resources filling in a gap in Australasian languages, including gold standard POS-tagged written corpora, transcribed speech corpora, time-aligned corpora down to the level of phonemes. These are part of efforts to accelerate the documentation of Cook Islands Maori and to increase its vitality amongst its users.
Tasks Machine Translation, Part-Of-Speech Tagging, Speech Recognition
Published 2018-12-01
URL https://www.aclweb.org/anthology/U18-1003/
PDF https://www.aclweb.org/anthology/U18-1003
PWC https://paperswithcode.com/paper/development-of-natural-language-processing
Repo
Framework

YNU-HPCC at SemEval-2018 Task 1: BiLSTM with Attention based Sentiment Analysis for Affect in Tweets

Title YNU-HPCC at SemEval-2018 Task 1: BiLSTM with Attention based Sentiment Analysis for Affect in Tweets
Authors You Zhang, Jin Wang, Xuejie Zhang
Abstract We implemented the sentiment system in all five subtasks for English and Spanish. All subtasks involve emotion or sentiment intensity prediction (regression and ordinal classification) and emotions determining (multi-labels classification). The useful BiLSTM (Bidirectional Long-Short Term Memory) model with attention mechanism was mainly applied for our system. We use BiLSTM in order to get word information extracted from both directions. The attention mechanism was used to find the contribution of each word for improving the scores. Furthermore, based on BiLSTMATT (BiLSTM with attention mechanism) a few deep-learning algorithms were employed for different subtasks. For regression and ordinal classification tasks we used domain adaptation and ensemble learning methods to leverage base model. While a single base model was used for multi-labels task.
Tasks Domain Adaptation, Multi-Label Classification, Sentiment Analysis
Published 2018-06-01
URL https://www.aclweb.org/anthology/S18-1040/
PDF https://www.aclweb.org/anthology/S18-1040
PWC https://paperswithcode.com/paper/ynu-hpcc-at-semeval-2018-task-1-bilstm-with
Repo
Framework
Title Towards Processing of the Oral History Interviews and Related Printed Documents
Authors Zbyn{\v{e}}k Zaj{'\i}c, Lucie Skorkovsk{'a}, Petr Neduchal, Pavel Ircing, Josef V. Psutka, Marek Hr{'u}z, Ale{\v{s}} Pra{\v{z}}{'a}k, Daniel Soutner, Jan {\v{S}}vec, Luk{'a}{\v{s}} Bure{\v{s}}, Lud{\v{e}}k M{"u}ller
Abstract
Tasks Optical Character Recognition, Speech Recognition
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1331/
PDF https://www.aclweb.org/anthology/L18-1331
PWC https://paperswithcode.com/paper/towards-processing-of-the-oral-history
Repo
Framework

Stacking with Auxiliary Features for Visual Question Answering

Title Stacking with Auxiliary Features for Visual Question Answering
Authors Nazneen Fatema Rajani, Raymond Mooney
Abstract Visual Question Answering (VQA) is a well-known and challenging task that requires systems to jointly reason about natural language and vision. Deep learning models in various forms have been the standard for solving VQA. However, some of these VQA models are better at certain types of image-question pairs than other models. Ensembling VQA models intelligently to leverage their diverse expertise is, therefore, advantageous. Stacking With Auxiliary Features (SWAF) is an intelligent ensembling technique which learns to combine the results of multiple models using features of the current problem as context. We propose four categories of auxiliary features for ensembling for VQA. Three out of the four categories of features can be inferred from an image-question pair and do not require querying the component models. The fourth category of auxiliary features uses model-specific explanations. In this paper, we describe how we use these various categories of auxiliary features to improve performance for VQA. Using SWAF to effectively ensemble three recent systems, we obtain a new state-of-the-art. Our work also highlights the advantages of explainable AI models.
Tasks Common Sense Reasoning, Question Answering, Visual Question Answering, Word Embeddings
Published 2018-06-01
URL https://www.aclweb.org/anthology/N18-1201/
PDF https://www.aclweb.org/anthology/N18-1201
PWC https://paperswithcode.com/paper/stacking-with-auxiliary-features-for-visual
Repo
Framework

On the Utility of Lay Summaries and AI Safety Disclosures: Toward Robust, Open Research Oversight

Title On the Utility of Lay Summaries and AI Safety Disclosures: Toward Robust, Open Research Oversight
Authors Allen Schmaltz
Abstract In this position paper, we propose that the community consider encouraging researchers to include two riders, a {}Lay Summary{''} and an {}AI Safety Disclosure{''}, as part of future NLP papers published in ACL forums that present user-facing systems. The goal is to encourage researchers{–}via a relatively non-intrusive mechanism{–}to consider the societal implications of technologies carrying (un)known and/or (un)knowable long-term risks, to highlight failure cases, and to provide a mechanism by which the general public (and scientists in other disciplines) can more readily engage in the discussion in an informed manner. This simple proposal requires minimal additional up-front costs for researchers; the lay summary, at least, has significant precedence in the medical literature and other areas of science; and the proposal is aimed to supplement, rather than replace, existing approaches for encouraging researchers to consider the ethical implications of their work, such as those of the Collaborative Institutional Training Initiative (CITI) Program and institutional review boards (IRBs).
Tasks Machine Translation
Published 2018-06-01
URL https://www.aclweb.org/anthology/W18-0801/
PDF https://www.aclweb.org/anthology/W18-0801
PWC https://paperswithcode.com/paper/on-the-utility-of-lay-summaries-and-ai-safety
Repo
Framework

Viterbi-based Pruning for Sparse Matrix with Fixed and High Index Compression Ratio

Title Viterbi-based Pruning for Sparse Matrix with Fixed and High Index Compression Ratio
Authors Dongsoo Lee, Daehyun Ahn, Taesu Kim, Pierce I. Chuang, Jae-Joon Kim
Abstract Weight pruning has proven to be an effective method in reducing the model size and computation cost while not sacrificing the model accuracy. Conventional sparse matrix formats, however, involve irregular index structures with large storage requirement and sequential reconstruction process, resulting in inefficient use of highly parallel computing resources. Hence, pruning is usually restricted to inference with a batch size of one, for which an efficient parallel matrix-vector multiplication method exists. In this paper, a new class of sparse matrix representation utilizing Viterbi algorithm that has a high, and more importantly, fixed index compression ratio regardless of the pruning rate, is proposed. In this approach, numerous sparse matrix candidates are first generated by the Viterbi encoder, and then the one that aims to minimize the model accuracy degradation is selected by the Viterbi algorithm. The model pruning process based on the proposed Viterbi encoder and Viterbi algorithm is highly parallelizable, and can be implemented efficiently in hardware to achieve low-energy, high-performance index decoding process. Compared with the existing magnitude-based pruning methods, index data storage requirement can be further compressed by 85.2% in MNIST and 83.9% in AlexNet while achieving similar pruning rate. Even compared with the relative index compression technique, our method can still reduce the index storage requirement by 52.7% in MNIST and 35.5% in AlexNet.
Tasks
Published 2018-01-01
URL https://openreview.net/forum?id=S1D8MPxA-
PDF https://openreview.net/pdf?id=S1D8MPxA-
PWC https://paperswithcode.com/paper/viterbi-based-pruning-for-sparse-matrix-with
Repo
Framework

Human Activity Prediction in Smart Home Environments with LSTM Neural Networks

Title Human Activity Prediction in Smart Home Environments with LSTM Neural Networks
Authors Niek Tax
Abstract In this paper, we investigate the performance of several sequence prediction techniques on the prediction of future events of human behavior in a smart home, as well as the timestamps of those next events. Prediction techniques in smart home environments have several use cases, such as the real-time identification of abnormal behavior, identifying coachable moments for e-coaching, and a plethora of applications in the area of home automation. We give an overview of several sequence prediction techniques, including techniques that originate from the areas of data mining, process mining, and data compression, and we evaluate the predictive accuracy of those techniques on a collection of publicly available real-life datasets from the smart home environments domain. This contrast our work with existing work on prediction in smart homes, which often evaluate their techniques on a single smart home instead of a larger collection of logs. We found that LSTM neural networks outperform the other prediction methods on the task of predicting the next activity as well as on the task of predicting the timestamp of the next event. However, surprisingly, we found that it is very dependent on the dataset which technique works best for the task of predicting a window of multiple next activities.
Tasks Activity Prediction, Activity Recognition, Home Activity Monitoring, Human Activity Recognition, Recognizing And Localizing Human Actions
Published 2018-06-28
URL https://ieeexplore.ieee.org/abstract/document/8595030
PDF https://ieeexplore.ieee.org/abstract/document/8595030
PWC https://paperswithcode.com/paper/human-activity-prediction-in-smart-home
Repo
Framework

Specifying Conceptual Models Using Restricted Natural Language

Title Specifying Conceptual Models Using Restricted Natural Language
Authors Bayzid Ashik Hossain, Rolf Schwitter
Abstract The key activity to design an information system is conceptual modelling which brings out and describes the general knowledge that is required to build a system. In this paper we propose a novel approach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted form of natural language. A restricted natural language is a subset of a natural language that has well-defined computational properties and therefore can be translated unambiguously into a formal notation. We will argue that a restricted natural language is suitable for writing precise and consistent specifications that lead to executable conceptual models. Using a restricted natural language will allow the domain experts to describe a scenario in the terminology of the application domain without the need to formally encode this scenario. The resulting textual specification can then be automatically translated into the language of the desired conceptual modelling framework.
Tasks
Published 2018-12-01
URL https://www.aclweb.org/anthology/U18-1005/
PDF https://www.aclweb.org/anthology/U18-1005
PWC https://paperswithcode.com/paper/specifying-conceptual-models-using-restricted
Repo
Framework

Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data

Title Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data
Authors Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, D, S apat, ipan, Kalika Bali
Abstract Training language models for Code-mixed (CM) language is known to be a difficult problem because of lack of data compounded by the increased confusability due to the presence of more than one language. We present a computational technique for creation of grammatically valid artificial CM data based on the Equivalence Constraint Theory. We show that when training examples are sampled appropriately from this synthetic data and presented in certain order (aka training curriculum) along with monolingual and real CM data, it can significantly reduce the perplexity of an RNN-based language model. We also show that randomly generated CM data does not help in decreasing the perplexity of the LMs.
Tasks Language Identification, Language Modelling, Sentiment Analysis, Speech Recognition
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-1143/
PDF https://www.aclweb.org/anthology/P18-1143
PWC https://paperswithcode.com/paper/language-modeling-for-code-mixing-the-role-of
Repo
Framework

A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection

Title A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection
Authors Yiyi Wang, Chilin Shih
Abstract This paper presents a method of combining Conditional Random Fields (CRFs) model with a post-processing layer using Google n-grams statistical information tailored to detect word selection and word order errors made by learners of Chinese as Foreign Language (CFL). We describe the architecture of the model and its performance in the shared task of the ACL 2018 Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA). This hybrid approach yields comparably high false positive rate (FPR = 0.1274) and precision (Pd= 0.7519; Pi= 0.6311), but low recall (Rd = 0.3035; Ri = 0.1696 ) in grammatical error detection and identification tasks. Additional statistical information and linguistic rules can be added to enhance the model performance in the future.
Tasks Grammatical Error Detection
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-3728/
PDF https://www.aclweb.org/anthology/W18-3728
PWC https://paperswithcode.com/paper/a-hybrid-approach-combining-statistical
Repo
Framework

oi-VAE: Output Interpretable VAEs for Nonlinear Group Factor Analysis

Title oi-VAE: Output Interpretable VAEs for Nonlinear Group Factor Analysis
Authors Samuel K. Ainsworth, Nicholas J. Foti, Adrian K. C. Lee, Emily B. Fox
Abstract Deep generative models have recently yielded encouraging results in producing subjectively realistic samples of complex data. Far less attention has been paid to making these generative models interpretable. In many scenarios, ranging from scientific applications to finance, the observed variables have a natural grouping. It is often of interest to understand systems of interaction amongst these groups, and latent factor models (LFMs) are an attractive approach. However, traditional LFMs are limited by assuming a linear correlation structure. We present an output interpretable VAE (oi-VAE) for grouped data that models complex, nonlinear latent-to-observed relationships. We combine a structured VAE comprised of group-specific generators with a sparsity-inducing prior. We demonstrate that oi-VAE yields meaningful notions of interpretability in the analysis of motion capture and MEG data. We further show that in these situations, the regularization inherent to oi-VAE can actually lead to improved generalization and learned generative processes.
Tasks Motion Capture
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=2410
PDF http://proceedings.mlr.press/v80/ainsworth18a/ainsworth18a.pdf
PWC https://paperswithcode.com/paper/oi-vae-output-interpretable-vaes-for
Repo
Framework

A Neural Morphological Analyzer for Arapaho Verbs Learned from a Finite State Transducer

Title A Neural Morphological Analyzer for Arapaho Verbs Learned from a Finite State Transducer
Authors Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden
Abstract We experiment with training an encoder-decoder neural model for mimicking the behavior of an existing hand-written finite-state morphological grammar for Arapaho verbs, a polysynthetic language with a highly complex verbal inflection system. After adjusting for ambiguous parses, we find that the system is able to generalize to unseen forms with accuracies of 98.68{%} (unambiguous verbs) and 92.90{%} (all verbs).
Tasks Machine Translation, Morphological Analysis, Speech Recognition
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4802/
PDF https://www.aclweb.org/anthology/W18-4802
PWC https://paperswithcode.com/paper/a-neural-morphological-analyzer-for-arapaho
Repo
Framework
comments powered by Disqus