Paper Group NANR 96
On Four Metaheuristic Applications to Speech Enhancement—Implementing Optimization Algorithms with MATLAB R2018a. Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model. Two-Step Quantization for Low-Bit Neural Networks. Development of Natural Language Processing Tools f …
On Four Metaheuristic Applications to Speech Enhancement—Implementing Optimization Algorithms with MATLAB R2018a
Title | On Four Metaheuristic Applications to Speech Enhancement—Implementing Optimization Algorithms with MATLAB R2018a |
Authors | Su-Mei Shiue, Lang-Jyi Huang, Wei-Ho Tsai, Yen-Lin Chen |
Abstract | |
Tasks | Speech Enhancement |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/O18-1026/ |
https://www.aclweb.org/anthology/O18-1026 | |
PWC | https://paperswithcode.com/paper/on-four-metaheuristic-applications-to-speech |
Repo | |
Framework | |
Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model
Title | Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model |
Authors | Satoru Tsuge, Shunichi Ishihara |
Abstract | Among the more typical forensic voice comparison (FVC) approaches, the acoustic-phonetic statistical approach is suitable for text-dependent FVC, but it does not fully exploit available time-varying information of speech in its modelling. The automatic approach, on the other hand, essentially deals with text-independent cases, which means temporal information is not explicitly incorporated in the modelling. Text-dependent likelihood ratio (LR)-based FVC studies, in particular those that adopt the automatic approach, are few. This preliminary LR-based FVC study compares two statistical models, the Hidden Markov Model (HMM) and the Gaussian Mixture Model (GMM), for the calculation of forensic LRs using the same speech data. FVC experiments were carried out using different lengths of Japanese short words under a forensically realistic, but challenging condition: only two speech tokens for model training and LR estimation. Log-likelihood-ratio cost (Cllr) was used as the assessment metric. The study demonstrates that the HMM system constantly outperforms the GMM system in terms of average Cllr values. However, words longer than three mora are needed if the advantage of the HMM is to become evident. With a seven-mora word, for example, the HMM outperformed the GMM by a Cllr value of 0.073. |
Tasks | |
Published | 2018-12-01 |
URL | https://www.aclweb.org/anthology/U18-1002/ |
https://www.aclweb.org/anthology/U18-1002 | |
PWC | https://paperswithcode.com/paper/text-dependent-forensic-voice-comparison |
Repo | |
Framework | |
Two-Step Quantization for Low-Bit Neural Networks
Title | Two-Step Quantization for Low-Bit Neural Networks |
Authors | Peisong Wang, Qinghao Hu, Yifan Zhang, Chunjie Zhang, Yang Liu, Jian Cheng |
Abstract | Every bit matters in the hardware design of quantized neural networks. However, extremely-low-bit representation usually causes large accuracy drop. Thus, how to train extremely-low-bit neural networks with high accuracy is of central importance. Most existing network quantization approaches learn transformations (low-bit weights) as well as encodings (low-bit activations) simultaneously. This tight coupling makes the optimization problem difficult, and thus prevents the network from learning optimal representations. In this paper, we propose a simple yet effective Two-Step Quantization (TSQ) framework, by decomposing the network quantization problem into two steps: code learning and transformation function learning based on the learned codes. For the first step, we propose the sparse quantization method for code learning. The second step can be formulated as a non-linear least square regression problem with low-bit constraints, which can be solved efficiently in an iterative manner. Extensive experiments on CIFAR-10 and ILSVRC-12 datasets demonstrate that the proposed TSQ is effective and outperforms the state-of-the-art by a large margin. Especially, for 2-bit activation and ternary weight quantization of AlexNet, the accuracy of our TSQ drops only about 0.5 points compared with the full-precision counterpart, outperforming current state-of-the-art by more than 5 points. |
Tasks | Quantization |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Wang_Two-Step_Quantization_for_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Two-Step_Quantization_for_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/two-step-quantization-for-low-bit-neural |
Repo | |
Framework | |
Development of Natural Language Processing Tools for Cook Islands M=aori
Title | Development of Natural Language Processing Tools for Cook Islands M=aori |
Authors | Rol Solano, o Coto, Sally Akevai Nicholas, Samantha Wray |
Abstract | This paper presents three ongoing projects for NLP in Cook Islands Maori: Untrained Forced Alignment (approx. 9{%} error when detecting the center of words), speech-to-text (37{%} WER in the best trained models) and POS tagging (92{%} accuracy for the best performing model). Included as part of these projects are new resources filling in a gap in Australasian languages, including gold standard POS-tagged written corpora, transcribed speech corpora, time-aligned corpora down to the level of phonemes. These are part of efforts to accelerate the documentation of Cook Islands Maori and to increase its vitality amongst its users. |
Tasks | Machine Translation, Part-Of-Speech Tagging, Speech Recognition |
Published | 2018-12-01 |
URL | https://www.aclweb.org/anthology/U18-1003/ |
https://www.aclweb.org/anthology/U18-1003 | |
PWC | https://paperswithcode.com/paper/development-of-natural-language-processing |
Repo | |
Framework | |
YNU-HPCC at SemEval-2018 Task 1: BiLSTM with Attention based Sentiment Analysis for Affect in Tweets
Title | YNU-HPCC at SemEval-2018 Task 1: BiLSTM with Attention based Sentiment Analysis for Affect in Tweets |
Authors | You Zhang, Jin Wang, Xuejie Zhang |
Abstract | We implemented the sentiment system in all five subtasks for English and Spanish. All subtasks involve emotion or sentiment intensity prediction (regression and ordinal classification) and emotions determining (multi-labels classification). The useful BiLSTM (Bidirectional Long-Short Term Memory) model with attention mechanism was mainly applied for our system. We use BiLSTM in order to get word information extracted from both directions. The attention mechanism was used to find the contribution of each word for improving the scores. Furthermore, based on BiLSTMATT (BiLSTM with attention mechanism) a few deep-learning algorithms were employed for different subtasks. For regression and ordinal classification tasks we used domain adaptation and ensemble learning methods to leverage base model. While a single base model was used for multi-labels task. |
Tasks | Domain Adaptation, Multi-Label Classification, Sentiment Analysis |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1040/ |
https://www.aclweb.org/anthology/S18-1040 | |
PWC | https://paperswithcode.com/paper/ynu-hpcc-at-semeval-2018-task-1-bilstm-with |
Repo | |
Framework | |
Towards Processing of the Oral History Interviews and Related Printed Documents
Title | Towards Processing of the Oral History Interviews and Related Printed Documents |
Authors | Zbyn{\v{e}}k Zaj{'\i}c, Lucie Skorkovsk{'a}, Petr Neduchal, Pavel Ircing, Josef V. Psutka, Marek Hr{'u}z, Ale{\v{s}} Pra{\v{z}}{'a}k, Daniel Soutner, Jan {\v{S}}vec, Luk{'a}{\v{s}} Bure{\v{s}}, Lud{\v{e}}k M{"u}ller |
Abstract | |
Tasks | Optical Character Recognition, Speech Recognition |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1331/ |
https://www.aclweb.org/anthology/L18-1331 | |
PWC | https://paperswithcode.com/paper/towards-processing-of-the-oral-history |
Repo | |
Framework | |
Stacking with Auxiliary Features for Visual Question Answering
Title | Stacking with Auxiliary Features for Visual Question Answering |
Authors | Nazneen Fatema Rajani, Raymond Mooney |
Abstract | Visual Question Answering (VQA) is a well-known and challenging task that requires systems to jointly reason about natural language and vision. Deep learning models in various forms have been the standard for solving VQA. However, some of these VQA models are better at certain types of image-question pairs than other models. Ensembling VQA models intelligently to leverage their diverse expertise is, therefore, advantageous. Stacking With Auxiliary Features (SWAF) is an intelligent ensembling technique which learns to combine the results of multiple models using features of the current problem as context. We propose four categories of auxiliary features for ensembling for VQA. Three out of the four categories of features can be inferred from an image-question pair and do not require querying the component models. The fourth category of auxiliary features uses model-specific explanations. In this paper, we describe how we use these various categories of auxiliary features to improve performance for VQA. Using SWAF to effectively ensemble three recent systems, we obtain a new state-of-the-art. Our work also highlights the advantages of explainable AI models. |
Tasks | Common Sense Reasoning, Question Answering, Visual Question Answering, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/N18-1201/ |
https://www.aclweb.org/anthology/N18-1201 | |
PWC | https://paperswithcode.com/paper/stacking-with-auxiliary-features-for-visual |
Repo | |
Framework | |
On the Utility of Lay Summaries and AI Safety Disclosures: Toward Robust, Open Research Oversight
Title | On the Utility of Lay Summaries and AI Safety Disclosures: Toward Robust, Open Research Oversight |
Authors | Allen Schmaltz |
Abstract | In this position paper, we propose that the community consider encouraging researchers to include two riders, a {}Lay Summary{''} and an { }AI Safety Disclosure{''}, as part of future NLP papers published in ACL forums that present user-facing systems. The goal is to encourage researchers{–}via a relatively non-intrusive mechanism{–}to consider the societal implications of technologies carrying (un)known and/or (un)knowable long-term risks, to highlight failure cases, and to provide a mechanism by which the general public (and scientists in other disciplines) can more readily engage in the discussion in an informed manner. This simple proposal requires minimal additional up-front costs for researchers; the lay summary, at least, has significant precedence in the medical literature and other areas of science; and the proposal is aimed to supplement, rather than replace, existing approaches for encouraging researchers to consider the ethical implications of their work, such as those of the Collaborative Institutional Training Initiative (CITI) Program and institutional review boards (IRBs). |
Tasks | Machine Translation |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-0801/ |
https://www.aclweb.org/anthology/W18-0801 | |
PWC | https://paperswithcode.com/paper/on-the-utility-of-lay-summaries-and-ai-safety |
Repo | |
Framework | |
Viterbi-based Pruning for Sparse Matrix with Fixed and High Index Compression Ratio
Title | Viterbi-based Pruning for Sparse Matrix with Fixed and High Index Compression Ratio |
Authors | Dongsoo Lee, Daehyun Ahn, Taesu Kim, Pierce I. Chuang, Jae-Joon Kim |
Abstract | Weight pruning has proven to be an effective method in reducing the model size and computation cost while not sacrificing the model accuracy. Conventional sparse matrix formats, however, involve irregular index structures with large storage requirement and sequential reconstruction process, resulting in inefficient use of highly parallel computing resources. Hence, pruning is usually restricted to inference with a batch size of one, for which an efficient parallel matrix-vector multiplication method exists. In this paper, a new class of sparse matrix representation utilizing Viterbi algorithm that has a high, and more importantly, fixed index compression ratio regardless of the pruning rate, is proposed. In this approach, numerous sparse matrix candidates are first generated by the Viterbi encoder, and then the one that aims to minimize the model accuracy degradation is selected by the Viterbi algorithm. The model pruning process based on the proposed Viterbi encoder and Viterbi algorithm is highly parallelizable, and can be implemented efficiently in hardware to achieve low-energy, high-performance index decoding process. Compared with the existing magnitude-based pruning methods, index data storage requirement can be further compressed by 85.2% in MNIST and 83.9% in AlexNet while achieving similar pruning rate. Even compared with the relative index compression technique, our method can still reduce the index storage requirement by 52.7% in MNIST and 35.5% in AlexNet. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=S1D8MPxA- |
https://openreview.net/pdf?id=S1D8MPxA- | |
PWC | https://paperswithcode.com/paper/viterbi-based-pruning-for-sparse-matrix-with |
Repo | |
Framework | |
Human Activity Prediction in Smart Home Environments with LSTM Neural Networks
Title | Human Activity Prediction in Smart Home Environments with LSTM Neural Networks |
Authors | Niek Tax |
Abstract | In this paper, we investigate the performance of several sequence prediction techniques on the prediction of future events of human behavior in a smart home, as well as the timestamps of those next events. Prediction techniques in smart home environments have several use cases, such as the real-time identification of abnormal behavior, identifying coachable moments for e-coaching, and a plethora of applications in the area of home automation. We give an overview of several sequence prediction techniques, including techniques that originate from the areas of data mining, process mining, and data compression, and we evaluate the predictive accuracy of those techniques on a collection of publicly available real-life datasets from the smart home environments domain. This contrast our work with existing work on prediction in smart homes, which often evaluate their techniques on a single smart home instead of a larger collection of logs. We found that LSTM neural networks outperform the other prediction methods on the task of predicting the next activity as well as on the task of predicting the timestamp of the next event. However, surprisingly, we found that it is very dependent on the dataset which technique works best for the task of predicting a window of multiple next activities. |
Tasks | Activity Prediction, Activity Recognition, Home Activity Monitoring, Human Activity Recognition, Recognizing And Localizing Human Actions |
Published | 2018-06-28 |
URL | https://ieeexplore.ieee.org/abstract/document/8595030 |
https://ieeexplore.ieee.org/abstract/document/8595030 | |
PWC | https://paperswithcode.com/paper/human-activity-prediction-in-smart-home |
Repo | |
Framework | |
Specifying Conceptual Models Using Restricted Natural Language
Title | Specifying Conceptual Models Using Restricted Natural Language |
Authors | Bayzid Ashik Hossain, Rolf Schwitter |
Abstract | The key activity to design an information system is conceptual modelling which brings out and describes the general knowledge that is required to build a system. In this paper we propose a novel approach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted form of natural language. A restricted natural language is a subset of a natural language that has well-defined computational properties and therefore can be translated unambiguously into a formal notation. We will argue that a restricted natural language is suitable for writing precise and consistent specifications that lead to executable conceptual models. Using a restricted natural language will allow the domain experts to describe a scenario in the terminology of the application domain without the need to formally encode this scenario. The resulting textual specification can then be automatically translated into the language of the desired conceptual modelling framework. |
Tasks | |
Published | 2018-12-01 |
URL | https://www.aclweb.org/anthology/U18-1005/ |
https://www.aclweb.org/anthology/U18-1005 | |
PWC | https://paperswithcode.com/paper/specifying-conceptual-models-using-restricted |
Repo | |
Framework | |
Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data
Title | Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data |
Authors | Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, D, S apat, ipan, Kalika Bali |
Abstract | Training language models for Code-mixed (CM) language is known to be a difficult problem because of lack of data compounded by the increased confusability due to the presence of more than one language. We present a computational technique for creation of grammatically valid artificial CM data based on the Equivalence Constraint Theory. We show that when training examples are sampled appropriately from this synthetic data and presented in certain order (aka training curriculum) along with monolingual and real CM data, it can significantly reduce the perplexity of an RNN-based language model. We also show that randomly generated CM data does not help in decreasing the perplexity of the LMs. |
Tasks | Language Identification, Language Modelling, Sentiment Analysis, Speech Recognition |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1143/ |
https://www.aclweb.org/anthology/P18-1143 | |
PWC | https://paperswithcode.com/paper/language-modeling-for-code-mixing-the-role-of |
Repo | |
Framework | |
A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection
Title | A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection |
Authors | Yiyi Wang, Chilin Shih |
Abstract | This paper presents a method of combining Conditional Random Fields (CRFs) model with a post-processing layer using Google n-grams statistical information tailored to detect word selection and word order errors made by learners of Chinese as Foreign Language (CFL). We describe the architecture of the model and its performance in the shared task of the ACL 2018 Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA). This hybrid approach yields comparably high false positive rate (FPR = 0.1274) and precision (Pd= 0.7519; Pi= 0.6311), but low recall (Rd = 0.3035; Ri = 0.1696 ) in grammatical error detection and identification tasks. Additional statistical information and linguistic rules can be added to enhance the model performance in the future. |
Tasks | Grammatical Error Detection |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-3728/ |
https://www.aclweb.org/anthology/W18-3728 | |
PWC | https://paperswithcode.com/paper/a-hybrid-approach-combining-statistical |
Repo | |
Framework | |
oi-VAE: Output Interpretable VAEs for Nonlinear Group Factor Analysis
Title | oi-VAE: Output Interpretable VAEs for Nonlinear Group Factor Analysis |
Authors | Samuel K. Ainsworth, Nicholas J. Foti, Adrian K. C. Lee, Emily B. Fox |
Abstract | Deep generative models have recently yielded encouraging results in producing subjectively realistic samples of complex data. Far less attention has been paid to making these generative models interpretable. In many scenarios, ranging from scientific applications to finance, the observed variables have a natural grouping. It is often of interest to understand systems of interaction amongst these groups, and latent factor models (LFMs) are an attractive approach. However, traditional LFMs are limited by assuming a linear correlation structure. We present an output interpretable VAE (oi-VAE) for grouped data that models complex, nonlinear latent-to-observed relationships. We combine a structured VAE comprised of group-specific generators with a sparsity-inducing prior. We demonstrate that oi-VAE yields meaningful notions of interpretability in the analysis of motion capture and MEG data. We further show that in these situations, the regularization inherent to oi-VAE can actually lead to improved generalization and learned generative processes. |
Tasks | Motion Capture |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2410 |
http://proceedings.mlr.press/v80/ainsworth18a/ainsworth18a.pdf | |
PWC | https://paperswithcode.com/paper/oi-vae-output-interpretable-vaes-for |
Repo | |
Framework | |
A Neural Morphological Analyzer for Arapaho Verbs Learned from a Finite State Transducer
Title | A Neural Morphological Analyzer for Arapaho Verbs Learned from a Finite State Transducer |
Authors | Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden |
Abstract | We experiment with training an encoder-decoder neural model for mimicking the behavior of an existing hand-written finite-state morphological grammar for Arapaho verbs, a polysynthetic language with a highly complex verbal inflection system. After adjusting for ambiguous parses, we find that the system is able to generalize to unseen forms with accuracies of 98.68{%} (unambiguous verbs) and 92.90{%} (all verbs). |
Tasks | Machine Translation, Morphological Analysis, Speech Recognition |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/W18-4802/ |
https://www.aclweb.org/anthology/W18-4802 | |
PWC | https://paperswithcode.com/paper/a-neural-morphological-analyzer-for-arapaho |
Repo | |
Framework | |