July 28, 2019

2981 words 14 mins read

Paper Group ANR 223

Towards Language-Universal End-to-End Speech Recognition. Metric Learning-based Generative Adversarial Network. Global Entity Ranking Across Multiple Languages. Stopping GAN Violence: Generative Unadversarial Networks. Application of Fuzzy Logic in Design of Smart Washing Machine. Convolutional Attention-based Seq2Seq Neural Network for End-to-End …

Towards Language-Universal End-to-End Speech Recognition


Title	Towards Language-Universal End-to-End Speech Recognition
Authors	Suyoun Kim, Michael L. Seltzer
Abstract	Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters. In this work, we exploit recent progress in end-to-end speech recognition to create a single multilingual speech recognition system capable of recognizing any of the languages seen in training. To do so, we propose the use of a universal character set that is shared among all languages. We also create a language-specific gating mechanism within the network that can modulate the network’s internal representations in a language-specific way. We evaluate our proposed approach on the Microsoft Cortana task across three languages and show that our system outperforms both the individual monolingual systems and systems built with a multi-task learning approach. We also show that this model can be used to initialize a monolingual speech recognizer, and can be used to create a bilingual model for use in code-switching scenarios.
Tasks	End-To-End Speech Recognition, Multi-Task Learning, Speech Recognition
Published	2017-11-06
URL	http://arxiv.org/abs/1711.02207v1
PDF	http://arxiv.org/pdf/1711.02207v1.pdf
PWC	https://paperswithcode.com/paper/towards-language-universal-end-to-end-speech
Repo
Framework

Metric Learning-based Generative Adversarial Network


Title	Metric Learning-based Generative Adversarial Network
Authors	Zi-Yi Dou
Abstract	Generative Adversarial Networks (GANs), as a framework for estimating generative models via an adversarial process, have attracted huge attention and have proven to be powerful in a variety of tasks. However, training GANs is well known for being delicate and unstable, partially caused by its sig- moid cross entropy loss function for the discriminator. To overcome such a problem, many researchers directed their attention on various ways to measure how close the model distribution and real distribution are and have applied dif- ferent metrics as their objective functions. In this paper, we propose a novel framework to train GANs based on distance metric learning and we call it Metric Learning-based Gener- ative Adversarial Network (MLGAN). The discriminator of MLGANs can dynamically learn an appropriate metric, rather than a static one, to measure the distance between generated samples and real samples. Afterwards, MLGANs update the generator under the newly learned metric. We evaluate our ap- proach on several representative datasets and the experimen- tal results demonstrate that MLGANs can achieve superior performance compared with several existing state-of-the-art approaches. We also empirically show that MLGANs could increase the stability of training GANs.
Tasks	Metric Learning
Published	2017-11-08
URL	http://arxiv.org/abs/1711.02792v1
PDF	http://arxiv.org/pdf/1711.02792v1.pdf
PWC	https://paperswithcode.com/paper/metric-learning-based-generative-adversarial
Repo
Framework

Global Entity Ranking Across Multiple Languages


Title	Global Entity Ranking Across Multiple Languages
Authors	Prantik Bhattacharyya, Nemanja Spasojevic
Abstract	We present work on building a global long-tailed ranking of entities across multiple languages using Wikipedia and Freebase knowledge bases. We identify multiple features and build a model to rank entities using a ground-truth dataset of more than 10 thousand labels. The final system ranks 27 million entities with 75% precision and 48% F1 score. We provide performance evaluation and empirical evidence of the quality of ranking across languages, and open the final ranked lists for future research.
Tasks
Published	2017-03-17
URL	http://arxiv.org/abs/1703.06108v1
PDF	http://arxiv.org/pdf/1703.06108v1.pdf
PWC	https://paperswithcode.com/paper/global-entity-ranking-across-multiple
Repo
Framework

Stopping GAN Violence: Generative Unadversarial Networks


Title	Stopping GAN Violence: Generative Unadversarial Networks
Authors	Samuel Albanie, Sébastien Ehrhardt, João F. Henriques
Abstract	While the costs of human violence have attracted a great deal of attention from the research community, the effects of the network-on-network (NoN) violence popularised by Generative Adversarial Networks have yet to be addressed. In this work, we quantify the financial, social, spiritual, cultural, grammatical and dermatological impact of this aggression and address the issue by proposing a more peaceful approach which we term Generative Unadversarial Networks (GUNs). Under this framework, we simultaneously train two models: a generator G that does its best to capture whichever data distribution it feels it can manage, and a motivator M that helps G to achieve its dream. Fighting is strictly verboten and both models evolve by learning to respect their differences. The framework is both theoretically and electrically grounded in game theory, and can be viewed as a winner-shares-all two-player game in which both players work as a team to achieve the best score. Experiments show that by working in harmony, the proposed model is able to claim both the moral and log-likelihood high ground. Our work builds on a rich history of carefully argued position-papers, published as anonymous YouTube comments, which prove that the optimal solution to NoN violence is more GUNs.
Tasks
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02528v1
PDF	http://arxiv.org/pdf/1703.02528v1.pdf
PWC	https://paperswithcode.com/paper/stopping-gan-violence-generative
Repo
Framework

Application of Fuzzy Logic in Design of Smart Washing Machine


Title	Application of Fuzzy Logic in Design of Smart Washing Machine
Authors	Rao Farhat Masood
Abstract	Washing machine is of great domestic necessity as it frees us from the burden of washing our clothes and saves ample of our time. This paper will cover the aspect of designing and developing of Fuzzy Logic based, Smart Washing Machine. The regular washing machine (timer based) makes use of multi-turned timer based start-stop mechanism which is mechanical as is prone to breakage. In addition to its starting and stopping issues, the mechanical timers are not efficient with respect of maintenance and electricity usage. Recent developments have shown that merger of digital electronics in optimal functionality of this machine is possible and nowadays in practice. A number of international renowned companies have developed the machine with the introduction of smart artificial intelligence. Such a machine makes use of sensors and smartly calculates the amount of run-time (washing time) for the main machine motor. Realtime calculations and processes are also catered in optimizing the run-time of the machine. The obvious result is smart time management, better economy of electricity and efficiency of work. This paper deals with the indigenization of FLC (Fuzzy Logic Controller) based Washing Machine, which is capable of automating the inputs and getting the desired output (wash-time).
Tasks
Published	2017-01-04
URL	http://arxiv.org/abs/1701.01654v2
PDF	http://arxiv.org/pdf/1701.01654v2.pdf
PWC	https://paperswithcode.com/paper/application-of-fuzzy-logic-in-design-of-smart
Repo
Framework

Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR


Title	Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR
Authors	Dan Lim
Abstract	This thesis introduces the sequence to sequence model with Luong’s attention mechanism for end-to-end ASR. It also describes various neural network algorithms including Batch normalization, Dropout and Residual network which constitute the convolutional attention-based seq2seq neural network. Finally the proposed model proved its effectiveness for speech recognition achieving 15.8% phoneme error rate on TIMIT dataset.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2017-10-12
URL	http://arxiv.org/abs/1710.04515v1
PDF	http://arxiv.org/pdf/1710.04515v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-attention-based-seq2seq-neural
Repo
Framework

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems


Title	Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
Authors	Yonatan Belinkov, James Glass
Abstract	Neural models have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04482v1
PDF	http://arxiv.org/pdf/1709.04482v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-hidden-representations-in-end-to
Repo
Framework

Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case


Title	Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case
Authors	Faraz Faghri, Sayed Hadi Hashemi, Mohammad Babaeizadeh, Mike A. Nalls, Saurabh Sinha, Roy H. Campbell
Abstract	In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and dimensionality reduction. The goal of this study is to guide the focus of scalable computing experts in the endeavor of applying new storage and scalable computation designs to bioinformatics algorithms that merit their attention most, following the engineering maxim of “optimize the common case”.
Tasks	Dimensionality Reduction
Published	2017-09-29
URL	http://arxiv.org/abs/1710.00112v1
PDF	http://arxiv.org/pdf/1710.00112v1.pdf
PWC	https://paperswithcode.com/paper/toward-scalable-machine-learning-and-data
Repo
Framework

Automated Surgical Skill Assessment in RMIS Training


Title	Automated Surgical Skill Assessment in RMIS Training
Authors	Aneeq Zia, Irfan Essa
Abstract	Purpose: Manual feedback in basic RMIS training can consume a significant amount of time from expert surgeons’ schedule and is prone to subjectivity. While VR-based training tasks can generate automated score reports, there is no mechanism of generating automated feedback for surgeons performing basic surgical tasks in RMIS training. In this paper, we explore the usage of different holistic features for automated skill assessment using only robot kinematic data and propose a weighted feature fusion technique for improving score prediction performance. Methods: We perform our experiments on the publicly available JIGSAWS dataset and evaluate four different types of holistic features from robot kinematic data - Sequential Motion Texture (SMT), Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Approximate Entropy (ApEn). The features are then used for skill classification and exact skill score prediction. Along with using these features individually, we also evaluate the performance using our proposed weighted combination technique. Results: Our results demonstrate that these holistic features outperform all previous HMM based state-of-the-art methods for skill classification on the JIGSAWS dataset. Also, our proposed feature fusion strategy significantly improves performance for skill score predictions achieving up to 0.61 average spearman correlation coefficient. Conclusions: Holistic features capturing global information from robot kinematic data can successfully be used for evaluating surgeon skill in basic surgical tasks on the da Vinci robot. Using the framework presented can potentially allow for real time score feedback in RMIS training.
Tasks
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08604v1
PDF	http://arxiv.org/pdf/1712.08604v1.pdf
PWC	https://paperswithcode.com/paper/automated-surgical-skill-assessment-in-rmis
Repo
Framework

S-OHEM: Stratified Online Hard Example Mining for Object Detection


Title	S-OHEM: Stratified Online Hard Example Mining for Object Detection
Authors	Minne Li, Zhaoning Zhang, Hao Yu, Xinyuan Chen, Dongsheng Li
Abstract	One of the major challenges in object detection is to propose detectors with highly accurate localization of objects. The online sampling of high-loss region proposals (hard examples) uses the multitask loss with equal weight settings across all loss types (e.g, classification and localization, rigid and non-rigid categories) and ignores the influence of different loss distributions throughout the training process, which we find essential to the training efficacy. In this paper, we present the Stratified Online Hard Example Mining (S-OHEM) algorithm for training higher efficiency and accuracy detectors. S-OHEM exploits OHEM with stratified sampling, a widely-adopted sampling technique, to choose the training examples according to this influence during hard example mining, and thus enhance the performance of object detectors. We show through systematic experiments that S-OHEM yields an average precision (AP) improvement of 0.5% on rigid categories of PASCAL VOC 2007 for both the IoU threshold of 0.6 and 0.7. For KITTI 2012, both results of the same metric are 1.6%. Regarding the mean average precision (mAP), a relative increase of 0.3% and 0.5% (1% and 0.5%) is observed for VOC07 (KITTI12) using the same set of IoU threshold. Also, S-OHEM is easy to integrate with existing region-based detectors and is capable of acting with post-recognition level regressors.
Tasks	Object Detection
Published	2017-05-05
URL	http://arxiv.org/abs/1705.02233v4
PDF	http://arxiv.org/pdf/1705.02233v4.pdf
PWC	https://paperswithcode.com/paper/s-ohem-stratified-online-hard-example-mining
Repo
Framework

Exploring Neural Transducers for End-to-End Speech Recognition


Title	Exploring Neural Transducers for End-to-End Speech Recognition
Authors	Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
Abstract	In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition. We show that, without any language model, Seq2Seq and RNN-Transducer models both outperform the best reported CTC models with a language model, on the popular Hub5’00 benchmark. On our internal diverse dataset, these trends continue - RNNTransducer models rescored with a language model after beam search outperform our best CTC models. These results simplify the speech recognition pipeline so that decoding can now be expressed purely as neural network operations. We also study how the choice of encoder architecture affects the performance of the three models - when all encoder layers are forward only, and when encoders downsample the input representation aggressively.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2017-07-24
URL	http://arxiv.org/abs/1707.07413v1
PDF	http://arxiv.org/pdf/1707.07413v1.pdf
PWC	https://paperswithcode.com/paper/exploring-neural-transducers-for-end-to-end
Repo
Framework

Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?


Title	Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?
Authors	Cameron Musco, David P. Woodruff
Abstract	Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly. In this work we study the limits of computationally efficient low-rank kernel approximation. We show that for a broad class of kernels, including the popular Gaussian and polynomial kernels, computing a relative error $k$-rank approximation to $K$ is at least as difficult as multiplying the input data matrix $A \in \mathbb{R}^{n \times d}$ by an arbitrary matrix $C \in \mathbb{R}^{d \times k}$. Barring a breakthrough in fast matrix multiplication, when $k$ is not too large, this requires $\Omega(nnz(A)k)$ time where $nnz(A)$ is the number of non-zeros in $A$. This lower bound matches, in many parameter regimes, recent work on subquadratic time algorithms for low-rank approximation of general kernels [MM16,MW17], demonstrating that these algorithms are unlikely to be significantly improved, in particular to $O(nnz(A))$ input sparsity runtimes. At the same time there is hope: we show for the first time that $O(nnz(A))$ time approximation is possible for general radial basis function kernels (e.g., the Gaussian kernel) for the closely related problem of low-rank approximation of the kernelized dataset.
Tasks
Published	2017-11-05
URL	http://arxiv.org/abs/1711.01596v1
PDF	http://arxiv.org/pdf/1711.01596v1.pdf
PWC	https://paperswithcode.com/paper/is-input-sparsity-time-possible-for-kernel
Repo
Framework

Virtual Sensor Modelling using Neural Networks with Coefficient-based Adaptive Weights and Biases Search Algorithm for Diesel Engines


Title	Virtual Sensor Modelling using Neural Networks with Coefficient-based Adaptive Weights and Biases Search Algorithm for Diesel Engines
Authors	Kushagra Rastogi, Navreet Saini
Abstract	With the explosion in the field of Big Data and introduction of more stringent emission norms every three to five years, automotive companies must not only continue to enhance the fuel economy ratings of their products, but also provide valued services to their customers such as delivering engine performance and health reports at regular intervals. A reasonable solution to both issues is installing a variety of sensors on the engine. Sensor data can be used to develop fuel economy features and will directly indicate engine performance. However, mounting a plethora of sensors is impractical in a very cost-sensitive industry. Thus, virtual sensors can replace physical sensors by reducing cost while capturing essential engine data.
Tasks
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08319v1
PDF	http://arxiv.org/pdf/1712.08319v1.pdf
PWC	https://paperswithcode.com/paper/virtual-sensor-modelling-using-neural
Repo
Framework

A Computational Model of a Single-Photon Avalanche Diode Sensor for Transient Imaging


Title	A Computational Model of a Single-Photon Avalanche Diode Sensor for Transient Imaging
Authors	Quercus Hernandez, Diego Gutierrez, Adrian Jarabo
Abstract	Single-Photon Avalanche Diodes (SPAD) are affordable photodetectors, capable to collect extremely fast low-energy events, due to their single-photon sensibility. This makes them very suitable for time-of-flight-based range imaging systems, allowing to reduce costs and power requirements, without sacrifizing much temporal resolution. In this work we describe a computational model to simulate the behaviour of SPAD sensors, aiming to provide a realistic camera model for time-resolved light transport simulation, with applications on prototyping new reconstructions techniques based on SPAD time-of-flight data. Our model accounts for the major effects of the sensor on the incoming signal. We compare our model against real-world measurements, and apply it to a variety of scenarios, including complex multiply-scattered light transport.
Tasks
Published	2017-02-23
URL	http://arxiv.org/abs/1703.02635v1
PDF	http://arxiv.org/pdf/1703.02635v1.pdf
PWC	https://paperswithcode.com/paper/a-computational-model-of-a-single-photon
Repo
Framework

SemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and Querying Knowledge Graphs


Title	SemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and Querying Knowledge Graphs
Authors	Paul Cuddihy, Justin McHugh, Jenny Weisenberg Williams, Varish Mulwad, Kareem S. Aggour
Abstract	The relatively recent adoption of Knowledge Graphs as an enabling technology in multiple high-profile artificial intelligence and cognitive applications has led to growing interest in the Semantic Web technology stack. Many semantics-related tools, however, are focused on serving experts with a deep understanding of semantic technologies. For example, triplification of relational data is available but there is no open source tool that allows a user unfamiliar with OWL/RDF to import data into a semantic triple store in an intuitive manner. Further, many tools require users to have a working understanding of SPARQL to query data. Casual users interested in benefiting from the power of Knowledge Graphs have few tools available for exploring, querying, and managing semantic data. We present SemTK, the Semantics Toolkit, a user-friendly suite of tools that allow both expert and non-expert semantics users convenient ingestion of relational data, simplified query generation, and more. The exploration of ontologies and instance data is performed through SPARQLgraph, an intuitive web-based user interface in SemTK understandable and navigable by a lay user. The open source version of SemTK is available at http://semtk.research.ge.com
Tasks	Knowledge Graphs
Published	2017-10-31
URL	http://arxiv.org/abs/1710.11531v2
PDF	http://arxiv.org/pdf/1710.11531v2.pdf
PWC	https://paperswithcode.com/paper/semtk-an-ontology-first-open-source-semantic
Repo
Framework