Paper Group ANR 223
Towards Language-Universal End-to-End Speech Recognition. Metric Learning-based Generative Adversarial Network. Global Entity Ranking Across Multiple Languages. Stopping GAN Violence: Generative Unadversarial Networks. Application of Fuzzy Logic in Design of Smart Washing Machine. Convolutional Attention-based Seq2Seq Neural Network for End-to-End …
Towards Language-Universal End-to-End Speech Recognition
Title | Towards Language-Universal End-to-End Speech Recognition |
Authors | Suyoun Kim, Michael L. Seltzer |
Abstract | Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters. In this work, we exploit recent progress in end-to-end speech recognition to create a single multilingual speech recognition system capable of recognizing any of the languages seen in training. To do so, we propose the use of a universal character set that is shared among all languages. We also create a language-specific gating mechanism within the network that can modulate the network’s internal representations in a language-specific way. We evaluate our proposed approach on the Microsoft Cortana task across three languages and show that our system outperforms both the individual monolingual systems and systems built with a multi-task learning approach. We also show that this model can be used to initialize a monolingual speech recognizer, and can be used to create a bilingual model for use in code-switching scenarios. |
Tasks | End-To-End Speech Recognition, Multi-Task Learning, Speech Recognition |
Published | 2017-11-06 |
URL | http://arxiv.org/abs/1711.02207v1 |
http://arxiv.org/pdf/1711.02207v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-language-universal-end-to-end-speech |
Repo | |
Framework | |
Metric Learning-based Generative Adversarial Network
Title | Metric Learning-based Generative Adversarial Network |
Authors | Zi-Yi Dou |
Abstract | Generative Adversarial Networks (GANs), as a framework for estimating generative models via an adversarial process, have attracted huge attention and have proven to be powerful in a variety of tasks. However, training GANs is well known for being delicate and unstable, partially caused by its sig- moid cross entropy loss function for the discriminator. To overcome such a problem, many researchers directed their attention on various ways to measure how close the model distribution and real distribution are and have applied dif- ferent metrics as their objective functions. In this paper, we propose a novel framework to train GANs based on distance metric learning and we call it Metric Learning-based Gener- ative Adversarial Network (MLGAN). The discriminator of MLGANs can dynamically learn an appropriate metric, rather than a static one, to measure the distance between generated samples and real samples. Afterwards, MLGANs update the generator under the newly learned metric. We evaluate our ap- proach on several representative datasets and the experimen- tal results demonstrate that MLGANs can achieve superior performance compared with several existing state-of-the-art approaches. We also empirically show that MLGANs could increase the stability of training GANs. |
Tasks | Metric Learning |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.02792v1 |
http://arxiv.org/pdf/1711.02792v1.pdf | |
PWC | https://paperswithcode.com/paper/metric-learning-based-generative-adversarial |
Repo | |
Framework | |
Global Entity Ranking Across Multiple Languages
Title | Global Entity Ranking Across Multiple Languages |
Authors | Prantik Bhattacharyya, Nemanja Spasojevic |
Abstract | We present work on building a global long-tailed ranking of entities across multiple languages using Wikipedia and Freebase knowledge bases. We identify multiple features and build a model to rank entities using a ground-truth dataset of more than 10 thousand labels. The final system ranks 27 million entities with 75% precision and 48% F1 score. We provide performance evaluation and empirical evidence of the quality of ranking across languages, and open the final ranked lists for future research. |
Tasks | |
Published | 2017-03-17 |
URL | http://arxiv.org/abs/1703.06108v1 |
http://arxiv.org/pdf/1703.06108v1.pdf | |
PWC | https://paperswithcode.com/paper/global-entity-ranking-across-multiple |
Repo | |
Framework | |
Stopping GAN Violence: Generative Unadversarial Networks
Title | Stopping GAN Violence: Generative Unadversarial Networks |
Authors | Samuel Albanie, Sébastien Ehrhardt, João F. Henriques |
Abstract | While the costs of human violence have attracted a great deal of attention from the research community, the effects of the network-on-network (NoN) violence popularised by Generative Adversarial Networks have yet to be addressed. In this work, we quantify the financial, social, spiritual, cultural, grammatical and dermatological impact of this aggression and address the issue by proposing a more peaceful approach which we term Generative Unadversarial Networks (GUNs). Under this framework, we simultaneously train two models: a generator G that does its best to capture whichever data distribution it feels it can manage, and a motivator M that helps G to achieve its dream. Fighting is strictly verboten and both models evolve by learning to respect their differences. The framework is both theoretically and electrically grounded in game theory, and can be viewed as a winner-shares-all two-player game in which both players work as a team to achieve the best score. Experiments show that by working in harmony, the proposed model is able to claim both the moral and log-likelihood high ground. Our work builds on a rich history of carefully argued position-papers, published as anonymous YouTube comments, which prove that the optimal solution to NoN violence is more GUNs. |
Tasks | |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02528v1 |
http://arxiv.org/pdf/1703.02528v1.pdf | |
PWC | https://paperswithcode.com/paper/stopping-gan-violence-generative |
Repo | |
Framework | |
Application of Fuzzy Logic in Design of Smart Washing Machine
Title | Application of Fuzzy Logic in Design of Smart Washing Machine |
Authors | Rao Farhat Masood |
Abstract | Washing machine is of great domestic necessity as it frees us from the burden of washing our clothes and saves ample of our time. This paper will cover the aspect of designing and developing of Fuzzy Logic based, Smart Washing Machine. The regular washing machine (timer based) makes use of multi-turned timer based start-stop mechanism which is mechanical as is prone to breakage. In addition to its starting and stopping issues, the mechanical timers are not efficient with respect of maintenance and electricity usage. Recent developments have shown that merger of digital electronics in optimal functionality of this machine is possible and nowadays in practice. A number of international renowned companies have developed the machine with the introduction of smart artificial intelligence. Such a machine makes use of sensors and smartly calculates the amount of run-time (washing time) for the main machine motor. Realtime calculations and processes are also catered in optimizing the run-time of the machine. The obvious result is smart time management, better economy of electricity and efficiency of work. This paper deals with the indigenization of FLC (Fuzzy Logic Controller) based Washing Machine, which is capable of automating the inputs and getting the desired output (wash-time). |
Tasks | |
Published | 2017-01-04 |
URL | http://arxiv.org/abs/1701.01654v2 |
http://arxiv.org/pdf/1701.01654v2.pdf | |
PWC | https://paperswithcode.com/paper/application-of-fuzzy-logic-in-design-of-smart |
Repo | |
Framework | |
Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR
Title | Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR |
Authors | Dan Lim |
Abstract | This thesis introduces the sequence to sequence model with Luong’s attention mechanism for end-to-end ASR. It also describes various neural network algorithms including Batch normalization, Dropout and Residual network which constitute the convolutional attention-based seq2seq neural network. Finally the proposed model proved its effectiveness for speech recognition achieving 15.8% phoneme error rate on TIMIT dataset. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2017-10-12 |
URL | http://arxiv.org/abs/1710.04515v1 |
http://arxiv.org/pdf/1710.04515v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-attention-based-seq2seq-neural |
Repo | |
Framework | |
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
Title | Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems |
Authors | Yonatan Belinkov, James Glass |
Abstract | Neural models have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04482v1 |
http://arxiv.org/pdf/1709.04482v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-hidden-representations-in-end-to |
Repo | |
Framework | |
Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case
Title | Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case |
Authors | Faraz Faghri, Sayed Hadi Hashemi, Mohammad Babaeizadeh, Mike A. Nalls, Saurabh Sinha, Roy H. Campbell |
Abstract | In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and dimensionality reduction. The goal of this study is to guide the focus of scalable computing experts in the endeavor of applying new storage and scalable computation designs to bioinformatics algorithms that merit their attention most, following the engineering maxim of “optimize the common case”. |
Tasks | Dimensionality Reduction |
Published | 2017-09-29 |
URL | http://arxiv.org/abs/1710.00112v1 |
http://arxiv.org/pdf/1710.00112v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-scalable-machine-learning-and-data |
Repo | |
Framework | |
Automated Surgical Skill Assessment in RMIS Training
Title | Automated Surgical Skill Assessment in RMIS Training |
Authors | Aneeq Zia, Irfan Essa |
Abstract | Purpose: Manual feedback in basic RMIS training can consume a significant amount of time from expert surgeons’ schedule and is prone to subjectivity. While VR-based training tasks can generate automated score reports, there is no mechanism of generating automated feedback for surgeons performing basic surgical tasks in RMIS training. In this paper, we explore the usage of different holistic features for automated skill assessment using only robot kinematic data and propose a weighted feature fusion technique for improving score prediction performance. Methods: We perform our experiments on the publicly available JIGSAWS dataset and evaluate four different types of holistic features from robot kinematic data - Sequential Motion Texture (SMT), Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT) and Approximate Entropy (ApEn). The features are then used for skill classification and exact skill score prediction. Along with using these features individually, we also evaluate the performance using our proposed weighted combination technique. Results: Our results demonstrate that these holistic features outperform all previous HMM based state-of-the-art methods for skill classification on the JIGSAWS dataset. Also, our proposed feature fusion strategy significantly improves performance for skill score predictions achieving up to 0.61 average spearman correlation coefficient. Conclusions: Holistic features capturing global information from robot kinematic data can successfully be used for evaluating surgeon skill in basic surgical tasks on the da Vinci robot. Using the framework presented can potentially allow for real time score feedback in RMIS training. |
Tasks | |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08604v1 |
http://arxiv.org/pdf/1712.08604v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-surgical-skill-assessment-in-rmis |
Repo | |
Framework | |
S-OHEM: Stratified Online Hard Example Mining for Object Detection
Title | S-OHEM: Stratified Online Hard Example Mining for Object Detection |
Authors | Minne Li, Zhaoning Zhang, Hao Yu, Xinyuan Chen, Dongsheng Li |
Abstract | One of the major challenges in object detection is to propose detectors with highly accurate localization of objects. The online sampling of high-loss region proposals (hard examples) uses the multitask loss with equal weight settings across all loss types (e.g, classification and localization, rigid and non-rigid categories) and ignores the influence of different loss distributions throughout the training process, which we find essential to the training efficacy. In this paper, we present the Stratified Online Hard Example Mining (S-OHEM) algorithm for training higher efficiency and accuracy detectors. S-OHEM exploits OHEM with stratified sampling, a widely-adopted sampling technique, to choose the training examples according to this influence during hard example mining, and thus enhance the performance of object detectors. We show through systematic experiments that S-OHEM yields an average precision (AP) improvement of 0.5% on rigid categories of PASCAL VOC 2007 for both the IoU threshold of 0.6 and 0.7. For KITTI 2012, both results of the same metric are 1.6%. Regarding the mean average precision (mAP), a relative increase of 0.3% and 0.5% (1% and 0.5%) is observed for VOC07 (KITTI12) using the same set of IoU threshold. Also, S-OHEM is easy to integrate with existing region-based detectors and is capable of acting with post-recognition level regressors. |
Tasks | Object Detection |
Published | 2017-05-05 |
URL | http://arxiv.org/abs/1705.02233v4 |
http://arxiv.org/pdf/1705.02233v4.pdf | |
PWC | https://paperswithcode.com/paper/s-ohem-stratified-online-hard-example-mining |
Repo | |
Framework | |
Exploring Neural Transducers for End-to-End Speech Recognition
Title | Exploring Neural Transducers for End-to-End Speech Recognition |
Authors | Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu |
Abstract | In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition. We show that, without any language model, Seq2Seq and RNN-Transducer models both outperform the best reported CTC models with a language model, on the popular Hub5’00 benchmark. On our internal diverse dataset, these trends continue - RNNTransducer models rescored with a language model after beam search outperform our best CTC models. These results simplify the speech recognition pipeline so that decoding can now be expressed purely as neural network operations. We also study how the choice of encoder architecture affects the performance of the three models - when all encoder layers are forward only, and when encoders downsample the input representation aggressively. |
Tasks | End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2017-07-24 |
URL | http://arxiv.org/abs/1707.07413v1 |
http://arxiv.org/pdf/1707.07413v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-neural-transducers-for-end-to-end |
Repo | |
Framework | |
Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?
Title | Is Input Sparsity Time Possible for Kernel Low-Rank Approximation? |
Authors | Cameron Musco, David P. Woodruff |
Abstract | Low-rank approximation is a common tool used to accelerate kernel methods: the $n \times n$ kernel matrix $K$ is approximated via a rank-$k$ matrix $\tilde K$ which can be stored in much less space and processed more quickly. In this work we study the limits of computationally efficient low-rank kernel approximation. We show that for a broad class of kernels, including the popular Gaussian and polynomial kernels, computing a relative error $k$-rank approximation to $K$ is at least as difficult as multiplying the input data matrix $A \in \mathbb{R}^{n \times d}$ by an arbitrary matrix $C \in \mathbb{R}^{d \times k}$. Barring a breakthrough in fast matrix multiplication, when $k$ is not too large, this requires $\Omega(nnz(A)k)$ time where $nnz(A)$ is the number of non-zeros in $A$. This lower bound matches, in many parameter regimes, recent work on subquadratic time algorithms for low-rank approximation of general kernels [MM16,MW17], demonstrating that these algorithms are unlikely to be significantly improved, in particular to $O(nnz(A))$ input sparsity runtimes. At the same time there is hope: we show for the first time that $O(nnz(A))$ time approximation is possible for general radial basis function kernels (e.g., the Gaussian kernel) for the closely related problem of low-rank approximation of the kernelized dataset. |
Tasks | |
Published | 2017-11-05 |
URL | http://arxiv.org/abs/1711.01596v1 |
http://arxiv.org/pdf/1711.01596v1.pdf | |
PWC | https://paperswithcode.com/paper/is-input-sparsity-time-possible-for-kernel |
Repo | |
Framework | |
Virtual Sensor Modelling using Neural Networks with Coefficient-based Adaptive Weights and Biases Search Algorithm for Diesel Engines
Title | Virtual Sensor Modelling using Neural Networks with Coefficient-based Adaptive Weights and Biases Search Algorithm for Diesel Engines |
Authors | Kushagra Rastogi, Navreet Saini |
Abstract | With the explosion in the field of Big Data and introduction of more stringent emission norms every three to five years, automotive companies must not only continue to enhance the fuel economy ratings of their products, but also provide valued services to their customers such as delivering engine performance and health reports at regular intervals. A reasonable solution to both issues is installing a variety of sensors on the engine. Sensor data can be used to develop fuel economy features and will directly indicate engine performance. However, mounting a plethora of sensors is impractical in a very cost-sensitive industry. Thus, virtual sensors can replace physical sensors by reducing cost while capturing essential engine data. |
Tasks | |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08319v1 |
http://arxiv.org/pdf/1712.08319v1.pdf | |
PWC | https://paperswithcode.com/paper/virtual-sensor-modelling-using-neural |
Repo | |
Framework | |
A Computational Model of a Single-Photon Avalanche Diode Sensor for Transient Imaging
Title | A Computational Model of a Single-Photon Avalanche Diode Sensor for Transient Imaging |
Authors | Quercus Hernandez, Diego Gutierrez, Adrian Jarabo |
Abstract | Single-Photon Avalanche Diodes (SPAD) are affordable photodetectors, capable to collect extremely fast low-energy events, due to their single-photon sensibility. This makes them very suitable for time-of-flight-based range imaging systems, allowing to reduce costs and power requirements, without sacrifizing much temporal resolution. In this work we describe a computational model to simulate the behaviour of SPAD sensors, aiming to provide a realistic camera model for time-resolved light transport simulation, with applications on prototyping new reconstructions techniques based on SPAD time-of-flight data. Our model accounts for the major effects of the sensor on the incoming signal. We compare our model against real-world measurements, and apply it to a variety of scenarios, including complex multiply-scattered light transport. |
Tasks | |
Published | 2017-02-23 |
URL | http://arxiv.org/abs/1703.02635v1 |
http://arxiv.org/pdf/1703.02635v1.pdf | |
PWC | https://paperswithcode.com/paper/a-computational-model-of-a-single-photon |
Repo | |
Framework | |
SemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and Querying Knowledge Graphs
Title | SemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and Querying Knowledge Graphs |
Authors | Paul Cuddihy, Justin McHugh, Jenny Weisenberg Williams, Varish Mulwad, Kareem S. Aggour |
Abstract | The relatively recent adoption of Knowledge Graphs as an enabling technology in multiple high-profile artificial intelligence and cognitive applications has led to growing interest in the Semantic Web technology stack. Many semantics-related tools, however, are focused on serving experts with a deep understanding of semantic technologies. For example, triplification of relational data is available but there is no open source tool that allows a user unfamiliar with OWL/RDF to import data into a semantic triple store in an intuitive manner. Further, many tools require users to have a working understanding of SPARQL to query data. Casual users interested in benefiting from the power of Knowledge Graphs have few tools available for exploring, querying, and managing semantic data. We present SemTK, the Semantics Toolkit, a user-friendly suite of tools that allow both expert and non-expert semantics users convenient ingestion of relational data, simplified query generation, and more. The exploration of ontologies and instance data is performed through SPARQLgraph, an intuitive web-based user interface in SemTK understandable and navigable by a lay user. The open source version of SemTK is available at http://semtk.research.ge.com |
Tasks | Knowledge Graphs |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1710.11531v2 |
http://arxiv.org/pdf/1710.11531v2.pdf | |
PWC | https://paperswithcode.com/paper/semtk-an-ontology-first-open-source-semantic |
Repo | |
Framework | |