Paper Group ANR 38
Towards Real-time Mispronunciation Detection in Kids’ Speech. The Operating System of the Neuromorphic BrainScaleS-1 System. Lipreading using Temporal Convolutional Networks. Learning Similarity Metrics for Numerical Simulations. Non-dimensional Star-Identification. Improving Training on Noisy Stuctured Labels. Disentangling Image Distortions in De …
Towards Real-time Mispronunciation Detection in Kids’ Speech
Title | Towards Real-time Mispronunciation Detection in Kids’ Speech |
Authors | Peter Plantinga, Eric Fosler-Lussier |
Abstract | Modern mispronunciation detection and diagnosis systems have seen significant gains in accuracy due to the introduction of deep learning. However, these systems have not been evaluated for the ability to be run in real-time, an important factor in applications that provide rapid feedback. In particular, the state-of-the-art uses bi-directional recurrent networks, where a uni-directional network may be more appropriate. Teacher-student learning is a natural approach to use to improve a uni-directional model, but when using a CTC objective, this is limited by poor alignment of outputs to evidence. We address this limitation by trying two loss terms for improving the alignments of our models. One loss is an “alignment loss” term that encourages outputs only when features do not resemble silence. The other loss term uses a uni-directional model as teacher model to align the bi-directional model. Our proposed model uses these aligned bi-directional models as teacher models. Experiments on the CSLU kids’ corpus show that these changes decrease the latency of the outputs, and improve the detection rates, with a trade-off between these goals. |
Tasks | |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01765v1 |
https://arxiv.org/pdf/2003.01765v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-real-time-mispronunciation-detection |
Repo | |
Framework | |
The Operating System of the Neuromorphic BrainScaleS-1 System
Title | The Operating System of the Neuromorphic BrainScaleS-1 System |
Authors | Eric Müller, Sebastian Schmitt, Christian Mauch, Sebastian Billaudelle, Andreas Grübl, Maurice Güttler, Dan Husmann, Joscha Ilmberger, Sebastian Jeltsch, Jakob Kaiser, Johann Klähn, Mitja Kleider, Christoph Koke, José Montes, Paul Müller, Johannes Partzsch, Felix Passenberg, Hartmut Schmidt, Bernhard Vogginger, Jonas Weidner, Christian Mayr, Johannes Schemmel |
Abstract | BrainScaleS-1 is a wafer-scale mixed-signal accelerated neuromorphic system targeted for research in the fields of computational neuroscience and beyond-von-Neumann computing. The BrainScaleS Operating System (BrainScaleS OS) is a software stack giving users the possibility to emulate networks described in the high-level network description language PyNN with minimal knowledge of the system. At the same time, expert usage is facilitated by allowing to hook into the system at any depth of the stack. We present operation and development methodologies implemented for the BrainScaleS-1 neuromorphic architecture and walk through the individual components of BrainScaleS OS constituting the software stack for BrainScaleS-1 platform operation. |
Tasks | |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13749v1 |
https://arxiv.org/pdf/2003.13749v1.pdf | |
PWC | https://paperswithcode.com/paper/the-operating-system-of-the-neuromorphic |
Repo | |
Framework | |
Lipreading using Temporal Convolutional Networks
Title | Lipreading using Temporal Convolutional Networks |
Authors | Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic |
Abstract | Lip-reading has attracted a lot of research attention lately thanks to advances in deep learning. The current state-of-the-art model for recognition of isolated words in-the-wild consists of a residual network and Bidirectional Gated Recurrent Unit (BGRU) layers. In this work, we address the limitations of this model and we propose changes which further improve its performance. Firstly, the BGRU layers are replaced with Temporal Convolutional Networks (TCN). Secondly, we greatly simplify the training procedure, which allows us to train the model in one single stage. Thirdly, we show that the current state-of-the-art methodology produces models that do not generalize well to variations on the sequence length, and we addresses this issue by proposing a variable-length augmentation. We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively. Our proposed model results in an absolute improvement of 1.2% and 3.2%, respectively, in these datasets which is the new state-of-the-art performance. |
Tasks | Lipreading |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2001.08702v1 |
https://arxiv.org/pdf/2001.08702v1.pdf | |
PWC | https://paperswithcode.com/paper/lipreading-using-temporal-convolutional |
Repo | |
Framework | |
Learning Similarity Metrics for Numerical Simulations
Title | Learning Similarity Metrics for Numerical Simulations |
Authors | Georg Kohl, Kiwon Um, Nils Thuerey |
Abstract | We propose a neural network-based approach that computes a stable and generalizing metric (LSiM), to compare field data from a variety of numerical simulation sources. Our method employs a Siamese network architecture that is motivated by the mathematical properties of a metric. We leverage a controllable data generation setup with partial differential equation (PDE) solvers to create increasingly different outputs from a reference simulation in a controlled environment. A central component of our learned metric is a specialized loss function that introduces knowledge about the correlation between single data samples into the training process. To demonstrate that the proposed approach outperforms existing simple metrics for vector spaces and other learned, image-based metrics, we evaluate the different methods on a large range of test data. Additionally, we analyze benefits for generalization and the impact of an adjustable training data difficulty. The robustness of LSiM is demonstrated via an evaluation on three real-world data sets. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07863v1 |
https://arxiv.org/pdf/2002.07863v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-similarity-metrics-for-numerical-1 |
Repo | |
Framework | |
Non-dimensional Star-Identification
Title | Non-dimensional Star-Identification |
Authors | Carl Leake, David Arnas, Daniele Mortari |
Abstract | This study introduces a new “Non-Dimensional” star identification algorithm to reliably identify the stars observed by a wide field-of-view star tracker when the focal length and optical axis offset values are known with poor accuracy. This algorithm is particularly suited to complement nominal lost-in-space algorithms when they fail the star identification due to focal length and/or optical axis offset deviations from their nominal operational ranges. These deviations may be caused, for example, by launch vibrations or thermal variations in orbit. The algorithm performance is compared in terms of accuracy, speed, and robustness to the Pyramid algorithm. These comparisons highlight the clear advantages that a combined approach of these methodologies provides. |
Tasks | |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13736v1 |
https://arxiv.org/pdf/2003.13736v1.pdf | |
PWC | https://paperswithcode.com/paper/non-dimensional-star-identification |
Repo | |
Framework | |
Improving Training on Noisy Stuctured Labels
Title | Improving Training on Noisy Stuctured Labels |
Authors | Abubakar Abid, James Zou |
Abstract | Fine-grained annotations—e.g. dense image labels, image segmentation and text tagging—are useful in many ML applications but they are labor-intensive to generate. Moreover there are often systematic, structured errors in these fine-grained annotations. For example, a car might be entirely unannotated in the image, or the boundary between a car and street might only be coarsely annotated. Standard ML training on data with such structured errors produces models with biases and poor performance. In this work, we propose a novel framework of Error-Correcting Networks (ECN) to address the challenge of learning in the presence structured error in fine-grained annotations. Given a large noisy dataset with commonly occurring structured errors, and a much smaller dataset with more accurate annotations, ECN is able to substantially improve the prediction of fine-grained annotations compared to standard approaches for training on noisy data. It does so by learning to leverage the structures in the annotations and in the noisy labels. Systematic experiments on image segmentation and text tagging demonstrate the strong performance of ECN in improving training on noisy structured labels. |
Tasks | Semantic Segmentation |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03862v1 |
https://arxiv.org/pdf/2003.03862v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-training-on-noisy-stuctured-labels |
Repo | |
Framework | |
Disentangling Image Distortions in Deep Feature Space
Title | Disentangling Image Distortions in Deep Feature Space |
Authors | Simone Bianco, Luigi Celona, Paolo Napoletano, Raimondo Schettini |
Abstract | Previous literature suggests that perceptual similarity is an emergent property shared across deep visual representations. Experiments conducted on a dataset of human-judged image distortions have proven that deep features outperform, by a large margin, classic perceptual metrics. In this work we take a further step in the direction of a broader understanding of such property by analyzing the capability of deep visual representations to intrinsically characterize different types of image distortions. To this end, we firstly generate a number of synthetically distorted images by applying three mainstream distortion types to the LIVE database and then we analyze the features extracted by different layers of different Deep Network architectures. We observe that a dimension-reduced representation of the features extracted from a given layer permits to efficiently separate types of distortions in the feature space. Moreover, each network layer exhibits a different ability to separate between different types of distortions, and this ability varies according to the network architecture. As a further analysis, we evaluate the exploitation of features taken from the layer that better separates image distortions for: i) reduced-reference image quality assessment, and ii) distortion types and severity levels characterization on both single and multiple distortion databases. Results achieved on both tasks suggest that deep visual representations can be unsupervisedly employed to efficiently characterize various image distortions. |
Tasks | Image Quality Assessment |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11409v1 |
https://arxiv.org/pdf/2002.11409v1.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-image-distortions-in-deep |
Repo | |
Framework | |
Deep No-reference Tone Mapped Image Quality Assessment
Title | Deep No-reference Tone Mapped Image Quality Assessment |
Authors | Chandra Sekhar Ravuri, Rajesh Sureddi, Sathya Veera Reddy Dendi, Shanmuganathan Raman, Sumohana S. Channappayya |
Abstract | The process of rendering high dynamic range (HDR) images to be viewed on conventional displays is called tone mapping. However, tone mapping introduces distortions in the final image which may lead to visual displeasure. To quantify these distortions, we introduce a novel no-reference quality assessment technique for these tone mapped images. This technique is composed of two stages. In the first stage, we employ a convolutional neural network (CNN) to generate quality aware maps (also known as distortion maps) from tone mapped images by training it with the ground truth distortion maps. In the second stage, we model the normalized image and distortion maps using an Asymmetric Generalized Gaussian Distribution (AGGD). The parameters of the AGGD model are then used to estimate the quality score using support vector regression (SVR). We show that the proposed technique delivers competitive performance relative to the state-of-the-art techniques. The novelty of this work is its ability to visualize various distortions as quality maps (distortion maps), especially in the no-reference setting, and to use these maps as features to estimate the quality score of tone mapped images. |
Tasks | Image Quality Assessment |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.03165v1 |
https://arxiv.org/pdf/2002.03165v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-no-reference-tone-mapped-image-quality |
Repo | |
Framework | |
AlignSeg: Feature-Aligned Segmentation Networks
Title | AlignSeg: Feature-Aligned Segmentation Networks |
Authors | Zilong Huang, Yunchao Wei, Xinggang Wang, Honghui Shi, Wenyu Liu, Thomas S. Huang |
Abstract | Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by 1) step-by-step downsampling operations, and 2) indiscriminate contextual information fusion. In this paper, we explore the principles in addressing such feature misalignment issues and inventively propose Feature-Aligned Segmentation Networks (AlignSeg). AlignSeg consists of two primary modules, i.e., the Aligned Feature Aggregation (AlignFA) module and the Aligned Context Modeling (AlignCM) module. First, AlignFA adopts a simple learnable interpolation strategy to learn transformation offsets of pixels, which can effectively relieve the feature misalignment issue caused by multiresolution feature aggregation. Second, with the contextual embeddings in hand, AlignCM enables each pixel to choose private custom contextual information in an adaptive manner, making the contextual embeddings aligned better to provide appropriate guidance. We validate the effectiveness of our AlignSeg network with extensive experiments on Cityscapes and ADE20K, achieving new state-of-the-art mIoU scores of 82.6% and 45.95%, respectively. Our source code will be made available. |
Tasks | Semantic Segmentation |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2003.00872v1 |
https://arxiv.org/pdf/2003.00872v1.pdf | |
PWC | https://paperswithcode.com/paper/alignseg-feature-aligned-segmentation |
Repo | |
Framework | |
Metric Learning for Ordered Labeled Trees with pq-grams
Title | Metric Learning for Ordered Labeled Trees with pq-grams |
Authors | Hikaru Shindo, Masaaki Nishino, Yasuaki Kobayashi, Akihiro Yamamoto |
Abstract | Computing the similarity between two data points plays a vital role in many machine learning algorithms. Metric learning has the aim of learning a good metric automatically from data. Most existing studies on metric learning for tree-structured data have adopted the approach of learning the tree edit distance. However, the edit distance is not amenable for big data analysis because it incurs high computation cost. In this paper, we propose a new metric learning approach for tree-structured data with pq-grams. The pq-gram distance is a distance for ordered labeled trees, and has much lower computation cost than the tree edit distance. In order to perform metric learning based on pq-grams, we propose a new differentiable parameterized distance, weighted pq-gram distance. We also propose a way to learn the proposed distance based on Large Margin Nearest Neighbors (LMNN), which is a well-studied and practical metric learning scheme. We formulate the metric learning problem as an optimization problem and use the gradient descent technique to perform metric learning. We empirically show that the proposed approach not only achieves competitive results with the state-of-the-art edit distance-based methods in various classification problems, but also solves the classification problems much more rapidly than the edit distance-based methods. |
Tasks | Metric Learning |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.03960v1 |
https://arxiv.org/pdf/2003.03960v1.pdf | |
PWC | https://paperswithcode.com/paper/metric-learning-for-ordered-labeled-trees |
Repo | |
Framework | |
Using Machine Learning for Model Physics: an Overview
Title | Using Machine Learning for Model Physics: an Overview |
Authors | Vladimir Krasnopolsky |
Abstract | In the overview, a generic mathematical object (mapping) is introduced, and its relation to model physics parameterization is explained. Machine learning (ML) tools that can be used to emulate and/or approximate mappings are introduced. Applications of ML to emulate existing parameterizations, to develop new parameterizations, to ensure physical constraints, and control the accuracy of developed applications are described. Some ML approaches that allow developers to go beyond the standard parameterization paradigm are discussed. |
Tasks | |
Published | 2020-02-02 |
URL | https://arxiv.org/abs/2002.00416v1 |
https://arxiv.org/pdf/2002.00416v1.pdf | |
PWC | https://paperswithcode.com/paper/using-machine-learning-for-model-physics-an |
Repo | |
Framework | |
Harmonic Decompositions of Convolutional Networks
Title | Harmonic Decompositions of Convolutional Networks |
Authors | Meyer Scetbon, Zaid Harchaoui |
Abstract | We consider convolutional networks from a reproducing kernel Hilbert space viewpoint. We establish harmonic decompositions of convolutional networks, that is expansions into sums of elementary functions of increasing order. The elementary functions are related to the spherical harmonics, a fundamental class of special functions on spheres. The harmonic decompositions allow us to characterize the integral operators associated with convolutional networks, and obtain as a result statistical bounds for convolutional networks. |
Tasks | |
Published | 2020-03-28 |
URL | https://arxiv.org/abs/2003.12756v1 |
https://arxiv.org/pdf/2003.12756v1.pdf | |
PWC | https://paperswithcode.com/paper/harmonic-decompositions-of-convolutional |
Repo | |
Framework | |
Accelerography: Feasibility of Gesture Typing using Accelerometer
Title | Accelerography: Feasibility of Gesture Typing using Accelerometer |
Authors | Arindam Roy Chowdhury, Abhinandan Dalal, Shubhajit Sen |
Abstract | In this paper, we aim to look into the feasibility of constructing alphabets using gestures. The main idea is to construct gestures, that are easy to remember, not cumbersome to reproduce and easily identifiable. We construct gestures for the entire English alphabet and provide an algorithm to identify the gestures, even when they are constructed continuously. We tackle the problem statistically, taking into account the problem of randomness in the hand movement gestures of users, and achieve an average accuracy of 97.33% with the entire English alphabet. |
Tasks | |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.14310v1 |
https://arxiv.org/pdf/2003.14310v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerography-feasibility-of-gesture-typing |
Repo | |
Framework | |
Scalable Bid Landscape Forecasting in Real-time Bidding
Title | Scalable Bid Landscape Forecasting in Real-time Bidding |
Authors | Aritra Ghosh, Saayan Mitra, Somdeb Sarkhel, Jason Xie, Gang Wu, Viswanathan Swaminathan |
Abstract | In programmatic advertising, ad slots are usually sold using second-price (SP) auctions in real-time. The highest bidding advertiser wins but pays only the second-highest bid (known as the winning price). In SP, for a single item, the dominant strategy of each bidder is to bid the true value from the bidder’s perspective. However, in a practical setting, with budget constraints, bidding the true value is a sub-optimal strategy. Hence, to devise an optimal bidding strategy, it is of utmost importance to learn the winning price distribution accurately. Moreover, a demand-side platform (DSP), which bids on behalf of advertisers, observes the winning price if it wins the auction. For losing auctions, DSPs can only treat its bidding price as the lower bound for the unknown winning price. In literature, typically censored regression is used to model such partially observed data. A common assumption in censored regression is that the winning price is drawn from a fixed variance (homoscedastic) uni-modal distribution (most often Gaussian). However, in reality, these assumptions are often violated. We relax these assumptions and propose a heteroscedastic fully parametric censored regression approach, as well as a mixture density censored network. Our approach not only generalizes censored regression but also provides flexibility to model arbitrarily distributed real-world data. Experimental evaluation on the publicly available dataset for winning price estimation demonstrates the effectiveness of our method. Furthermore, we evaluate our algorithm on one of the largest demand-side platforms and significant improvement has been achieved in comparison with the baseline solutions. |
Tasks | |
Published | 2020-01-18 |
URL | https://arxiv.org/abs/2001.06587v1 |
https://arxiv.org/pdf/2001.06587v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-bid-landscape-forecasting-in-real |
Repo | |
Framework | |
StegColNet: Steganalysis based on an ensemble colorspace approach
Title | StegColNet: Steganalysis based on an ensemble colorspace approach |
Authors | Shreyank N Gowda, Chun Yuan |
Abstract | Image steganography refers to the process of hiding information inside images. Steganalysis is the process of detecting a steganographic image. We introduce a steganalysis approach that uses an ensemble color space model to obtain a weighted concatenated feature activation map. The concatenated map helps to obtain certain features explicit to each color space. We use a levy-flight grey wolf optimization strategy to reduce the number of features selected in the map. We then use these features to classify the image into one of two classes: whether the given image has secret information stored or not. Extensive experiments have been done on a large scale dataset extracted from the Bossbase dataset. Also, we show that the model can be transferred to different datasets and perform extensive experiments on a mixture of datasets. Our results show that the proposed approach outperforms the recent state of the art deep learning steganalytical approaches by 2.32 percent on average for 0.2 bits per channel (bpc) and 1.87 percent on average for 0.4 bpc. |
Tasks | Image Steganography |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02413v1 |
https://arxiv.org/pdf/2002.02413v1.pdf | |
PWC | https://paperswithcode.com/paper/stegcolnet-steganalysis-based-on-an-ensemble |
Repo | |
Framework | |