October 20, 2019

3228 words 16 mins read

Paper Group ANR 9

Approximate Submodular Functions and Performance Guarantees. Unsupervised Open Relation Extraction. Semantic Segmentation for Urban Planning Maps based on U-Net. Deep Generative Video Compression. Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects. Review of Applications of Generalized …

Approximate Submodular Functions and Performance Guarantees


Title	Approximate Submodular Functions and Performance Guarantees
Authors	Gaurav Gupta, Sergio Pequito, Paul Bogdan
Abstract	We consider the problem of maximizing non-negative non-decreasing set functions. Although most of the recent work focus on exploiting submodularity, it turns out that several objectives we encounter in practice are not submodular. Nonetheless, often we leverage the greedy algorithms used in submodular functions to determine a solution to the non-submodular functions. Hereafter, we propose to address the original problem by \emph{approximating} the non-submodular function and analyze the incurred error, as well as the performance trade-offs. To quantify the approximation error, we introduce a novel concept of $\delta$-approximation of a function, which we used to define the space of submodular functions that lie within an approximation error. We provide necessary conditions on the existence of such $\delta$-approximation functions, which might not be unique. Consequently, we characterize this subspace which we refer to as \emph{region of submodularity}. Furthermore, submodular functions are known to lead to different sub-optimality guarantees, so we generalize those dependencies upon a $\delta$-approximation into the notion of \emph{greedy curvature}. Finally, we used this latter notion to simplify some of the existing results and efficiently (i.e., linear complexity) determine tightened bounds on the sub-optimality guarantees using objective functions commonly used in practical setups and validate them using real data.
Tasks
Published	2018-06-17
URL	http://arxiv.org/abs/1806.06323v1
PDF	http://arxiv.org/pdf/1806.06323v1.pdf
PWC	https://paperswithcode.com/paper/approximate-submodular-functions-and
Repo
Framework

Unsupervised Open Relation Extraction


Title	Unsupervised Open Relation Extraction
Authors	Hady Elsahar, Elena Demidova, Simon Gottschalk, Christophe Gravier, Frederique Laforest
Abstract	We explore methods to extract relations between named entities from free text in an unsupervised setting. In addition to standard feature extraction, we develop a novel method to re-weight word embeddings. We alleviate the problem of features sparsity using an individual feature reduction. Our approach exhibits a significant improvement by 5.8% over the state-of-the-art relation clustering scoring a F1-score of 0.416 on the NYT-FB dataset.
Tasks	Relation Extraction, Word Embeddings
Published	2018-01-22
URL	http://arxiv.org/abs/1801.07174v1
PDF	http://arxiv.org/pdf/1801.07174v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-open-relation-extraction
Repo
Framework

Semantic Segmentation for Urban Planning Maps based on U-Net


Title	Semantic Segmentation for Urban Planning Maps based on U-Net
Authors	Zhiling Guo, Hiroaki Shengoku, Guangming Wu, Qi Chen, Wei Yuan, Xiaodan Shi, Xiaowei Shao, Yongwei Xu, Ryosuke Shibasaki
Abstract	The automatic digitizing of paper maps is a significant and challenging task for both academia and industry. As an important procedure of map digitizing, the semantic segmentation section mainly relies on manual visual interpretation with low efficiency. In this study, we select urban planning maps as a representative sample and investigate the feasibility of utilizing U-shape fully convolutional based architecture to perform end-to-end map semantic segmentation. The experimental results obtained from the test area in Shibuya district, Tokyo, demonstrate that our proposed method could achieve a very high Jaccard similarity coefficient of 93.63% and an overall accuracy of 99.36%. For implementation on GPGPU and cuDNN, the required processing time for the whole Shibuya district can be less than three minutes. The results indicate the proposed method can serve as a viable tool for urban planning map semantic segmentation task with high accuracy and efficiency.
Tasks	Semantic Segmentation
Published	2018-09-28
URL	http://arxiv.org/abs/1809.10862v2
PDF	http://arxiv.org/pdf/1809.10862v2.pdf
PWC	https://paperswithcode.com/paper/semantic-segmentation-for-urban-planning-maps
Repo
Framework

Deep Generative Video Compression


Title	Deep Generative Video Compression
Authors	Jun Han, Salvator Lombardo, Christopher Schroers, Stephan Mandt
Abstract	The usage of deep generative models for image compression has led to impressive performance gains over classical codecs while neural video compression is still in its infancy. Here, we propose an end-to-end, deep generative modeling approach to compress temporal sequences with a focus on video. Our approach builds upon variational autoencoder (VAE) models for sequential data and combines them with recent work on neural image compression. The approach jointly learns to transform the original sequence into a lower-dimensional representation as well as to discretize and entropy code this representation according to predictions of the sequential VAE. Rate-distortion evaluations on small videos from public data sets with varying complexity and diversity show that our model yields competitive results when trained on generic video content. Extreme compression performance is achieved when training the model on specialized content.
Tasks	Image Compression, Video Compression
Published	2018-10-05
URL	https://arxiv.org/abs/1810.02845v2
PDF	https://arxiv.org/pdf/1810.02845v2.pdf
PWC	https://paperswithcode.com/paper/deep-probabilistic-video-compression
Repo
Framework

Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects


Title	Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects
Authors	Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
Abstract	We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder. We compared an ideal system that uses manually corrected linguistic features including phoneme and prosodic information in training and test sets against a few other systems that use corrupted linguistic features. Both subjective and objective results demonstrate that corrupted linguistic features, especially those in the test set, affected the ideal system’s performance significantly in a statistical sense due to a mismatched condition between the training and test sets. Interestingly, while an utterance-level Turing test showed that listeners had a difficult time differentiating synthetic speech from natural speech, it further indicated that adding noise to the linguistic features in the training set can partially reduce the effect of the mismatch, regularize the model, and help the system perform better when linguistic features of the test set are noisy.
Tasks	Denoising, Speech Synthesis
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00665v1
PDF	http://arxiv.org/pdf/1808.00665v1.pdf
PWC	https://paperswithcode.com/paper/investigating-accuracy-of-pitch-accent
Repo
Framework

Review of Applications of Generalized Regression Neural Networks in Identification and Control of Dynamic Systems


Title	Review of Applications of Generalized Regression Neural Networks in Identification and Control of Dynamic Systems
Authors	Ahmad Jobran Al-Mahasneh, Sreenatha G. Anavatti, Matthew A. Garratt
Abstract	This paper depicts a brief revision of Generalized Regression Neural Networks (GRNN) applications in system identification and control of dynamic systems. In addition, a comparison study between the performance of back-propagation neural networks and GRNN is presented for system identification problems. The results of the comparison confirm that GRNN has shorter training time and higher accuracy than the counterpart back-propagation neural networks.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11236v1
PDF	http://arxiv.org/pdf/1805.11236v1.pdf
PWC	https://paperswithcode.com/paper/review-of-applications-of-generalized
Repo
Framework

Hierarchical Modular Reinforcement Learning Method and Knowledge Acquisition of State-Action Rule for Multi-target Problem


Title	Hierarchical Modular Reinforcement Learning Method and Knowledge Acquisition of State-Action Rule for Multi-target Problem
Authors	Takumi Ichimura, Daisuke Igaue
Abstract	Hierarchical Modular Reinforcement Learning (HMRL), consists of 2 layered learning where Profit Sharing works to plan a prey position in the higher layer and Q-learning method trains the state-actions to the target in the lower layer. In this paper, we expanded HMRL to multi-target problem to take the distance between targets to the consideration. The function, called `AT field’, can estimate the interests for an agent according to the distance between 2 agents and the advantage/disadvantage of the other agent. Moreover, the knowledge related to state-action rules is extracted by C4.5. The action under the situation is decided by using the acquired knowledge. To verify the effectiveness of proposed method, some experimental results are reported. \|
Tasks	Q-Learning
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02698v1
PDF	http://arxiv.org/pdf/1804.02698v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-modular-reinforcement-learning
Repo
Framework

Full Wafer Redistribution and Wafer Embedding as Key Technologies for a Multi-Scale Neuromorphic Hardware Cluster


Title	Full Wafer Redistribution and Wafer Embedding as Key Technologies for a Multi-Scale Neuromorphic Hardware Cluster
Authors	Kai Zoschke, Maurice Güttler, Lars Böttcher, Andreas Grübl, Dan Husmann, Johannes Schemmel, Karlheinz Meier, Oswin Ehrmann
Abstract	Together with the Kirchhoff-Institute for Physics(KIP) the Fraunhofer IZM has developed a full wafer redistribution and embedding technology as base for a large-scale neuromorphic hardware system. The paper will give an overview of the neuromorphic computing platform at the KIP and the associated hardware requirements which drove the described technological developments. In the first phase of the project standard redistribution technologies from wafer level packaging were adapted to enable a high density reticle-to-reticle routing on 200mm CMOS wafers. Neighboring reticles were interconnected across the scribe lines with an 8{\mu}m pitch routing based on semi-additive copper metallization. Passivation by photo sensitive benzocyclobutene was used to enable a second intra-reticle routing layer. Final IO pads with flash gold were generated on top of each reticle. With that concept neuromorphic systems based on full wafers could be assembled and tested. The fabricated high density inter-reticle routing revealed a very high yield of larger than 99.9%. In order to allow an upscaling of the system size to a large number of wafers with feasible effort a full wafer embedding concept for printed circuit boards was developed and proven in the second phase of the project. The wafers were thinned to 250{\mu}m and laminated with additional prepreg layers and copper foils into a core material. After lamination of the PCB panel the reticle IOs of the embedded wafer were accessed by micro via drilling, copper electroplating, lithography and subtractive etching of the PCB wiring structure. The created wiring with 50um line width enabled an access of the reticle IOs on the embedded wafer as well as a board level routing. The panels with the embedded wafers were subsequently stressed with up to 1000 thermal cycles between 0C and 100C and have shown no severe failure formation over the cycle time.
Tasks
Published	2018-01-15
URL	http://arxiv.org/abs/1801.04734v1
PDF	http://arxiv.org/pdf/1801.04734v1.pdf
PWC	https://paperswithcode.com/paper/full-wafer-redistribution-and-wafer-embedding
Repo
Framework

DONUT: CTC-based Query-by-Example Keyword Spotting


Title	DONUT: CTC-based Query-by-Example Keyword Spotting
Authors	Loren Lugosch, Samuel Myer, Vikrant Singh Tomar
Abstract	Keyword spotting–or wakeword detection–is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on embedded systems without requiring private user data to be uploaded to the cloud.
Tasks	Keyword Spotting
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10736v1
PDF	http://arxiv.org/pdf/1811.10736v1.pdf
PWC	https://paperswithcode.com/paper/donut-ctc-based-query-by-example-keyword
Repo
Framework

Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach


Title	Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach
Authors	Grant M. Rotskoff, Eric Vanden-Eijnden
Abstract	Neural networks, a central tool in machine learning, have demonstrated remarkable, high fidelity performance on image recognition and classification tasks. These successes evince an ability to accurately represent high dimensional functions, but rigorous results about the approximation error of neural networks after training are few. Here we establish conditions for global convergence of the standard optimization algorithm used in machine learning applications, stochastic gradient descent (SGD), and quantify the scaling of its error with the size of the network. This is done by reinterpreting SGD as the evolution of a particle system with interactions governed by a potential related to the objective or “loss” function used to train the network. We show that, when the number $n$ of units is large, the empirical distribution of the particles descends on a convex landscape towards the global minimum at a rate independent of $n$, with a resulting approximation error that universally scales as $O(n^{-1})$. These properties are established in the form of a Law of Large Numbers and a Central Limit Theorem for the empirical distribution. Our analysis also quantifies the scale and nature of the noise introduced by SGD and provides guidelines for the step size and batch size to use when training a neural network. We illustrate our findings on examples in which we train neural networks to learn the energy function of the continuous 3-spin model on the sphere. The approximation error scales as our analysis predicts in as high a dimension as $d=25$.
Tasks
Published	2018-05-02
URL	https://arxiv.org/abs/1805.00915v3
PDF	https://arxiv.org/pdf/1805.00915v3.pdf
PWC	https://paperswithcode.com/paper/neural-networks-as-interacting-particle
Repo
Framework

Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders


Title	Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders
Authors	Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John Quinn, Thomas Niesler
Abstract	We compare features for dynamic time warping (DTW) when used to bootstrap keyword spotting (KWS) in an almost zero-resource setting. Such quickly-deployable systems aim to support United Nations (UN) humanitarian relief efforts in parts of Africa with severely under-resourced languages. Our objective is to identify acoustic features that provide acceptable KWS performance in such environments. As supervised resource, we restrict ourselves to a small, easily acquired and independently compiled set of isolated keywords. For feature extraction, a multilingual bottleneck feature (BNF) extractor, trained on well-resourced out-of-domain languages, is integrated with a correspondence autoencoder (CAE) trained on extremely sparse in-domain data. On their own, BNFs and CAE features are shown to achieve a more than 2% absolute performance improvement over baseline MFCCs. However, by using BNFs as input to the CAE, even better performance is achieved, with a more than 11% absolute improvement in ROC AUC over MFCCs and more than twice as many top-10 retrievals for two evaluated languages, English and Luganda. We conclude that integrating BNFs with the CAE allows both large out-of-domain and sparse in-domain resources to be exploited for improved ASR-free keyword spotting.
Tasks	Keyword Spotting
Published	2018-11-14
URL	https://arxiv.org/abs/1811.08284v2
PDF	https://arxiv.org/pdf/1811.08284v2.pdf
PWC	https://paperswithcode.com/paper/feature-exploration-for-almost-zero-resource
Repo
Framework

Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs


Title	Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs
Authors	Wenpeng Yin, Yadollah Yaghoobzadeh, Hinrich Schütze
Abstract	Large scale knowledge graphs (KGs) such as Freebase are generally incomplete. Reasoning over multi-hop (mh) KG paths is thus an important capability that is needed for question answering or other NLP tasks that require knowledge about the world. mh-KG reasoning includes diverse scenarios, e.g., given a head entity and a relation path, predict the tail entity; or given two entities connected by some relation paths, predict the unknown relation between them. We present ROPs, recurrent one-hop predictors, that predict entities at each step of mh-KB paths by using recurrent neural networks and vector representations of entities and relations, with two benefits: (i) modeling mh-paths of arbitrary lengths while updating the entity and relation representations by the training signal at each step; (ii) handling different types of mh-KG reasoning in a unified framework. Our models show state-of-the-art for two important multi-hop KG reasoning tasks: Knowledge Base Completion and Path Query Answering.
Tasks	Knowledge Base Completion, Knowledge Graphs, Question Answering
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04523v1
PDF	http://arxiv.org/pdf/1806.04523v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-one-hop-predictions-for-reasoning
Repo
Framework

Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting


Title	Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting
Authors	Zhehuai Chen, Yanmin Qian, Kai Yu
Abstract	Speech recognition is a sequence prediction problem. Besides employing various deep learning approaches for framelevel classification, sequence-level discriminative training has been proved to be indispensable to achieve the state-of-the-art performance in large vocabulary continuous speech recognition (LVCSR). However, keyword spotting (KWS), as one of the most common speech recognition tasks, almost only benefits from frame-level deep learning due to the difficulty of getting competing sequence hypotheses. The few studies on sequence discriminative training for KWS are limited for fixed vocabulary or LVCSR based methods and have not been compared to the state-of-the-art deep learning based KWS approaches. In this paper, a sequence discriminative training framework is proposed for both fixed vocabulary and unrestricted acoustic KWS. Sequence discriminative training for both sequence-level generative and discriminative models are systematically investigated. By introducing word-independent phone lattices or non-keyword blank symbols to construct competing hypotheses, feasible and efficient sequence discriminative training approaches are proposed for acoustic KWS. Experiments showed that the proposed approaches obtained consistent and significant improvement in both fixed vocabulary and unrestricted KWS tasks, compared to previous frame-level deep learning based acoustic KWS methods.
Tasks	Keyword Spotting, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00639v1
PDF	http://arxiv.org/pdf/1808.00639v1.pdf
PWC	https://paperswithcode.com/paper/sequence-discriminative-training-for-deep
Repo
Framework

Automatic Classification of Defective Photovoltaic Module Cells in Electroluminescence Images


Title	Automatic Classification of Defective Photovoltaic Module Cells in Electroluminescence Images
Authors	Sergiu Deitsch, Vincent Christlein, Stephan Berger, Claudia Buerhop-Lutz, Andreas Maier, Florian Gallwitz, Christian Riess
Abstract	Electroluminescence (EL) imaging is a useful modality for the inspection of photovoltaic (PV) modules. EL images provide high spatial resolution, which makes it possible to detect even finest defects on the surface of PV modules. However, the analysis of EL images is typically a manual process that is expensive, time-consuming, and requires expert knowledge of many different types of defects. In this work, we investigate two approaches for automatic detection of such defects in a single image of a PV cell. The approaches differ in their hardware requirements, which are dictated by their respective application scenarios. The more hardware-efficient approach is based on hand-crafted features that are classified in a Support Vector Machine (SVM). To obtain a strong performance, we investigate and compare various processing variants. The more hardware-demanding approach uses an end-to-end deep Convolutional Neural Network (CNN) that runs on a Graphics Processing Unit (GPU). Both approaches are trained on 1,968 cells extracted from high resolution EL intensity images of mono- and polycrystalline PV modules. The CNN is more accurate, and reaches an average accuracy of 88.42%. The SVM achieves a slightly lower average accuracy of 82.44%, but can run on arbitrary hardware. Both automated approaches make continuous, highly accurate monitoring of PV cells feasible.
Tasks
Published	2018-07-08
URL	http://arxiv.org/abs/1807.02894v3
PDF	http://arxiv.org/pdf/1807.02894v3.pdf
PWC	https://paperswithcode.com/paper/automatic-classification-of-defective
Repo
Framework

Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification


Title	Differentiable Supervector Extraction for Encoding Speaker and Phrase Information in Text Dependent Speaker Verification
Authors	Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
Abstract	In this paper, we propose a new differentiable neural network alignment mechanism for text-dependent speaker verification which uses alignment models to produce a supervector representation of an utterance. Unlike previous works with similar approaches, we do not extract the embedding of an utterance from the mean reduction of the temporal dimension. Our system replaces the mean by a phrase alignment model to keep the temporal structure of each phrase which is relevant in this application since the phonetic information is part of the identity in the verification task. Moreover, we can apply a convolutional neural network as front-end, and thanks to the alignment process being differentiable, we can train the whole network to produce a supervector for each utterance which will be discriminative with respect to the speaker and the phrase simultaneously. As we show, this choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. In this work, the process of verification is performed using a basic similarity metric, due to simplicity, compared to other more elaborate models that are commonly used. The new model using alignment to produce supervectors was tested on the RSR2015-Part I database for text-dependent speaker verification, providing competitive results compared to similar size networks using the mean to extract embeddings.
Tasks	Speaker Verification, Text-Dependent Speaker Verification
Published	2018-12-22
URL	http://arxiv.org/abs/1812.09484v1
PDF	http://arxiv.org/pdf/1812.09484v1.pdf
PWC	https://paperswithcode.com/paper/differentiable-supervector-extraction-for
Repo
Framework