Paper Group ANR 167
Fractional spectral graph wavelets and their applications. On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods. Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement. Distributed Microphone Speech Enhancement based on Deep Learning. Multilingual End-to-End Speech Translation. Nece …
Fractional spectral graph wavelets and their applications
Title | Fractional spectral graph wavelets and their applications |
Authors | Jiasong Wu, Fuzhi Wu, Qihan Yang, Youyong Kong, Xilin Liu, Yan Zhang, Lotfi Senhadji, Huazhong Shu |
Abstract | One of the key challenges in the area of signal processing on graphs is to design transforms and dictionaries methods to identify and exploit structure in signals on weighted graphs. In this paper, we first generalize graph Fourier transform (GFT) to graph fractional Fourier transform (GFRFT), which is then used to define a novel transform named spectral graph fractional wavelet transform (SGFRWT), which is a generalized and extended version of spectral graph wavelet transform (SGWT). A fast algorithm for SGFRWT is also derived and implemented based on Fourier series approximation. The potential applications of SGFRWT are also presented. |
Tasks | |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10471v1 |
http://arxiv.org/pdf/1902.10471v1.pdf | |
PWC | https://paperswithcode.com/paper/fractional-spectral-graph-wavelets-and-their |
Repo | |
Framework | |
On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods
Title | On the Insufficiency of the Large Margins Theory in Explaining the Performance of Ensemble Methods |
Authors | Waldyn Martinez, J. Brian Gray |
Abstract | Boosting and other ensemble methods combine a large number of weak classifiers through weighted voting to produce stronger predictive models. To explain the successful performance of boosting algorithms, Schapire et al. (1998) showed that AdaBoost is especially effective at increasing the margins of the training data. Schapire et al. (1998) also developed an upper bound on the generalization error of any ensemble based on the margins of the training data, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal (sometimes referred to as the ``large margins theory’'). Tighter bounds have been derived and have reinforced the large margins theory hypothesis. For instance, Wang et al. (2011) suggest that specific margin instances, such as the equilibrium margin, can better summarize the margins distribution. These results have led many researchers to consider direct optimization of the margins to improve ensemble generalization error with mixed results. We show that the large margins theory is not sufficient for explaining the performance of voting classifiers. We do this by illustrating how it is possible to improve upon the margin distribution of an ensemble solution, while keeping the complexity fixed, yet not improve the test set performance. | |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04063v1 |
https://arxiv.org/pdf/1906.04063v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-insufficiency-of-the-large-margins |
Repo | |
Framework | |
Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement
Title | Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement |
Authors | Mostafa Sadeghi, Xavier Alameda-Pineda |
Abstract | In this paper, we are interested in unsupervised speech enhancement using latent variable generative models. We propose to learn a generative model for clean speech spectrogram based on a variational autoencoder (VAE) where a mixture of audio and visual networks is used to infer the posterior of the latent variables. This is motivated by the fact that visual data, i.e., lips images of the speaker, provide helpful and complementary information about speech. As such, they can help train a richer inference network. Moreover, during speech enhancement, visual data are used to initialize the latent variables, thus providing a more robust initialization than the noisy speech spectrogram. A variational inference approach is derived to train the proposed VAE. Thanks to the novel inference procedure and the robust initialization, the proposed audio-visual mixture VAE exhibits superior performance on speech enhancement than using the standard audio-only counterpart. |
Tasks | Speech Enhancement |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10647v1 |
https://arxiv.org/pdf/1912.10647v1.pdf | |
PWC | https://paperswithcode.com/paper/mixture-of-inference-networks-for-vae-based |
Repo | |
Framework | |
Distributed Microphone Speech Enhancement based on Deep Learning
Title | Distributed Microphone Speech Enhancement based on Deep Learning |
Authors | Syu-Siang Wang, Yu-You Liang, Jeih-weih Hung, Yu Tsao, Hsin-Min Wang, Shih-Hau Fang |
Abstract | Speech-related applications deliver inferior performance in complex noise environments. Therefore, this study primarily addresses this problem by introducing speech-enhancement (SE) systems based on deep neural networks (DNNs) applied to a distributed microphone architecture. The first system constructs a DNN model for each microphone to enhance the recorded noisy speech signal, and the second system combines all the noisy recordings into a large feature structure that is then enhanced through a DNN model. As for the third system, a channel-dependent DNN is first used to enhance the corresponding noisy input, and all the channel-wise enhanced outputs are fed into a DNN fusion model to construct a nearly clean signal. All the three DNN SE systems are operated in the acoustic frequency domain of speech signals in a diffuse-noise field environment. Evaluation experiments were conducted on the Taiwan Mandarin Hearing in Noise Test (TMHINT) database, and the results indicate that all the three DNN-based SE systems provide the original noise-corrupted signals with improved speech quality and intelligibility, whereas the third system delivers the highest signal-to-noise ratio (SNR) improvement and optimal speech intelligibility. |
Tasks | Speech Enhancement |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08153v2 |
https://arxiv.org/pdf/1911.08153v2.pdf | |
PWC | https://paperswithcode.com/paper/distributed-microphone-speech-enhancement |
Repo | |
Framework | |
Multilingual End-to-End Speech Translation
Title | Multilingual End-to-End Speech Translation |
Authors | Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe |
Abstract | In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the first time they are applied to the end-to-end ST problem. We show the effectiveness of multilingual end-to-end ST in two scenarios: one-to-many and many-to-many translations with publicly available data. We experimentally confirm that multilingual end-to-end ST models significantly outperform bilingual ones in both scenarios. The generalization of multilingual training is also evaluated in a transfer learning scenario to a very low-resource language pair. All of our codes and the database are publicly available to encourage further research in this emergent multilingual ST topic. |
Tasks | Machine Translation, Speech Recognition, Transfer Learning |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00254v2 |
https://arxiv.org/pdf/1910.00254v2.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-end-to-end-speech-translation |
Repo | |
Framework | |
Necessary and Sufficient Polynomial Constraints on Compatible Triplets of Essential Matrices
Title | Necessary and Sufficient Polynomial Constraints on Compatible Triplets of Essential Matrices |
Authors | E. V. Martyushev |
Abstract | The essential matrix incorporates relative rotation and translation parameters of two calibrated cameras. The well-known algebraic characterization of essential matrices, i.e. necessary and sufficient conditions under which an arbitrary matrix (of rank two) becomes essential, consists of a unique matrix equation of degree three. Based on this equation, a number of efficient algorithmic solutions to different relative pose estimation problems have been proposed. In three views, a possible way to describe the geometry of three calibrated cameras comes from considering compatible triplets of essential matrices. The compatibility is meant the correspondence of a triplet to a certain configuration of calibrated cameras. The main goal of this paper is to give an algebraic characterization of compatible triplets of essential matrices. Specifically, we propose necessary and sufficient polynomial constraints on a triplet of real rank-two essential matrices that ensure its compatibility. The constraints are given in the form of six cubic matrix equations, one quartic and one sextic scalar equations. An important advantage of the proposed constraints is their sufficiency even in the case of cameras with collinear centers. The applications of the constraints may include relative camera pose estimation in three and more views, averaging of essential matrices for incremental structure from motion, multiview camera auto-calibration, etc. |
Tasks | Calibration, Pose Estimation |
Published | 2019-12-15 |
URL | https://arxiv.org/abs/1912.11987v1 |
https://arxiv.org/pdf/1912.11987v1.pdf | |
PWC | https://paperswithcode.com/paper/necessary-and-sufficient-polynomial |
Repo | |
Framework | |
Generalization of feature embeddings transferred from different video anomaly detection domains
Title | Generalization of feature embeddings transferred from different video anomaly detection domains |
Authors | Fernando Pereira dos Santos, Leonardo Sampaio Ferraz Ribeiro, Moacir Antonelli Ponti |
Abstract | Detecting anomalous activity in video surveillance often involves using only normal activity data in order to learn an accurate detector. Due to lack of annotated data for some specific target domain, one could employ existing data from a source domain to produce better predictions. Hence, transfer learning presents itself as an important tool. But how to analyze the resulting data space? This paper investigates video anomaly detection, in particular feature embeddings of pre-trained CNN that can be used with non-fully supervised data. By proposing novel cross-domain generalization measures, we study how source features can generalize for different target video domains, as well as analyze unsupervised transfer learning. The proposed generalization measures are not only a theorical approach, but show to be useful in practice as a way to understand which datasets can be used or transferred to describe video frames, which it is possible to better discriminate between normal and anomalous activity. |
Tasks | Anomaly Detection, Domain Generalization, Transfer Learning |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.09819v1 |
http://arxiv.org/pdf/1901.09819v1.pdf | |
PWC | https://paperswithcode.com/paper/generalization-of-feature-embeddings |
Repo | |
Framework | |
Evaluating Competence Measures for Dynamic Regressor Selection
Title | Evaluating Competence Measures for Dynamic Regressor Selection |
Authors | Thiago J. M. Moura, George D. C. Cavalcanti, Luiz S. Oliveira |
Abstract | Dynamic regressor selection (DRS) systems work by selecting the most competent regressors from an ensemble to estimate the target value of a given test pattern. This competence is usually quantified using the performance of the regressors in local regions of the feature space around the test pattern. However, choosing the best measure to calculate the level of competence correctly is not straightforward. The literature of dynamic classifier selection presents a wide variety of competence measures, which cannot be used or adapted for DRS. In this paper, we review eight measures used with regression problems, and adapt them to test the performance of the DRS algorithms found in the literature. Such measures are extracted from a local region of the feature space around the test pattern, called region of competence, therefore competence measures.To better compare the competence measures, we perform a set of comprehensive experiments of 15 regression datasets. Three DRS systems were compared against individual regressor and static systems that use the Mean and the Median to combine the outputs of the regressors from the ensemble. The DRS systems were assessed varying the competence measures. Our results show that DRS systems outperform individual regressors and static systems but the choice of the competence measure is problem-dependent. |
Tasks | |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04645v1 |
http://arxiv.org/pdf/1904.04645v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-competence-measures-for-dynamic |
Repo | |
Framework | |
Crowd-aware itinerary recommendation: a game-theoretic approach to optimize social welfare
Title | Crowd-aware itinerary recommendation: a game-theoretic approach to optimize social welfare |
Authors | Junhua Liu, Chu Guo, Kristin L. Wood, Kwan Hui Lim |
Abstract | The demand for Itinerary Planning grows rapidly in recent years as the economy and standard of living are improving globally. Nonetheless, itinerary recommendation remains a complex and difficult task, especially for one that is queuing time- and crowd-aware. This difficulty is due to the large amount of parameters involved, i.e., attraction popularity, queuing time, walking time, operating hours, etc. Many recent or existing works adopt a data-driven approach and propose solutions with single-person perspectives, but do not address real-world problems as a result of natural crowd behavior, such as the Selfish Routing problem, which describes the consequence of ineffective network and sub-optimal social outcome by leaving agents to decide freely. In this work, we propose the Strategic and Crowd-Aware Itinerary Recommendation (SCAIR) algorithm which takes a game-theoretic approach to address the Selfish Routing problem and optimize social welfare in real-world situations. To address the NP-hardness of the social welfare optimization problem, we further propose a Markov Decision Process (MDP) approach which enables our simulations to be carried out in poly-time. We then use real-world data to evaluate the proposed algorithm, with benchmarks of two intuitive strategies commonly adopted in real life, and a recent algorithm published in the literature. Our simulation results highlight the existence of the Selfish Routing problem and show that SCAIR outperforms the benchmarks in handling this issue with real-world data. |
Tasks | |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.07775v2 |
https://arxiv.org/pdf/1909.07775v2.pdf | |
PWC | https://paperswithcode.com/paper/crowd-aware-itinerary-recommendation-a-game |
Repo | |
Framework | |
Improved Hierarchical Patient Classification with Language Model Pretraining over Clinical Notes
Title | Improved Hierarchical Patient Classification with Language Model Pretraining over Clinical Notes |
Authors | Jonas Kemp, Alvin Rajkomar, Andrew M. Dai |
Abstract | Clinical notes in electronic health records contain highly heterogeneous writing styles, including non-standard terminology or abbreviations. Using these notes in predictive modeling has traditionally required preprocessing (e.g. taking frequent terms or topic modeling) that removes much of the richness of the source data. We propose a pretrained hierarchical recurrent neural network model that parses minimally processed clinical notes in an intuitive fashion, and show that it improves performance for discharge diagnosis classification tasks on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, compared to models that treat the notes as an unordered collection of terms or that conduct no pretraining. We also apply an attribution technique to examples to identify the words that the model uses to make its prediction, and show the importance of the words’ nearby context. |
Tasks | Language Modelling |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.03039v3 |
https://arxiv.org/pdf/1909.03039v3.pdf | |
PWC | https://paperswithcode.com/paper/improved-patient-classification-with-language |
Repo | |
Framework | |
Soft Rasterizer: Differentiable Rendering for Unsupervised Single-View Mesh Reconstruction
Title | Soft Rasterizer: Differentiable Rendering for Unsupervised Single-View Mesh Reconstruction |
Authors | Shichen Liu, Weikai Chen, Tianye Li, Hao Li |
Abstract | Rendering is the process of generating 2D images from 3D assets, simulated in a virtual environment, typically with a graphics pipeline. By inverting such renderer, one can think of a learning approach to predict a 3D shape from an input image. However, standard rendering pipelines involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence suitable for learning. We present the first non-parametric and truly differentiable rasterizer based on silhouettes. Our method enables unsupervised learning for high-quality 3D mesh reconstruction from a single image. We call our framework `soft rasterizer’ as it provides an accurate soft approximation of the standard rasterizer. The key idea is to fuse the probabilistic contributions of all mesh triangles with respect to the rendered pixels. When combined with a mesh generator in a deep neural network, our soft rasterizer is able to generate an approximated silhouette of the generated polygon mesh in the forward pass. The rendering loss is back-propagated to supervise the mesh generation without the need of 3D training data. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art unsupervised techniques, both quantitatively and qualitatively. We also show that our soft rasterizer can achieve comparable results to the cutting-edge supervised learning method and in various cases even better ones, especially for real-world data. | |
Tasks | |
Published | 2019-01-17 |
URL | http://arxiv.org/abs/1901.05567v2 |
http://arxiv.org/pdf/1901.05567v2.pdf | |
PWC | https://paperswithcode.com/paper/soft-rasterizer-differentiable-rendering-for |
Repo | |
Framework | |
E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings
Title | E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings |
Authors | Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang |
Abstract | Convolutional neural networks (CNNs) have been increasingly deployed to edge devices. Hence, many efforts have been made towards efficient CNN inference in resource-constrained platforms. This paper attempts to explore an orthogonal direction: how to conduct more energy-efficient training of CNNs, so as to enable on-device training. We strive to reduce the energy cost during training, by dropping unnecessary computations from three complementary levels: stochastic mini-batch dropping on the data level; selective layer update on the model level; and sign prediction for low-cost, low-precision back-propagation, on the algorithm level. Extensive simulations and ablation studies, with real energy measurements from an FPGA board, confirm the superiority of our proposed strategies and demonstrate remarkable energy savings for training. For example, when training ResNet-74 on CIFAR-10, we achieve aggressive energy savings of >90% and >60%, while incurring a top-1 accuracy loss of only about 2% and 1.2%, respectively. When training ResNet-110 on CIFAR-100, an over 84% training energy saving is achieved without degrading inference accuracy. |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13349v4 |
https://arxiv.org/pdf/1910.13349v4.pdf | |
PWC | https://paperswithcode.com/paper/191013349 |
Repo | |
Framework | |
Veni Vidi Dixi: Reliable Wireless Communication with Depth Images
Title | Veni Vidi Dixi: Reliable Wireless Communication with Depth Images |
Authors | Serkut Ayvaşık, H. Murat Gürsu, Wolfgang Kellerer |
Abstract | The upcoming industrial revolution requires deployment of critical wireless sensor networks for automation and monitoring purposes. However, the reliability of the wireless communication is rendered unpredictable by mobile elements in the communication environment such as humans or mobile robots which lead to dynamically changing radio environments. Changes in the wireless channel can be monitored with frequent pilot transmission. However, that would stress the battery life of sensors. In this work a new wireless channel estimation technique, Veni Vidi Dixi, VVD, is proposed. VVD leverages the redundant information in depth images obtained from the surveillance cameras in the communication environment and utilizes Convolutional Neural Networks CNNs to map the depth images of the communication environment to complex wireless channel estimations. VVD increases the wireless communication reliability without the need for frequent pilot transmission and with no additional complexity on the receiver. The proposed method is tested by conducting measurements in an indoor environment with a single mobile human. Up to authors best knowledge our work is the first to obtain complex wireless channel estimation from only depth images without any pilot transmission. The collected wireless trace, depth images and codes are publicly available. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.01879v1 |
https://arxiv.org/pdf/1912.01879v1.pdf | |
PWC | https://paperswithcode.com/paper/veni-vidi-dixi-reliable-wireless |
Repo | |
Framework | |
Contour Loss: Boundary-Aware Learning for Salient Object Segmentation
Title | Contour Loss: Boundary-Aware Learning for Salient Object Segmentation |
Authors | Zixuan Chen, Huajun Zhou, Xiaohua Xie, Jianhuang Lai |
Abstract | We present a learning model that makes full use of boundary information for salient object segmentation. Specifically, we come up with a novel loss function, i.e., Contour Loss, which leverages object contours to guide models to perceive salient object boundaries. Such a boundary-aware network can learn boundary-wise distinctions between salient objects and background, hence effectively facilitating the saliency detection. Yet the Contour Loss emphasizes on the local saliency. We further propose the hierarchical global attention module (HGAM), which forces the model hierarchically attend to global contexts, thus captures the global visual saliency. Comprehensive experiments on six benchmark datasets show that our method achieves superior performance over state-of-the-art ones. Moreover, our model has a real-time speed of 26 fps on a TITAN X GPU. |
Tasks | Saliency Detection, Semantic Segmentation |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.01975v1 |
https://arxiv.org/pdf/1908.01975v1.pdf | |
PWC | https://paperswithcode.com/paper/contour-loss-boundary-aware-learning-for |
Repo | |
Framework | |
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on GPU Clusters
Title | Training Distributed Deep Recurrent Neural Networks with Mixed Precision on GPU Clusters |
Authors | Alexey Svyatkovskiy, Julian Kates-Harbeck, William Tang |
Abstract | In this paper, we evaluate training of deep recurrent neural networks with half-precision floats. We implement a distributed, data-parallel, synchronous training algorithm by integrating TensorFlow and CUDA-aware MPI to enable execution across multiple GPU nodes and making use of high-speed interconnects. We introduce a learning rate schedule facilitating neural network convergence at up to $O(100)$ workers. Strong scaling tests performed on clusters of NVIDIA Pascal P100 GPUs show linear runtime and logarithmic communication time scaling for both single and mixed precision training modes. Performance is evaluated on a scientific dataset taken from the Joint European Torus (JET) tokamak, containing multi-modal time series of sensory measurements leading up to deleterious events called plasma disruptions, and the benchmark Large Movie Review Dataset~\cite{imdb}. Half-precision significantly reduces memory and network bandwidth, allowing training of state-of-the-art models with over 70 million trainable parameters while achieving a comparable test set performance as single precision. |
Tasks | Time Series |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.00286v1 |
https://arxiv.org/pdf/1912.00286v1.pdf | |
PWC | https://paperswithcode.com/paper/training-distributed-deep-recurrent-neural |
Repo | |
Framework | |