October 17, 2019

2947 words 14 mins read

Paper Group ANR 840

Low-Resource Contextual Topic Identification on Speech. Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How. Binary Classification with Karmic, Threshold-Quasi-Concave Metrics. Adaptive Quantile Sparse Image (AQuaSI) Prior for Inverse Imaging Problems. Deep Video-Based Performance Cloning. ROI-10D …

Low-Resource Contextual Topic Identification on Speech


Title	Low-Resource Contextual Topic Identification on Speech
Authors	Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur
Abstract	In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-language topic classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large improvements. In particular, we propose an attention-based contextual model which is able to leverage the contexts in a selective manner. We test both our contextual and non-contextual models on four LORELEI languages, and on all but one our attention-based contextual model significantly outperforms the context-independent models.
Tasks
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06204v2
PDF	http://arxiv.org/pdf/1807.06204v2.pdf
PWC	https://paperswithcode.com/paper/low-resource-contextual-topic-identification
Repo
Framework

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How


Title	Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How
Authors	Alberto Delmas, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Andreas Moshovos
Abstract	We show that, during inference with Convolutional Neural Networks (CNNs), more than 2x to $8x ineffectual work can be exposed if instead of targeting those weights and activations that are zero, we target different combinations of value stream properties. We demonstrate a practical application with Bit-Tactical (TCL), a hardware accelerator which exploits weight sparsity, per layer precision variability and dynamic fine-grain precision reduction for activations, and optionally the naturally occurring sparse effectual bit content of activations to improve performance and energy efficiency. TCL benefits both sparse and dense CNNs, natively supports both convolutional and fully-connected layers, and exploits properties of all activations to reduce storage, communication, and computation demands. While TCL does not require changes to the CNN to deliver benefits, it does reward any technique that would amplify any of the aforementioned weight and activation value properties. Compared to an equivalent data-parallel accelerator for dense CNNs, TCLp, a variant of TCL improves performance by 5.05x and is 2.98x more energy efficient while requiring 22% more area.
Tasks
Published	2018-03-09
URL	http://arxiv.org/abs/1803.03688v1
PDF	http://arxiv.org/pdf/1803.03688v1.pdf
PWC	https://paperswithcode.com/paper/bit-tactical-exploiting-ineffectual
Repo
Framework

Binary Classification with Karmic, Threshold-Quasi-Concave Metrics


Title	Binary Classification with Karmic, Threshold-Quasi-Concave Metrics
Authors	Bowei Yan, Oluwasanmi Koyejo, Kai Zhong, Pradeep Ravikumar
Abstract	Complex performance measures, beyond the popular measure of accuracy, are increasingly being used in the context of binary classification. These complex performance measures are typically not even decomposable, that is, the loss evaluated on a batch of samples cannot typically be expressed as a sum or average of losses evaluated at individual samples, which in turn requires new theoretical and methodological developments beyond standard treatments of supervised learning. In this paper, we advance this understanding of binary classification for complex performance measures by identifying two key properties: a so-called Karmic property, and a more technical threshold-quasi-concavity property, which we show is milder than existing structural assumptions imposed on performance measures. Under these properties, we show that the Bayes optimal classifier is a threshold function of the conditional probability of positive class. We then leverage this result to come up with a computationally practical plug-in classifier, via a novel threshold estimator, and further, provide a novel statistical analysis of classification error with respect to complex performance measures.
Tasks
Published	2018-06-02
URL	http://arxiv.org/abs/1806.00640v1
PDF	http://arxiv.org/pdf/1806.00640v1.pdf
PWC	https://paperswithcode.com/paper/binary-classification-with-karmic-threshold
Repo
Framework

Adaptive Quantile Sparse Image (AQuaSI) Prior for Inverse Imaging Problems


Title	Adaptive Quantile Sparse Image (AQuaSI) Prior for Inverse Imaging Problems
Authors	Franziska Schirrmacher, Thomas Köhler, Christian Riess
Abstract	Inverse problems play a central role for many classical computer vision and image processing tasks. Many inverse problems are ill-posed, and hence require a prior to regularize the solution space. However, many of the existing priors, like total variation, are based on ad-hoc assumptions that have difficulties to represent the actual distribution of natural images. Thus, a key challenge in research on image processing is to find better suited priors to represent natural images. In this work, we propose the Adaptive Quantile Sparse Image (AQuaSI) prior. It is based on a quantile filter, can be used as a joint filter on guidance data, and be readily plugged into a wide range of numerical optimization algorithms. We demonstrate the efficacy of the proposed prior in joint RGB/depth upsampling, on RGB/NIR image restoration, and in a comparison with related regularization by denoising approaches.
Tasks	Denoising, Image Restoration
Published	2018-04-06
URL	https://arxiv.org/abs/1804.02152v2
PDF	https://arxiv.org/pdf/1804.02152v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-quantile-sparse-image-aquasi-prior
Repo
Framework

Deep Video-Based Performance Cloning


Title	Deep Video-Based Performance Cloning
Authors	Kfir Aberman, Mingyi Shi, Jing Liao, Dani Lischinski, Baoquan Chen, Daniel Cohen-Or
Abstract	We present a new video-based performance cloning technique. After training a deep generative network using a reference video capturing the appearance and dynamics of a target actor, we are able to generate videos where this actor reenacts other performances. All of the training data and the driving performances are provided as ordinary video segments, without motion capture or depth information. Our generative model is realized as a deep neural network with two branches, both of which train the same space-time conditional generator, using shared weights. One branch, responsible for learning to generate the appearance of the target actor in various poses, uses \emph{paired} training data, self-generated from the reference video. The second branch uses unpaired data to improve generation of temporally coherent video renditions of unseen pose sequences. We demonstrate a variety of promising results, where our method is able to generate temporally coherent videos, for challenging scenarios where the reference and driving videos consist of very different dance performances. Supplementary video: https://youtu.be/JpwsEeqNhhA.
Tasks	Motion Capture
Published	2018-08-21
URL	http://arxiv.org/abs/1808.06847v1
PDF	http://arxiv.org/pdf/1808.06847v1.pdf
PWC	https://paperswithcode.com/paper/deep-video-based-performance-cloning
Repo
Framework

ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape


Title	ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape
Authors	Fabian Manhardt, Wadim Kehl, Adrien Gaidon
Abstract	We present a deep learning method for end-to-end monocular 3D object detection and metric shape retrieval. We propose a novel loss formulation by lifting 2D detection, orientation, and scale estimation into 3D space. Instead of optimizing these quantities separately, the 3D instantiation allows to properly measure the metric misalignment of boxes. We experimentally show that our 10D lifting of sparse 2D Regions of Interests (RoIs) achieves great results both for 6D pose and recovery of the textured metric geometry of instances. This further enables 3D synthetic data augmentation via inpainting recovered meshes directly onto the 2D scenes. We evaluate on KITTI3D against other strong monocular methods and demonstrate that our approach doubles the AP on the 3D pose metrics on the official test set, defining the new state of the art.
Tasks	3D Object Detection, Data Augmentation, Object Detection
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02781v3
PDF	http://arxiv.org/pdf/1812.02781v3.pdf
PWC	https://paperswithcode.com/paper/roi-10d-monocular-lifting-of-2d-detection-to
Repo
Framework

Applying Nature-Inspired Optimization Algorithms for Selecting Important Timestamps to Reduce Time Series Dimensionality


Title	Applying Nature-Inspired Optimization Algorithms for Selecting Important Timestamps to Reduce Time Series Dimensionality
Authors	Muhammad Marwan Muhammad Fuad
Abstract	Time series data account for a major part of data supply available today. Time series mining handles several tasks such as classification, clustering, query-by-content, prediction, and others. Performing data mining tasks on raw time series is inefficient as these data are high-dimensional by nature. Instead, time series are first pre-processed using several techniques before different data mining tasks can be performed on them. In general, there are two main approaches to reduce time series dimensionality, the first is what we call landmark methods. These methods are based on finding characteristic features in the target time series. The second is based on data transformations. These methods transform the time series from the original space into a reduced space, where they can be managed more efficiently. The method we present in this paper applies a third approach, as it projects a time series onto a lower-dimensional space by selecting important points in the time series. The novelty of our method is that these points are not chosen according to a geometric criterion, which is subjective in most cases, but through an optimization process. The other important characteristic of our method is that these important points are selected on a dataset-level and not on a single time series-level. The direct advantage of this strategy is that the distance defined on the low-dimensional space lower bounds the original distance applied to raw data. This enables us to apply the popular GEMINI algorithm. The promising results of our experiments on a wide variety of time series datasets, using different optimizers, and applied to the two major data mining tasks, validate our new method.
Tasks	Time Series
Published	2018-12-09
URL	http://arxiv.org/abs/1812.03444v1
PDF	http://arxiv.org/pdf/1812.03444v1.pdf
PWC	https://paperswithcode.com/paper/applying-nature-inspired-optimization
Repo
Framework

Incremental Learning-to-Learn with Statistical Guarantees


Title	Incremental Learning-to-Learn with Statistical Guarantees
Authors	Giulia Denevi, Carlo Ciliberto, Dimitris Stamos, Massimiliano Pontil
Abstract	In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta distribution. In contrast to previous work on batch learning-to-learn, we consider a scenario where tasks are presented sequentially and the algorithm needs to adapt incrementally to improve its performance on future tasks. Key to this setting is for the algorithm to rapidly incorporate new observations into the model as they arrive, without keeping them in memory. We focus on the case where the underlying algorithm is ridge regression parameterized by a positive semidefinite matrix. We propose to learn this matrix by applying a stochastic strategy to minimize the empirical error incurred by ridge regression on future tasks sampled from the meta distribution. We study the statistical properties of the proposed algorithm and prove non-asymptotic bounds on its excess transfer risk, that is, the generalization performance on new tasks from the same meta distribution. We compare our online learning-to-learn approach with a state of the art batch method, both theoretically and empirically.
Tasks
Published	2018-03-21
URL	http://arxiv.org/abs/1803.08089v1
PDF	http://arxiv.org/pdf/1803.08089v1.pdf
PWC	https://paperswithcode.com/paper/incremental-learning-to-learn-with
Repo
Framework

Image Recognition Using Scale Recurrent Neural Networks


Title	Image Recognition Using Scale Recurrent Neural Networks
Authors	Dong-Qing Zhang
Abstract	Convolutional Neural Network(CNN) has been widely used for image recognition with great success. However, there are a number of limitations of the current CNN based image recognition paradigm. First, the receptive field of CNN is generally fixed, which limits its recognition capacity when the input image is very large. Second, it lacks the computational scalability for dealing with images with different sizes. Third, it is quite different from human visual system for image recognition, which involves both feadforward and recurrent proprocessing. This paper proposes a different paradigm of image recognition, which can take advantages of variable scales of the input images, has more computational scalabilities, and is more similar to image recognition by human visual system. It is based on recurrent neural network (RNN) defined on image scale with an embeded base CNN, which is named Scale Recurrent Neural Network(SRNN). This RNN based approach makes it easier to deal with images with variable sizes, and allows us to borrow existing RNN techniques, such as LSTM and GRU, to further enhance the recognition accuracy. Our experiments show that the recognition accuracy of a base CNN can be significantly boosted using the proposed SRNN models. It also significantly outperforms the scale ensemble method, which integrate the results of performing CNN to the input image at different scales, although the computational overhead of using SRNN is negligible.
Tasks
Published	2018-03-25
URL	http://arxiv.org/abs/1803.09218v1
PDF	http://arxiv.org/pdf/1803.09218v1.pdf
PWC	https://paperswithcode.com/paper/image-recognition-using-scale-recurrent
Repo
Framework

Learning to detect dysarthria from raw speech


Title	Learning to detect dysarthria from raw speech
Authors	Juliette Millet, Neil Zeghidour
Abstract	Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-level features, by selecting the relevant information for the task at hand. We explore an alternative to this selection, by learning jointly the classifier, and the feature extraction. Recent work on speech recognition has shown improved performance over speech features by learning from the waveform. We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture. We apply this model to dysarthria detection from sentence-level audio recordings. Starting from a strong attention-based baseline on which mel-filterbanks outperform standard low-level descriptors, we show that learning the filters or the normalization and compression improves over fixed features by 10% absolute accuracy. We also observe a gain over OpenSmile features by learning jointly the feature extraction, the normalization, and the compression factor with the architecture. This constitutes a first attempt at learning jointly all these operations from raw audio for a speech classification task.
Tasks	Speech Recognition
Published	2018-11-27
URL	http://arxiv.org/abs/1811.11101v2
PDF	http://arxiv.org/pdf/1811.11101v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-detect-dysarthria-from-raw-speech
Repo
Framework

Extreme Learning Machine for Graph Signal Processing


Title	Extreme Learning Machine for Graph Signal Processing
Authors	Arun Venkitaraman, Saikat Chatterjee, Peter Händel
Abstract	In this article, we improve extreme learning machines for regression tasks using a graph signal processing based regularization. We assume that the target signal for prediction or regression is a graph signal. With this assumption, we use the regularization to enforce that the output of an extreme learning machine is smooth over a given graph. Simulation results with real data confirm that such regularization helps significantly when the available training data is limited in size and corrupted by noise.
Tasks
Published	2018-03-12
URL	http://arxiv.org/abs/1803.04193v1
PDF	http://arxiv.org/pdf/1803.04193v1.pdf
PWC	https://paperswithcode.com/paper/extreme-learning-machine-for-graph-signal
Repo
Framework

Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning


Title	Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning
Authors	Zheng Lian, Ya Li, Jianhua Tao, Jian Huang
Abstract	I have submitted a new version to arXiv:1910.13806. I forget to choose to replace the old version, but submitted a new one. It’s my mistake.
Tasks	Emotion Recognition, Speech Emotion Recognition, Transfer Learning
Published	2018-11-11
URL	https://arxiv.org/abs/1811.07691v2
PDF	https://arxiv.org/pdf/1811.07691v2.pdf
PWC	https://paperswithcode.com/paper/improving-speech-emotion-recognition-via
Repo
Framework

Faster Convergence & Generalization in DNNs


Title	Faster Convergence & Generalization in DNNs
Authors	Gaurav Singh, John Shawe-Taylor
Abstract	Deep neural networks have gained tremendous popularity in last few years. They have been applied for the task of classification in almost every domain. Despite the success, deep networks can be incredibly slow to train for even moderate sized models on sufficiently large datasets. Additionally, these networks require large amounts of data to be able to generalize. The importance of speeding up convergence, and generalization in deep networks can not be overstated. In this work, we develop an optimization algorithm based on generalized-optimal updates derived from minibatches that lead to faster convergence. Towards the end, we demonstrate on two benchmark datasets that the proposed method achieves two orders of magnitude speed up over traditional back-propagation, and is more robust to noise/over-fitting.
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11414v3
PDF	http://arxiv.org/pdf/1807.11414v3.pdf
PWC	https://paperswithcode.com/paper/faster-convergence-generalization-in-dnns
Repo
Framework


Title	Burst ranking for blind multi-image deblurring
Authors	Fidel A. Guerrero Peña, Pedro D. Marrero Fernández, Tsang Ing Ren, Jorge J. G. Leandro, Ricardo Nishihara
Abstract	We propose a new incremental aggregation algorithm for multi-image deblurring with automatic image selection. The primary motivation is that current bursts deblurring methods do not handle well situations in which misalignment or out-of-context frames are present in the burst. These real-life situations result in poor reconstructions or manual selection of the images that will be used to deblur. Automatically selecting best frames within the burst to improve the base reconstruction is challenging because the amount of possible images fusions is equal to the power set cardinal. Here, we approach the multi-image deblurring problem as a two steps process. First, we successfully learn a comparison function to rank a burst of images using a deep convolutional neural network. Then, an incremental Fourier burst accumulation with a reconstruction degradation mechanism is applied fusing only less blurred images that are sufficient to maximize the reconstruction quality. Experiments with the proposed algorithm have shown superior results when compared to other similar approaches, outperforming other methods described in the literature in previously described situations. We validate our findings on several synthetic and real datasets.
Tasks	Deblurring
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12121v2
PDF	http://arxiv.org/pdf/1810.12121v2.pdf
PWC	https://paperswithcode.com/paper/burst-ranking-for-blind-multi-image
Repo
Framework

Kid-Net: Convolution Networks for Kidney Vessels Segmentation from CT-Volumes


Title	Kid-Net: Convolution Networks for Kidney Vessels Segmentation from CT-Volumes
Authors	Ahmed Taha, Pechin Lo, Junning Li, Tao Zhao
Abstract	Semantic image segmentation plays an important role in modeling patient-specific anatomy. We propose a convolution neural network, called Kid-Net, along with a training schema to segment kidney vessels: artery, vein and collecting system. Such segmentation is vital during the surgical planning phase in which medical decisions are made before surgical incision. Our main contribution is developing a training schema that handles unbalanced data, reduces false positives and enables high-resolution segmentation with a limited memory budget. These objectives are attained using dynamic weighting, random sampling and 3D patch segmentation. Manual medical image annotation is both time-consuming and expensive. Kid-Net reduces kidney vessels segmentation time from matter of hours to minutes. It is trained end-to-end using 3D patches from volumetric CT-images. A complete segmentation for a 512x512x512 CT-volume is obtained within a few minutes (1-2 mins) by stitching the output 3D patches together. Feature down-sampling and up-sampling are utilized to achieve higher classification and localization accuracies. Quantitative and qualitative evaluation results on a challenging testing dataset show Kid-Net competence.
Tasks	Semantic Segmentation
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06769v1
PDF	http://arxiv.org/pdf/1806.06769v1.pdf
PWC	https://paperswithcode.com/paper/kid-net-convolution-networks-for-kidney
Repo
Framework