July 28, 2019

2780 words 14 mins read

Paper Group ANR 295

Paper Group ANR 295

A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?. On parameters transformations for emulating sparse priors using variational-Laplace inference. On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition. Real-Time Optical flow-based Video Stabilization for Unmanne …

A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?

Title A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?
Authors Minh Le
Abstract Critical evaluation of word similarity datasets is very important for computational lexical semantics. This short report concerns the sanity check proposed in Batchkarov et al. (2016) to evaluate several popular datasets such as MC, RG and MEN – the first two reportedly failed. I argue that this test is unstable, offers no added insight, and needs major revision in order to fulfill its purported goal.
Tasks
Published 2017-07-12
URL http://arxiv.org/abs/1707.03819v1
PDF http://arxiv.org/pdf/1707.03819v1.pdf
PWC https://paperswithcode.com/paper/a-critique-of-a-critique-of-word-similarity
Repo
Framework

On parameters transformations for emulating sparse priors using variational-Laplace inference

Title On parameters transformations for emulating sparse priors using variational-Laplace inference
Authors Jean Daunizeau
Abstract So-called sparse estimators arise in the context of model fitting, when one a priori assumes that only a few (unknown) model parameters deviate from zero. Sparsity constraints can be useful when the estimation problem is under-determined, i.e. when number of model parameters is much higher than the number of data points. Typically, such constraints are enforced by minimizing the L1 norm, which yields the so-called LASSO estimator. In this work, we propose a simple parameter transform that emulates sparse priors without sacrificing the simplicity and robustness of L2-norm regularization schemes. We show how L1 regularization can be obtained with a “sparsify” remapping of parameters under normal Bayesian priors, and we demonstrate the ensuing variational Laplace approach using Monte-Carlo simulations.
Tasks
Published 2017-03-06
URL http://arxiv.org/abs/1703.07168v1
PDF http://arxiv.org/pdf/1703.07168v1.pdf
PWC https://paperswithcode.com/paper/on-parameters-transformations-for-emulating
Repo
Framework

On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition

Title On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition
Authors Angel Mario Castro Martinez, Sri Harish Mallidi, Bernd T. Meyer
Abstract Previous studies support the idea of merging auditory-based Gabor features with deep learning architectures to achieve robust automatic speech recognition, however, the cause behind the gain of such combination is still unknown. We believe these representations provide the deep learning decoder with more discriminable cues. Our aim with this paper is to validate this hypothesis by performing experiments with three different recognition tasks (Aurora 4, CHiME 2 and CHiME 3) and assess the discriminability of the information encoded by Gabor filterbank features. Additionally, to identify the contribution of low, medium and high temporal modulation frequencies subsets of the Gabor filterbank were used as features (dubbed LTM, MTM and HTM respectively). With temporal modulation frequencies between 16 and 25 Hz, HTM consistently outperformed the remaining ones in every condition, highlighting the robustness of these representations against channel distortions, low signal-to-noise ratios and acoustically challenging real-life scenarios with relative improvements from 11 to 56% against a Mel-filterbank-DNN baseline. To explain the results, a measure of similarity between phoneme classes from DNN activations is proposed and linked to their acoustic properties. We find this measure to be consistent with the observed error rates and highlight specific differences on phoneme level to pinpoint the benefit of the proposed features.
Tasks Speech Recognition
Published 2017-02-14
URL http://arxiv.org/abs/1702.04333v1
PDF http://arxiv.org/pdf/1702.04333v1.pdf
PWC https://paperswithcode.com/paper/on-the-relevance-of-auditory-based-gabor
Repo
Framework

Real-Time Optical flow-based Video Stabilization for Unmanned Aerial Vehicles

Title Real-Time Optical flow-based Video Stabilization for Unmanned Aerial Vehicles
Authors Anli Lim, Bharath Ramesh, Yue Yang, Cheng Xiang, Zhi Gao, Feng Lin
Abstract This paper describes the development of a novel algorithm to tackle the problem of real-time video stabilization for unmanned aerial vehicles (UAVs). There are two main components in the algorithm: (1) By designing a suitable model for the global motion of UAV, the proposed algorithm avoids the necessity of estimating the most general motion model, projective transformation, and considers simpler motion models, such as rigid transformation and similarity transformation. (2) To achieve a high processing speed, optical-flow based tracking is employed in lieu of conventional tracking and matching methods used by state-of-the-art algorithms. These two new ideas resulted in a real-time stabilization algorithm, developed over two phases. Stage I considers processing the whole sequence of frames in the video while achieving an average processing speed of 50fps on several publicly available benchmark videos. Next, Stage II undertakes the task of real-time video stabilization using a multi-threading implementation of the algorithm designed in Stage I.
Tasks Optical Flow Estimation
Published 2017-01-13
URL http://arxiv.org/abs/1701.03572v1
PDF http://arxiv.org/pdf/1701.03572v1.pdf
PWC https://paperswithcode.com/paper/real-time-optical-flow-based-video
Repo
Framework

Adaptive Binarization for Weakly Supervised Affordance Segmentation

Title Adaptive Binarization for Weakly Supervised Affordance Segmentation
Authors Johann Sawatzky, Juergen Gall
Abstract The concept of affordance is important to understand the relevance of object parts for a certain functional interaction. Affordance types generalize across object categories and are not mutually exclusive. This makes the segmentation of affordance regions of objects in images a difficult task. In this work, we build on an iterative approach that learns a convolutional neural network for affordance segmentation from sparse keypoints. During this process, the predictions of the network need to be binarized. In this work, we propose an adaptive approach for binarization and estimate the parameters for initialization by approximated cross validation. We evaluate our approach on two affordance datasets where our approach outperforms the state-of-the-art for weakly supervised affordance segmentation.
Tasks
Published 2017-07-10
URL http://arxiv.org/abs/1707.02850v1
PDF http://arxiv.org/pdf/1707.02850v1.pdf
PWC https://paperswithcode.com/paper/adaptive-binarization-for-weakly-supervised
Repo
Framework

A Correspondence Relaxation Approach for 3D Shape Reconstruction

Title A Correspondence Relaxation Approach for 3D Shape Reconstruction
Authors Yong Khoo
Abstract This paper presents a new method for 3D shape reconstruction based on two existing methods. A 3D reconstruction from a single photograph is introduced by both papers: the first one uses a photograph and a set of existing 3D model to generate the 3D object in the photograph, while the second one uses a photograph and a selected similar model to create the 3D object in the photograph. According to their difference, we propose a relaxation based method for more accurate correspondence establishment and shape recovery. The experiment demonstrates promising results compared to the state-of-the-art work on 3D shape estimation.
Tasks 3D Reconstruction
Published 2017-05-14
URL http://arxiv.org/abs/1705.05016v1
PDF http://arxiv.org/pdf/1705.05016v1.pdf
PWC https://paperswithcode.com/paper/a-correspondence-relaxation-approach-for-3d
Repo
Framework

Learning Multi-Modal Word Representation Grounded in Visual Context

Title Learning Multi-Modal Word Representation Grounded in Visual Context
Authors Éloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari
Abstract Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results.
Tasks Word Embeddings
Published 2017-11-09
URL http://arxiv.org/abs/1711.03483v1
PDF http://arxiv.org/pdf/1711.03483v1.pdf
PWC https://paperswithcode.com/paper/learning-multi-modal-word-representation
Repo
Framework

Towards Robust Neural Networks via Random Self-ensemble

Title Towards Robust Neural Networks via Random Self-ensemble
Authors Xuanqing Liu, Minhao Cheng, Huan Zhang, Cho-Jui Hsieh
Abstract Recent studies have revealed the vulnerability of deep neural networks: A small adversarial perturbation that is imperceptible to human can easily make a well-trained deep neural network misclassify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defense algorithm called Random Self-Ensemble (RSE) by combining two important concepts: {\bf randomness} and {\bf ensemble}. To protect a targeted model, RSE adds random noise layers to the neural network to prevent the strong gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models $f_\epsilon$ without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has a good predictive capability. Our algorithm significantly outperforms previous defense techniques on real data sets. For instance, on CIFAR-10 with VGG network (which has 92% accuracy without any attack), under the strong C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than 10%, the best previous defense technique has $48%$ accuracy, while our method still has $86%$ prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network.
Tasks
Published 2017-12-02
URL http://arxiv.org/abs/1712.00673v2
PDF http://arxiv.org/pdf/1712.00673v2.pdf
PWC https://paperswithcode.com/paper/towards-robust-neural-networks-via-random
Repo
Framework

Unfolding and Shrinking Neural Machine Translation Ensembles

Title Unfolding and Shrinking Neural Machine Translation Ensembles
Authors Felix Stahlberg, Bill Byrne
Abstract Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance. Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems because it is cumbersome and slow. This work aims to reduce the runtime to be on par with a single system without compromising the translation quality. First, we show that the ensemble can be unfolded into a single large neural network which imitates the output of the ensemble system. We show that unfolding can already improve the runtime in practice since more work can be done on the GPU. We proceed by describing a set of techniques to shrink the unfolded network by reducing the dimensionality of layers. On Japanese-English we report that the resulting network has the size and decoding speed of a single NMT network but performs on the level of a 3-ensemble system.
Tasks Machine Translation
Published 2017-04-11
URL http://arxiv.org/abs/1704.03279v2
PDF http://arxiv.org/pdf/1704.03279v2.pdf
PWC https://paperswithcode.com/paper/unfolding-and-shrinking-neural-machine
Repo
Framework

Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima

Title Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima
Authors Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabas Poczos, Aarti Singh
Abstract We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i.e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_j\sigma(\mathbf{w}^T\mathbf{Z}_j)$, in which both the convolutional weights $\mathbf{w}$ and the output weights $\mathbf{a}$ are parameters to be learned. When the labels are the outputs from a teacher network of the same architecture with fixed weights $(\mathbf{w}^*, \mathbf{a}^*)$, we prove that with Gaussian input $\mathbf{Z}$, there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, gradient descent with weight normalization from randomly initialized weights can still be proven to recover the true parameters with constant probability, which can be boosted to probability $1$ with multiple restarts. We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.
Tasks
Published 2017-12-03
URL http://arxiv.org/abs/1712.00779v2
PDF http://arxiv.org/pdf/1712.00779v2.pdf
PWC https://paperswithcode.com/paper/gradient-descent-learns-one-hidden-layer-cnn
Repo
Framework

Asian Stamps Identification and Classification System

Title Asian Stamps Identification and Classification System
Authors Behzad Mahaseni, Nabhan D. Salih
Abstract In this paper, we address the problem of stamp recognition. The goal is to classify a given stamp to a certain country and also identify the year it is published. We propose a new approach for stamp recognition based on describing a given stamp image using color information and texture information. For color information we use color histogram for the entire image and for texture we use two features. SIFT which is based on local feature descriptors and HOG which is a dens texture descriptor. As a result on total we have three different types of features. Our initial evaluation shows that give these information we are able to classify the images with a reasonable accuracy.
Tasks
Published 2017-09-15
URL http://arxiv.org/abs/1709.05065v1
PDF http://arxiv.org/pdf/1709.05065v1.pdf
PWC https://paperswithcode.com/paper/asian-stamps-identification-and
Repo
Framework

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

Title Prosodic Event Recognition using Convolutional Neural Networks with Context Information
Authors Sabrina Stehwien, Ngoc Thang Vu
Abstract This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.
Tasks
Published 2017-06-02
URL http://arxiv.org/abs/1706.00741v1
PDF http://arxiv.org/pdf/1706.00741v1.pdf
PWC https://paperswithcode.com/paper/prosodic-event-recognition-using
Repo
Framework

The Unconstrained Ear Recognition Challenge

Title The Unconstrained Ear Recognition Challenge
Authors Žiga Emeršič, Dejan Štepec, Vitomir Štruc, Peter Peer, Anjith George, Adil Ahmad, Elshibani Omar, Terrance E. Boult, Reza Safdari, Yuxiang Zhou, Stefanos Zafeiriou, Dogucan Yaman, Fevziye I. Eyiokur, Hazim K. Ekenel
Abstract In this paper we present the results of the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions. The goal of the challenge was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques for the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity of the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset (with 180 subjects) regardless of image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.
Tasks Person Recognition
Published 2017-08-23
URL http://arxiv.org/abs/1708.06997v2
PDF http://arxiv.org/pdf/1708.06997v2.pdf
PWC https://paperswithcode.com/paper/the-unconstrained-ear-recognition-challenge
Repo
Framework

Multi-Label Annotation Aggregation in Crowdsourcing

Title Multi-Label Annotation Aggregation in Crowdsourcing
Authors Xuan Wei, Daniel Dajun Zeng, Junming Yin
Abstract As a means of human-based computation, crowdsourcing has been widely used to annotate large-scale unlabeled datasets. One of the obvious challenges is how to aggregate these possibly noisy labels provided by a set of heterogeneous annotators. Another challenge stems from the difficulty in evaluating the annotator reliability without even knowing the ground truth, which can be used to build incentive mechanisms in crowdsourcing platforms. When each instance is associated with many possible labels simultaneously, the problem becomes even harder because of its combinatorial nature. In this paper, we present new flexible Bayesian models and efficient inference algorithms for multi-label annotation aggregation by taking both annotator reliability and label dependency into account. Extensive experiments on real-world datasets confirm that the proposed methods outperform other competitive alternatives, and the model can recover the type of the annotators with high accuracy. Besides, we empirically find that the mixture of multiple independent Bernoulli distribution is able to accurately capture label dependency in this unsupervised multi-label annotation aggregation scenario.
Tasks
Published 2017-06-19
URL http://arxiv.org/abs/1706.06120v1
PDF http://arxiv.org/pdf/1706.06120v1.pdf
PWC https://paperswithcode.com/paper/multi-label-annotation-aggregation-in
Repo
Framework

Two-dimensional nonseparable discrete linear canonical transform based on CM-CC-CM-CC decomposition

Title Two-dimensional nonseparable discrete linear canonical transform based on CM-CC-CM-CC decomposition
Authors Soo-Chang Pei, Shih-Gu Huang
Abstract As a generalization of the two-dimensional Fourier transform (2D FT) and 2D fractional Fourier transform, the 2D nonseparable linear canonical transform (2D NsLCT) is useful in optics, signal and image processing. To reduce the digital implementation complexity of the 2D NsLCT, some previous works decomposed the 2D NsLCT into several low-complexity operations, including 2D FT, 2D chirp multiplication (2D CM) and 2D affine transformations. However, 2D affine transformations will introduce interpolation error. In this paper, we propose a new decomposition called CM-CC-CM-CC decomposition, which decomposes the 2D NsLCT into two 2D CMs and two 2D chirp convolutions (2D CCs). No 2D affine transforms are involved. Simulation results show that the proposed methods have higher accuracy, lower computational complexity and smaller error in the additivity property compared with the previous works. Plus, the proposed methods have perfect reversibility property that one can reconstruct the input signal/image losslessly from the output.
Tasks
Published 2017-05-26
URL http://arxiv.org/abs/1707.03688v1
PDF http://arxiv.org/pdf/1707.03688v1.pdf
PWC https://paperswithcode.com/paper/two-dimensional-nonseparable-discrete-linear
Repo
Framework
comments powered by Disqus