July 28, 2019

2780 words 14 mins read

Paper Group ANR 295

A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?. On parameters transformations for emulating sparse priors using variational-Laplace inference. On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition. Real-Time Optical flow-based Video Stabilization for Unmanne …

A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?


Title	A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?
Authors	Minh Le
Abstract	Critical evaluation of word similarity datasets is very important for computational lexical semantics. This short report concerns the sanity check proposed in Batchkarov et al. (2016) to evaluate several popular datasets such as MC, RG and MEN – the first two reportedly failed. I argue that this test is unstable, offers no added insight, and needs major revision in order to fulfill its purported goal.
Tasks
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03819v1
PDF	http://arxiv.org/pdf/1707.03819v1.pdf
PWC	https://paperswithcode.com/paper/a-critique-of-a-critique-of-word-similarity
Repo
Framework

On parameters transformations for emulating sparse priors using variational-Laplace inference


Title	On parameters transformations for emulating sparse priors using variational-Laplace inference
Authors	Jean Daunizeau
Abstract	So-called sparse estimators arise in the context of model fitting, when one a priori assumes that only a few (unknown) model parameters deviate from zero. Sparsity constraints can be useful when the estimation problem is under-determined, i.e. when number of model parameters is much higher than the number of data points. Typically, such constraints are enforced by minimizing the L1 norm, which yields the so-called LASSO estimator. In this work, we propose a simple parameter transform that emulates sparse priors without sacrificing the simplicity and robustness of L2-norm regularization schemes. We show how L1 regularization can be obtained with a “sparsify” remapping of parameters under normal Bayesian priors, and we demonstrate the ensuing variational Laplace approach using Monte-Carlo simulations.
Tasks
Published	2017-03-06
URL	http://arxiv.org/abs/1703.07168v1
PDF	http://arxiv.org/pdf/1703.07168v1.pdf
PWC	https://paperswithcode.com/paper/on-parameters-transformations-for-emulating
Repo
Framework

On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition


Title	On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition
Authors	Angel Mario Castro Martinez, Sri Harish Mallidi, Bernd T. Meyer
Abstract	Previous studies support the idea of merging auditory-based Gabor features with deep learning architectures to achieve robust automatic speech recognition, however, the cause behind the gain of such combination is still unknown. We believe these representations provide the deep learning decoder with more discriminable cues. Our aim with this paper is to validate this hypothesis by performing experiments with three different recognition tasks (Aurora 4, CHiME 2 and CHiME 3) and assess the discriminability of the information encoded by Gabor filterbank features. Additionally, to identify the contribution of low, medium and high temporal modulation frequencies subsets of the Gabor filterbank were used as features (dubbed LTM, MTM and HTM respectively). With temporal modulation frequencies between 16 and 25 Hz, HTM consistently outperformed the remaining ones in every condition, highlighting the robustness of these representations against channel distortions, low signal-to-noise ratios and acoustically challenging real-life scenarios with relative improvements from 11 to 56% against a Mel-filterbank-DNN baseline. To explain the results, a measure of similarity between phoneme classes from DNN activations is proposed and linked to their acoustic properties. We find this measure to be consistent with the observed error rates and highlight specific differences on phoneme level to pinpoint the benefit of the proposed features.
Tasks	Speech Recognition
Published	2017-02-14
URL	http://arxiv.org/abs/1702.04333v1
PDF	http://arxiv.org/pdf/1702.04333v1.pdf
PWC	https://paperswithcode.com/paper/on-the-relevance-of-auditory-based-gabor
Repo
Framework

Real-Time Optical flow-based Video Stabilization for Unmanned Aerial Vehicles


Title	Real-Time Optical flow-based Video Stabilization for Unmanned Aerial Vehicles
Authors	Anli Lim, Bharath Ramesh, Yue Yang, Cheng Xiang, Zhi Gao, Feng Lin
Abstract	This paper describes the development of a novel algorithm to tackle the problem of real-time video stabilization for unmanned aerial vehicles (UAVs). There are two main components in the algorithm: (1) By designing a suitable model for the global motion of UAV, the proposed algorithm avoids the necessity of estimating the most general motion model, projective transformation, and considers simpler motion models, such as rigid transformation and similarity transformation. (2) To achieve a high processing speed, optical-flow based tracking is employed in lieu of conventional tracking and matching methods used by state-of-the-art algorithms. These two new ideas resulted in a real-time stabilization algorithm, developed over two phases. Stage I considers processing the whole sequence of frames in the video while achieving an average processing speed of 50fps on several publicly available benchmark videos. Next, Stage II undertakes the task of real-time video stabilization using a multi-threading implementation of the algorithm designed in Stage I.
Tasks	Optical Flow Estimation
Published	2017-01-13
URL	http://arxiv.org/abs/1701.03572v1
PDF	http://arxiv.org/pdf/1701.03572v1.pdf
PWC	https://paperswithcode.com/paper/real-time-optical-flow-based-video
Repo
Framework

Adaptive Binarization for Weakly Supervised Affordance Segmentation


Title	Adaptive Binarization for Weakly Supervised Affordance Segmentation
Authors	Johann Sawatzky, Juergen Gall
Abstract	The concept of affordance is important to understand the relevance of object parts for a certain functional interaction. Affordance types generalize across object categories and are not mutually exclusive. This makes the segmentation of affordance regions of objects in images a difficult task. In this work, we build on an iterative approach that learns a convolutional neural network for affordance segmentation from sparse keypoints. During this process, the predictions of the network need to be binarized. In this work, we propose an adaptive approach for binarization and estimate the parameters for initialization by approximated cross validation. We evaluate our approach on two affordance datasets where our approach outperforms the state-of-the-art for weakly supervised affordance segmentation.
Tasks
Published	2017-07-10
URL	http://arxiv.org/abs/1707.02850v1
PDF	http://arxiv.org/pdf/1707.02850v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-binarization-for-weakly-supervised
Repo
Framework

A Correspondence Relaxation Approach for 3D Shape Reconstruction


Title	A Correspondence Relaxation Approach for 3D Shape Reconstruction
Authors	Yong Khoo
Abstract	This paper presents a new method for 3D shape reconstruction based on two existing methods. A 3D reconstruction from a single photograph is introduced by both papers: the first one uses a photograph and a set of existing 3D model to generate the 3D object in the photograph, while the second one uses a photograph and a selected similar model to create the 3D object in the photograph. According to their difference, we propose a relaxation based method for more accurate correspondence establishment and shape recovery. The experiment demonstrates promising results compared to the state-of-the-art work on 3D shape estimation.
Tasks	3D Reconstruction
Published	2017-05-14
URL	http://arxiv.org/abs/1705.05016v1
PDF	http://arxiv.org/pdf/1705.05016v1.pdf
PWC	https://paperswithcode.com/paper/a-correspondence-relaxation-approach-for-3d
Repo
Framework


Title	Learning Multi-Modal Word Representation Grounded in Visual Context
Authors	Éloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari
Abstract	Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results.
Tasks	Word Embeddings
Published	2017-11-09
URL	http://arxiv.org/abs/1711.03483v1
PDF	http://arxiv.org/pdf/1711.03483v1.pdf
PWC	https://paperswithcode.com/paper/learning-multi-modal-word-representation
Repo
Framework

Towards Robust Neural Networks via Random Self-ensemble


Title	Towards Robust Neural Networks via Random Self-ensemble
Authors	Xuanqing Liu, Minhao Cheng, Huan Zhang, Cho-Jui Hsieh
Abstract	Recent studies have revealed the vulnerability of deep neural networks: A small adversarial perturbation that is imperceptible to human can easily make a well-trained deep neural network misclassify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defense algorithm called Random Self-Ensemble (RSE) by combining two important concepts: {\bf randomness} and {\bf ensemble}. To protect a targeted model, RSE adds random noise layers to the neural network to prevent the strong gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models $f_\epsilon$ without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has a good predictive capability. Our algorithm significantly outperforms previous defense techniques on real data sets. For instance, on CIFAR-10 with VGG network (which has 92% accuracy without any attack), under the strong C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than 10%, the best previous defense technique has $48%$ accuracy, while our method still has $86%$ prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network.
Tasks
Published	2017-12-02
URL	http://arxiv.org/abs/1712.00673v2
PDF	http://arxiv.org/pdf/1712.00673v2.pdf
PWC	https://paperswithcode.com/paper/towards-robust-neural-networks-via-random
Repo
Framework

Unfolding and Shrinking Neural Machine Translation Ensembles


Title	Unfolding and Shrinking Neural Machine Translation Ensembles
Authors	Felix Stahlberg, Bill Byrne
Abstract	Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance. Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems because it is cumbersome and slow. This work aims to reduce the runtime to be on par with a single system without compromising the translation quality. First, we show that the ensemble can be unfolded into a single large neural network which imitates the output of the ensemble system. We show that unfolding can already improve the runtime in practice since more work can be done on the GPU. We proceed by describing a set of techniques to shrink the unfolded network by reducing the dimensionality of layers. On Japanese-English we report that the resulting network has the size and decoding speed of a single NMT network but performs on the level of a 3-ensemble system.
Tasks	Machine Translation
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03279v2
PDF	http://arxiv.org/pdf/1704.03279v2.pdf
PWC	https://paperswithcode.com/paper/unfolding-and-shrinking-neural-machine
Repo
Framework

Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima


Title	Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima
Authors	Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabas Poczos, Aarti Singh
Abstract	We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i.e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_j\sigma(\mathbf{w}^T\mathbf{Z}_j)$, in which both the convolutional weights $\mathbf{w}$ and the output weights $\mathbf{a}$ are parameters to be learned. When the labels are the outputs from a teacher network of the same architecture with fixed weights $(\mathbf{w}^, \mathbf{a}^)$, we prove that with Gaussian input $\mathbf{Z}$, there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, gradient descent with weight normalization from randomly initialized weights can still be proven to recover the true parameters with constant probability, which can be boosted to probability $1$ with multiple restarts. We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.
Tasks
Published	2017-12-03
URL	http://arxiv.org/abs/1712.00779v2
PDF	http://arxiv.org/pdf/1712.00779v2.pdf
PWC	https://paperswithcode.com/paper/gradient-descent-learns-one-hidden-layer-cnn
Repo
Framework

Asian Stamps Identification and Classification System


Title	Asian Stamps Identification and Classification System
Authors	Behzad Mahaseni, Nabhan D. Salih
Abstract	In this paper, we address the problem of stamp recognition. The goal is to classify a given stamp to a certain country and also identify the year it is published. We propose a new approach for stamp recognition based on describing a given stamp image using color information and texture information. For color information we use color histogram for the entire image and for texture we use two features. SIFT which is based on local feature descriptors and HOG which is a dens texture descriptor. As a result on total we have three different types of features. Our initial evaluation shows that give these information we are able to classify the images with a reasonable accuracy.
Tasks
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05065v1
PDF	http://arxiv.org/pdf/1709.05065v1.pdf
PWC	https://paperswithcode.com/paper/asian-stamps-identification-and
Repo
Framework

Prosodic Event Recognition using Convolutional Neural Networks with Context Information


Title	Prosodic Event Recognition using Convolutional Neural Networks with Context Information
Authors	Sabrina Stehwien, Ngoc Thang Vu
Abstract	This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.
Tasks
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00741v1
PDF	http://arxiv.org/pdf/1706.00741v1.pdf
PWC	https://paperswithcode.com/paper/prosodic-event-recognition-using
Repo
Framework

The Unconstrained Ear Recognition Challenge


Title	The Unconstrained Ear Recognition Challenge
Authors	Žiga Emeršič, Dejan Štepec, Vitomir Štruc, Peter Peer, Anjith George, Adil Ahmad, Elshibani Omar, Terrance E. Boult, Reza Safdari, Yuxiang Zhou, Stefanos Zafeiriou, Dogucan Yaman, Fevziye I. Eyiokur, Hazim K. Ekenel
Abstract	In this paper we present the results of the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions. The goal of the challenge was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques for the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity of the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset (with 180 subjects) regardless of image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.
Tasks	Person Recognition
Published	2017-08-23
URL	http://arxiv.org/abs/1708.06997v2
PDF	http://arxiv.org/pdf/1708.06997v2.pdf
PWC	https://paperswithcode.com/paper/the-unconstrained-ear-recognition-challenge
Repo
Framework

Multi-Label Annotation Aggregation in Crowdsourcing


Title	Multi-Label Annotation Aggregation in Crowdsourcing
Authors	Xuan Wei, Daniel Dajun Zeng, Junming Yin
Abstract	As a means of human-based computation, crowdsourcing has been widely used to annotate large-scale unlabeled datasets. One of the obvious challenges is how to aggregate these possibly noisy labels provided by a set of heterogeneous annotators. Another challenge stems from the difficulty in evaluating the annotator reliability without even knowing the ground truth, which can be used to build incentive mechanisms in crowdsourcing platforms. When each instance is associated with many possible labels simultaneously, the problem becomes even harder because of its combinatorial nature. In this paper, we present new flexible Bayesian models and efficient inference algorithms for multi-label annotation aggregation by taking both annotator reliability and label dependency into account. Extensive experiments on real-world datasets confirm that the proposed methods outperform other competitive alternatives, and the model can recover the type of the annotators with high accuracy. Besides, we empirically find that the mixture of multiple independent Bernoulli distribution is able to accurately capture label dependency in this unsupervised multi-label annotation aggregation scenario.
Tasks
Published	2017-06-19
URL	http://arxiv.org/abs/1706.06120v1
PDF	http://arxiv.org/pdf/1706.06120v1.pdf
PWC	https://paperswithcode.com/paper/multi-label-annotation-aggregation-in
Repo
Framework

Two-dimensional nonseparable discrete linear canonical transform based on CM-CC-CM-CC decomposition


Title	Two-dimensional nonseparable discrete linear canonical transform based on CM-CC-CM-CC decomposition
Authors	Soo-Chang Pei, Shih-Gu Huang
Abstract	As a generalization of the two-dimensional Fourier transform (2D FT) and 2D fractional Fourier transform, the 2D nonseparable linear canonical transform (2D NsLCT) is useful in optics, signal and image processing. To reduce the digital implementation complexity of the 2D NsLCT, some previous works decomposed the 2D NsLCT into several low-complexity operations, including 2D FT, 2D chirp multiplication (2D CM) and 2D affine transformations. However, 2D affine transformations will introduce interpolation error. In this paper, we propose a new decomposition called CM-CC-CM-CC decomposition, which decomposes the 2D NsLCT into two 2D CMs and two 2D chirp convolutions (2D CCs). No 2D affine transforms are involved. Simulation results show that the proposed methods have higher accuracy, lower computational complexity and smaller error in the additivity property compared with the previous works. Plus, the proposed methods have perfect reversibility property that one can reconstruct the input signal/image losslessly from the output.
Tasks
Published	2017-05-26
URL	http://arxiv.org/abs/1707.03688v1
PDF	http://arxiv.org/pdf/1707.03688v1.pdf
PWC	https://paperswithcode.com/paper/two-dimensional-nonseparable-discrete-linear
Repo
Framework