Paper Group ANR 295
A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?. On parameters transformations for emulating sparse priors using variational-Laplace inference. On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition. Real-Time Optical flow-based Video Stabilization for Unmanne …
A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?
Title | A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion? |
Authors | Minh Le |
Abstract | Critical evaluation of word similarity datasets is very important for computational lexical semantics. This short report concerns the sanity check proposed in Batchkarov et al. (2016) to evaluate several popular datasets such as MC, RG and MEN – the first two reportedly failed. I argue that this test is unstable, offers no added insight, and needs major revision in order to fulfill its purported goal. |
Tasks | |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.03819v1 |
http://arxiv.org/pdf/1707.03819v1.pdf | |
PWC | https://paperswithcode.com/paper/a-critique-of-a-critique-of-word-similarity |
Repo | |
Framework | |
On parameters transformations for emulating sparse priors using variational-Laplace inference
Title | On parameters transformations for emulating sparse priors using variational-Laplace inference |
Authors | Jean Daunizeau |
Abstract | So-called sparse estimators arise in the context of model fitting, when one a priori assumes that only a few (unknown) model parameters deviate from zero. Sparsity constraints can be useful when the estimation problem is under-determined, i.e. when number of model parameters is much higher than the number of data points. Typically, such constraints are enforced by minimizing the L1 norm, which yields the so-called LASSO estimator. In this work, we propose a simple parameter transform that emulates sparse priors without sacrificing the simplicity and robustness of L2-norm regularization schemes. We show how L1 regularization can be obtained with a “sparsify” remapping of parameters under normal Bayesian priors, and we demonstrate the ensuing variational Laplace approach using Monte-Carlo simulations. |
Tasks | |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.07168v1 |
http://arxiv.org/pdf/1703.07168v1.pdf | |
PWC | https://paperswithcode.com/paper/on-parameters-transformations-for-emulating |
Repo | |
Framework | |
On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition
Title | On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition |
Authors | Angel Mario Castro Martinez, Sri Harish Mallidi, Bernd T. Meyer |
Abstract | Previous studies support the idea of merging auditory-based Gabor features with deep learning architectures to achieve robust automatic speech recognition, however, the cause behind the gain of such combination is still unknown. We believe these representations provide the deep learning decoder with more discriminable cues. Our aim with this paper is to validate this hypothesis by performing experiments with three different recognition tasks (Aurora 4, CHiME 2 and CHiME 3) and assess the discriminability of the information encoded by Gabor filterbank features. Additionally, to identify the contribution of low, medium and high temporal modulation frequencies subsets of the Gabor filterbank were used as features (dubbed LTM, MTM and HTM respectively). With temporal modulation frequencies between 16 and 25 Hz, HTM consistently outperformed the remaining ones in every condition, highlighting the robustness of these representations against channel distortions, low signal-to-noise ratios and acoustically challenging real-life scenarios with relative improvements from 11 to 56% against a Mel-filterbank-DNN baseline. To explain the results, a measure of similarity between phoneme classes from DNN activations is proposed and linked to their acoustic properties. We find this measure to be consistent with the observed error rates and highlight specific differences on phoneme level to pinpoint the benefit of the proposed features. |
Tasks | Speech Recognition |
Published | 2017-02-14 |
URL | http://arxiv.org/abs/1702.04333v1 |
http://arxiv.org/pdf/1702.04333v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-relevance-of-auditory-based-gabor |
Repo | |
Framework | |
Real-Time Optical flow-based Video Stabilization for Unmanned Aerial Vehicles
Title | Real-Time Optical flow-based Video Stabilization for Unmanned Aerial Vehicles |
Authors | Anli Lim, Bharath Ramesh, Yue Yang, Cheng Xiang, Zhi Gao, Feng Lin |
Abstract | This paper describes the development of a novel algorithm to tackle the problem of real-time video stabilization for unmanned aerial vehicles (UAVs). There are two main components in the algorithm: (1) By designing a suitable model for the global motion of UAV, the proposed algorithm avoids the necessity of estimating the most general motion model, projective transformation, and considers simpler motion models, such as rigid transformation and similarity transformation. (2) To achieve a high processing speed, optical-flow based tracking is employed in lieu of conventional tracking and matching methods used by state-of-the-art algorithms. These two new ideas resulted in a real-time stabilization algorithm, developed over two phases. Stage I considers processing the whole sequence of frames in the video while achieving an average processing speed of 50fps on several publicly available benchmark videos. Next, Stage II undertakes the task of real-time video stabilization using a multi-threading implementation of the algorithm designed in Stage I. |
Tasks | Optical Flow Estimation |
Published | 2017-01-13 |
URL | http://arxiv.org/abs/1701.03572v1 |
http://arxiv.org/pdf/1701.03572v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-optical-flow-based-video |
Repo | |
Framework | |
Adaptive Binarization for Weakly Supervised Affordance Segmentation
Title | Adaptive Binarization for Weakly Supervised Affordance Segmentation |
Authors | Johann Sawatzky, Juergen Gall |
Abstract | The concept of affordance is important to understand the relevance of object parts for a certain functional interaction. Affordance types generalize across object categories and are not mutually exclusive. This makes the segmentation of affordance regions of objects in images a difficult task. In this work, we build on an iterative approach that learns a convolutional neural network for affordance segmentation from sparse keypoints. During this process, the predictions of the network need to be binarized. In this work, we propose an adaptive approach for binarization and estimate the parameters for initialization by approximated cross validation. We evaluate our approach on two affordance datasets where our approach outperforms the state-of-the-art for weakly supervised affordance segmentation. |
Tasks | |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02850v1 |
http://arxiv.org/pdf/1707.02850v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-binarization-for-weakly-supervised |
Repo | |
Framework | |
A Correspondence Relaxation Approach for 3D Shape Reconstruction
Title | A Correspondence Relaxation Approach for 3D Shape Reconstruction |
Authors | Yong Khoo |
Abstract | This paper presents a new method for 3D shape reconstruction based on two existing methods. A 3D reconstruction from a single photograph is introduced by both papers: the first one uses a photograph and a set of existing 3D model to generate the 3D object in the photograph, while the second one uses a photograph and a selected similar model to create the 3D object in the photograph. According to their difference, we propose a relaxation based method for more accurate correspondence establishment and shape recovery. The experiment demonstrates promising results compared to the state-of-the-art work on 3D shape estimation. |
Tasks | 3D Reconstruction |
Published | 2017-05-14 |
URL | http://arxiv.org/abs/1705.05016v1 |
http://arxiv.org/pdf/1705.05016v1.pdf | |
PWC | https://paperswithcode.com/paper/a-correspondence-relaxation-approach-for-3d |
Repo | |
Framework | |
Learning Multi-Modal Word Representation Grounded in Visual Context
Title | Learning Multi-Modal Word Representation Grounded in Visual Context |
Authors | Éloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari |
Abstract | Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results. |
Tasks | Word Embeddings |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03483v1 |
http://arxiv.org/pdf/1711.03483v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-multi-modal-word-representation |
Repo | |
Framework | |
Towards Robust Neural Networks via Random Self-ensemble
Title | Towards Robust Neural Networks via Random Self-ensemble |
Authors | Xuanqing Liu, Minhao Cheng, Huan Zhang, Cho-Jui Hsieh |
Abstract | Recent studies have revealed the vulnerability of deep neural networks: A small adversarial perturbation that is imperceptible to human can easily make a well-trained deep neural network misclassify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defense algorithm called Random Self-Ensemble (RSE) by combining two important concepts: {\bf randomness} and {\bf ensemble}. To protect a targeted model, RSE adds random noise layers to the neural network to prevent the strong gradient-based attacks, and ensembles the prediction over random noises to stabilize the performance. We show that our algorithm is equivalent to ensemble an infinite number of noisy models $f_\epsilon$ without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has a good predictive capability. Our algorithm significantly outperforms previous defense techniques on real data sets. For instance, on CIFAR-10 with VGG network (which has 92% accuracy without any attack), under the strong C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than 10%, the best previous defense technique has $48%$ accuracy, while our method still has $86%$ prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network. |
Tasks | |
Published | 2017-12-02 |
URL | http://arxiv.org/abs/1712.00673v2 |
http://arxiv.org/pdf/1712.00673v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-robust-neural-networks-via-random |
Repo | |
Framework | |
Unfolding and Shrinking Neural Machine Translation Ensembles
Title | Unfolding and Shrinking Neural Machine Translation Ensembles |
Authors | Felix Stahlberg, Bill Byrne |
Abstract | Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance. Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems because it is cumbersome and slow. This work aims to reduce the runtime to be on par with a single system without compromising the translation quality. First, we show that the ensemble can be unfolded into a single large neural network which imitates the output of the ensemble system. We show that unfolding can already improve the runtime in practice since more work can be done on the GPU. We proceed by describing a set of techniques to shrink the unfolded network by reducing the dimensionality of layers. On Japanese-English we report that the resulting network has the size and decoding speed of a single NMT network but performs on the level of a 3-ensemble system. |
Tasks | Machine Translation |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03279v2 |
http://arxiv.org/pdf/1704.03279v2.pdf | |
PWC | https://paperswithcode.com/paper/unfolding-and-shrinking-neural-machine |
Repo | |
Framework | |
Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima
Title | Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima |
Authors | Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabas Poczos, Aarti Singh |
Abstract | We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i.e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_j a_j\sigma(\mathbf{w}^T\mathbf{Z}_j)$, in which both the convolutional weights $\mathbf{w}$ and the output weights $\mathbf{a}$ are parameters to be learned. When the labels are the outputs from a teacher network of the same architecture with fixed weights $(\mathbf{w}^*, \mathbf{a}^*)$, we prove that with Gaussian input $\mathbf{Z}$, there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, gradient descent with weight normalization from randomly initialized weights can still be proven to recover the true parameters with constant probability, which can be boosted to probability $1$ with multiple restarts. We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations. |
Tasks | |
Published | 2017-12-03 |
URL | http://arxiv.org/abs/1712.00779v2 |
http://arxiv.org/pdf/1712.00779v2.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-learns-one-hidden-layer-cnn |
Repo | |
Framework | |
Asian Stamps Identification and Classification System
Title | Asian Stamps Identification and Classification System |
Authors | Behzad Mahaseni, Nabhan D. Salih |
Abstract | In this paper, we address the problem of stamp recognition. The goal is to classify a given stamp to a certain country and also identify the year it is published. We propose a new approach for stamp recognition based on describing a given stamp image using color information and texture information. For color information we use color histogram for the entire image and for texture we use two features. SIFT which is based on local feature descriptors and HOG which is a dens texture descriptor. As a result on total we have three different types of features. Our initial evaluation shows that give these information we are able to classify the images with a reasonable accuracy. |
Tasks | |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05065v1 |
http://arxiv.org/pdf/1709.05065v1.pdf | |
PWC | https://paperswithcode.com/paper/asian-stamps-identification-and |
Repo | |
Framework | |
Prosodic Event Recognition using Convolutional Neural Networks with Context Information
Title | Prosodic Event Recognition using Convolutional Neural Networks with Context Information |
Authors | Sabrina Stehwien, Ngoc Thang Vu |
Abstract | This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases. |
Tasks | |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00741v1 |
http://arxiv.org/pdf/1706.00741v1.pdf | |
PWC | https://paperswithcode.com/paper/prosodic-event-recognition-using |
Repo | |
Framework | |
The Unconstrained Ear Recognition Challenge
Title | The Unconstrained Ear Recognition Challenge |
Authors | Žiga Emeršič, Dejan Štepec, Vitomir Štruc, Peter Peer, Anjith George, Adil Ahmad, Elshibani Omar, Terrance E. Boult, Reza Safdari, Yuxiang Zhou, Stefanos Zafeiriou, Dogucan Yaman, Fevziye I. Eyiokur, Hazim K. Ekenel |
Abstract | In this paper we present the results of the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions. The goal of the challenge was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques for the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity of the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset (with 180 subjects) regardless of image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing. |
Tasks | Person Recognition |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.06997v2 |
http://arxiv.org/pdf/1708.06997v2.pdf | |
PWC | https://paperswithcode.com/paper/the-unconstrained-ear-recognition-challenge |
Repo | |
Framework | |
Multi-Label Annotation Aggregation in Crowdsourcing
Title | Multi-Label Annotation Aggregation in Crowdsourcing |
Authors | Xuan Wei, Daniel Dajun Zeng, Junming Yin |
Abstract | As a means of human-based computation, crowdsourcing has been widely used to annotate large-scale unlabeled datasets. One of the obvious challenges is how to aggregate these possibly noisy labels provided by a set of heterogeneous annotators. Another challenge stems from the difficulty in evaluating the annotator reliability without even knowing the ground truth, which can be used to build incentive mechanisms in crowdsourcing platforms. When each instance is associated with many possible labels simultaneously, the problem becomes even harder because of its combinatorial nature. In this paper, we present new flexible Bayesian models and efficient inference algorithms for multi-label annotation aggregation by taking both annotator reliability and label dependency into account. Extensive experiments on real-world datasets confirm that the proposed methods outperform other competitive alternatives, and the model can recover the type of the annotators with high accuracy. Besides, we empirically find that the mixture of multiple independent Bernoulli distribution is able to accurately capture label dependency in this unsupervised multi-label annotation aggregation scenario. |
Tasks | |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.06120v1 |
http://arxiv.org/pdf/1706.06120v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-label-annotation-aggregation-in |
Repo | |
Framework | |
Two-dimensional nonseparable discrete linear canonical transform based on CM-CC-CM-CC decomposition
Title | Two-dimensional nonseparable discrete linear canonical transform based on CM-CC-CM-CC decomposition |
Authors | Soo-Chang Pei, Shih-Gu Huang |
Abstract | As a generalization of the two-dimensional Fourier transform (2D FT) and 2D fractional Fourier transform, the 2D nonseparable linear canonical transform (2D NsLCT) is useful in optics, signal and image processing. To reduce the digital implementation complexity of the 2D NsLCT, some previous works decomposed the 2D NsLCT into several low-complexity operations, including 2D FT, 2D chirp multiplication (2D CM) and 2D affine transformations. However, 2D affine transformations will introduce interpolation error. In this paper, we propose a new decomposition called CM-CC-CM-CC decomposition, which decomposes the 2D NsLCT into two 2D CMs and two 2D chirp convolutions (2D CCs). No 2D affine transforms are involved. Simulation results show that the proposed methods have higher accuracy, lower computational complexity and smaller error in the additivity property compared with the previous works. Plus, the proposed methods have perfect reversibility property that one can reconstruct the input signal/image losslessly from the output. |
Tasks | |
Published | 2017-05-26 |
URL | http://arxiv.org/abs/1707.03688v1 |
http://arxiv.org/pdf/1707.03688v1.pdf | |
PWC | https://paperswithcode.com/paper/two-dimensional-nonseparable-discrete-linear |
Repo | |
Framework | |