Paper Group ANR 313
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets. PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. A 58.6mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps. Information Utilization Ratio in Heuristic Optimiz …
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
Title | What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets |
Authors | Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci, Chu-Ren Huang |
Abstract | In this paper, we claim that Vector Cosine, which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that, independently of the adopted parameters, outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches. |
Tasks | |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08701v1 |
http://arxiv.org/pdf/1603.08701v1.pdf | |
PWC | https://paperswithcode.com/paper/what-a-nerd-beating-students-and-vector |
Repo | |
Framework | |
PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection
Title | PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection |
Authors | Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park |
Abstract | This paper presents how we can achieve the state-of-the-art accuracy in multi-category object detection task while minimizing the computational cost by adapting and combining recent technical innovations. Following the common pipeline of “CNN feature extraction + region proposal + RoI classification”, we mainly redesign the feature extraction part, since region proposal part is not computationally expensive and classification part can be efficiently compressed with common techniques like truncated SVD. Our design principle is “less channels with more layers” and adoption of some building blocks including concatenated ReLU, Inception, and HyperNet. The designed network is deep and thin and trained with the help of batch normalization, residual connections, and learning rate scheduling based on plateau detection. We obtained solid results on well-known object detection benchmarks: 83.8% mAP (mean average precision) on VOC2007 and 82.5% mAP on VOC2012 (2nd place), while taking only 750ms/image on Intel i7-6700K CPU with a single core and 46ms/image on NVIDIA Titan X GPU. Theoretically, our network requires only 12.3% of the computational cost compared to ResNet-101, the winner on VOC2012. |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2016-08-29 |
URL | http://arxiv.org/abs/1608.08021v3 |
http://arxiv.org/pdf/1608.08021v3.pdf | |
PWC | https://paperswithcode.com/paper/pvanet-deep-but-lightweight-neural-networks |
Repo | |
Framework | |
A 58.6mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps
Title | A 58.6mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps |
Authors | Amr Suleiman, Zhengdong Zhang, Vivienne Sze |
Abstract | This paper presents a programmable, energy-efficient and real-time object detection accelerator using deformable parts models (DPM), with 2x higher accuracy than traditional rigid body models. With 8 deformable parts detection, three methods are used to address the high computational complexity: classification pruning for 33x fewer parts classification, vector quantization for 15x memory size reduction, and feature basis projection for 2x reduction of the cost of each classification. The chip is implemented in 65nm CMOS technology, and can process HD (1920x1080) images at 30fps without any off-chip storage while consuming only 58.6mW (0.94nJ/pixel, 1168 GOPS/W). The chip has two classification engines to simultaneously detect two different classes of objects. With a tested high throughput of 60fps, the classification engines can be time multiplexed to detect even more than two object classes. It is energy scalable by changing the pruning factor or disabling the parts classification. |
Tasks | Object Detection, Quantization, Real-Time Object Detection |
Published | 2016-07-27 |
URL | http://arxiv.org/abs/1607.08635v1 |
http://arxiv.org/pdf/1607.08635v1.pdf | |
PWC | https://paperswithcode.com/paper/a-586mw-real-time-programmable-object |
Repo | |
Framework | |
Information Utilization Ratio in Heuristic Optimization Algorithms
Title | Information Utilization Ratio in Heuristic Optimization Algorithms |
Authors | Junzhi Li, Ying Tan |
Abstract | Heuristic algorithms are able to optimize objective functions efficiently because they use intelligently the information about the objective functions. Thus, information utilization is critical to the performance of heuristics. However, the concept of information utilization has remained vague and abstract because there is no reliable metric to reflect the extent to which the information about the objective function is utilized by heuristic algorithms. In this paper, the metric of information utilization ratio (IUR) is defined, which is the ratio of the utilized information quantity over the acquired information quantity in the search process. The IUR proves to be well-defined. Several examples of typical heuristic algorithms are given to demonstrate the procedure of calculating the IUR. Empirical evidences on the correlation between the IUR and the performance of a heuristic are also provided. The IUR can be an index of how finely an algorithm is designed and guide the invention of new heuristics and the improvement of existing ones. |
Tasks | |
Published | 2016-04-06 |
URL | http://arxiv.org/abs/1604.01643v2 |
http://arxiv.org/pdf/1604.01643v2.pdf | |
PWC | https://paperswithcode.com/paper/information-utilization-ratio-in-heuristic |
Repo | |
Framework | |
Neural Dataset Generality
Title | Neural Dataset Generality |
Authors | Ragav Venkatesan, Vijetha Gattupalli, Baoxin Li |
Abstract | Often the filters learned by Convolutional Neural Networks (CNNs) from different datasets appear similar. This is prominent in the first few layers. This similarity of filters is being exploited for the purposes of transfer learning and some studies have been made to analyse such transferability of features. This is also being used as an initialization technique for different tasks in the same dataset or for the same task in similar datasets. Off-the-shelf CNN features have capitalized on this idea to promote their networks as best transferable and most general and are used in a cavalier manner in day-to-day computer vision tasks. It is curious that while the filters learned by these CNNs are related to the atomic structures of the images from which they are learnt, all datasets learn similar looking low-level filters. With the understanding that a dataset that contains many such atomic structures learn general filters and are therefore useful to initialize other networks with, we propose a way to analyse and quantify generality among datasets from their accuracies on transferred filters. We applied this metric on several popular character recognition, natural image and a medical image dataset, and arrived at some interesting conclusions. On further experimentation we also discovered that particular classes in a dataset themselves are more general than others. |
Tasks | Transfer Learning |
Published | 2016-05-14 |
URL | http://arxiv.org/abs/1605.04369v1 |
http://arxiv.org/pdf/1605.04369v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-dataset-generality |
Repo | |
Framework | |
Highly-Smooth Zero-th Order Online Optimization Vianney Perchet
Title | Highly-Smooth Zero-th Order Online Optimization Vianney Perchet |
Authors | Francis Bach, Vianney Perchet |
Abstract | The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines. In this paper we consider convex optimization with noisy zero-th order information, that is noisy function evaluations at any desired point. We focus on problems with high degrees of smoothness, such as logistic regression. We show that as opposed to gradient-based algorithms, high-order smoothness may be used to improve estimation rates, with a precise dependence of our upper-bounds on the degree of smoothness. In particular, we show that for infinitely differentiable functions, we recover the same dependence on sample size as gradient-based algorithms, with an extra dimension-dependent factor. This is done for both convex and strongly-convex functions, with finite horizon and anytime algorithms. Finally, we also recover similar results in the online optimization setting. |
Tasks | |
Published | 2016-05-26 |
URL | http://arxiv.org/abs/1605.08165v1 |
http://arxiv.org/pdf/1605.08165v1.pdf | |
PWC | https://paperswithcode.com/paper/highly-smooth-zero-th-order-online |
Repo | |
Framework | |
Hierarchical Quickest Change Detection via Surrogates
Title | Hierarchical Quickest Change Detection via Surrogates |
Authors | Prithwish Chakraborty, Sathappan Muthiah, Ravi Tandon, Naren Ramakrishnan |
Abstract | Change detection (CD) in time series data is a critical problem as it reveal changes in the underlying generative processes driving the time series. Despite having received significant attention, one important unexplored aspect is how to efficiently utilize additional correlated information to improve the detection and the understanding of changepoints. We propose hierarchical quickest change detection (HQCD), a framework that formalizes the process of incorporating additional correlated sources for early changepoint detection. The core ideas behind HQCD are rooted in the theory of quickest detection and HQCD can be regarded as its novel generalization to a hierarchical setting. The sources are classified into targets and surrogates, and HQCD leverages this structure to systematically assimilate observed data to update changepoint statistics across layers. The decision on actual changepoints are provided by minimizing the delay while still maintaining reliability bounds. In addition, HQCD also uncovers interesting relations between changes at targets from changes across surrogates. We validate HQCD for reliability and performance against several state-of-the-art methods for both synthetic dataset (known changepoints) and several real-life examples (unknown changepoints). Our experiments indicate that we gain significant robustness without loss of detection delay through HQCD. Our real-life experiments also showcase the usefulness of the hierarchical setting by connecting the surrogate sources (such as Twitter chatter) to target sources (such as Employment related protests that ultimately lead to major uprisings). |
Tasks | Time Series |
Published | 2016-03-31 |
URL | http://arxiv.org/abs/1603.09739v1 |
http://arxiv.org/pdf/1603.09739v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-quickest-change-detection-via |
Repo | |
Framework | |
A-Ward_p\b{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation
Title | A-Ward_p\b{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation |
Authors | Renato Cordeiro de Amorim, Vladimir Makarenkov, Boris Mirkin |
Abstract | In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Ward p algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Ward p\b{eta} , is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Ward p\b{eta} provides better cluster recovery than both Ward and Ward p. |
Tasks | |
Published | 2016-11-03 |
URL | http://arxiv.org/abs/1611.01060v1 |
http://arxiv.org/pdf/1611.01060v1.pdf | |
PWC | https://paperswithcode.com/paper/a-ward_pbeta-effective-hierarchical |
Repo | |
Framework | |
An Expressive Probabilistic Temporal Logic
Title | An Expressive Probabilistic Temporal Logic |
Authors | Bruno Woltzenlogel Paleo |
Abstract | This paper argues that a combined treatment of probabilities, time and actions is essential for an appropriate logical account of the notion of probability; and, based on this intuition, describes an expressive probabilistic temporal logic for reasoning about actions with uncertain outcomes. The logic is modal and higher-order: modalities annotated by actions are used to express possibility and necessity of propositions in the next states resulting from the actions, and a higher-order function is needed to express the probability operator. The proposed logic is shown to be an adequate extension of classical mathematical probability theory, and its expressiveness is illustrated through the formalization of the Monty Hall problem. |
Tasks | |
Published | 2016-03-24 |
URL | http://arxiv.org/abs/1603.07453v2 |
http://arxiv.org/pdf/1603.07453v2.pdf | |
PWC | https://paperswithcode.com/paper/an-expressive-probabilistic-temporal-logic |
Repo | |
Framework | |
Block-Diagonal Sparse Representation by Learning a Linear Combination Dictionary for Recognition
Title | Block-Diagonal Sparse Representation by Learning a Linear Combination Dictionary for Recognition |
Authors | Xinglin Piao, Yongli Hu, Yanfeng Sun, Junbin Gao, Baocai Yin |
Abstract | In a sparse representation based recognition scheme, it is critical to learn a desired dictionary, aiming both good representational power and discriminative performance. In this paper, we propose a new dictionary learning model for recognition applications, in which three strategies are adopted to achieve these two objectives simultaneously. First, a block-diagonal constraint is introduced into the model to eliminate the correlation between classes and enhance the discriminative performance. Second, a low-rank term is adopted to model the coherence within classes for refining the sparse representation of each class. Finally, instead of using the conventional over-complete dictionary, a specific dictionary constructed from the linear combination of the training samples is proposed to enhance the representational power of the dictionary and to improve the robustness of the sparse representation model. The proposed method is tested on several public datasets. The experimental results show the method outperforms most state-of-the-art methods. |
Tasks | Dictionary Learning |
Published | 2016-01-07 |
URL | http://arxiv.org/abs/1601.01432v2 |
http://arxiv.org/pdf/1601.01432v2.pdf | |
PWC | https://paperswithcode.com/paper/block-diagonal-sparse-representation-by |
Repo | |
Framework | |
Fast Stochastic Methods for Nonsmooth Nonconvex Optimization
Title | Fast Stochastic Methods for Nonsmooth Nonconvex Optimization |
Authors | Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola |
Abstract | We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle this issue, we develop fast stochastic algorithms that provably converge to a stationary point for constant minibatches. Furthermore, using a variant of these algorithms, we show provably faster convergence than batch proximal gradient descent. Finally, we prove global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, that subsumes several recent works. This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions. |
Tasks | |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.06900v1 |
http://arxiv.org/pdf/1605.06900v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-stochastic-methods-for-nonsmooth |
Repo | |
Framework | |
Neural Machine Translation Advised by Statistical Machine Translation
Title | Neural Machine Translation Advised by Statistical Machine Translation |
Authors | Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang |
Abstract | Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years. However, recent studies show that NMT generally produces fluent but inadequate translations (Tu et al. 2016b; Tu et al. 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to conventional Statistical Machine Translation (SMT), which usually yields adequate but non-fluent translations. It is natural, therefore, to leverage the advantages of both models for better translations, and in this work we propose to incorporate SMT model into NMT framework. More specifically, at each decoding step, SMT offers additional recommendations of generated words based on the decoding information from NMT (e.g., the generated partial translation and attention history). Then we employ an auxiliary classifier to score the SMT recommendations and a gating function to combine the SMT recommendations with NMT generations, both of which are jointly trained within the NMT architecture in an end-to-end manner. Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets. |
Tasks | Machine Translation |
Published | 2016-10-17 |
URL | http://arxiv.org/abs/1610.05150v2 |
http://arxiv.org/pdf/1610.05150v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-advised-by |
Repo | |
Framework | |
An Uncertain Future: Forecasting from Static Images using Variational Autoencoders
Title | An Uncertain Future: Forecasting from Static Images using Variational Autoencoders |
Authors | Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert |
Abstract | In a given scene, humans can often easily predict a set of immediate future events that might happen. However, generalized pixel-level anticipation in computer vision systems is difficult because machine learning struggles with the ambiguity inherent in predicting the future. In this paper, we focus on predicting the dense trajectory of pixels in a scene, specifically what will move in the scene, where it will travel, and how it will deform over the course of one second. We propose a conditional variational autoencoder as a solution to this problem. In this framework, direct inference from the image shapes the distribution of possible trajectories, while latent variables encode any necessary information that is not available in the image. We show that our method is able to successfully predict events in a wide variety of scenes and can produce multiple different predictions when the future is ambiguous. Our algorithm is trained on thousands of diverse, realistic videos and requires absolutely no human labeling. In addition to non-semantic action prediction, we find that our method learns a representation that is applicable to semantic vision tasks. |
Tasks | |
Published | 2016-06-25 |
URL | http://arxiv.org/abs/1606.07873v1 |
http://arxiv.org/pdf/1606.07873v1.pdf | |
PWC | https://paperswithcode.com/paper/an-uncertain-future-forecasting-from-static |
Repo | |
Framework | |
Adversarial Diversity and Hard Positive Generation
Title | Adversarial Diversity and Hard Positive Generation |
Authors | Andras Rozsa, Ethan M. Rudd, Terrance E. Boult |
Abstract | State-of-the-art deep neural networks suffer from a fundamental problem - they misclassify adversarial examples formed by applying small perturbations to inputs. In this paper, we present a new psychometric perceptual adversarial similarity score (PASS) measure for quantifying adversarial images, introduce the notion of hard positive generation, and use a diverse set of adversarial perturbations - not just the closest ones - for data augmentation. We introduce a novel hot/cold approach for adversarial example generation, which provides multiple possible adversarial perturbations for every single image. The perturbations generated by our novel approach often correspond to semantically meaningful image structures, and allow greater flexibility to scale perturbation-amplitudes, which yields an increased diversity of adversarial images. We present adversarial images on several network topologies and datasets, including LeNet on the MNIST dataset, and GoogLeNet and ResidualNet on the ImageNet dataset. Finally, we demonstrate on LeNet and GoogLeNet that fine-tuning with a diverse set of hard positives improves the robustness of these networks compared to training with prior methods of generating adversarial images. |
Tasks | Data Augmentation |
Published | 2016-05-05 |
URL | http://arxiv.org/abs/1605.01775v2 |
http://arxiv.org/pdf/1605.01775v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-diversity-and-hard-positive |
Repo | |
Framework | |
Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks
Title | Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks |
Authors | Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira |
Abstract | Automatic Offline Handwritten Signature Verification has been researched over the last few decades from several perspectives, using insights from graphology, computer vision, signal processing, among others. In spite of the advancements on the field, building classifiers that can separate between genuine signatures and skilled forgeries (forgeries made targeting a particular signature) is still hard. We propose approaching the problem from a feature learning perspective. Our hypothesis is that, in the absence of a good model of the data generation process, it is better to learn the features from data, instead of using hand-crafted features that have no resemblance to the signature generation process. To this end, we use Deep Convolutional Neural Networks to learn features in a writer-independent format, and use this model to obtain a feature representation on another set of users, where we train writer-dependent classifiers. We tested our method in two datasets: GPDS-960 and Brazilian PUC-PR. Our experimental results show that the features learned in a subset of the users are discriminative for the other users, including across different datasets, reaching close to the state-of-the-art in the GPDS dataset, and improving the state-of-the-art in the Brazilian PUC-PR dataset. |
Tasks | |
Published | 2016-04-04 |
URL | http://arxiv.org/abs/1604.00974v1 |
http://arxiv.org/pdf/1604.00974v1.pdf | |
PWC | https://paperswithcode.com/paper/writer-independent-feature-learning-for |
Repo | |
Framework | |