May 6, 2019

3215 words 16 mins read

Paper Group ANR 313

What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets. PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. A 58.6mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps. Information Utilization Ratio in Heuristic Optimiz …

What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets


Title	What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
Authors	Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci, Chu-Ren Huang
Abstract	In this paper, we claim that Vector Cosine, which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that, independently of the adopted parameters, outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.
Tasks
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08701v1
PDF	http://arxiv.org/pdf/1603.08701v1.pdf
PWC	https://paperswithcode.com/paper/what-a-nerd-beating-students-and-vector
Repo
Framework

PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection


Title	PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection
Authors	Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park
Abstract	This paper presents how we can achieve the state-of-the-art accuracy in multi-category object detection task while minimizing the computational cost by adapting and combining recent technical innovations. Following the common pipeline of “CNN feature extraction + region proposal + RoI classification”, we mainly redesign the feature extraction part, since region proposal part is not computationally expensive and classification part can be efficiently compressed with common techniques like truncated SVD. Our design principle is “less channels with more layers” and adoption of some building blocks including concatenated ReLU, Inception, and HyperNet. The designed network is deep and thin and trained with the help of batch normalization, residual connections, and learning rate scheduling based on plateau detection. We obtained solid results on well-known object detection benchmarks: 83.8% mAP (mean average precision) on VOC2007 and 82.5% mAP on VOC2012 (2nd place), while taking only 750ms/image on Intel i7-6700K CPU with a single core and 46ms/image on NVIDIA Titan X GPU. Theoretically, our network requires only 12.3% of the computational cost compared to ResNet-101, the winner on VOC2012.
Tasks	Object Detection, Real-Time Object Detection
Published	2016-08-29
URL	http://arxiv.org/abs/1608.08021v3
PDF	http://arxiv.org/pdf/1608.08021v3.pdf
PWC	https://paperswithcode.com/paper/pvanet-deep-but-lightweight-neural-networks
Repo
Framework

A 58.6mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps


Title	A 58.6mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920x1080 Video at 30fps
Authors	Amr Suleiman, Zhengdong Zhang, Vivienne Sze
Abstract	This paper presents a programmable, energy-efficient and real-time object detection accelerator using deformable parts models (DPM), with 2x higher accuracy than traditional rigid body models. With 8 deformable parts detection, three methods are used to address the high computational complexity: classification pruning for 33x fewer parts classification, vector quantization for 15x memory size reduction, and feature basis projection for 2x reduction of the cost of each classification. The chip is implemented in 65nm CMOS technology, and can process HD (1920x1080) images at 30fps without any off-chip storage while consuming only 58.6mW (0.94nJ/pixel, 1168 GOPS/W). The chip has two classification engines to simultaneously detect two different classes of objects. With a tested high throughput of 60fps, the classification engines can be time multiplexed to detect even more than two object classes. It is energy scalable by changing the pruning factor or disabling the parts classification.
Tasks	Object Detection, Quantization, Real-Time Object Detection
Published	2016-07-27
URL	http://arxiv.org/abs/1607.08635v1
PDF	http://arxiv.org/pdf/1607.08635v1.pdf
PWC	https://paperswithcode.com/paper/a-586mw-real-time-programmable-object
Repo
Framework

Information Utilization Ratio in Heuristic Optimization Algorithms


Title	Information Utilization Ratio in Heuristic Optimization Algorithms
Authors	Junzhi Li, Ying Tan
Abstract	Heuristic algorithms are able to optimize objective functions efficiently because they use intelligently the information about the objective functions. Thus, information utilization is critical to the performance of heuristics. However, the concept of information utilization has remained vague and abstract because there is no reliable metric to reflect the extent to which the information about the objective function is utilized by heuristic algorithms. In this paper, the metric of information utilization ratio (IUR) is defined, which is the ratio of the utilized information quantity over the acquired information quantity in the search process. The IUR proves to be well-defined. Several examples of typical heuristic algorithms are given to demonstrate the procedure of calculating the IUR. Empirical evidences on the correlation between the IUR and the performance of a heuristic are also provided. The IUR can be an index of how finely an algorithm is designed and guide the invention of new heuristics and the improvement of existing ones.
Tasks
Published	2016-04-06
URL	http://arxiv.org/abs/1604.01643v2
PDF	http://arxiv.org/pdf/1604.01643v2.pdf
PWC	https://paperswithcode.com/paper/information-utilization-ratio-in-heuristic
Repo
Framework

Neural Dataset Generality


Title	Neural Dataset Generality
Authors	Ragav Venkatesan, Vijetha Gattupalli, Baoxin Li
Abstract	Often the filters learned by Convolutional Neural Networks (CNNs) from different datasets appear similar. This is prominent in the first few layers. This similarity of filters is being exploited for the purposes of transfer learning and some studies have been made to analyse such transferability of features. This is also being used as an initialization technique for different tasks in the same dataset or for the same task in similar datasets. Off-the-shelf CNN features have capitalized on this idea to promote their networks as best transferable and most general and are used in a cavalier manner in day-to-day computer vision tasks. It is curious that while the filters learned by these CNNs are related to the atomic structures of the images from which they are learnt, all datasets learn similar looking low-level filters. With the understanding that a dataset that contains many such atomic structures learn general filters and are therefore useful to initialize other networks with, we propose a way to analyse and quantify generality among datasets from their accuracies on transferred filters. We applied this metric on several popular character recognition, natural image and a medical image dataset, and arrived at some interesting conclusions. On further experimentation we also discovered that particular classes in a dataset themselves are more general than others.
Tasks	Transfer Learning
Published	2016-05-14
URL	http://arxiv.org/abs/1605.04369v1
PDF	http://arxiv.org/pdf/1605.04369v1.pdf
PWC	https://paperswithcode.com/paper/neural-dataset-generality
Repo
Framework

Highly-Smooth Zero-th Order Online Optimization Vianney Perchet


Title	Highly-Smooth Zero-th Order Online Optimization Vianney Perchet
Authors	Francis Bach, Vianney Perchet
Abstract	The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines. In this paper we consider convex optimization with noisy zero-th order information, that is noisy function evaluations at any desired point. We focus on problems with high degrees of smoothness, such as logistic regression. We show that as opposed to gradient-based algorithms, high-order smoothness may be used to improve estimation rates, with a precise dependence of our upper-bounds on the degree of smoothness. In particular, we show that for infinitely differentiable functions, we recover the same dependence on sample size as gradient-based algorithms, with an extra dimension-dependent factor. This is done for both convex and strongly-convex functions, with finite horizon and anytime algorithms. Finally, we also recover similar results in the online optimization setting.
Tasks
Published	2016-05-26
URL	http://arxiv.org/abs/1605.08165v1
PDF	http://arxiv.org/pdf/1605.08165v1.pdf
PWC	https://paperswithcode.com/paper/highly-smooth-zero-th-order-online
Repo
Framework

Hierarchical Quickest Change Detection via Surrogates


Title	Hierarchical Quickest Change Detection via Surrogates
Authors	Prithwish Chakraborty, Sathappan Muthiah, Ravi Tandon, Naren Ramakrishnan
Abstract	Change detection (CD) in time series data is a critical problem as it reveal changes in the underlying generative processes driving the time series. Despite having received significant attention, one important unexplored aspect is how to efficiently utilize additional correlated information to improve the detection and the understanding of changepoints. We propose hierarchical quickest change detection (HQCD), a framework that formalizes the process of incorporating additional correlated sources for early changepoint detection. The core ideas behind HQCD are rooted in the theory of quickest detection and HQCD can be regarded as its novel generalization to a hierarchical setting. The sources are classified into targets and surrogates, and HQCD leverages this structure to systematically assimilate observed data to update changepoint statistics across layers. The decision on actual changepoints are provided by minimizing the delay while still maintaining reliability bounds. In addition, HQCD also uncovers interesting relations between changes at targets from changes across surrogates. We validate HQCD for reliability and performance against several state-of-the-art methods for both synthetic dataset (known changepoints) and several real-life examples (unknown changepoints). Our experiments indicate that we gain significant robustness without loss of detection delay through HQCD. Our real-life experiments also showcase the usefulness of the hierarchical setting by connecting the surrogate sources (such as Twitter chatter) to target sources (such as Employment related protests that ultimately lead to major uprisings).
Tasks	Time Series
Published	2016-03-31
URL	http://arxiv.org/abs/1603.09739v1
PDF	http://arxiv.org/pdf/1603.09739v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-quickest-change-detection-via
Repo
Framework

A-Ward_p\b{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation


Title	A-Ward_p\b{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation
Authors	Renato Cordeiro de Amorim, Vladimir Makarenkov, Boris Mirkin
Abstract	In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Ward p algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Ward p\b{eta} , is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Ward p\b{eta} provides better cluster recovery than both Ward and Ward p.
Tasks
Published	2016-11-03
URL	http://arxiv.org/abs/1611.01060v1
PDF	http://arxiv.org/pdf/1611.01060v1.pdf
PWC	https://paperswithcode.com/paper/a-ward_pbeta-effective-hierarchical
Repo
Framework

An Expressive Probabilistic Temporal Logic


Title	An Expressive Probabilistic Temporal Logic
Authors	Bruno Woltzenlogel Paleo
Abstract	This paper argues that a combined treatment of probabilities, time and actions is essential for an appropriate logical account of the notion of probability; and, based on this intuition, describes an expressive probabilistic temporal logic for reasoning about actions with uncertain outcomes. The logic is modal and higher-order: modalities annotated by actions are used to express possibility and necessity of propositions in the next states resulting from the actions, and a higher-order function is needed to express the probability operator. The proposed logic is shown to be an adequate extension of classical mathematical probability theory, and its expressiveness is illustrated through the formalization of the Monty Hall problem.
Tasks
Published	2016-03-24
URL	http://arxiv.org/abs/1603.07453v2
PDF	http://arxiv.org/pdf/1603.07453v2.pdf
PWC	https://paperswithcode.com/paper/an-expressive-probabilistic-temporal-logic
Repo
Framework

Block-Diagonal Sparse Representation by Learning a Linear Combination Dictionary for Recognition


Title	Block-Diagonal Sparse Representation by Learning a Linear Combination Dictionary for Recognition
Authors	Xinglin Piao, Yongli Hu, Yanfeng Sun, Junbin Gao, Baocai Yin
Abstract	In a sparse representation based recognition scheme, it is critical to learn a desired dictionary, aiming both good representational power and discriminative performance. In this paper, we propose a new dictionary learning model for recognition applications, in which three strategies are adopted to achieve these two objectives simultaneously. First, a block-diagonal constraint is introduced into the model to eliminate the correlation between classes and enhance the discriminative performance. Second, a low-rank term is adopted to model the coherence within classes for refining the sparse representation of each class. Finally, instead of using the conventional over-complete dictionary, a specific dictionary constructed from the linear combination of the training samples is proposed to enhance the representational power of the dictionary and to improve the robustness of the sparse representation model. The proposed method is tested on several public datasets. The experimental results show the method outperforms most state-of-the-art methods.
Tasks	Dictionary Learning
Published	2016-01-07
URL	http://arxiv.org/abs/1601.01432v2
PDF	http://arxiv.org/pdf/1601.01432v2.pdf
PWC	https://paperswithcode.com/paper/block-diagonal-sparse-representation-by
Repo
Framework

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization


Title	Fast Stochastic Methods for Nonsmooth Nonconvex Optimization
Authors	Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola
Abstract	We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle this issue, we develop fast stochastic algorithms that provably converge to a stationary point for constant minibatches. Furthermore, using a variant of these algorithms, we show provably faster convergence than batch proximal gradient descent. Finally, we prove global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, that subsumes several recent works. This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions.
Tasks
Published	2016-05-23
URL	http://arxiv.org/abs/1605.06900v1
PDF	http://arxiv.org/pdf/1605.06900v1.pdf
PWC	https://paperswithcode.com/paper/fast-stochastic-methods-for-nonsmooth
Repo
Framework

Neural Machine Translation Advised by Statistical Machine Translation


Title	Neural Machine Translation Advised by Statistical Machine Translation
Authors	Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang
Abstract	Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years. However, recent studies show that NMT generally produces fluent but inadequate translations (Tu et al. 2016b; Tu et al. 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to conventional Statistical Machine Translation (SMT), which usually yields adequate but non-fluent translations. It is natural, therefore, to leverage the advantages of both models for better translations, and in this work we propose to incorporate SMT model into NMT framework. More specifically, at each decoding step, SMT offers additional recommendations of generated words based on the decoding information from NMT (e.g., the generated partial translation and attention history). Then we employ an auxiliary classifier to score the SMT recommendations and a gating function to combine the SMT recommendations with NMT generations, both of which are jointly trained within the NMT architecture in an end-to-end manner. Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets.
Tasks	Machine Translation
Published	2016-10-17
URL	http://arxiv.org/abs/1610.05150v2
PDF	http://arxiv.org/pdf/1610.05150v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-advised-by
Repo
Framework

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders


Title	An Uncertain Future: Forecasting from Static Images using Variational Autoencoders
Authors	Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert
Abstract	In a given scene, humans can often easily predict a set of immediate future events that might happen. However, generalized pixel-level anticipation in computer vision systems is difficult because machine learning struggles with the ambiguity inherent in predicting the future. In this paper, we focus on predicting the dense trajectory of pixels in a scene, specifically what will move in the scene, where it will travel, and how it will deform over the course of one second. We propose a conditional variational autoencoder as a solution to this problem. In this framework, direct inference from the image shapes the distribution of possible trajectories, while latent variables encode any necessary information that is not available in the image. We show that our method is able to successfully predict events in a wide variety of scenes and can produce multiple different predictions when the future is ambiguous. Our algorithm is trained on thousands of diverse, realistic videos and requires absolutely no human labeling. In addition to non-semantic action prediction, we find that our method learns a representation that is applicable to semantic vision tasks.
Tasks
Published	2016-06-25
URL	http://arxiv.org/abs/1606.07873v1
PDF	http://arxiv.org/pdf/1606.07873v1.pdf
PWC	https://paperswithcode.com/paper/an-uncertain-future-forecasting-from-static
Repo
Framework

Adversarial Diversity and Hard Positive Generation


Title	Adversarial Diversity and Hard Positive Generation
Authors	Andras Rozsa, Ethan M. Rudd, Terrance E. Boult
Abstract	State-of-the-art deep neural networks suffer from a fundamental problem - they misclassify adversarial examples formed by applying small perturbations to inputs. In this paper, we present a new psychometric perceptual adversarial similarity score (PASS) measure for quantifying adversarial images, introduce the notion of hard positive generation, and use a diverse set of adversarial perturbations - not just the closest ones - for data augmentation. We introduce a novel hot/cold approach for adversarial example generation, which provides multiple possible adversarial perturbations for every single image. The perturbations generated by our novel approach often correspond to semantically meaningful image structures, and allow greater flexibility to scale perturbation-amplitudes, which yields an increased diversity of adversarial images. We present adversarial images on several network topologies and datasets, including LeNet on the MNIST dataset, and GoogLeNet and ResidualNet on the ImageNet dataset. Finally, we demonstrate on LeNet and GoogLeNet that fine-tuning with a diverse set of hard positives improves the robustness of these networks compared to training with prior methods of generating adversarial images.
Tasks	Data Augmentation
Published	2016-05-05
URL	http://arxiv.org/abs/1605.01775v2
PDF	http://arxiv.org/pdf/1605.01775v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-diversity-and-hard-positive
Repo
Framework

Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks


Title	Writer-independent Feature Learning for Offline Signature Verification using Deep Convolutional Neural Networks
Authors	Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira
Abstract	Automatic Offline Handwritten Signature Verification has been researched over the last few decades from several perspectives, using insights from graphology, computer vision, signal processing, among others. In spite of the advancements on the field, building classifiers that can separate between genuine signatures and skilled forgeries (forgeries made targeting a particular signature) is still hard. We propose approaching the problem from a feature learning perspective. Our hypothesis is that, in the absence of a good model of the data generation process, it is better to learn the features from data, instead of using hand-crafted features that have no resemblance to the signature generation process. To this end, we use Deep Convolutional Neural Networks to learn features in a writer-independent format, and use this model to obtain a feature representation on another set of users, where we train writer-dependent classifiers. We tested our method in two datasets: GPDS-960 and Brazilian PUC-PR. Our experimental results show that the features learned in a subset of the users are discriminative for the other users, including across different datasets, reaching close to the state-of-the-art in the GPDS dataset, and improving the state-of-the-art in the Brazilian PUC-PR dataset.
Tasks
Published	2016-04-04
URL	http://arxiv.org/abs/1604.00974v1
PDF	http://arxiv.org/pdf/1604.00974v1.pdf
PWC	https://paperswithcode.com/paper/writer-independent-feature-learning-for
Repo
Framework