October 21, 2019

2922 words 14 mins read

Paper Group AWR 132

PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation. Boosting in Image Quality Assessment. DSFD: Dual Shot Face Detector. A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment. ATOM: Accurate Tracking by Overlap Maximization. Online Abstraction with MDP Homomorphisms for Deep Learn …

PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation


Title	PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation
Authors	Ana García del Molino, Michael Gygli
Abstract	Highlight detection models are typically trained to identify cues that make visual content appealing or interesting for the general public, with the objective of reducing a video to such moments. However, the “interestingness” of a video segment or image is subjective. Thus, such highlight models provide results of limited relevance for the individual user. On the other hand, training one model per user is inefficient and requires large amounts of personal information which is typically not available. To overcome these limitations, we present a global ranking model which conditions on each particular user’s interests. Rather than training one model per user, our model is personalized via its inputs, which allows it to effectively adapt its predictions, given only a few user-specific examples. To train this model, we create a large-scale dataset of users and the GIFs they created, giving us an accurate indication of their interests. Our experiments show that using the user history substantially improves the prediction accuracy. On our test set of 850 videos, our model improves the recall by 8% with respect to generic highlight detectors. Furthermore, our method proves more precise than the user-agnostic baselines even with just one person-specific example.
Tasks
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06604v2
PDF	http://arxiv.org/pdf/1804.06604v2.pdf
PWC	https://paperswithcode.com/paper/phd-gifs-personalized-highlight-detection-for
Repo	https://github.com/gifs/personalized-highlights-dataset
Framework	none

Boosting in Image Quality Assessment


Title	Boosting in Image Quality Assessment
Authors	Dogancan Temel, Ghassan AlRegib
Abstract	In this paper, we analyze the effect of boosting in image quality assessment through multi-method fusion. Existing multi-method studies focus on proposing a single quality estimator. On the contrary, we investigate the generalizability of multi-method fusion as a framework. In addition to support vector machines that are commonly used in the multi-method fusion, we propose using neural networks in the boosting. To span different types of image quality assessment algorithms, we use quality estimators based on fidelity, perceptually-extended fidelity, structural similarity, spectral similarity, color, and learning. In the experiments, we perform k-fold cross validation using the LIVE, the multiply distorted LIVE, and the TID 2013 databases and the performance of image quality assessment algorithms are measured via accuracy-, linearity-, and ranking-based metrics. Based on the experiments, we show that boosting methods generally improve the performance of image quality assessment and the level of improvement depends on the type of the boosting algorithm. Our experimental results also indicate that boosting the worst performing quality estimator with two or more additional methods leads to statistically significant performance enhancements independent of the boosting technique and neural network-based boosting outperforms support vector machine-based boosting when two or more methods are fused.
Tasks	Image Quality Assessment
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08429v1
PDF	http://arxiv.org/pdf/1811.08429v1.pdf
PWC	https://paperswithcode.com/paper/boosting-in-image-quality-assessment
Repo	https://github.com/olivesgatech/Boosting-in-IQA
Framework	none

DSFD: Dual Shot Face Detector


Title	DSFD: Dual Shot Face Detector
Authors	Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, Feiyue Huang
Abstract	In this paper, we propose a novel face detection network with three novel contributions that address three key aspects of face detection, including better feature learning, progressive loss design and anchor assign based data augmentation, respectively. First, we propose a Feature Enhance Module (FEM) for enhancing the original feature maps to extend the single shot detector to dual shot detector. Second, we adopt Progressive Anchor Loss (PAL) computed by two different sets of anchors to effectively facilitate the features. Third, we use an Improved Anchor Matching (IAM) by integrating novel anchor assign strategy into data augmentation to provide better initialization for the regressor. Since these techniques are all related to the two-stream design, we name the proposed network as Dual Shot Face Detector (DSFD). Extensive experiments on popular benchmarks, WIDER FACE and FDDB, demonstrate the superiority of DSFD over the state-of-the-art face detectors.
Tasks	Data Augmentation, Face Detection
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10220v3
PDF	http://arxiv.org/pdf/1810.10220v3.pdf
PWC	https://paperswithcode.com/paper/dsfd-dual-shot-face-detector
Repo	https://github.com/TencentYoutuResearch/FaceDetection-DSFD
Framework	pytorch

A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment


Title	A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment
Authors	Dogancan Temel, Ghassan AlRegib
Abstract	The process of quantifying image quality consists of engineering the quality features and pooling these features to obtain a value or a map. There has been a significant research interest in designing the quality features but pooling is usually overlooked compared to feature design. In this work, we compare the state of the art quality and content-based spatial pooling strategies and show that although features are the key in any image quality assessment, pooling also matters. We also propose a quality-based spatial pooling strategy that is based on linearly weighted percentile pooling (WPP). Pooling strategies are analyzed for squared error, SSIM and PerSIM in LIVE, multiply distorted LIVE and TID2013 image databases.
Tasks	Image Quality Assessment
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08891v1
PDF	http://arxiv.org/pdf/1811.08891v1.pdf
PWC	https://paperswithcode.com/paper/a-comparative-study-of-quality-and-content
Repo	https://github.com/olivesgatech/Spatial-Pooling-in-IQA
Framework	none

ATOM: Accurate Tracking by Overlap Maximization


Title	ATOM: Accurate Tracking by Overlap Maximization
Authors	Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg
Abstract	While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited since target estimation is a complex task, requiring high-level knowledge about the object. We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. High level knowledge is incorporated into the target estimation through extensive offline learning. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating target-specific information, our approach achieves previously unseen bounding box accuracy. We further introduce a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework sets a new state-of-the-art on five challenging benchmarks. On the new large-scale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15% over the previous best approach, while running at over 30 FPS. Code and models are available at https://github.com/visionml/pytracking.
Tasks	Visual Object Tracking, Visual Tracking
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07628v2
PDF	http://arxiv.org/pdf/1811.07628v2.pdf
PWC	https://paperswithcode.com/paper/atom-accurate-tracking-by-overlap
Repo	https://github.com/visionml/pytracking
Framework	pytorch

Online Abstraction with MDP Homomorphisms for Deep Learning


Title	Online Abstraction with MDP Homomorphisms for Deep Learning
Authors	Ondrej Biza, Robert Platt
Abstract	Abstraction of Markov Decision Processes is a useful tool for solving complex problems, as it can ignore unimportant aspects of an environment, simplifying the process of learning an optimal policy. In this paper, we propose a new algorithm for finding abstract MDPs in environments with continuous state spaces. It is based on MDP homomorphisms, a structure-preserving mapping between MDPs. We demonstrate our algorithm’s ability to learn abstractions from collected experience and show how to reuse the abstractions to guide exploration in new tasks the agent encounters. Our novel task transfer method outperforms baselines based on a deep Q-network in the majority of our experiments. The source code is at https://github.com/ondrejba/aamas_19.
Tasks
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12929v2
PDF	http://arxiv.org/pdf/1811.12929v2.pdf
PWC	https://paperswithcode.com/paper/online-abstraction-with-mdp-homomorphisms-for
Repo	https://github.com/ondrejba/aamas_19
Framework	tf

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning


Title	Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning
Authors	David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek
Abstract	Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those.
Tasks	Decision Making
Published	2018-10-15
URL	https://arxiv.org/abs/1810.06530v5
PDF	https://arxiv.org/pdf/1810.06530v5.pdf
PWC	https://paperswithcode.com/paper/successor-uncertainties-exploration-and
Repo	https://github.com/DavidJanz/successor_uncertainties_tabular
Framework	pytorch

Deep Factorization Machines for Knowledge Tracing


Title	Deep Factorization Machines for Knowledge Tracing
Authors	Jill-Jênn Vie
Abstract	This paper introduces our solution to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM). We used deep factorization machines, a wide and deep learning model of pairwise relationships between users, items, skills, and other entities considered. Our solution (AUC 0.815) hopefully managed to beat the logistic regression baseline (AUC 0.774) but not the top performing model (AUC 0.861) and reveals interesting strategies to build upon item response theory models.
Tasks	Knowledge Tracing, Language Acquisition
Published	2018-05-01
URL	http://arxiv.org/abs/1805.00356v1
PDF	http://arxiv.org/pdf/1805.00356v1.pdf
PWC	https://paperswithcode.com/paper/deep-factorization-machines-for-knowledge
Repo	https://github.com/jilljenn/ktm
Framework	tf

Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms


Title	Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms
Authors	Panagiotis Mandros, Mario Boley, Jilles Vreeken
Abstract	The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.
Tasks
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05467v1
PDF	http://arxiv.org/pdf/1809.05467v1.pdf
PWC	https://paperswithcode.com/paper/discovering-reliable-dependencies-from-data
Repo	https://github.com/pmandros/fodiscovery
Framework	none

Efficient end-to-end learning for quantizable representations


Title	Efficient end-to-end learning for quantizable representations
Authors	Yeonwoo Jeong, Hyun Oh Song
Abstract	Embedding representation learning via neural networks is at the core foundation of modern similarity based search. While much effort has been put in developing algorithms for learning binary hamming code representations for search efficiency, this still requires a linear scan of the entire dataset per each query and trades off the search accuracy through binarization. To this end, we consider the problem of directly learning a quantizable embedding representation and the sparse binary hash code end-to-end which can be used to construct an efficient hash table not only providing significant search reduction in the number of data but also achieving the state of the art search accuracy outperforming previous state of the art deep metric learning methods. We also show that finding the optimal sparse binary hash code in a mini-batch can be computed exactly in polynomial time by solving a minimum cost flow problem. Our results on Cifar-100 and on ImageNet datasets show the state of the art search accuracy in precision@k and NMI metrics while providing up to 98X and 478X search speedup respectively over exhaustive linear search. The source code is available at https://github.com/maestrojeong/Deep-Hash-Table-ICML18
Tasks	Metric Learning, Representation Learning
Published	2018-05-15
URL	http://arxiv.org/abs/1805.05809v3
PDF	http://arxiv.org/pdf/1805.05809v3.pdf
PWC	https://paperswithcode.com/paper/efficient-end-to-end-learning-for-quantizable
Repo	https://github.com/maestrojeong/Deep-Hash-Table-ICML18
Framework	tf

WikiRank: Improving Keyphrase Extraction Based on Background Knowledge


Title	WikiRank: Improving Keyphrase Extraction Based on Background Knowledge
Authors	Yang Yu, Vincent Ng
Abstract	Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other state-of-art models by more than 2% in F1-score.
Tasks
Published	2018-03-23
URL	http://arxiv.org/abs/1803.09000v1
PDF	http://arxiv.org/pdf/1803.09000v1.pdf
PWC	https://paperswithcode.com/paper/wikirank-improving-keyphrase-extraction-based
Repo	https://github.com/keel-keywordextraction-entitylinking/keywordExtraction
Framework	none

Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning


Title	Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning
Authors	Prithviraj Ammanabrolu, Mark O. Riedl
Abstract	Text-based adventure games provide a platform on which to explore reinforcement learning in the context of a combinatorial action space, such as natural language. We present a deep reinforcement learning architecture that represents the game state as a knowledge graph which is learned during exploration. This graph is used to prune the action space, enabling more efficient exploration. The question of which action to take can be reduced to a question-answering task, a form of transfer learning that pre-trains certain parts of our architecture. In experiments using the TextWorld framework, we show that our proposed technique can learn a control policy faster than baseline alternatives. We have also open-sourced our code at https://github.com/rajammanabrolu/KG-DQN.
Tasks	Efficient Exploration, Question Answering, Transfer Learning
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01628v2
PDF	http://arxiv.org/pdf/1812.01628v2.pdf
PWC	https://paperswithcode.com/paper/playing-text-adventure-games-with-graph-based
Repo	https://github.com/projectzork/Readings
Framework	none

Do CIFAR-10 Classifiers Generalize to CIFAR-10?


Title	Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Authors	Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar
Abstract	Machine learning is currently dominated by largely experimental work focused on improvements in a few key tasks. However, the impressive accuracy numbers of the best performing models are questionable because the same test sets have been used to select these models for multiple years now. To understand the danger of overfitting, we measure the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images. Although we ensure that the new test set is as close to the original data distribution as possible, we find a large drop in accuracy (4% to 10%) for a broad range of deep learning models. Yet more recent models with higher original accuracy show a smaller drop and better overall performance, indicating that this drop is likely not due to overfitting based on adaptivity. Instead, we view our results as evidence that current accuracy numbers are brittle and susceptible to even minute natural variations in the data distribution.
Tasks
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00451v1
PDF	http://arxiv.org/pdf/1806.00451v1.pdf
PWC	https://paperswithcode.com/paper/do-cifar-10-classifiers-generalize-to-cifar
Repo	https://github.com/modestyachts/CIFAR-10.1
Framework	none

Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers


Title	Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
Authors	Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi
Abstract	Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We employ novel scoring strategies to identify the critical tokens that, if modified, cause the classifier to make an incorrect prediction. Simple character-level transformations are applied to the highest-ranked tokens in order to minimize the edit distance of the perturbation, yet change the original classification. We evaluated DeepWordBug on eight real-world text datasets, including text classification, sentiment analysis, and spam detection. We compare the result of DeepWordBug with two baselines: Random (Black-box) and Gradient (White-box). Our experimental results indicate that DeepWordBug reduces the prediction accuracy of current state-of-the-art deep-learning models, including a decrease of 68% on average for a Word-LSTM model and 48% on average for a Char-CNN model.
Tasks	Adversarial Text, Sentiment Analysis, Text Classification
Published	2018-01-13
URL	http://arxiv.org/abs/1801.04354v5
PDF	http://arxiv.org/pdf/1801.04354v5.pdf
PWC	https://paperswithcode.com/paper/black-box-generation-of-adversarial-text
Repo	https://github.com/alankarj/robust_nlp
Framework	none

TextBugger: Generating Adversarial Text Against Real-world Applications


Title	TextBugger: Generating Adversarial Text Against Real-world Applications
Authors	Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang
Abstract	Deep Learning-based Text Understanding (DLTU) is the backbone technique behind various applications, including question answering, machine translation, and text classification. Despite its tremendous popularity, the security vulnerabilities of DLTU are still largely unknown, which is highly concerning given its increasing use in security-sensitive applications such as sentiment analysis and toxic content detection. In this paper, we show that DLTU is inherently vulnerable to adversarial text attacks, in which maliciously crafted texts trigger target DLTU systems and services to misbehave. Specifically, we present TextBugger, a general attack framework for generating adversarial texts. In contrast to prior works, TextBugger differs in significant ways: (i) effective – it outperforms state-of-the-art attacks in terms of attack success rate; (ii) evasive – it preserves the utility of benign text, with 94.9% of the adversarial text correctly recognized by human readers; and (iii) efficient – it generates adversarial text with computational complexity sub-linear to the text length. We empirically evaluate TextBugger on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection, demonstrating its effectiveness, evasiveness, and efficiency. For instance, TextBugger achieves 100% success rate on the IMDB dataset based on Amazon AWS Comprehend within 4.61 seconds and preserves 97% semantic similarity. We further discuss possible defense mechanisms to mitigate such attack and the adversary’s potential countermeasures, which leads to promising directions for further research.
Tasks	Adversarial Text, Machine Translation, Question Answering, Semantic Similarity, Semantic Textual Similarity, Sentiment Analysis, Text Classification
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05271v1
PDF	http://arxiv.org/pdf/1812.05271v1.pdf
PWC	https://paperswithcode.com/paper/textbugger-generating-adversarial-text
Repo	https://github.com/CatherineWong/dancin_seq2seq
Framework	pytorch