Paper Group AWR 132
PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation. Boosting in Image Quality Assessment. DSFD: Dual Shot Face Detector. A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment. ATOM: Accurate Tracking by Overlap Maximization. Online Abstraction with MDP Homomorphisms for Deep Learn …
PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation
Title | PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation |
Authors | Ana García del Molino, Michael Gygli |
Abstract | Highlight detection models are typically trained to identify cues that make visual content appealing or interesting for the general public, with the objective of reducing a video to such moments. However, the “interestingness” of a video segment or image is subjective. Thus, such highlight models provide results of limited relevance for the individual user. On the other hand, training one model per user is inefficient and requires large amounts of personal information which is typically not available. To overcome these limitations, we present a global ranking model which conditions on each particular user’s interests. Rather than training one model per user, our model is personalized via its inputs, which allows it to effectively adapt its predictions, given only a few user-specific examples. To train this model, we create a large-scale dataset of users and the GIFs they created, giving us an accurate indication of their interests. Our experiments show that using the user history substantially improves the prediction accuracy. On our test set of 850 videos, our model improves the recall by 8% with respect to generic highlight detectors. Furthermore, our method proves more precise than the user-agnostic baselines even with just one person-specific example. |
Tasks | |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06604v2 |
http://arxiv.org/pdf/1804.06604v2.pdf | |
PWC | https://paperswithcode.com/paper/phd-gifs-personalized-highlight-detection-for |
Repo | https://github.com/gifs/personalized-highlights-dataset |
Framework | none |
Boosting in Image Quality Assessment
Title | Boosting in Image Quality Assessment |
Authors | Dogancan Temel, Ghassan AlRegib |
Abstract | In this paper, we analyze the effect of boosting in image quality assessment through multi-method fusion. Existing multi-method studies focus on proposing a single quality estimator. On the contrary, we investigate the generalizability of multi-method fusion as a framework. In addition to support vector machines that are commonly used in the multi-method fusion, we propose using neural networks in the boosting. To span different types of image quality assessment algorithms, we use quality estimators based on fidelity, perceptually-extended fidelity, structural similarity, spectral similarity, color, and learning. In the experiments, we perform k-fold cross validation using the LIVE, the multiply distorted LIVE, and the TID 2013 databases and the performance of image quality assessment algorithms are measured via accuracy-, linearity-, and ranking-based metrics. Based on the experiments, we show that boosting methods generally improve the performance of image quality assessment and the level of improvement depends on the type of the boosting algorithm. Our experimental results also indicate that boosting the worst performing quality estimator with two or more additional methods leads to statistically significant performance enhancements independent of the boosting technique and neural network-based boosting outperforms support vector machine-based boosting when two or more methods are fused. |
Tasks | Image Quality Assessment |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08429v1 |
http://arxiv.org/pdf/1811.08429v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-in-image-quality-assessment |
Repo | https://github.com/olivesgatech/Boosting-in-IQA |
Framework | none |
DSFD: Dual Shot Face Detector
Title | DSFD: Dual Shot Face Detector |
Authors | Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, Feiyue Huang |
Abstract | In this paper, we propose a novel face detection network with three novel contributions that address three key aspects of face detection, including better feature learning, progressive loss design and anchor assign based data augmentation, respectively. First, we propose a Feature Enhance Module (FEM) for enhancing the original feature maps to extend the single shot detector to dual shot detector. Second, we adopt Progressive Anchor Loss (PAL) computed by two different sets of anchors to effectively facilitate the features. Third, we use an Improved Anchor Matching (IAM) by integrating novel anchor assign strategy into data augmentation to provide better initialization for the regressor. Since these techniques are all related to the two-stream design, we name the proposed network as Dual Shot Face Detector (DSFD). Extensive experiments on popular benchmarks, WIDER FACE and FDDB, demonstrate the superiority of DSFD over the state-of-the-art face detectors. |
Tasks | Data Augmentation, Face Detection |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10220v3 |
http://arxiv.org/pdf/1810.10220v3.pdf | |
PWC | https://paperswithcode.com/paper/dsfd-dual-shot-face-detector |
Repo | https://github.com/TencentYoutuResearch/FaceDetection-DSFD |
Framework | pytorch |
A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment
Title | A Comparative Study of Quality and Content-Based Spatial Pooling Strategies in Image Quality Assessment |
Authors | Dogancan Temel, Ghassan AlRegib |
Abstract | The process of quantifying image quality consists of engineering the quality features and pooling these features to obtain a value or a map. There has been a significant research interest in designing the quality features but pooling is usually overlooked compared to feature design. In this work, we compare the state of the art quality and content-based spatial pooling strategies and show that although features are the key in any image quality assessment, pooling also matters. We also propose a quality-based spatial pooling strategy that is based on linearly weighted percentile pooling (WPP). Pooling strategies are analyzed for squared error, SSIM and PerSIM in LIVE, multiply distorted LIVE and TID2013 image databases. |
Tasks | Image Quality Assessment |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08891v1 |
http://arxiv.org/pdf/1811.08891v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-study-of-quality-and-content |
Repo | https://github.com/olivesgatech/Spatial-Pooling-in-IQA |
Framework | none |
ATOM: Accurate Tracking by Overlap Maximization
Title | ATOM: Accurate Tracking by Overlap Maximization |
Authors | Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg |
Abstract | While recent years have witnessed astonishing improvements in visual tracking robustness, the advancements in tracking accuracy have been limited. As the focus has been directed towards the development of powerful classifiers, the problem of accurate target state estimation has been largely overlooked. In fact, most trackers resort to a simple multi-scale search in order to estimate the target bounding box. We argue that this approach is fundamentally limited since target estimation is a complex task, requiring high-level knowledge about the object. We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. High level knowledge is incorporated into the target estimation through extensive offline learning. Our target estimation component is trained to predict the overlap between the target object and an estimated bounding box. By carefully integrating target-specific information, our approach achieves previously unseen bounding box accuracy. We further introduce a classification component that is trained online to guarantee high discriminative power in the presence of distractors. Our final tracking framework sets a new state-of-the-art on five challenging benchmarks. On the new large-scale TrackingNet dataset, our tracker ATOM achieves a relative gain of 15% over the previous best approach, while running at over 30 FPS. Code and models are available at https://github.com/visionml/pytracking. |
Tasks | Visual Object Tracking, Visual Tracking |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07628v2 |
http://arxiv.org/pdf/1811.07628v2.pdf | |
PWC | https://paperswithcode.com/paper/atom-accurate-tracking-by-overlap |
Repo | https://github.com/visionml/pytracking |
Framework | pytorch |
Online Abstraction with MDP Homomorphisms for Deep Learning
Title | Online Abstraction with MDP Homomorphisms for Deep Learning |
Authors | Ondrej Biza, Robert Platt |
Abstract | Abstraction of Markov Decision Processes is a useful tool for solving complex problems, as it can ignore unimportant aspects of an environment, simplifying the process of learning an optimal policy. In this paper, we propose a new algorithm for finding abstract MDPs in environments with continuous state spaces. It is based on MDP homomorphisms, a structure-preserving mapping between MDPs. We demonstrate our algorithm’s ability to learn abstractions from collected experience and show how to reuse the abstractions to guide exploration in new tasks the agent encounters. Our novel task transfer method outperforms baselines based on a deep Q-network in the majority of our experiments. The source code is at https://github.com/ondrejba/aamas_19. |
Tasks | |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1811.12929v2 |
http://arxiv.org/pdf/1811.12929v2.pdf | |
PWC | https://paperswithcode.com/paper/online-abstraction-with-mdp-homomorphisms-for |
Repo | https://github.com/ondrejba/aamas_19 |
Framework | tf |
Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning
Title | Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning |
Authors | David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek |
Abstract | Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those. |
Tasks | Decision Making |
Published | 2018-10-15 |
URL | https://arxiv.org/abs/1810.06530v5 |
https://arxiv.org/pdf/1810.06530v5.pdf | |
PWC | https://paperswithcode.com/paper/successor-uncertainties-exploration-and |
Repo | https://github.com/DavidJanz/successor_uncertainties_tabular |
Framework | pytorch |
Deep Factorization Machines for Knowledge Tracing
Title | Deep Factorization Machines for Knowledge Tracing |
Authors | Jill-Jênn Vie |
Abstract | This paper introduces our solution to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM). We used deep factorization machines, a wide and deep learning model of pairwise relationships between users, items, skills, and other entities considered. Our solution (AUC 0.815) hopefully managed to beat the logistic regression baseline (AUC 0.774) but not the top performing model (AUC 0.861) and reveals interesting strategies to build upon item response theory models. |
Tasks | Knowledge Tracing, Language Acquisition |
Published | 2018-05-01 |
URL | http://arxiv.org/abs/1805.00356v1 |
http://arxiv.org/pdf/1805.00356v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-factorization-machines-for-knowledge |
Repo | https://github.com/jilljenn/ktm |
Framework | tf |
Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms
Title | Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms |
Authors | Panagiotis Mandros, Mario Boley, Jilles Vreeken |
Abstract | The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search. |
Tasks | |
Published | 2018-09-14 |
URL | http://arxiv.org/abs/1809.05467v1 |
http://arxiv.org/pdf/1809.05467v1.pdf | |
PWC | https://paperswithcode.com/paper/discovering-reliable-dependencies-from-data |
Repo | https://github.com/pmandros/fodiscovery |
Framework | none |
Efficient end-to-end learning for quantizable representations
Title | Efficient end-to-end learning for quantizable representations |
Authors | Yeonwoo Jeong, Hyun Oh Song |
Abstract | Embedding representation learning via neural networks is at the core foundation of modern similarity based search. While much effort has been put in developing algorithms for learning binary hamming code representations for search efficiency, this still requires a linear scan of the entire dataset per each query and trades off the search accuracy through binarization. To this end, we consider the problem of directly learning a quantizable embedding representation and the sparse binary hash code end-to-end which can be used to construct an efficient hash table not only providing significant search reduction in the number of data but also achieving the state of the art search accuracy outperforming previous state of the art deep metric learning methods. We also show that finding the optimal sparse binary hash code in a mini-batch can be computed exactly in polynomial time by solving a minimum cost flow problem. Our results on Cifar-100 and on ImageNet datasets show the state of the art search accuracy in precision@k and NMI metrics while providing up to 98X and 478X search speedup respectively over exhaustive linear search. The source code is available at https://github.com/maestrojeong/Deep-Hash-Table-ICML18 |
Tasks | Metric Learning, Representation Learning |
Published | 2018-05-15 |
URL | http://arxiv.org/abs/1805.05809v3 |
http://arxiv.org/pdf/1805.05809v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-end-to-end-learning-for-quantizable |
Repo | https://github.com/maestrojeong/Deep-Hash-Table-ICML18 |
Framework | tf |
WikiRank: Improving Keyphrase Extraction Based on Background Knowledge
Title | WikiRank: Improving Keyphrase Extraction Based on Background Knowledge |
Authors | Yang Yu, Vincent Ng |
Abstract | Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other state-of-art models by more than 2% in F1-score. |
Tasks | |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.09000v1 |
http://arxiv.org/pdf/1803.09000v1.pdf | |
PWC | https://paperswithcode.com/paper/wikirank-improving-keyphrase-extraction-based |
Repo | https://github.com/keel-keywordextraction-entitylinking/keywordExtraction |
Framework | none |
Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning
Title | Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning |
Authors | Prithviraj Ammanabrolu, Mark O. Riedl |
Abstract | Text-based adventure games provide a platform on which to explore reinforcement learning in the context of a combinatorial action space, such as natural language. We present a deep reinforcement learning architecture that represents the game state as a knowledge graph which is learned during exploration. This graph is used to prune the action space, enabling more efficient exploration. The question of which action to take can be reduced to a question-answering task, a form of transfer learning that pre-trains certain parts of our architecture. In experiments using the TextWorld framework, we show that our proposed technique can learn a control policy faster than baseline alternatives. We have also open-sourced our code at https://github.com/rajammanabrolu/KG-DQN. |
Tasks | Efficient Exploration, Question Answering, Transfer Learning |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01628v2 |
http://arxiv.org/pdf/1812.01628v2.pdf | |
PWC | https://paperswithcode.com/paper/playing-text-adventure-games-with-graph-based |
Repo | https://github.com/projectzork/Readings |
Framework | none |
Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Title | Do CIFAR-10 Classifiers Generalize to CIFAR-10? |
Authors | Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar |
Abstract | Machine learning is currently dominated by largely experimental work focused on improvements in a few key tasks. However, the impressive accuracy numbers of the best performing models are questionable because the same test sets have been used to select these models for multiple years now. To understand the danger of overfitting, we measure the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images. Although we ensure that the new test set is as close to the original data distribution as possible, we find a large drop in accuracy (4% to 10%) for a broad range of deep learning models. Yet more recent models with higher original accuracy show a smaller drop and better overall performance, indicating that this drop is likely not due to overfitting based on adaptivity. Instead, we view our results as evidence that current accuracy numbers are brittle and susceptible to even minute natural variations in the data distribution. |
Tasks | |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00451v1 |
http://arxiv.org/pdf/1806.00451v1.pdf | |
PWC | https://paperswithcode.com/paper/do-cifar-10-classifiers-generalize-to-cifar |
Repo | https://github.com/modestyachts/CIFAR-10.1 |
Framework | none |
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
Title | Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers |
Authors | Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi |
Abstract | Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, to effectively generate small text perturbations in a black-box setting that forces a deep-learning classifier to misclassify a text input. We employ novel scoring strategies to identify the critical tokens that, if modified, cause the classifier to make an incorrect prediction. Simple character-level transformations are applied to the highest-ranked tokens in order to minimize the edit distance of the perturbation, yet change the original classification. We evaluated DeepWordBug on eight real-world text datasets, including text classification, sentiment analysis, and spam detection. We compare the result of DeepWordBug with two baselines: Random (Black-box) and Gradient (White-box). Our experimental results indicate that DeepWordBug reduces the prediction accuracy of current state-of-the-art deep-learning models, including a decrease of 68% on average for a Word-LSTM model and 48% on average for a Char-CNN model. |
Tasks | Adversarial Text, Sentiment Analysis, Text Classification |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.04354v5 |
http://arxiv.org/pdf/1801.04354v5.pdf | |
PWC | https://paperswithcode.com/paper/black-box-generation-of-adversarial-text |
Repo | https://github.com/alankarj/robust_nlp |
Framework | none |
TextBugger: Generating Adversarial Text Against Real-world Applications
Title | TextBugger: Generating Adversarial Text Against Real-world Applications |
Authors | Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang |
Abstract | Deep Learning-based Text Understanding (DLTU) is the backbone technique behind various applications, including question answering, machine translation, and text classification. Despite its tremendous popularity, the security vulnerabilities of DLTU are still largely unknown, which is highly concerning given its increasing use in security-sensitive applications such as sentiment analysis and toxic content detection. In this paper, we show that DLTU is inherently vulnerable to adversarial text attacks, in which maliciously crafted texts trigger target DLTU systems and services to misbehave. Specifically, we present TextBugger, a general attack framework for generating adversarial texts. In contrast to prior works, TextBugger differs in significant ways: (i) effective – it outperforms state-of-the-art attacks in terms of attack success rate; (ii) evasive – it preserves the utility of benign text, with 94.9% of the adversarial text correctly recognized by human readers; and (iii) efficient – it generates adversarial text with computational complexity sub-linear to the text length. We empirically evaluate TextBugger on a set of real-world DLTU systems and services used for sentiment analysis and toxic content detection, demonstrating its effectiveness, evasiveness, and efficiency. For instance, TextBugger achieves 100% success rate on the IMDB dataset based on Amazon AWS Comprehend within 4.61 seconds and preserves 97% semantic similarity. We further discuss possible defense mechanisms to mitigate such attack and the adversary’s potential countermeasures, which leads to promising directions for further research. |
Tasks | Adversarial Text, Machine Translation, Question Answering, Semantic Similarity, Semantic Textual Similarity, Sentiment Analysis, Text Classification |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05271v1 |
http://arxiv.org/pdf/1812.05271v1.pdf | |
PWC | https://paperswithcode.com/paper/textbugger-generating-adversarial-text |
Repo | https://github.com/CatherineWong/dancin_seq2seq |
Framework | pytorch |