Paper Group AWR 1
Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards. Robustness Assessment for Adversarial Machine Learning: Problems, Solutions and a Survey of Current Neural Networks and Defenses. Generated Loss, Augmented Training, and Multiscale VAE. Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit …
Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards
Title | Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards |
Authors | Hou Pong Chan, Wang Chen, Lu Wang, Irwin King |
Abstract | Generating keyphrases that summarize the main points of a document is a fundamental task in natural language processing. Although existing generative models are capable of predicting multiple keyphrases for an input document as well as determining the number of keyphrases to generate, they still suffer from the problem of generating too few keyphrases. To address this problem, we propose a reinforcement learning (RL) approach for keyphrase generation, with an adaptive reward function that encourages a model to generate both sufficient and accurate keyphrases. Furthermore, we introduce a new evaluation method that incorporates name variations of the ground-truth keyphrases using the Wikipedia knowledge base. Thus, our evaluation method can more robustly evaluate the quality of predicted keyphrases. Extensive experiments on five real-world datasets of different scales demonstrate that our RL approach consistently and significantly improves the performance of the state-of-the-art generative models with both conventional and new evaluation methods. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04106v1 |
https://arxiv.org/pdf/1906.04106v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-keyphrase-generation-via-reinforcement |
Repo | https://github.com/kenchan0226/keyphrase-generation-rl |
Framework | pytorch |
Robustness Assessment for Adversarial Machine Learning: Problems, Solutions and a Survey of Current Neural Networks and Defenses
Title | Robustness Assessment for Adversarial Machine Learning: Problems, Solutions and a Survey of Current Neural Networks and Defenses |
Authors | Danilo Vasconcellos Vargas, Shashank Kotyan |
Abstract | In adversarial machine learning, there are a huge number of attacks of various types which evaluates robustness for new models and defences a daunting task. To make matters worse, there is an inherent bias in attacks and defences. Here, we organize the problems faced (model dependence, insufficient evaluation, false adversarial samples and perturbation dependent results) and propose a model agnostic dual ($L_0$ and $L_\infty$) quality assessment method together with the concept of robustness levels to tackle them. We validate the dual quality assessment on state-of-the-art models (WideResNet, ResNet, AllConv, DenseNet, NIN, LeNet and CapsNet) as well as the current hardest defences proposed at ICLR 2018 and the widely known adversarial training, showing that current models and defences are vulnerable in all levels of robustness. The robustness assessment show that depending on the metric used (i.e., $L_0$ or $L_\infty$) the robustness may change significantly and therefore duality should be taken into account for a correct assessment. Moreover, a mathematical derivation, as well as a counterexample, suggest that $L_1$ and $L_2$ metrics alone are not enough to avoid false adversarial samples. Interestingly, a by-product of the assessment proposed is a novel $L_\infty$ black-box method which requires even less perturbation than the One-Pixel Attack (only 12% of One-Pixel Attack’s amount of perturbation) to achieve similar results. Thus, this paper elucidates the problems of robustness evaluation, proposes a dual quality assessment to tackle them as well as survey the robustness of current models and defences. Code available at http://bit.ly/DualQualityAssessment. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06026v2 |
https://arxiv.org/pdf/1906.06026v2.pdf | |
PWC | https://paperswithcode.com/paper/model-agnostic-dual-quality-assessment-for |
Repo | https://github.com/shashankkotyan/DualQualityAssessment |
Framework | tf |
Generated Loss, Augmented Training, and Multiscale VAE
Title | Generated Loss, Augmented Training, and Multiscale VAE |
Authors | Jason Chou, Gautam Hathi |
Abstract | The variational autoencoder (VAE) framework remains a popular option for training unsupervised generative models, especially for discrete data where generative adversarial networks (GANs) require workaround to create gradient for the generator. In our work modeling US postal addresses, we show that our discrete VAE with tree recursive architecture demonstrates limited capability of capturing field correlations within structured data, even after overcoming the challenge of posterior collapse with scheduled sampling and tuning of the KL-divergence weight $\beta$. Worse, VAE seems to have difficulty mapping its generated samples to the latent space, as their VAE loss lags behind or even increases during the training process. Motivated by this observation, we show that augmenting training data with generated variants (augmented training) and training a VAE with multiple values of $\beta$ simultaneously (multiscale VAE) both improve the generation quality of VAE. Despite their differences in motivation and emphasis, we show that augmented training and multiscale VAE are actually connected and have similar effects on the model. |
Tasks | |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10446v1 |
http://arxiv.org/pdf/1904.10446v1.pdf | |
PWC | https://paperswithcode.com/paper/generated-loss-augmented-training-and |
Repo | https://github.com/EIFY/vermont_address |
Framework | none |
Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
Title | Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization |
Authors | Navid Azizan, Sahin Lale, Babak Hassibi |
Abstract | Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors that perfectly interpolate the training data). Therefore, it is important to understand which interpolating solutions we converge to, how they depend on the initialization point and the learning algorithm, and whether they lead to different generalization performances. In this paper, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which the popular stochastic gradient descent (SGD) is a special case. Our contributions are both theoretical and experimental. On the theory side, we show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima (something that comes for free in the highly overparameterized case), SMD with sufficiently small step size converges to a global minimum that is approximately the closest one in Bregman divergence. On the experimental side, our extensive experiments on standard datasets and models, using various initializations, various mirror descents, and various Bregman divergences, consistently confirms that this phenomenon happens in deep learning. Our experiments further indicate that there is a clear difference in the generalization performance of the solutions obtained by different SMD algorithms. Experimenting on a standard image dataset and network architecture with SMD with different kinds of implicit regularization, $\ell_1$ to encourage sparsity, $\ell_2$ yielding SGD, and $\ell_{10}$ to discourage large components in the parameter vector, consistently and definitively shows that $\ell_{10}$-SMD has better generalization performance than SGD, which in turn has better generalization performance than $\ell_1$-SMD. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03830v1 |
https://arxiv.org/pdf/1906.03830v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-mirror-descent-on |
Repo | https://github.com/SahinLale/StochasticMirrorDescent |
Framework | pytorch |
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Title | Quantifying and Alleviating the Language Prior Problem in Visual Question Answering |
Authors | Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Yibing Liu, Yinglong Wang, Mohan Kankanhalli |
Abstract | Benefiting from the advancement of computer vision, natural language processing and information retrieval techniques, visual question answering (VQA), which aims to answer questions about an image or a video, has received lots of attentions over the past few years. Although some progress has been achieved so far, several studies have pointed out that current VQA models are heavily affected by the \emph{language prior problem}, which means they tend to answer questions based on the co-occurrence patterns of question keywords (e.g., how many) and answers (e.g., 2) instead of understanding images and questions. Existing methods attempt to solve this problem by either balancing the biased datasets or forcing models to better understand images. However, only marginal effects and even performance deterioration are observed for the first and second solution, respectively. In addition, another important issue is the lack of measurement to quantitatively measure the extent of the language prior effect, which severely hinders the advancement of related techniques. In this paper, we make contributions to solve the above problems from two perspectives. Firstly, we design a metric to quantitatively measure the language prior effect of VQA models. The proposed metric has been demonstrated to be effective in our empirical studies. Secondly, we propose a regularization method (i.e., score regularization module) to enhance current VQA models by alleviating the language prior problem as well as boosting the backbone model performance. The proposed score regularization module adopts a pair-wise learning strategy, which makes the VQA models answer the question based on the reasoning of the image (upon this question) instead of basing on question-answer patterns observed in the biased training set. The score regularization module is flexible to be integrated into various VQA models. |
Tasks | Information Retrieval, Question Answering, Visual Question Answering |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04877v1 |
https://arxiv.org/pdf/1905.04877v1.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-and-alleviating-the-language |
Repo | https://github.com/guoyang9/vqa-prior |
Framework | pytorch |
Recovery Guarantees for Compressible Signals with Adversarial Noise
Title | Recovery Guarantees for Compressible Signals with Adversarial Noise |
Authors | Jasjeet Dhaliwal, Kyle Hambrook |
Abstract | We provide recovery guarantees for compressible signals that have been corrupted with noise and extend the framework introduced in \cite{bafna2018thwarting} to defend neural networks against $\ell_0$-norm, $\ell_2$-norm, and $\ell_{\infty}$-norm attacks. Our results are general as they can be applied to most unitary transforms used in practice and hold for $\ell_0$-norm, $\ell_2$-norm, and $\ell_\infty$-norm bounded noise. In the case of $\ell_0$-norm noise, we prove recovery guarantees for Iterative Hard Thresholding (IHT) and Basis Pursuit (BP). For $\ell_2$-norm bounded noise, we provide recovery guarantees for BP and for the case of $\ell_\infty$-norm bounded noise, we provide recovery guarantees for Dantzig Selector (DS). These guarantees theoretically bolster the defense framework introduced in \cite{bafna2018thwarting} for defending neural networks against adversarial inputs. Finally, we experimentally demonstrate the effectiveness of this defense framework against an array of $\ell_0$, $\ell_2$ and $\ell_\infty$ norm attacks. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06565v3 |
https://arxiv.org/pdf/1907.06565v3.pdf | |
PWC | https://paperswithcode.com/paper/recovery-guarantees-for-compressible-signals |
Repo | https://github.com/jasjeetIM/recovering_compressible_signals |
Framework | tf |
Repurposing Entailment for Multi-Hop Question Answering Tasks
Title | Repurposing Entailment for Multi-Hop Question Answering Tasks |
Authors | Harsh Trivedi, Heeyoung Kwon, Tushar Khot, Ashish Sabharwal, Niranjan Balasubramanian |
Abstract | Question Answering (QA) naturally reduces to an entailment problem, namely, verifying whether some text entails the answer to a question. However, for multi-hop QA tasks, which require reasoning with multiple sentences, it remains unclear how best to utilize entailment models pre-trained on large scale datasets such as SNLI, which are based on sentence pairs. We introduce Multee, a general architecture that can effectively use entailment models for multi-hop QA tasks. Multee uses (i) a local module that helps locate important sentences, thereby avoiding distracting information, and (ii) a global module that aggregates information by effectively incorporating importance weights. Importantly, we show that both modules can use entailment functions pre-trained on a large scale NLI datasets. We evaluate performance on MultiRC and OpenBookQA, two multihop QA datasets. When using an entailment function pre-trained on NLI datasets, Multee outperforms QA models trained only on the target QA datasets and the OpenAI transformer models. The code is available at https://github.com/StonyBrookNLP/multee. |
Tasks | Question Answering |
Published | 2019-04-20 |
URL | http://arxiv.org/abs/1904.09380v1 |
http://arxiv.org/pdf/1904.09380v1.pdf | |
PWC | https://paperswithcode.com/paper/190409380 |
Repo | https://github.com/StonyBrookNLP/multee |
Framework | pytorch |
Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Title | Learning Temporal Pose Estimation from Sparsely-Labeled Videos |
Authors | Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani |
Abstract | Modern approaches for multi-person pose estimation in video require large amounts of dense annotations. However, labeling every frame in a video is costly and labor intensive. To reduce the need for dense annotations, we propose a PoseWarper network that leverages training videos with sparse annotations (every k frames) to learn to perform dense temporal pose propagation and estimation. Given a pair of video frames—a labeled Frame A and an unlabeled Frame B—we train our model to predict human pose in Frame A using the features from Frame B by means of deformable convolutions to implicitly learn the pose warping between A and B. We demonstrate that we can leverage our trained PoseWarper for several applications. First, at inference time we can reverse the application direction of our network in order to propagate pose information from manually annotated frames to unlabeled frames. This makes it possible to generate pose annotations for the entire video given only a few manually-labeled frames. Compared to modern label propagation methods based on optical flow, our warping mechanism is much more compact (6M vs 39M parameters), and also more accurate (88.7% mAP vs 83.8% mAP). We also show that we can improve the accuracy of a pose estimator by training it on an augmented dataset obtained by adding our propagated poses to the original manual labels. Lastly, we can use our PoseWarper to aggregate temporal pose information from neighboring frames during inference. This allows our system to achieve state-of-the-art pose detection results on the PoseTrack2017 and PoseTrack2018 datasets. Code has been made available at: https://github.com/facebookresearch/PoseWarper. |
Tasks | Multi-Person Pose Estimation, Optical Flow Estimation, Pose Estimation |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.04016v3 |
https://arxiv.org/pdf/1906.04016v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-temporal-pose-estimation-from |
Repo | https://github.com/facebookresearch/PoseWarper |
Framework | pytorch |
False Data Injection Attacks in Internet of Things and Deep Learning enabled Predictive Analytics
Title | False Data Injection Attacks in Internet of Things and Deep Learning enabled Predictive Analytics |
Authors | Gautam Raj Mode, Prasad Calyam, Khaza Anuarul Hoque |
Abstract | Industry 4.0 is the latest industrial revolution primarily merging automation with advanced manufacturing to reduce direct human effort and resources. Predictive maintenance (PdM) is an industry 4.0 solution, which facilitates predicting faults in a component or a system powered by state-of-the-art machine learning (ML) algorithms and the Internet-of-Things (IoT) sensors. However, IoT sensors and deep learning (DL) algorithms, both are known for their vulnerabilities to cyber-attacks. In the context of PdM systems, such attacks can have catastrophic consequences as they are hard to detect due to the nature of the attack. To date, the majority of the published literature focuses on the accuracy of DL enabled PdM systems and often ignores the effect of such attacks. In this paper, we demonstrate the effect of IoT sensor attacks on a PdM system. At first, we use three state-of-the-art DL algorithms, specifically, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN) for predicting the Remaining Useful Life (RUL) of a turbofan engine using NASA’s C-MAPSS dataset. The obtained results show that the GRU-based PdM model outperforms some of the recent literature on RUL prediction using the C-MAPSS dataset. Afterward, we model two different types of false data injection attacks (FDIA) on turbofan engine sensor data and evaluate their impact on CNN, LSTM, and GRU-based PdM systems. The obtained results demonstrate that FDI attacks on even a few IoT sensors can strongly defect the RUL prediction. However, the GRU-based PdM model performs better in terms of accuracy and resiliency. Lastly, we perform a study on the GRU-based PdM model using four different GRU networks with different sequence lengths. Our experiments reveal an interesting relationship between the accuracy, resiliency and sequence length for the GRU-based PdM models. |
Tasks | |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.01716v4 |
https://arxiv.org/pdf/1910.01716v4.pdf | |
PWC | https://paperswithcode.com/paper/false-data-injection-attacks-in-internet-of |
Repo | https://github.com/dependable-cps/FDIA-PdM |
Framework | none |
Latent Variable Sentiment Grammar
Title | Latent Variable Sentiment Grammar |
Authors | Liwen Zhang, Kewei Tu, Yue Zhang |
Abstract | Neural models have been investigated for sentiment classification over constituent trees. They learn phrase composition automatically by encoding tree structures but do not explicitly model sentiment composition, which requires to encode sentiment class labels. To this end, we investigate two formalisms with deep sentiment representations that capture sentiment subtype expressions by latent variables and Gaussian mixture vectors, respectively. Experiments on Stanford Sentiment Treebank (SST) show the effectiveness of sentiment grammar over vanilla neural encoders. Using ELMo embeddings, our method gives the best results on this benchmark. |
Tasks | Sentiment Analysis |
Published | 2019-06-29 |
URL | https://arxiv.org/abs/1907.00218v2 |
https://arxiv.org/pdf/1907.00218v2.pdf | |
PWC | https://paperswithcode.com/paper/latent-variable-sentiment-grammar |
Repo | https://github.com/Ehaschia/bi-tree-lstm-crf |
Framework | pytorch |
Faking and Discriminating the Navigation Data of a Micro Aerial Vehicle Using Quantum Generative Adversarial Networks
Title | Faking and Discriminating the Navigation Data of a Micro Aerial Vehicle Using Quantum Generative Adversarial Networks |
Authors | Michel Barbeau, Joaquin Garcia-Alfaro |
Abstract | We show that the Quantum Generative Adversarial Network (QGAN) paradigm can be employed by an adversary to learn generating data that deceives the monitoring of a Cyber-Physical System (CPS) and to perpetrate a covert attack. As a test case, the ideas are elaborated considering the navigation data of a Micro Aerial Vehicle (MAV). A concrete QGAN design is proposed to generate fake MAV navigation data. Initially, the adversary is entirely ignorant about the dynamics of the CPS, the strength of the approach from the point of view of the bad guy. A design is also proposed to discriminate between genuine and fake MAV navigation data. The designs combine classical optimization, qubit quantum computing and photonic quantum computing. Using the PennyLane software simulation, they are evaluated over a classical computing platform. We assess the learning time and accuracy of the navigation data generator and discriminator versus space complexity, i.e., the amount of quantum memory needed to solve the problem. |
Tasks | |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.03038v3 |
https://arxiv.org/pdf/1907.03038v3.pdf | |
PWC | https://paperswithcode.com/paper/faking-and-discriminating-the-navigation-data |
Repo | https://github.com/jgalfaro/mirrored-QGANMAV |
Framework | none |
Please Stop Permuting Features: An Explanation and Alternatives
Title | Please Stop Permuting Features: An Explanation and Alternatives |
Authors | Giles Hooker, Lucas Mentch |
Abstract | This paper advocates against permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because of their ability to provide model-agnostic measures that depend only on the pre-trained model output. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. Rather than simply add to this growing literature by further demonstrating such issues, here we seek to provide an explanation for the observed behavior. In particular, we argue that breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects through various settings where a ground-truth is understood and find support for previous claims in the literature that PaP metrics tend to over-emphasize correlated features both in variable importance and partial dependence plots, even though applying permutation methods to the ground-truth models do not. As an alternative, we recommend more direct approaches that have proven successful in other settings: explicitly removing features, conditional permutations, or model distillation methods. |
Tasks | |
Published | 2019-05-01 |
URL | http://arxiv.org/abs/1905.03151v1 |
http://arxiv.org/pdf/1905.03151v1.pdf | |
PWC | https://paperswithcode.com/paper/190503151 |
Repo | https://github.com/antonFJohansson/Please-Stop-Permuting-Features |
Framework | none |
Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments
Title | Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments |
Authors | Vasilis Syrgkanis, Victor Lei, Miruna Oprescu, Maggie Hei, Keith Battocchi, Greg Lewis |
Abstract | We consider the estimation of heterogeneous treatment effects with arbitrary machine learning methods in the presence of unobserved confounders with the aid of a valid instrument. Such settings arise in A/B tests with an intent-to-treat structure, where the experimenter randomizes over which user will receive a recommendation to take an action, and we are interested in the effect of the downstream action. We develop a statistical learning approach to the estimation of heterogeneous effects, reducing the problem to the minimization of an appropriate loss function that depends on a set of auxiliary models (each corresponding to a separate prediction task). The reduction enables the use of all recent algorithmic advances (e.g. neural nets, forests). We show that the estimated effect model is robust to estimation errors in the auxiliary models, by showing that the loss satisfies a Neyman orthogonality criterion. Our approach can be used to estimate projections of the true effect model on simpler hypothesis spaces. When these spaces are parametric, then the parameter estimates are asymptotically normal, which enables construction of confidence sets. We applied our method to estimate the effect of membership on downstream webpage engagement on TripAdvisor, using as an instrument an intent-to-treat A/B test among 4 million TripAdvisor users, where some users received an easier membership sign-up process. We also validate our method on synthetic data and on public datasets for the effects of schooling on income. |
Tasks | |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10176v3 |
https://arxiv.org/pdf/1905.10176v3.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-estimation-of-heterogeneous-1 |
Repo | https://github.com/Microsoft/EconML |
Framework | none |
Learning Nonsymmetric Determinantal Point Processes
Title | Learning Nonsymmetric Determinantal Point Processes |
Authors | Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, Syrine Krichene |
Abstract | Determinantal point processes (DPPs) have attracted substantial attention as an elegant probabilistic model that captures the balance between quality and diversity within sets. DPPs are conventionally parameterized by a positive semi-definite kernel matrix, and this symmetric kernel encodes only repulsive interactions between items. These so-called symmetric DPPs have significant expressive power, and have been successfully applied to a variety of machine learning tasks, including recommendation systems, information retrieval, and automatic summarization, among many others. Efficient algorithms for learning symmetric DPPs and sampling from these models have been reasonably well studied. However, relatively little attention has been given to nonsymmetric DPPs, which relax the symmetric constraint on the kernel. Nonsymmetric DPPs allow for both repulsive and attractive item interactions, which can significantly improve modeling power, resulting in a model that may better fit for some applications. We present a method that enables a tractable algorithm, based on maximum likelihood estimation, for learning nonsymmetric DPPs from data composed of observed subsets. Our method imposes a particular decomposition of the nonsymmetric kernel that enables such tractable learning algorithms, which we analyze both theoretically and experimentally. We evaluate our model on synthetic and real-world datasets, demonstrating improved predictive performance compared to symmetric DPPs, which have previously shown strong performance on modeling tasks associated with these datasets. |
Tasks | Information Retrieval, Point Processes, Recommendation Systems |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12962v2 |
https://arxiv.org/pdf/1905.12962v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-nonsymmetric-determinantal-point |
Repo | https://github.com/cgartrel/nonsymmetric-DPP-learning |
Framework | pytorch |
Tightness-aware Evaluation Protocol for Scene Text Detection
Title | Tightness-aware Evaluation Protocol for Scene Text Detection |
Authors | Yuliang Liu, Lianwen Jin, Zecheng Xie, Canjie Luo, Shuaitao Zhang, Lele Xie |
Abstract | Evaluation protocols play key role in the developmental progress of text detection methods. There are strict requirements to ensure that the evaluation methods are fair, objective and reasonable. However, existing metrics exhibit some obvious drawbacks: 1) They are not goal-oriented; 2) they cannot recognize the tightness of detection methods; 3) existing one-to-many and many-to-one solutions involve inherent loopholes and deficiencies. Therefore, this paper proposes a novel evaluation protocol called Tightness-aware Intersect-over-Union (TIoU) metric that could quantify completeness of ground truth, compactness of detection, and tightness of matching degree. Specifically, instead of merely using the IoU value, two common detection behaviors are properly considered; meanwhile, directly using the score of TIoU to recognize the tightness. In addition, we further propose a straightforward method to address the annotation granularity issue, which can fairly evaluate word and text-line detections simultaneously. By adopting the detection results from published methods and general object detection frameworks, comprehensive experiments on ICDAR 2013 and ICDAR 2015 datasets are conducted to compare recent metrics and the proposed TIoU metric. The comparison demonstrated some promising new prospects, e.g., determining the methods and frameworks for which the detection is tighter and more beneficial to recognize. Our method is extremely simple; however, the novelty is none other than the proposed metric can utilize simplest but reasonable improvements to lead to many interesting and insightful prospects and solving most the issues of the previous metrics. The code is publicly available at https://github.com/Yuliang-Liu/TIoU-metric . |
Tasks | Object Detection, Scene Text Detection |
Published | 2019-03-27 |
URL | http://arxiv.org/abs/1904.00813v1 |
http://arxiv.org/pdf/1904.00813v1.pdf | |
PWC | https://paperswithcode.com/paper/tightness-aware-evaluation-protocol-for-scene |
Repo | https://github.com/Yuliang-Liu/TIoU-metric |
Framework | none |