Paper Group ANR 1741
Quality Assessment of DIBR-synthesized views: An Overview. L*ReLU: Piece-wise Linear Activation Functions for Deep Fine-grained Visual Categorization. Zooming into Face Forensics: A Pixel-level Analysis. Measuring the Quality of Explanations: The System Causability Scale (SCS). Comparing Human and Machine Explanations. Prediction of Highway Lane Ch …
Quality Assessment of DIBR-synthesized views: An Overview
Title | Quality Assessment of DIBR-synthesized views: An Overview |
Authors | Shishun Tian, Lu Zhang, Wenbin Zou, Xia Li, Ting Su, Luce Morin, Olivier Deforges |
Abstract | The Depth-Image-Based-Rendering (DIBR) is one of the main fundamental technique to generate new views in 3D video applications, such as Multi-View Videos (MVV), Free-Viewpoint Videos (FVV) and Virtual Reality (VR). However, the quality assessment of DIBR-synthesized views is quite different from the traditional 2D images/videos. In recent years, several efforts have been made towards this topic, but there lacks a detailed survey in literature. In this paper, we provide a comprehensive survey on various current approaches for DIBR-synthesized views. The current accessible datasets of DIBR-synthesized views are firstly reviewed. Followed by a summary and analysis of the representative state-of-the-art objective metrics. Then, the performances of different objective metrics are evaluated and discussed on all available datasets. Finally, we discuss the potential challenges and suggest possible directions for future research. |
Tasks | |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.07036v1 |
https://arxiv.org/pdf/1911.07036v1.pdf | |
PWC | https://paperswithcode.com/paper/quality-assessment-of-dibr-synthesized-views |
Repo | |
Framework | |
L*ReLU: Piece-wise Linear Activation Functions for Deep Fine-grained Visual Categorization
Title | L*ReLU: Piece-wise Linear Activation Functions for Deep Fine-grained Visual Categorization |
Authors | Mina Basirat, Peter M. Roth |
Abstract | Deep neural networks paved the way for significant improvements in image visual categorization during the last years. However, even though the tasks are highly varying, differing in complexity and difficulty, existing solutions mostly build on the same architectural decisions. This also applies to the selection of activation functions (AFs), where most approaches build on Rectified Linear Units (ReLUs). In this paper, however, we show that the choice of a proper AF has a significant impact on the classification accuracy, in particular, if fine, subtle details are of relevance. Therefore, we propose to model the degree of absence and the presence of features via the AF by using piece-wise linear functions, which we refer to as L*ReLU. In this way, we can ensure the required properties, while still inheriting the benefits in terms of computational efficiency from ReLUs. We demonstrate our approach for the task of Fine-grained Visual Categorization (FGVC), running experiments on seven different benchmark datasets. The results do not only demonstrate superior results but also that for different tasks, having different characteristics, different AFs are selected. |
Tasks | Fine-Grained Visual Categorization |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12259v1 |
https://arxiv.org/pdf/1910.12259v1.pdf | |
PWC | https://paperswithcode.com/paper/lrelu-piece-wise-linear-activation-functions |
Repo | |
Framework | |
Zooming into Face Forensics: A Pixel-level Analysis
Title | Zooming into Face Forensics: A Pixel-level Analysis |
Authors | Jia Li, Tong Shen, Wei Zhang, Hui Ren, Dan Zeng, Tao Mei |
Abstract | The stunning progress in face manipulation methods has made it possible to synthesize realistic fake face images, which poses potential threats to our society. It is urgent to have face forensics techniques to distinguish those tampered images. A large scale dataset “FaceForensics++” has provided enormous training data generated from prominent face manipulation methods to facilitate anti-fake research. However, previous works focus more on casting it as a classification problem by only considering a global prediction. Through investigation to the problem, we find that training a classification network often fails to capture high quality features, which might lead to sub-optimal solutions. In this paper, we zoom in on the problem by conducting a pixel-level analysis, i.e. formulating it as a pixel-level segmentation task. By evaluating multiple architectures on both segmentation and classification tasks, We show the superiority of viewing the problem from a segmentation perspective. Different ablation studies are also performed to investigate what makes an effective and efficient anti-fake model. Strong baselines are also established, which, we hope, could shed some light on the field of face forensics. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.05790v1 |
https://arxiv.org/pdf/1912.05790v1.pdf | |
PWC | https://paperswithcode.com/paper/zooming-into-face-forensics-a-pixel-level |
Repo | |
Framework | |
Measuring the Quality of Explanations: The System Causability Scale (SCS). Comparing Human and Machine Explanations
Title | Measuring the Quality of Explanations: The System Causability Scale (SCS). Comparing Human and Machine Explanations |
Authors | Andreas Holzinger, André Carrington, Heimo Müller |
Abstract | Recent success in Artificial Intelligence (AI) and Machine Learning (ML) allow problem solving automatically without any human intervention. Autonomous approaches can be very convenient. However, in certain domains, e.g., in the medical domain, it is necessary to enable a domain expert to understand, why an algorithm came up with a certain result. Consequently, the field of Explainable AI (xAI) rapidly gained interest worldwide in various domains, particularly in medicine. Explainable AI studies transparency and traceability of opaque AI/ML and there are already a huge variety of methods. For example with layer-wise relevance propagation relevant parts of inputs to, and representations in, a neural network which caused a result, can be highlighted. This is a first important step to ensure that end users, e.g., medical professionals, assume responsibility for decision making with AI/ML and of interest to professionals and regulators. Interactive ML adds the component of human expertise to AI/ML processes by enabling them to re-enact and retrace AI/ML results, e.g. let them check it for plausibility. This requires new human-AI interfaces for explainable AI. In order to build effective and efficient interactive human-AI interfaces we have to deal with the question of how to evaluate the quality of explanations given by an explainable AI system. In this paper we introduce our System Causability Scale (SCS) to measure the quality of explanations. It is based on our notion of Causability (Holzinger et al., 2019) combined with concepts adapted from a widely accepted usability scale. |
Tasks | Decision Making |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09024v1 |
https://arxiv.org/pdf/1912.09024v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-quality-of-explanations-the |
Repo | |
Framework | |
Prediction of Highway Lane Changes Based on Prototype Trajectories
Title | Prediction of Highway Lane Changes Based on Prototype Trajectories |
Authors | David Augustin, Marius Hofmann, Ulrich Konigorski |
Abstract | The vision of automated driving is to increase both road safety and efficiency, while offering passengers a convenient travel experience. This requires that autonomous systems correctly estimate the current traffic scene and its likely evolution. In highway scenarios early recognition of cut-in maneuvers is essential for risk-aware maneuver planning. In this paper, a statistical approach is proposed, which advantageously utilizes a set of prototypical lane change trajectories to realize both early maneuver detection and uncertainty-aware trajectory prediction for traffic participants. Generation of prototype trajectories from real traffic data is accomplished by Agglomerative Hierarchical Clustering. During clustering, the alignment of the cluster prototypes to each other is optimized and the cohesion of the resulting prototype is limited when two clusters merge. In the prediction stage, the similarity of observed vehicle motion and typical lane change patterns in the data base is evaluated to construct a set of significant features for maneuver classification via Boosted Decision Trees. The future trajectory is predicted combining typical lane change realizations in a mixture model. B-splines based trajectory adaptations guarantee continuity during transition from actually observed to predicted vehicle states. Quantitative evaluation results demonstrate the proposed concept’s improved performance for both maneuver and trajectory prediction compared to a previously implemented reference approach. |
Tasks | Trajectory Prediction |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.11208v1 |
https://arxiv.org/pdf/1907.11208v1.pdf | |
PWC | https://paperswithcode.com/paper/prediction-of-highway-lane-changes-based-on |
Repo | |
Framework | |
Fast and Accurate Convolutional Object Detectors for Real-time Embedded Platforms
Title | Fast and Accurate Convolutional Object Detectors for Real-time Embedded Platforms |
Authors | Min-Kook Choi, Jaehyung Park, Heechul Jung, Jinhee Lee, Soo-Heang Eo |
Abstract | With the improvements in the object detection networks, several variations of object detection networks have been achieved impressive performance. However, the performance evaluation of most models has focused on detection accuracy, and the performance verification is mostly based on high-end GPU hardwares. In this paper, we propose real-time object detectors that guarantees balanced performance for real-time system on embedded platforms. The proposed model utilizes the basic head structure of the RefineDet model, which is a variant of the single shot object detector (SSD). In order to ensure real-time performance, CNN models with relatively shallow layers or fewer parameters have been used as the backbone structure. In addition to the basic VGGNet and ResNet structures, various backbone structures such as MobileNet, Xception, ResNeXt, Inception-SENet, and SE-ResNeXt have been used for this purpose. Successful training of object detection networks was achieved through an appropriate combination of intermediate layers. The accuracy of the proposed detector was estimated by the evaluation of MS-COCO 2017 object detection dataset and the inference speed on the NVIDIA Drive PX2 and Jetson Xaviers boards were tested to verify real-time performance in the embedded systems. The experiments show that the proposed models ensure balanced performance in terms of accuracy and inference speed in the embedded system environments. In addition, unlike the high-end GPUs, the use of embedded GPUs involves several additional concerns for efficient inference, which have been identified in this work. The codes and models are publicly available on the web (link). |
Tasks | Object Detection |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10798v1 |
https://arxiv.org/pdf/1909.10798v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-accurate-convolutional-object |
Repo | |
Framework | |
Understanding the Limitations of Variational Mutual Information Estimators
Title | Understanding the Limitations of Variational Mutual Information Estimators |
Authors | Jiaming Song, Stefano Ermon |
Abstract | Variational approaches based on neural networks are showing promise for estimating mutual information (MI) between high dimensional variables. However, they can be difficult to use in practice due to poorly understood bias/variance tradeoffs. We theoretically show that, under some conditions, estimators such as MINE exhibit variance that could grow exponentially with the true amount of underlying MI. We also empirically demonstrate that existing estimators fail to satisfy basic self-consistency properties of MI, such as data processing and additivity under independence. Based on a unified perspective of variational approaches, we develop a new estimator that focuses on variance reduction. Empirical results on standard benchmark tasks demonstrate that our proposed estimator exhibits improved bias-variance trade-offs on standard benchmark tasks. |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06222v2 |
https://arxiv.org/pdf/1910.06222v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-limitations-of-variational |
Repo | |
Framework | |
When Explanations Lie: Why Many Modified BP Attributions Fail
Title | When Explanations Lie: Why Many Modified BP Attributions Fail |
Authors | Leon Sixt, Maximilian Granz, Tim Landgraf |
Abstract | Attribution methods aim to explain a neural network’s prediction by highlighting the most relevant image areas. A popular approach is to backpropagate (BP) a custom relevance score using modified rules, rather than the gradient. We analyze an extensive set of modified BP methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation, Excitation BP, PatternAttribution, DeepLIFT, Deconv, RectGrad, and Guided BP. We find empirically that the explanations of all mentioned methods, except for DeepLIFT, are independent of the parameters of later layers. We provide theoretical insights for this surprising behavior and also analyze why DeepLIFT does not suffer from this limitation. Empirically, we measure how information of later layers is ignored by using our new metric, cosine similarity convergence (CSC). The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically. |
Tasks | |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09818v4 |
https://arxiv.org/pdf/1912.09818v4.pdf | |
PWC | https://paperswithcode.com/paper/when-explanations-lie-why-modified-bp |
Repo | |
Framework | |
Span Model for Open Information Extraction on Accurate Corpus
Title | Span Model for Open Information Extraction on Accurate Corpus |
Authors | Junlang Zhan, Hai Zhao |
Abstract | Open information extraction (Open IE) is a challenging task especially due to its brittle data basis. Most of Open IE systems have to be trained on automatically built corpus and evaluated on inaccurate test set. In this work, we first alleviate this difficulty from both sides of training and test sets. For the former, we propose an improved model design to more sufficiently exploit training dataset. For the latter, we present our accurately re-annotated benchmark test set (Re-OIE6) according to a series of linguistic observation and analysis. Then, we introduce a span model instead of previous adopted sequence labeling formulization for n-ary Open IE. Our newly introduced model achieves new state-of-the-art performance on both benchmark evaluation datasets. |
Tasks | Open Information Extraction |
Published | 2019-01-30 |
URL | https://arxiv.org/abs/1901.10879v6 |
https://arxiv.org/pdf/1901.10879v6.pdf | |
PWC | https://paperswithcode.com/paper/span-based-open-information-extraction |
Repo | |
Framework | |
Stochastic trajectory prediction with social graph network
Title | Stochastic trajectory prediction with social graph network |
Authors | Lidan Zhang, Qi She, Ping Guo |
Abstract | Pedestrian trajectory prediction is a challenging task because of the complexity of real-world human social behaviors and uncertainty of the future motion. For the first issue, existing methods adopt fully connected topology for modeling the social behaviors, while ignoring non-symmetric pairwise relationships. To effectively capture social behaviors of relevant pedestrians, we utilize a directed social graph which is dynamically constructed on timely location and speed direction. Based on the social graph, we further propose a network to collect social effects and accumulate with individual representation, in order to generate destination-oriented and social-aware representations. For the second issue, instead of modeling the uncertainty of the entire future as a whole, we utilize a temporal stochastic method for sequentially learning a prior model of uncertainty during social interactions. The prediction on the next step is then generated by sampling on the prior model and progressively decoding with a hierarchical LSTMs. Experimental results on two public datasets show the effectiveness of our method, especially when predicting trajectories in very crowded scenes. |
Tasks | Trajectory Prediction |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10233v1 |
https://arxiv.org/pdf/1907.10233v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-trajectory-prediction-with-social |
Repo | |
Framework | |
IFQ-Net: Integrated Fixed-point Quantization Networks for Embedded Vision
Title | IFQ-Net: Integrated Fixed-point Quantization Networks for Embedded Vision |
Authors | Hongxing Gao, Wei Tao, Dongchao Wen, Tse-Wei Chen, Kinya Osa, Masami Kato |
Abstract | Deploying deep models on embedded devices has been a challenging problem since the great success of deep learning based networks. Fixed-point networks, which represent their data with low bits fixed-point and thus give remarkable savings on memory usage, are generally preferred. Even though current fixed-point networks employ relative low bits (e.g. 8-bits), the memory saving is far from enough for the embedded devices. On the other hand, quantization deep networks, for example XNOR-Net and HWGQNet, quantize the data into 1 or 2 bits resulting in more significant memory savings but still contain lots of floatingpoint data. In this paper, we propose a fixed-point network for embedded vision tasks through converting the floatingpoint data in a quantization network into fixed-point. Furthermore, to overcome the data loss caused by the conversion, we propose to compose floating-point data operations across multiple layers (e.g. convolution, batch normalization and quantization layers) and convert them into fixedpoint. We name the fixed-point network obtained through such integrated conversion as Integrated Fixed-point Quantization Networks (IFQ-Net). We demonstrate that our IFQNet gives 2.16x and 18x more savings on model size and runtime feature map memory respectively with similar accuracy on ImageNet. Furthermore, based on YOLOv2, we design IFQ-Tinier-YOLO face detector which is a fixed-point network with 256x reduction in model size (246k Bytes) than Tiny-YOLO. We illustrate the promising performance of our face detector in terms of detection rate on Face Detection Data Set and Bencmark (FDDB) and qualitative results of detecting small faces of Wider Face dataset. |
Tasks | Face Detection, Quantization |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08076v1 |
https://arxiv.org/pdf/1911.08076v1.pdf | |
PWC | https://paperswithcode.com/paper/ifq-net-integrated-fixed-point-quantization |
Repo | |
Framework | |
Scalable Block-Diagonal Locality-Constrained Projective Dictionary Learning
Title | Scalable Block-Diagonal Locality-Constrained Projective Dictionary Learning |
Authors | Zhao Zhang, Weiming Jiang, Zheng Zhang, Sheng Li, Guangcan Liu, Jie Qin |
Abstract | We propose a novel structured discriminative block-diagonal dictionary learning method, referred to as scalable Locality-Constrained Projective Dictionary Learning (LC-PDL), for efficient representation and classification. To improve the scalability by saving both training and testing time, our LC-PDL aims at learning a structured discriminative dictionary and a block-diagonal representation without using costly l0/l1-norm. Besides, it avoids extra time-consuming sparse reconstruction process with the well-trained dictionary for new sample as many existing models. More importantly, LC-PDL avoids using the complementary data matrix to learn the sub-dictionary over each class. To enhance the performance, we incorporate a locality constraint of atoms into the DL procedures to keep local information and obtain the codes of samples over each class separately. A block-diagonal discriminative approximation term is also derived to learn a discriminative projection to bridge data with their codes by extracting the special block-diagonal features from data, which can ensure the approximate coefficients to associate with its label information clearly. Then, a robust multiclass classifier is trained over extracted block-diagonal codes for accurate label predictions. Experimental results verify the effectiveness of our algorithm. |
Tasks | Dictionary Learning |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10568v1 |
https://arxiv.org/pdf/1905.10568v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-block-diagonal-locality-constrained |
Repo | |
Framework | |
It’s easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets
Title | It’s easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets |
Authors | Subhashini Venugopalan, Arunachalam Narayanaswamy, Samuel Yang, Anton Gerashcenko, Scott Lipnick, Nina Makhortova, James Hawrot, Christine Marques, Joao Pereira, Michael Brenner, Lee Rubin, Brian Wainger, Marc Berndl |
Abstract | Confounding variables are a well known source of nuisance in biomedical studies. They present an even greater challenge when we combine them with black-box machine learning techniques that operate on raw data. This work presents two case studies. In one, we discovered biases arising from systematic errors in the data generation process. In the other, we found a spurious source of signal unrelated to the prediction task at hand. In both cases, our prediction models performed well but under careful examination hidden confounders and biases were revealed. These are cautionary tales on the limits of using machine learning techniques on raw data from scientific experiments. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.07661v1 |
https://arxiv.org/pdf/1912.07661v1.pdf | |
PWC | https://paperswithcode.com/paper/its-easy-to-fool-yourself-case-studies-on |
Repo | |
Framework | |
Exploiting Human Social Cognition for the Detection of Fake and Fraudulent Faces via Memory Networks
Title | Exploiting Human Social Cognition for the Detection of Fake and Fraudulent Faces via Memory Networks |
Authors | Tharindu Fernando, Clinton Fookes, Simon Denman, Sridha Sridharan |
Abstract | Advances in computer vision have brought us to the point where we have the ability to synthesise realistic fake content. Such approaches are seen as a source of disinformation and mistrust, and pose serious concerns to governments around the world. Convolutional Neural Networks (CNNs) demonstrate encouraging results when detecting fake images that arise from the specific type of manipulation they are trained on. However, this success has not transitioned to unseen manipulation types, resulting in a significant gap in the line-of-defense. We propose a Hierarchical Memory Network (HMN) architecture, which is able to successfully detect faked faces by utilising knowledge stored in neural memories as well as visual cues to reason about the perceived face and anticipate its future semantic embeddings. This renders a generalisable face tampering detection framework. Experimental results demonstrate the proposed approach achieves superior performance for fake and fraudulent face detection compared to the state-of-the-art. |
Tasks | Face Detection |
Published | 2019-11-17 |
URL | https://arxiv.org/abs/1911.07844v1 |
https://arxiv.org/pdf/1911.07844v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-human-social-cognition-for-the |
Repo | |
Framework | |
DupNet: Towards Very Tiny Quantized CNN with Improved Accuracy for Face Detection
Title | DupNet: Towards Very Tiny Quantized CNN with Improved Accuracy for Face Detection |
Authors | Hongxing Gao, Wei Tao, Dongchao Wen, Junjie Liu, Tse-Wei Chen, Kinya Osa, Masami Kato |
Abstract | Deploying deep learning based face detectors on edge devices is a challenging task due to the limited computation resources. Even though binarizing the weights of a very tiny network gives impressive compactness on model size (e.g. 240.9 KB for IFQ-Tinier-YOLO), it is not tiny enough to fit in the embedded devices with strict memory constraints. In this paper, we propose DupNet which consists of two parts. Firstly, we employ weights with duplicated channels for the weight-intensive layers to reduce the model size. Secondly, for the quantization-sensitive layers whose quantization causes notable accuracy drop, we duplicate its input feature maps. It allows us to use more weights channels for convolving more representative outputs. Based on that, we propose a very tiny face detector, DupNet-Tinier-YOLO, which is 6.5X times smaller on model size and 42.0% less complex on computation and meanwhile achieves 2.4% higher detection than IFQ-Tinier-YOLO. Comparing with the full precision Tiny-YOLO, our DupNet-Tinier-YOLO gives 1,694.2X and 389.9X times savings on model size and computation complexity respectively with only 4.0% drop on detection rate (0.880 vs. 0.920). Moreover, our DupNet-Tinier-YOLO is only 36.9 KB, which is the tiniest deep face detector to our best knowledge. |
Tasks | Face Detection, Quantization |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05341v1 |
https://arxiv.org/pdf/1911.05341v1.pdf | |
PWC | https://paperswithcode.com/paper/dupnet-towards-very-tiny-quantized-cnn-with |
Repo | |
Framework | |