October 16, 2019

2978 words 14 mins read

Paper Group ANR 1103

Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd. Video Compression through Image Interpolation. Convolutional Sparse Coding for High Dynamic Range Imaging. Distribution Discrepancy Maximization for Image Privacy Preserving. Can We Assess Mental Health through Social Media and Smart Devices? Addressing Bias in Methodology and Evaluation. Evo …

Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd


Title	Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd
Authors	Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li
Abstract	Pedestrian detection in crowded scenes is a challenging problem since the pedestrians often gather together and occlude each other. In this paper, we propose a new occlusion-aware R-CNN (OR-CNN) to improve the detection accuracy in the crowd. Specifically, we design a new aggregation loss to enforce proposals to be close and locate compactly to the corresponding objects. Meanwhile, we use a new part occlusion-aware region of interest (PORoI) pooling unit to replace the RoI pooling layer in order to integrate the prior structure information of human body with visibility prediction into the network to handle occlusion. Our detector is trained in an end-to-end fashion, which achieves state-of-the-art results on three pedestrian detection datasets, i.e., CityPersons, ETH, and INRIA, and performs on-pair with the state-of-the-arts on Caltech.
Tasks	Pedestrian Detection
Published	2018-07-23
URL	http://arxiv.org/abs/1807.08407v1
PDF	http://arxiv.org/pdf/1807.08407v1.pdf
PWC	https://paperswithcode.com/paper/occlusion-aware-r-cnn-detecting-pedestrians
Repo
Framework

Video Compression through Image Interpolation


Title	Video Compression through Image Interpolation
Authors	Chao-Yuan Wu, Nayan Singhal, Philipp Krähenbühl
Abstract	An ever increasing amount of our digital communication, media consumption, and content creation revolves around videos. We share, watch, and archive many aspects of our lives through them, all of which are powered by strong video compression. Traditional video compression is laboriously hand designed and hand optimized. This paper presents an alternative in an end-to-end deep learning codec. Our codec builds on one simple idea: Video compression is repeated image interpolation. It thus benefits from recent advances in deep image interpolation and generation. Our deep video codec outperforms today’s prevailing codecs, such as H.261, MPEG-4 Part 2, and performs on par with H.264.
Tasks	Video Compression
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06919v1
PDF	http://arxiv.org/pdf/1804.06919v1.pdf
PWC	https://paperswithcode.com/paper/video-compression-through-image-interpolation
Repo
Framework

Convolutional Sparse Coding for High Dynamic Range Imaging


Title	Convolutional Sparse Coding for High Dynamic Range Imaging
Authors	Ana Serrano, Felix Heide, Diego Gutierrez, Gordon Wetzstein, Belen Masia
Abstract	Current HDR acquisition techniques are based on either (i) fusing multibracketed, low dynamic range (LDR) images, (ii) modifying existing hardware and capturing different exposures simultaneously with multiple sensors, or (iii) reconstructing a single image with spatially-varying pixel exposures. In this paper, we propose a novel algorithm to recover high-quality HDRI images from a single, coded exposure. The proposed reconstruction method builds on recently-introduced ideas of convolutional sparse coding (CSC); this paper demonstrates how to make CSC practical for HDR imaging. We demonstrate that the proposed algorithm achieves higher-quality reconstructions than alternative methods, we evaluate optical coding schemes, analyze algorithmic parameters, and build a prototype coded HDR camera that demonstrates the utility of convolutional sparse HDRI coding with a custom hardware platform.
Tasks
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04942v1
PDF	http://arxiv.org/pdf/1806.04942v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-sparse-coding-for-high-dynamic
Repo
Framework

Distribution Discrepancy Maximization for Image Privacy Preserving


Title	Distribution Discrepancy Maximization for Image Privacy Preserving
Authors	Sen Liu, Jianxin Lin, Zhibo Chen
Abstract	With the rapid increase in online photo sharing activities, image obfuscation algorithms become particularly important for protecting the sensitive information in the shared photos. However, existing image obfuscation methods based on hand-crafted principles are challenged by the dramatic development of deep learning techniques. To address this problem, we propose to maximize the distribution discrepancy between the original image domain and the encrypted image domain. Accordingly, we introduce a collaborative training scheme: a discriminator $D$ is trained to discriminate the reconstructed image from the encrypted image, and an encryption model $G_e$ is required to generate these two kinds of images to maximize the recognition rate of $D$, leading to the same training objective for both $D$ and $G_e$. We theoretically prove that such a training scheme maximizes two distributions’ discrepancy. Compared with commonly-used image obfuscation methods, our model can produce satisfactory defense against the attack of deep recognition models indicated by significant accuracy decreases on FaceScrub, Casia-WebFace and LFW datasets.
Tasks
Published	2018-11-18
URL	http://arxiv.org/abs/1811.07335v1
PDF	http://arxiv.org/pdf/1811.07335v1.pdf
PWC	https://paperswithcode.com/paper/distribution-discrepancy-maximization-for
Repo
Framework


Title	Can We Assess Mental Health through Social Media and Smart Devices? Addressing Bias in Methodology and Evaluation
Authors	Adam Tsakalidis, Maria Liakata, Theo Damoulas, Alexandra I. Cristea
Abstract	Predicting mental health from smartphone and social media data on a longitudinal basis has recently attracted great interest, with very promising results being reported across many studies. Such approaches have the potential to revolutionise mental health assessment, if their development and evaluation follows a real world deployment setting. In this work we take a closer look at state-of-the-art approaches, using different mental health datasets and indicators, different feature sources and multiple simulations, in order to assess their ability to generalise. We demonstrate that under a pragmatic evaluation framework, none of the approaches deliver or even approach the reported performances. In fact, we show that current state-of-the-art approaches can barely outperform the most na"ive baselines in the real-world setting, posing serious questions not only about their deployment ability, but also about the contribution of the derived features for the mental health assessment task and how to make better use of such data in the future.
Tasks
Published	2018-07-19
URL	http://arxiv.org/abs/1807.07351v1
PDF	http://arxiv.org/pdf/1807.07351v1.pdf
PWC	https://paperswithcode.com/paper/can-we-assess-mental-health-through-social
Repo
Framework

Evolution of Images with Diversity and Constraints Using a Generator Network


Title	Evolution of Images with Diversity and Constraints Using a Generator Network
Authors	Aneta Neumann, Christo Pyromallis, Bradley Alexander
Abstract	Evolutionary search has been extensively used to generate artistic images. Raw images have high dimensionality which makes a direct search for an image challenging. In previous work this problem has been addressed by using compact symbolic encodings or by constraining images with priors. Recent developments in deep learning have enabled a generation of compelling artistic images using generative networks that encode images with lower-dimensional latent spaces. To date this work has focused on the generation of images concordant with one or more classes and transfer of artistic styles. There is currently no work which uses search in this latent space to generate images scoring high or low aesthetic measures. In this paper we use evolutionary methods to search for images in two datasets, faces and butterflies, and demonstrate the effect of optimising aesthetic feature scores in one or two dimensions. The work gives a preliminary indication of which feature measures promote the most interesting images and how some of these measures interact.
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05480v1
PDF	http://arxiv.org/pdf/1802.05480v1.pdf
PWC	https://paperswithcode.com/paper/evolution-of-images-with-diversity-and
Repo
Framework

Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks


Title	Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks
Authors	Atsushi Yaguchi, Taiji Suzuki, Wataru Asano, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa
Abstract	In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation. However, large DNN models are needed to achieve state-of-the-art performance, exceeding the capabilities of edge devices. Model reduction is thus needed for practical use. In this paper, we point out that deep learning automatically induces group sparsity of weights, in which all weights connected to an output channel (node) are zero, when training DNNs under the following three conditions: (1) rectified-linear-unit (ReLU) activations, (2) an $L_2$-regularized objective function, and (3) the Adam optimizer. Next, we analyze this behavior both theoretically and experimentally, and propose a simple model reduction method: eliminate the zero weights after training the DNN. In experiments on MNIST and CIFAR-10 datasets, we demonstrate the sparsity with various training setups. Finally, we show that our method can efficiently reduce the model size and performs well relative to methods that use a sparsity-inducing regularizer.
Tasks	Machine Translation, Speech Recognition
Published	2018-12-19
URL	http://arxiv.org/abs/1812.08119v1
PDF	http://arxiv.org/pdf/1812.08119v1.pdf
PWC	https://paperswithcode.com/paper/adam-induces-implicit-weight-sparsity-in
Repo
Framework

Ensemble of Multi-sized FCNs to Improve White Matter Lesion Segmentation


Title	Ensemble of Multi-sized FCNs to Improve White Matter Lesion Segmentation
Authors	Zhewei Wang, Charles D. Smith, Jundong Liu
Abstract	In this paper, we develop a two-stage neural network solution for the challenging task of white-matter lesion segmentation. To cope with the vast vari- ability in lesion sizes, we sample brain MR scans with patches at three differ- ent dimensions and feed them into separate fully convolutional neural networks (FCNs). In the second stage, we process large and small lesion separately, and use ensemble-nets to combine the segmentation results generated from the FCNs. A novel activation function is adopted in the ensemble-nets to improve the segmen- tation accuracy measured by Dice Similarity Coefficient. Experiments on MICCAI 2017 White Matter Hyperintensities (WMH) Segmentation Challenge data demonstrate that our two-stage-multi-sized FCN approach, as well as the new activation function, are effective in capturing white-matter lesions in MR images.
Tasks	Lesion Segmentation
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09298v1
PDF	http://arxiv.org/pdf/1807.09298v1.pdf
PWC	https://paperswithcode.com/paper/ensemble-of-multi-sized-fcns-to-improve-white
Repo
Framework

Deep Neural Network Augmentation: Generating Faces for Affect Analysis


Title	Deep Neural Network Augmentation: Generating Faces for Affect Analysis
Authors	Dimitrios Kollias, Shiyang Cheng, Evangelos Ververas, Irene Kotsia, Stefanos Zafeiriou
Abstract	This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the emotion activation). The proposed approach accepts the following inputs: i) a neutral 2D image of a person; ii) a basic facial expression or a pair of valence-arousal (VA) emotional state descriptors to be generated, or a path of affect in the 2D VA Space to be generated as an image sequence. In order to synthesize affect in terms of VA, for this person, $600,000$ frames from the 4DFAB database were annotated. The affect synthesis is implemented by fitting a 3D Morphable Model on the neutral image, then deforming the reconstructed face and adding the inputted affect, and blending the new face with the given affect into the original image. Qualitative experiments illustrate the generation of realistic images, when the neutral image is sampled from thirteen well known lab-controlled or in-the-wild databases, including Aff-Wild, AffectNet, RAF-DB; comparisons with Generative Adversarial Networks (GANs) show the higher quality achieved by the proposed approach. Then, quantitative experiments are conducted, in which the synthesized images are used for data augmentation in training Deep Neural Networks to perform affect recognition over all databases; greatly improved performances are achieved when compared with state-of-the-art methods, as well as with GAN-based data augmentation, in all cases.
Tasks	Data Augmentation, Face Generation
Published	2018-11-12
URL	https://arxiv.org/abs/1811.05027v2
PDF	https://arxiv.org/pdf/1811.05027v2.pdf
PWC	https://paperswithcode.com/paper/generating-faces-for-affect-analysis
Repo
Framework

Active covariance estimation by random sub-sampling of variables


Title	Active covariance estimation by random sub-sampling of variables
Authors	Eduardo Pavez, Antonio Ortega
Abstract	We study covariance matrix estimation for the case of partially observed random vectors, where different samples contain different subsets of vector coordinates. Each observation is the product of the variable of interest with a $0-1$ Bernoulli random variable. We analyze an unbiased covariance estimator under this model, and derive an error bound that reveals relations between the sub-sampling probabilities and the entries of the covariance matrix. We apply our analysis in an active learning framework, where the expected number of observed variables is small compared to the dimension of the vector of interest, and propose a design of optimal sub-sampling probabilities and an active covariance matrix estimation algorithm.
Tasks	Active Learning
Published	2018-04-04
URL	http://arxiv.org/abs/1804.01620v1
PDF	http://arxiv.org/pdf/1804.01620v1.pdf
PWC	https://paperswithcode.com/paper/active-covariance-estimation-by-random-sub
Repo
Framework

TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification


Title	TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification
Authors	Wenpeng Yin, Dan Roth
Abstract	Determining whether a given claim is supported by evidence is a fundamental NLP problem that is best modeled as Textual Entailment. However, given a large collection of text, finding evidence that could support or refute a given claim is a challenge in itself, amplified by the fact that different evidence might be needed to support or refute a claim. Nevertheless, most prior work decouples evidence identification from determining the truth value of the claim given the evidence. We propose to consider these two aspects jointly. We develop TwoWingOS (two-wing optimization strategy), a system that, while identifying appropriate evidence for a claim, also determines whether or not the claim is supported by the evidence. Given the claim, TwoWingOS attempts to identify a subset of the evidence candidates; given the predicted evidence, it then attempts to determine the truth value of the corresponding claim. We treat this challenge as coupled optimization problems, training a joint model for it. TwoWingOS offers two advantages: (i) Unlike pipeline systems, it facilitates flexible-size evidence set, and (ii) Joint training improves both the claim entailment and the evidence identification. Experiments on a benchmark dataset show state-of-the-art performance. Code: https://github.com/yinwenpeng/FEVER
Tasks	Natural Language Inference
Published	2018-08-10
URL	http://arxiv.org/abs/1808.03465v2
PDF	http://arxiv.org/pdf/1808.03465v2.pdf
PWC	https://paperswithcode.com/paper/twowingos-a-two-wing-optimization-strategy
Repo
Framework

Object Detection with Deep Learning: A Review


Title	Object Detection with Deep Learning: A Review
Authors	Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, Xindong Wu
Abstract	Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.
Tasks	Face Detection, Object Detection, Pedestrian Detection, Salient Object Detection
Published	2018-07-15
URL	http://arxiv.org/abs/1807.05511v2
PDF	http://arxiv.org/pdf/1807.05511v2.pdf
PWC	https://paperswithcode.com/paper/object-detection-with-deep-learning-a-review
Repo
Framework

Deep Lip Reading: a comparison of models and an online application


Title	Deep Lip Reading: a comparison of models and an online application
Authors	Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
Abstract	The goal of this paper is to develop state-of-the-art models for lip reading – visual speech recognition. We develop three architectures and compare their accuracy and training times: (i) a recurrent model using LSTMs; (ii) a fully convolutional model; and (iii) the recently proposed transformer model. The recurrent and fully convolutional models are trained with a Connectionist Temporal Classification loss and use an explicit language model for decoding, the transformer is a sequence-to-sequence model. Our best performing model improves the state-of-the-art word error rate on the challenging BBC-Oxford Lip Reading Sentences 2 (LRS2) benchmark dataset by over 20 percent. As a further contribution we investigate the fully convolutional model when used for online (real time) lip reading of continuous speech, and show that it achieves high performance with low latency.
Tasks	Language Modelling, Speech Recognition, Visual Speech Recognition
Published	2018-06-15
URL	http://arxiv.org/abs/1806.06053v1
PDF	http://arxiv.org/pdf/1806.06053v1.pdf
PWC	https://paperswithcode.com/paper/deep-lip-reading-a-comparison-of-models-and
Repo
Framework

Is the Pedestrian going to Cross? Answering by 2D Pose Estimation


Title	Is the Pedestrian going to Cross? Answering by 2D Pose Estimation
Authors	Zhijie Fang, Antonio M. López
Abstract	Our recent work suggests that, thanks to nowadays powerful CNNs, image-based 2D pose estimation is a promising cue for determining pedestrian intentions such as crossing the road in the path of the ego-vehicle, stopping before entering the road, and starting to walk or bending towards the road. This statement is based on the results obtained on non-naturalistic sequences (Daimler dataset), i.e. in sequences choreographed specifically for performing the study. Fortunately, a new publicly available dataset (JAAD) has appeared recently to allow developing methods for detecting pedestrian intentions in naturalistic driving conditions; more specifically, for addressing the relevant question is the pedestrian going to cross? Accordingly, in this paper we use JAAD to assess the usefulness of 2D pose estimation for answering such a question. We combine CNN-based pedestrian detection, tracking and pose estimation to predict the crossing action from monocular images. Overall, the proposed pipeline provides new state-of-the-art results.
Tasks	Pedestrian Detection, Pose Estimation
Published	2018-07-15
URL	https://arxiv.org/abs/1807.10580v1
PDF	https://arxiv.org/pdf/1807.10580v1.pdf
PWC	https://paperswithcode.com/paper/is-the-pedestrian-going-to-cross-answering-by
Repo
Framework

On dynamic ensemble selection and data preprocessing for multi-class imbalance learning


Title	On dynamic ensemble selection and data preprocessing for multi-class imbalance learning
Authors	Rafael M. O. Cruz, Robert Sabourin, George D. C. Cavalcanti
Abstract	Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the majority class which has a large number of instances. Ensemble of classifiers have been reported to yield promising results. However, the majority of ensemble methods applied too imbalanced learning are static ones. Moreover, they only deal with binary imbalanced problems. Hence, this paper presents an empirical analysis of dynamic selection techniques and data preprocessing methods for dealing with multi-class imbalanced problems. We considered five variations of preprocessing methods and four dynamic selection methods. Our experiments conducted on 26 multi-class imbalanced problems show that the dynamic ensemble improves the F-measure and the G-mean as compared to the static ensemble. Moreover, data preprocessing plays an important role in such cases.
Tasks
Published	2018-03-11
URL	http://arxiv.org/abs/1803.03877v2
PDF	http://arxiv.org/pdf/1803.03877v2.pdf
PWC	https://paperswithcode.com/paper/on-dynamic-ensemble-selection-and-data
Repo
Framework