April 3, 2020

3588 words 17 mins read

Paper Group AWR 4

Masked Face Recognition Dataset and Application. PrivacyFL: A simulator for privacy-preserving and secure federated learning. Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation Protocol. A Machine Learning alternative to placebo-controlled clinical trials upon new diseases: A primer. DSGN: Deep Stereo Geometry Network …

Masked Face Recognition Dataset and Application


Title	Masked Face Recognition Dataset and Application
Authors	Zhongyuan Wang, Guangcheng Wang, Baojin Huang, Zhangyang Xiong, Qi Hong, Hao Wu, Peng Yi, Kui Jiang, Nanxi Wang, Yingjiao Pei, Heling Chen, Yu Miao, Zhibing Huang, Jinbi Liang
Abstract	In order to effectively prevent the spread of COVID-19 virus, almost everyone wears a mask during coronavirus epidemic. This almost makes conventional facial recognition technology ineffective in many cases, such as community access control, face access control, facial attendance, facial security checks at train stations, etc. Therefore, it is very urgent to improve the recognition performance of the existing face recognition technology on the masked faces. Most current advanced face recognition approaches are designed based on deep learning, which depend on a large number of face samples. However, at present, there are no publicly available masked face recognition datasets. To this end, this work proposes three types of masked face datasets, including Masked Face Detection Dataset (MFDD), Real-world Masked Face Recognition Dataset (RMFRD) and Simulated Masked Face Recognition Dataset (SMFRD). Among them, to the best of our knowledge, RMFRD is currently theworld’s largest real-world masked face dataset. These datasets are freely available to industry and academia, based on which various applications on masked faces can be developed. The multi-granularity masked face recognition model we developed achieves 95% accuracy, exceeding the results reported by the industry. Our datasets are available at: https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset.
Tasks	Face Detection, Face Recognition
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09093v2
PDF	https://arxiv.org/pdf/2003.09093v2.pdf
PWC	https://paperswithcode.com/paper/masked-face-recognition-dataset-and
Repo	https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset
Framework	none

PrivacyFL: A simulator for privacy-preserving and secure federated learning


Title	PrivacyFL: A simulator for privacy-preserving and secure federated learning
Authors	Vaikkunth Mugunthan, Anton Peraire-Bueno, Lalana Kagal
Abstract	Federated learning is a technique that enables distributed clients to collaboratively learn a shared machine learning model while keeping their training data localized. This reduces data privacy risks, however, privacy concerns still exist since it is possible to leak information about the training dataset from the trained model’s weights or parameters. Setting up a federated learning environment, especially with security and privacy guarantees, is a time-consuming process with numerous configurations and parameters that can be manipulated. In order to help clients ensure that collaboration is feasible and to check that it improves their model accuracy, a real-world simulator for privacy-preserving and secure federated learning is required. In this paper, we introduce PrivacyFL, which is an extensible, easily configurable and scalable simulator for federated learning environments. Its key features include latency simulation, robustness to client departure, support for both centralized and decentralized learning, and configurable privacy and security mechanisms based on differential privacy and secure multiparty computation. In this paper, we motivate our research, describe the architecture of the simulator and associated protocols, and discuss its evaluation in numerous scenarios that highlight its wide range of functionality and its advantages. Our paper addresses a significant real-world problem: checking the feasibility of participating in a federated learning environment under a variety of circumstances. It also has a strong practical impact because organizations such as hospitals, banks, and research institutes, which have large amounts of sensitive data and would like to collaborate, would greatly benefit from having a system that enables them to do so in a privacy-preserving and secure manner.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08423v1
PDF	https://arxiv.org/pdf/2002.08423v1.pdf
PWC	https://paperswithcode.com/paper/privacyfl-a-simulator-for-privacy-preserving
Repo	https://github.com/vaikkunth/PrivacyFL
Framework	none

Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation Protocol


Title	Rethinking Online Action Detection in Untrimmed Videos: A Novel Online Evaluation Protocol
Authors	Marcos Baptista Rios, Roberto J. López-Sastre, Fabian Caba Heilbron, Jan van Gemert, F. Javier Acevedo-Rodríguez, S. Maldonado-Bascón
Abstract	The Online Action Detection (OAD) problem needs to be revisited. Unlike traditional offline action detection approaches, where the evaluation metrics are clear and well established, in the OAD setting we find very few works and no consensus on the evaluation protocols to be used. In this work we propose to rethink the OAD scenario, clearly defining the problem itself and the main characteristics that the models which are considered online must comply with. We also introduce a novel metric: the Instantaneous Accuracy ($IA$). This new metric exhibits an \emph{online} nature and solves most of the limitations of the previous metrics. We conduct a thorough experimental evaluation on 3 challenging datasets, where the performance of various baseline methods is compared to that of the state-of-the-art. Our results confirm the problems of the previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario. The baselines models and a development kit with the novel evaluation protocol are publicly available: https://github.com/gramuah/ia.
Tasks	Action Detection
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12041v1
PDF	https://arxiv.org/pdf/2003.12041v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-online-action-detection-in
Repo	https://github.com/gramuah/ia
Framework	none

A Machine Learning alternative to placebo-controlled clinical trials upon new diseases: A primer


Title	A Machine Learning alternative to placebo-controlled clinical trials upon new diseases: A primer
Authors	Ezequiel Alvarez, Federico Lamagna, Manuel Szewc
Abstract	The appearance of a new dangerous and contagious disease requires the development of a drug therapy faster than what is foreseen by usual mechanisms. Many drug therapy developments consist in investigating through different clinical trials the effects of different specific drug combinations by delivering it into a test group of ill patients, meanwhile a placebo treatment is delivered to the remaining ill patients, known as the control group. We compare the above technique to a new technique in which all patients receive a different and reasonable combination of drugs and use this outcome to feed a Neural Network. By averaging out fluctuations and recognizing different patient features, the Neural Network learns the pattern that connects the patients initial state to the outcome of the treatments and therefore can predict the best drug therapy better than the above method. In contrast to many available works, we do not study any detail of drugs composition nor interaction, but instead pose and solve the problem from a phenomenological point of view, which allows us to compare both methods. Although the conclusion is reached through mathematical modeling and is stable upon any reasonable model, this is a proof-of-concept that should be studied within other expertises before confronting a real scenario. All calculations, tools and scripts have been made open source for the community to test, modify or expand it. Finally it should be mentioned that, although the results presented here are in the context of a new disease in medical sciences, these are useful for any field that requires a experimental technique with a control group.
Tasks
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12454v1
PDF	https://arxiv.org/pdf/2003.12454v1.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-alternative-to-placebo
Repo	https://github.com/ManuelSzewc/ML4DT
Framework	none

DSGN: Deep Stereo Geometry Network for 3D Object Detection


Title	DSGN: Deep Stereo Geometry Network for 3D Object Detection
Authors	Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia
Abstract	Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors and there remains a large gap in terms of performance between image-based and LiDAR-based methods, caused by inappropriate representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), reduces this gap significantly by detecting 3D objects on a differentiable volumetric representation – 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with a few LiDAR-based methods on the KITTI 3D object detection leaderboard. Code will be made available at https://github.com/chenyilun95/DSGN.
Tasks	3D Object Detection, Object Detection
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03398v2
PDF	https://arxiv.org/pdf/2001.03398v2.pdf
PWC	https://paperswithcode.com/paper/dsgn-deep-stereo-geometry-network-for-3d
Repo	https://github.com/chenyilun95/DSGN
Framework	none

Fast Neural Network Adaptation via Parameter Remapping and Architecture Search


Title	Fast Neural Network Adaptation via Parameter Remapping and Architecture Search
Authors	Jiemin Fang, Yuzhu Sun, Kangjian Peng, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang
Abstract	Deep neural networks achieve remarkable performance in many computer vision tasks. Most state-of-the-art (SOTA) semantic segmentation and object detection approaches reuse neural network architectures designed for image classification as the backbone, commonly pre-trained on ImageNet. However, performance gains can be achieved by designing network architectures specifically for detection and segmentation, as shown by recent neural architecture search (NAS) research for detection and segmentation. One major challenge though, is that ImageNet pre-training of the search space representation (a.k.a. super network) or the searched networks incurs huge computational cost. In this paper, we propose a Fast Neural Network Adaptation (FNA) method, which can adapt both the architecture and parameters of a seed network (e.g. a high performing manually designed backbone) to become a network with different depth, width, or kernels via a Parameter Remapping technique, making it possible to utilize NAS for detection/segmentation tasks a lot more efficiently. In our experiments, we conduct FNA on MobileNetV2 to obtain new networks for both segmentation and detection that clearly out-perform existing networks designed both manually and by NAS. The total computation cost of FNA is significantly less than SOTA segmentation/detection NAS approaches: 1737$\times$ less than DPC, 6.8$\times$ less than Auto-DeepLab and 7.4$\times$ less than DetNAS. The code is available at https://github.com/JaminFong/FNA.
Tasks	Image Classification, Neural Architecture Search, Object Detection, Semantic Segmentation
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02525v1
PDF	https://arxiv.org/pdf/2001.02525v1.pdf
PWC	https://paperswithcode.com/paper/fast-neural-network-adaptation-via-parameter
Repo	https://github.com/JaminFong/FNA
Framework	pytorch

Video Object Grounding using Semantic Roles in Language Description


Title	Video Object Grounding using Semantic Roles in Language Description
Authors	Arka Sadhu, Kan Chen, Ram Nevatia
Abstract	We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions. Previous methods apply image grounding based algorithms to address VOG, fail to explore the object relation information and suffer from limited generalization. Here, we investigate the role of object relations in VOG and propose a novel framework VOGNet to encode multi-modal object relations via self-attention with relative position encoding. To evaluate VOGNet, we propose novel contrasting sampling methods to generate more challenging grounding input samples, and construct a new dataset called ActivityNet-SRL (ASRL) based on existing caption and grounding datasets. Experiments on ASRL validate the need of encoding object relations in VOG, and our VOGNet outperforms competitive baselines by a significant margin.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10606v1
PDF	https://arxiv.org/pdf/2003.10606v1.pdf
PWC	https://paperswithcode.com/paper/video-object-grounding-using-semantic-roles
Repo	https://github.com/TheShadow29/vognet-pytorch
Framework	pytorch

An empirical investigation of the challenges of real-world reinforcement learning


Title	An empirical investigation of the challenges of real-world reinforcement learning
Authors	Gabriel Dulac-Arnold, Nir Levine, Daniel J. Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, Todd Hester
Abstract	Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.
Tasks	Continuous Control
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11881v1
PDF	https://arxiv.org/pdf/2003.11881v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-investigation-of-the-challenges
Repo	https://github.com/google-research/realworldrl_suite
Framework	tf

SuperMix: Supervising the Mixing Data Augmentation


Title	SuperMix: Supervising the Mixing Data Augmentation
Authors	Ali Dabouei, Sobhan Soleymani, Fariborz Taherkhani, Nasser M. Nasrabadi
Abstract	In this paper, we propose a supervised mixing augmentation method, termed SuperMix, which exploits the knowledge of a teacher to mix images based on their salient regions. SuperMix optimizes a mixing objective that considers: i) forcing the class of input images to appear in the mixed image, ii) preserving the local structure of images, and iii) reducing the risk of suppressing important features. To make the mixing suitable for large-scale applications, we develop an optimization technique, $65\times$ faster than gradient descent on the same problem. We validate the effectiveness of SuperMix through extensive evaluations and ablation studies on two tasks of object classification and knowledge distillation. On the classification task, SuperMix provides the same performance as the advanced augmentation methods, such as AutoAugment. On the distillation task, SuperMix sets a new state-of-the-art with a significantly simplified distillation method. Particularly, in six out of eight teacher-student setups from the same architectures, the students trained on the mixed data surpass their teachers with a notable margin.
Tasks	Data Augmentation, Object Classification
Published	2020-03-10
URL	https://arxiv.org/abs/2003.05034v1
PDF	https://arxiv.org/pdf/2003.05034v1.pdf
PWC	https://paperswithcode.com/paper/supermix-supervising-the-mixing-data
Repo	https://github.com/alldbi/SuperMix
Framework	pytorch

DYSAN: Dynamically sanitizing motion sensor data against sensitive inferences through adversarial networks


Title	DYSAN: Dynamically sanitizing motion sensor data against sensitive inferences through adversarial networks
Authors	Antoine Boutet, Carole Frindel, Sébastien Gambs, Théo Jourdan, Claude Rosin Ngueveu
Abstract	With the widespread adoption of the quantified self movement, an increasing number of users rely on mobile applications to monitor their physical activity through their smartphones. Granting to applications a direct access to sensor data expose users to privacy risks. Indeed, usually these motion sensor data are transmitted to analytics applications hosted on the cloud leveraging machine learning models to provide feedback on their health to users. However, nothing prevents the service provider to infer private and sensitive information about a user such as health or demographic attributes.In this paper, we present DySan, a privacy-preserving framework to sanitize motion sensor data against unwanted sensitive inferences (i.e., improving privacy) while limiting the loss of accuracy on the physical activity monitoring (i.e., maintaining data utility). To ensure a good trade-off between utility and privacy, DySan leverages on the framework of Generative Adversarial Network (GAN) to sanitize the sensor data. More precisely, by learning in a competitive manner several networks, DySan is able to build models that sanitize motion data against inferences on a specified sensitive attribute (e.g., gender) while maintaining a high accuracy on activity recognition. In addition, DySan dynamically selects the sanitizing model which maximize the privacy according to the incoming data. Experiments conducted on real datasets demonstrate that DySan can drasticallylimit the gender inference to 47% while only reducing the accuracy of activity recognition by 3%.
Tasks	Activity Recognition
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10325v1
PDF	https://arxiv.org/pdf/2003.10325v1.pdf
PWC	https://paperswithcode.com/paper/dysan-dynamically-sanitizing-motion-sensor
Repo	https://github.com/DynamicSanitizer/DySan
Framework	pytorch


Title	Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency
Authors	Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, Ralph Ewerth
Abstract	The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention. Photo content can range from decorative, depict additional important information, or can even contain misleading information. Therefore, automatic approaches to quantify cross-modal consistency of entity representation can support human assessors to evaluate the overall multimodal message, for instance, with regard to bias or sentiment. In some cases such measures could give hints to detect fake news, which is an increasingly important topic in today’s society. In this paper, we introduce a novel task of cross-modal consistency verification in real-world news and present a multimodal approach to quantify the entity coherence between image and text. Named entity linking is applied to extract persons, locations, and events from news texts. Several measures are suggested to calculate cross-modal similarity for these entities using state of the art approaches. In contrast to previous work, our system automatically gathers example data from the Web and is applicable to real-world news. Results on two novel datasets that cover different languages, topics, and domains demonstrate the feasibility of our approach. Datasets and code are publicly available to foster research towards this new direction.
Tasks	Entity Linking
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10421v1
PDF	https://arxiv.org/pdf/2003.10421v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-analytics-for-real-world-news
Repo	https://github.com/TIBHannover/cross-modal_entity_consistency
Framework	none

Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection


Title	Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection
Authors	Yuliang Guo, Guang Chen, Peitao Zhao, Weide Zhang, Jinghao Miao, Jingao Wang, Tae Eun Choe
Abstract	We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. The method, inspired by the latest state-of-the-art 3D-LaneNet, is a unified framework solving image encoding, spatial transform of features and 3D lane prediction in a single network. However, we propose unique designs for Gen-LaneNet in two folds. First, we introduce a new geometry-guided lane anchor representation in a new coordinate frame and apply a specific geometric transformation to directly calculate real 3D lane points from the network output. We demonstrate that aligning the lane points with the underlying top-view features in the new coordinate frame is critical towards a generalized method in handling unfamiliar scenes. Second, we present a scalable two-stage framework that decouples the learning of image segmentation subnetwork and geometry encoding subnetwork. Compared to 3D-LaneNet, the proposed Gen-LaneNet drastically reduces the amount of 3D lane labels required to achieve a robust solution in real-world application. Moreover, we release a new synthetic dataset and its construction strategy to encourage the development and evaluation of 3D lane detection methods. In experiments, we conduct extensive ablation study to substantiate the proposed Gen-LaneNet significantly outperforms 3D-LaneNet in average precision(AP) and F-score.
Tasks	Lane Detection, Semantic Segmentation
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10656v1
PDF	https://arxiv.org/pdf/2003.10656v1.pdf
PWC	https://paperswithcode.com/paper/gen-lanenet-a-generalized-and-scalable
Repo	https://github.com/yuliangguo/3D_Lane_Synthetic_Dataset
Framework	none

Differential Evolution with Reversible Linear Transformations


Title	Differential Evolution with Reversible Linear Transformations
Authors	Jakub M. Tomczak, Ewelina Weglarz-Tomczak, Agoston E. Eiben
Abstract	Differential evolution (DE) is a well-known type of evolutionary algorithms (EA). Similarly to other EA variants it can suffer from small populations and loose diversity too quickly. This paper presents a new approach to mitigate this issue: We propose to generate new candidate solutions by utilizing reversible linear transformation applied to a triplet of solutions from the population. In other words, the population is enlarged by using newly generated individuals without evaluating their fitness. We assess our methods on three problems: (i) benchmark function optimization, (ii) discovering parameter values of the gene repressilator system, (iii) learning neural networks. The empirical results indicate that the proposed approach outperforms vanilla DE and a version of DE with applying differential mutation three times on all testbeds.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02869v1
PDF	https://arxiv.org/pdf/2002.02869v1.pdf
PWC	https://paperswithcode.com/paper/differential-evolution-with-reversible-linear
Repo	https://github.com/jmtomczak/reversible-de
Framework	none

Fashion Landmark Detection and Category Classification for Robotics


Title	Fashion Landmark Detection and Category Classification for Robotics
Authors	Thomas Ziegler, Judith Butepage, Michael C. Welle, Anastasiia Varava, Tonci Novkovic, Danica Kragic
Abstract	Research on automated, image based identification of clothing categories and fashion landmarks has recently gained significant interest due to its potential impact on areas such as robotic clothing manipulation, automated clothes sorting and recycling, and online shopping. Several public and annotated fashion datasets have been created to facilitate research advances in this direction. In this work, we make the first step towards leveraging the data and techniques developed for fashion image analysis in vision-based robotic clothing manipulation tasks. We focus on techniques that can generalize from large-scale fashion datasets to less structured, small datasets collected in a robotic lab. Specifically, we propose training data augmentation methods such as elastic warping, and model adjustments such as rotation invariant convolutions to make the model generalize better. Our experiments demonstrate that our approach outperforms stateof-the art models with respect to clothing category classification and fashion landmark detection when tested on previously unseen datasets. Furthermore, we present experimental results on a new dataset composed of images where a robot holds different garments, collected in our lab.
Tasks	Data Augmentation
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11827v1
PDF	https://arxiv.org/pdf/2003.11827v1.pdf
PWC	https://paperswithcode.com/paper/fashion-landmark-detection-and-category
Repo	https://github.com/ThomasZiegler/Fashion_Landmark_Detection_and_Category_Classification
Framework	pytorch

On-the-Fly Adaptation of Source Code Models using Meta-Learning


Title	On-the-Fly Adaptation of Source Code Models using Meta-Learning
Authors	Disha Shrivastava, Hugo Larochelle, Daniel Tarlow
Abstract	The ability to adapt to unseen, local contexts is an important challenge that successful models of source code must overcome. One of the most popular approaches for the adaptation of such models is dynamic evaluation. With dynamic evaluation, when running a model on an unseen file, the model is updated immediately after having observed each token in that file. In this work, we propose instead to frame the problem of context adaptation as a meta-learning problem. We aim to train a base source code model that is best able to learn from information in a file to deliver improved predictions of missing tokens. Unlike dynamic evaluation, this formulation allows us to select more targeted information (support tokens) for adaptation, that is both before and after a target hole in a file. We consider an evaluation setting that we call line-level maintenance, designed to reflect the downstream task of code auto-completion in an IDE. Leveraging recent developments in meta-learning such as first-order MAML and Reptile, we demonstrate improved performance in experiments on a large scale Java GitHub corpus, compared to other adaptation baselines including dynamic evaluation. Moreover, our analysis shows that, compared to a non-adaptive baseline, our approach improves performance on identifiers and literals by 44% and 15%, respectively. Our implementation can be found at: https://github.com/shrivastavadisha/meta_learn_source_code
Tasks	Meta-Learning
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11768v1
PDF	https://arxiv.org/pdf/2003.11768v1.pdf
PWC	https://paperswithcode.com/paper/on-the-fly-adaptation-of-source-code-models
Repo	https://github.com/shrivastavadisha/meta_learn_source_code
Framework	tf