April 3, 2020

3534 words 17 mins read

Paper Group AWR 30

DSNAS: Direct Neural Architecture Search without Parameter Retraining. Two-Sample Testing for Event Impacts in Time Series. Discovering Mathematical Objects of Interest – A Study of Mathematical Notations. Semantic Pyramid for Image Generation. A Framework for Evaluation of Machine Reading Comprehension Gold Standards. Fast and Three-rious: Speedi …

DSNAS: Direct Neural Architecture Search without Parameter Retraining


Title	DSNAS: Direct Neural Architecture Search without Parameter Retraining
Authors	Shoukang Hu, Sirui Xie, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin
Abstract	If NAS methods are solutions, what is the problem? Most existing NAS methods require two-stage parameter optimization. However, performance of the same architecture in the two stages correlates poorly. In this work, we propose a new problem definition for NAS, task-specific end-to-end, based on this observation. We argue that given a computer vision task for which a NAS method is expected, this definition can reduce the vaguely-defined NAS evaluation to i) accuracy of this task and ii) the total computation consumed to finally obtain a model with satisfying accuracy. Seeing that most existing methods do not solve this problem directly, we propose DSNAS, an efficient differentiable NAS framework that simultaneously optimizes architecture and parameters with a low-biased Monte Carlo estimate. Child networks derived from DSNAS can be deployed directly without parameter retraining. Comparing with two-stage methods, DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%. Our implementation is available at https://github.com/SNAS-Series/SNAS-Series.
Tasks	Neural Architecture Search
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09128v2
PDF	https://arxiv.org/pdf/2002.09128v2.pdf
PWC	https://paperswithcode.com/paper/dsnas-direct-neural-architecture-search
Repo	https://github.com/SNAS-Series/SNAS-Series
Framework	pytorch

Two-Sample Testing for Event Impacts in Time Series


Title	Two-Sample Testing for Event Impacts in Time Series
Authors	Erik Scharwächter, Emmanuel Müller
Abstract	In many application domains, time series are monitored to detect extreme events like technical faults, natural disasters, or disease outbreaks. Unfortunately, it is often non-trivial to select both a time series that is informative about events and a powerful detection algorithm: detection may fail because the detection algorithm is not suitable, or because there is no shared information between the time series and the events of interest. In this work, we thus propose a non-parametric statistical test for shared information between a time series and a series of observed events. Our test allows identifying time series that carry information on event occurrences without committing to a specific event detection methodology. In a nutshell, we test for divergences of the value distributions of the time series at increasing lags after event occurrences with a multiple two-sample testing approach. In contrast to related tests, our approach is applicable for time series over arbitrary domains, including multivariate numeric, strings or graphs. We perform a large-scale simulation study to show that it outperforms or is on par with related tests on our task for univariate time series. We also demonstrate the real-world applicability of our approach on datasets from social media and smart home environments.
Tasks	Time Series
Published	2020-01-31
URL	https://arxiv.org/abs/2001.11930v1
PDF	https://arxiv.org/pdf/2001.11930v1.pdf
PWC	https://paperswithcode.com/paper/two-sample-testing-for-event-impacts-in-time
Repo	https://github.com/diozaka/eitest
Framework	none

Discovering Mathematical Objects of Interest – A Study of Mathematical Notations


Title	Discovering Mathematical Objects of Interest – A Study of Mathematical Notations
Authors	Andre Greiner-Petter, Moritz Schubotz, Fabian Mueller, Corinna Breitinger, Howard S. Cohl, Akiko Aizawa, Bela Gipp
Abstract	Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today’s systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking $P_{n}^{(\alpha, \beta)}!\left(x\right)$ with `Jacobi polynomial’); (3) we extend zbMATH’s search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data. \|
Tasks	Information Retrieval, Recommendation Systems
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02712v2
PDF	https://arxiv.org/pdf/2002.02712v2.pdf
PWC	https://paperswithcode.com/paper/discovering-mathematical-objects-of-interest
Repo	https://github.com/ag-gipp/FormulaCloudData
Framework	tf

Semantic Pyramid for Image Generation


Title	Semantic Pyramid for Image Generation
Authors	Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel
Abstract	We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model. Inspired by classical image pyramid representations, we construct our model as a Semantic Generation Pyramid – a hierarchical framework which leverages the continuum of semantic information encapsulated in such deep features; this ranges from low level information contained in fine features to high level, semantic information contained in deeper features. More specifically, given a set of features extracted from a reference image, our model generates diverse image samples, each with matching features at each semantic level of the classification model. We demonstrate that our model results in a versatile and flexible framework that can be used in various classic and novel image generation tasks. These include: generating images with a controllable extent of semantic similarity to a reference image, and different manipulation tasks such as semantically-controlled inpainting and compositing; all achieved with the same model, with no further training.
Tasks	Image Generation, Semantic Similarity, Semantic Textual Similarity
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06221v2
PDF	https://arxiv.org/pdf/2003.06221v2.pdf
PWC	https://paperswithcode.com/paper/semantic-pyramid-for-image-generation
Repo	https://github.com/rosinality/semantic-pyramid-pytorch
Framework	pytorch

A Framework for Evaluation of Machine Reading Comprehension Gold Standards


Title	A Framework for Evaluation of Machine Reading Comprehension Gold Standards
Authors	Viktor Schlegel, Marco Valentino, André Freitas, Goran Nenadic, Riza Batista-Navarro
Abstract	Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analyse modern MRC gold standards and present our findings: the absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.
Tasks	Machine Reading Comprehension, Reading Comprehension
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04642v1
PDF	https://arxiv.org/pdf/2003.04642v1.pdf
PWC	https://paperswithcode.com/paper/a-framework-for-evaluation-of-machine-reading
Repo	https://github.com/schlevik/dataset-analysis
Framework	none

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods


Title	Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods
Authors	Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré
Abstract	Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive, scaling superlinearly in the data. In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD). We use this insight to build FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions. In particular, we prove bounds on generalization error without assuming that the latent variable model can exactly parameterize the underlying data distribution. Empirically, we validate FlyingSquid on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.
Tasks	Latent Variable Models
Published	2020-02-27
URL	https://arxiv.org/abs/2002.11955v1
PDF	https://arxiv.org/pdf/2002.11955v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-three-rious-speeding-up-weak
Repo	https://github.com/HazyResearch/flyingsquid
Framework	pytorch


Title	Privacy-Preserving Image Sharing via Sparsifying Layers on Convolutional Groups
Authors	Sohrab Ferdowsi, Behrooz Razeghi, Taras Holotyak, Flavio P. Calmon, Slava Voloshynovskiy
Abstract	We propose a practical framework to address the problem of privacy-aware image sharing in large-scale setups. We argue that, while compactness is always desired at scale, this need is more severe when trying to furthermore protect the privacy-sensitive content. We therefore encode images, such that, from one hand, representations are stored in the public domain without paying the huge cost of privacy protection, but ambiguated and hence leaking no discernible content from the images, unless a combinatorially-expensive guessing mechanism is available for the attacker. From the other hand, authorized users are provided with very compact keys that can easily be kept secure. This can be used to disambiguate and reconstruct faithfully the corresponding access-granted images. We achieve this with a convolutional autoencoder of our design, where feature maps are passed independently through sparsifying transformations, providing multiple compact codes, each responsible for reconstructing different attributes of the image. The framework is tested on a large-scale database of images with public implementation available.
Tasks
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01469v1
PDF	https://arxiv.org/pdf/2002.01469v1.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-image-sharing-via
Repo	https://github.com/sssohrab/sparsifying_groups_imAmbiguation
Framework	pytorch

BioTouchPass2: Touchscreen Password Biometrics Using Time-Aligned Recurrent Neural Networks


Title	BioTouchPass2: Touchscreen Password Biometrics Using Time-Aligned Recurrent Neural Networks
Authors	Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Javier Ortega-Garcia
Abstract	Passwords are still used on a daily basis for all kind of applications. However, they are not secure enough by themselves in many cases. This work enhances password scenarios through two-factor authentication asking the users to draw each character of the password instead of typing them as usual. The main contributions of this study are as follows: i) We present the novel MobileTouchDB public database, acquired in an unsupervised mobile scenario with no restrictions in terms of position, posture, and devices. This database contains more than 64K on-line character samples performed by 217 users, with 94 different smartphone models, and up to 6 acquisition sessions. ii) We perform a complete analysis of the proposed approach considering both traditional authentication systems such as Dynamic Time Warping (DTW) and novel approaches based on Recurrent Neural Networks (RNNs). In addition, we present a novel approach named Time-Aligned Recurrent Neural Networks (TA-RNNs). This approach combines the potential of DTW and RNNs to train more robust systems against attacks. A complete analysis of the proposed approach is carried out using both MobileTouchDB and e-BioDigitDB databases. Our proposed TA-RNN system outperforms the state of the art, achieving a final 2.38% Equal Error Rate, using just a 4-digit password and one training sample per character. These results encourage the deployment of our proposed approach in comparison with traditional typed-based password systems where the attack would have 100% success rate under the same impostor scenario.
Tasks
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10223v1
PDF	https://arxiv.org/pdf/2001.10223v1.pdf
PWC	https://paperswithcode.com/paper/biotouchpass2-touchscreen-password-biometrics
Repo	https://github.com/BiDAlab/MobileTouchDB
Framework	none


Title	$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild
Authors	Yuan-Hang Zhang, Rulin Huang, Jiabei Zeng, Shiguang Shan, Xilin Chen
Abstract	This report describes a multi-modal multi-task ($M^3$T) approach underlying our submission to the valence-arousal estimation track of the Affective Behavior Analysis in-the-wild (ABAW) Challenge, held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020. In the proposed $M^3$T framework, we fuse both visual features from videos and acoustic features from the audio tracks to estimate the valence and arousal. The spatio-temporal visual features are extracted with a 3D convolutional network and a bidirectional recurrent neural network. Considering the correlations between valence / arousal, emotions, and facial actions, we also explores mechanisms to benefit from other tasks. We evaluated the $M^3$T framework on the validation set provided by ABAW and it significantly outperforms the baseline method.
Tasks	Gesture Recognition
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02957v1
PDF	https://arxiv.org/pdf/2002.02957v1.pdf
PWC	https://paperswithcode.com/paper/m3t-multi-modal-continuous-valence-arousal
Repo	https://github.com/sailordiary/m3t.pytorch
Framework	pytorch

Guessing State Tracking for Visual Dialogue


Title	Guessing State Tracking for Visual Dialogue
Authors	Wei Pang, Xiaojie Wang
Abstract	The Guesser plays an important role in GuessWhat?! like visual dialogues. It locates the target object in an image supposed by an oracle oneself over a question-answer based dialogue between a Questioner and the Oracle. Most existing guessers make one and only one guess after receiving all question-answer pairs in a dialogue with predefined number of rounds. This paper proposes the guessing state for the guesser, and regards guess as a process with change of guessing state through a dialogue. A guessing state tracking based guess model is therefore proposed. The guessing state is defined as a distribution on candidate objects in the image. A state update algorithm including three modules is given. UoVR updates the representation of the image according to current guessing state, QAEncoder encodes the question-answer pairs, and UoGS updates the guessing state by combining both information from the image and dialogue history. With the guessing state in hand, two loss functions are defined as supervisions for model training. Early supervision brings supervision to guesser at early rounds, and incremental supervision brings monotonicity to the guessing state. Experimental results on GuessWhat?! dataset show that our model significantly outperforms previous models, achieves new state-of-the-art, especially, the success rate of guessing 83.3% is approaching human-level performance 84.4%.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10340v2
PDF	https://arxiv.org/pdf/2002.10340v2.pdf
PWC	https://paperswithcode.com/paper/guessing-state-tracking-for-visual-dialogue
Repo	https://github.com/xubuvd/guesswhat
Framework	tf

The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives


Title	The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives
Authors	Nick Ruest, Jimmy Lin, Ian Milligan, Samantha Fritz
Abstract	The Archives Unleashed project aims to improve scholarly access to web archives through a multi-pronged strategy involving tool creation, process modeling, and community building - all proceeding concurrently in mutually-reinforcing efforts. As we near the end of our initially-conceived three-year project, we report on our progress and share lessons learned along the way. The main contribution articulated in this paper is a process model that decomposes scholarly inquiries into four main activities: filter, extract, aggregate, and visualize. Based on the insight that these activities can be disaggregated across time, space, and tools, it is possible to generate “derivative products”, using our Archives Unleashed Toolkit, that serve as useful starting points for scholarly inquiry. Scholars can download these products from the Archives Unleashed Cloud and manipulate them just like any other dataset, thus providing access to web archives without requiring any specialized knowledge. Over the past few years, our platform has processed over a thousand different collections from about two hundred users, totaling over 280 terabytes of web archives.
Tasks
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05399v1
PDF	https://arxiv.org/pdf/2001.05399v1.pdf
PWC	https://paperswithcode.com/paper/the-archives-unleashed-project-technology
Repo	https://github.com/archivesunleashed/aut
Framework	none

Integrating Deep Reinforcement Learning with Model-based Path Planners for Automated Driving


Title	Integrating Deep Reinforcement Learning with Model-based Path Planners for Automated Driving
Authors	Ekim Yurtsever, Linda Capito, Keith Redmill, Umit Ozguner
Abstract	Automated driving in urban settings is challenging chiefly due to the indeterministic nature of the human participants of the traffic. These behaviors are difficult to model, and conventional, rule-based Automated Driving Systems (ADSs) tend to fail when they face unmodeled dynamics. On the other hand, the more recent, end-to-end Deep Reinforcement Learning (DRL) based ADSs have shown promising results. However, pure learning-based approaches lack the hard-coded safety measures of model-based methods. Here we propose a hybrid approach that integrates a model-based path planner into a vision based DRL framework to alleviate the shortcomings of both worlds. In summary, the DRL agent learns to overrule the model-based planner’s decisions if it predicts that better future rewards can be obtained while doing so, e.g., avoiding an accident. Otherwise, the DRL agent tends to follow the model-based planner as close as possible. This logic is learned, i.e., no switching model is designed here. The agent learns this by considering two penalties: the penalty of straying away from the model-based path planner and the penalty of having a collision. The latter has precedence over the former, i.e., the penalty is greater. Therefore, after training, the agent learns to follow the model-based planner when it is safe to do so, otherwise, it gets penalized. However, it also learns to sacrifice positive rewards for following the model-based planner to avoid a potential big negative penalty for making a collision in the future. Experimental results show that the proposed method can plan its path and navigate while avoiding obstacles between randomly chosen origin-destination points in CARLA, a dynamic urban simulation environment. Our code is open-source and available online.
Tasks
Published	2020-02-02
URL	https://arxiv.org/abs/2002.00434v1
PDF	https://arxiv.org/pdf/2002.00434v1.pdf
PWC	https://paperswithcode.com/paper/integrating-deep-reinforcement-learning-with
Repo	https://github.com/Ekim-Yurtsever/Hybrid-DeepRL-Automated-Driving
Framework	none

TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing


Title	TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Authors	Ziqing Yang, Yiming Cui, Zhipeng Chen, Wanxiang Che, Ting Liu, Shijin Wang, Guoping Hu
Abstract	In this paper, we introduce TextBrewer, an open-source knowledge distillation toolkit designed for natural language processing. It works with different neural network models and supports various kinds of tasks, such as text classification, reading comprehension, sequence labeling. TextBrewer provides a simple and uniform workflow that enables quick setup of distillation experiments with highly flexible configurations. It offers a set of predefined distillation methods and can be extended with custom code. As a case study, we use TextBrewer to distill BERT on several typical NLP tasks. With simple configuration, we achieve results that are comparable with or even higher than the state-of-the-art performance. Our toolkit is available through: http://textbrewer.hfl-rc.com
Tasks	Reading Comprehension, Text Classification
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12620v1
PDF	https://arxiv.org/pdf/2002.12620v1.pdf
PWC	https://paperswithcode.com/paper/textbrewer-an-open-source-knowledge
Repo	https://github.com/airaria/TextBrewer
Framework	pytorch

Adversarial Texture Optimization from RGB-D Scans


Title	Adversarial Texture Optimization from RGB-D Scans
Authors	Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Max Jiang, Leonidas Guibas, Matthias Nießner, Thomas Funkhouser
Abstract	Realistic color texture generation is an important step in RGB-D surface reconstruction, but remains challenging in practice due to inaccuracies in reconstructed geometry, misaligned camera poses, and view-dependent imaging artifacts. In this work, we present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views. Specifically, we propose an approach to produce photorealistic textures for approximate surfaces, even from misaligned images, by learning an objective function that is robust to these errors. The key idea of our approach is to learn a patch-based conditional discriminator which guides the texture optimization to be tolerant to misalignments. Our discriminator takes a synthesized view and a real image, and evaluates whether the synthesized one is realistic, under a broadened definition of realism. We train the discriminator by providing as `real’ examples pairs of input views and their misaligned versions – so that the learned adversarial loss will tolerate errors from the scans. Experiments on synthetic and real data under quantitative or qualitative evaluation demonstrate the advantage of our approach in comparison to state of the art. Our code is publicly available with video demonstration. \|
Tasks	Texture Synthesis
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08400v1
PDF	https://arxiv.org/pdf/2003.08400v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-texture-optimization-from-rgb-d
Repo	https://github.com/hjwdzh/AdversarialTexture
Framework	none

Benchmarking Popular Classification Models’ Robustness to Random and Targeted Corruptions


Title	Benchmarking Popular Classification Models’ Robustness to Random and Targeted Corruptions
Authors	Utkarsh Desai, Srikanth Tamilselvam, Jassimran Kaur, Senthil Mani, Shreya Khare
Abstract	Text classification models, especially neural networks based models, have reached very high accuracy on many popular benchmark datasets. Yet, such models when deployed in real world applications, tend to perform badly. The primary reason is that these models are not tested against sufficient real world natural data. Based on the application users, the vocabulary and the style of the model’s input may greatly vary. This emphasizes the need for a model agnostic test dataset, which consists of various corruptions that are natural to appear in the wild. Models trained and tested on such benchmark datasets, will be more robust against real world data. However, such data sets are not easily available. In this work, we address this problem, by extending the benchmark datasets along naturally occurring corruptions such as Spelling Errors, Text Noise and Synonyms and making them publicly available. Through extensive experiments, we compare random and targeted corruption strategies using Local Interpretable Model-Agnostic Explanations(LIME). We report the vulnerabilities in two popular text classification models along these corruptions and also find that targeted corruptions can expose vulnerabilities of a model better than random choices in most cases.
Tasks	Text Classification
Published	2020-01-31
URL	https://arxiv.org/abs/2002.00754v1
PDF	https://arxiv.org/pdf/2002.00754v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-popular-classification-models
Repo	https://github.com/constraint-solvers/benchmark-corruptions
Framework	none