February 1, 2020

3186 words 15 mins read

Paper Group AWR 94

Paper Group AWR 94

Unified Language Model Pre-training for Natural Language Understanding and Generation. Balanced Crossover Operators in Genetic Algorithms. End-to-End Learning of Visual Representations from Uncurated Instructional Videos. Nonlinear Markov Random Fields Learned via Backpropagation. A Delay Metric for Video Object Detection: What Average Precision Fa …

Unified Language Model Pre-training for Natural Language Understanding and Generation

Title Unified Language Model Pre-training for Natural Language Understanding and Generation
Authors Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
Abstract This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Moreover, UniLM achieves new state-of-the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40.51 (2.04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35.75 (0.86 absolute improvement), the CoQA generative question answering F1 score to 82.5 (37.1 absolute improvement), the SQuAD question generation BLEU-4 to 22.12 (3.75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2.67 (human performance is 2.65). The code and pre-trained models are available at https://github.com/microsoft/unilm.
Tasks Abstractive Text Summarization, Document Summarization, Language Modelling, Question Answering, Question Generation, Text Generation, Text Summarization
Published 2019-05-08
URL https://arxiv.org/abs/1905.03197v3
PDF https://arxiv.org/pdf/1905.03197v3.pdf
PWC https://paperswithcode.com/paper/unified-language-model-pre-training-for
Repo https://github.com/microsoft/unilm
Framework pytorch

Balanced Crossover Operators in Genetic Algorithms

Title Balanced Crossover Operators in Genetic Algorithms
Authors Luca Manzoni, Luca Mariot, Eva Tuba
Abstract In several combinatorial optimization problems arising in cryptography and design theory, the admissible solutions must often satisfy a balancedness constraint, such as being represented by bitstrings with a fixed number of ones. For this reason, several works in the literature tackling these optimization problems with Genetic Algorithms (GA) introduced new balanced crossover operators which ensure that the offspring has the same balancedness characteristics of the parents. However, the use of such operators has never been thoroughly motivated, except for some generic considerations about search space reduction. In this paper, we undertake a rigorous statistical investigation on the effect of balanced and unbalanced crossover operators against three optimization problems from the area of cryptography and coding theory: nonlinear balanced Boolean functions, binary Orthogonal Arrays (OA) and bent functions. In particular, we consider three different balanced crossover operators (each with two variants: “left-to-right” and “shuffled”), two of which have never been published before, and compare their performances with classic one-point crossover. We are able to confirm that the balanced crossover operators performs better than all three balanced crossover operators. Furthermore, in two out of three crossovers, the “left-to-right” version performs better than the “shuffled” version.
Tasks Combinatorial Optimization
Published 2019-04-23
URL https://arxiv.org/abs/1904.10494v2
PDF https://arxiv.org/pdf/1904.10494v2.pdf
PWC https://paperswithcode.com/paper/balanced-crossover-operators-in-genetic
Repo https://github.com/rymoah/BalancedCrossoverGA
Framework none

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

Title End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Authors Antoine Miech, Jean-Baptiste Alayrac, Lucas Smaira, Ivan Laptev, Josef Sivic, Andrew Zisserman
Abstract Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video models still rely on manually annotated data. With the recent introduction of the HowTo100M dataset, narrated videos now offer the possibility of learning video representations without manual supervision. In this work we propose a new learning approach, MIL-NCE, capable of addressing misalignments inherent to narrated videos. With this approach we are able to learn strong video representations from scratch, without the need for any manual annotation. We evaluate our representations on a wide range of four downstream tasks over eight datasets: action recognition (HMDB-51, UCF-101, Kinetics-700), text-to-video retrieval (YouCook2, MSR-VTT), action localization (YouTube-8M Segments, CrossTask) and action segmentation (COIN). Our method outperforms all published self-supervised approaches for these tasks as well as several fully supervised baselines.
Tasks Action Localization, action segmentation, Video Retrieval
Published 2019-12-13
URL https://arxiv.org/abs/1912.06430v2
PDF https://arxiv.org/pdf/1912.06430v2.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-of-visual-representations
Repo https://github.com/antoine77340/S3D_HowTo100M
Framework pytorch

Nonlinear Markov Random Fields Learned via Backpropagation

Title Nonlinear Markov Random Fields Learned via Backpropagation
Authors Mikael Brudfors, Yaël Balbastre, John Ashburner
Abstract Although convolutional neural networks (CNNs) currently dominate competitions on image segmentation, for neuroimaging analysis tasks, more classical generative approaches based on mixture models are still used in practice to parcellate brains. To bridge the gap between the two, in this paper we propose a marriage between a probabilistic generative model, which has been shown to be robust to variability among magnetic resonance (MR) images acquired via different imaging protocols, and a CNN. The link is in the prior distribution over the unknown tissue classes, which are classically modelled using a Markov random field. In this work we model the interactions among neighbouring pixels by a type of recurrent CNN, which can encode more complex spatial interactions. We validate our proposed model on publicly available MR data, from different centres, and show that it generalises across imaging protocols. This result demonstrates a successful and principled inclusion of a CNN in a generative model, which in turn could be adapted by any probabilistic generative approach for image segmentation.
Tasks Semantic Segmentation
Published 2019-02-27
URL http://arxiv.org/abs/1902.10747v2
PDF http://arxiv.org/pdf/1902.10747v2.pdf
PWC https://paperswithcode.com/paper/nonlinear-markov-random-fields-learned-via
Repo https://github.com/WCHN/Label-Training
Framework none

A Delay Metric for Video Object Detection: What Average Precision Fails to Tell

Title A Delay Metric for Video Object Detection: What Average Precision Fails to Tell
Authors Huizi Mao, Xiaodong Yang, William J. Dally
Abstract Average precision (AP) is a widely used metric to evaluate detection accuracy of image and video object detectors. In this paper, we analyze object detection from videos and point out that AP alone is not sufficient to capture the temporal nature of video object detection. To tackle this problem, we propose a comprehensive metric, average delay (AD), to measure and compare detection delay. To facilitate delay evaluation, we carefully select a subset of ImageNet VID, which we name as ImageNet VIDT with an emphasis on complex trajectories. By extensively evaluating a wide range of detectors on VIDT, we show that most methods drastically increase the detection delay but still preserve AP well. In other words, AP is not sensitive enough to reflect the temporal characteristics of a video object detector. Our results suggest that video object detection methods should be additionally evaluated with a delay metric, particularly for latency-critical applications such as autonomous vehicle perception.
Tasks Object Detection, Video Object Detection
Published 2019-08-18
URL https://arxiv.org/abs/1908.06368v2
PDF https://arxiv.org/pdf/1908.06368v2.pdf
PWC https://paperswithcode.com/paper/a-delay-metric-for-video-object-detection
Repo https://github.com/RalphMao/VMetrics
Framework none

Free-Lunch Saliency via Attention in Atari Agents

Title Free-Lunch Saliency via Attention in Atari Agents
Authors Dmitry Nikulin, Anastasia Ianina, Vladimir Aliev, Sergey Nikolenko
Abstract We propose a new approach to visualize saliency maps for deep neural network models and apply it to deep reinforcement learning agents trained on Atari environments. Our method adds an attention module that we call FLS (Free Lunch Saliency) to the feature extractor from an established baseline (Mnih et al., 2015). This addition results in a trainable model that can produce saliency maps, i.e., visualizations of the importance of different parts of the input for the agent’s current decision making. We show experimentally that a network with an FLS module exhibits performance similar to the baseline (i.e., it is “free”, with no performance cost) and can be used as a drop-in replacement for reinforcement learning agents. We also design another feature extractor that scores slightly lower but provides higher-fidelity visualizations. In addition to attained scores, we report saliency metrics evaluated on the Atari-HEAD dataset of human gameplay.
Tasks Decision Making
Published 2019-08-07
URL https://arxiv.org/abs/1908.02511v2
PDF https://arxiv.org/pdf/1908.02511v2.pdf
PWC https://paperswithcode.com/paper/free-lunch-saliency-via-attention-in-atari
Repo https://github.com/dniku/free-lunch-saliency
Framework tf

Efficient computation of counterfactual explanations of LVQ models

Title Efficient computation of counterfactual explanations of LVQ models
Authors André Artelt, Barbara Hammer
Abstract The increasing use of machine learning in practice and legal regulations like EU’s GDPR cause the necessity to be able to explain the prediction and behavior of machine learning models. A prominent example of particularly intuitive explanations of AI models in the context of decision making are counterfactual explanations. Yet, it is still an open research problem how to efficiently compute counterfactual explanations for many models. We investigate how to efficiently compute counterfactual explanations for an important class of models, prototype-based classifiers such as learning vector quantization models. In particular, we derive specific convex and non-convex programs depending on the used metric.
Tasks Decision Making, Quantization
Published 2019-08-02
URL https://arxiv.org/abs/1908.00735v2
PDF https://arxiv.org/pdf/1908.00735v2.pdf
PWC https://paperswithcode.com/paper/efficient-computation-of-counterfactual
Repo https://github.com/andreArtelt/efficient_computation_counterfactuals_lvq
Framework none

Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims

Title Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims
Authors Sihao Chen, Daniel Khashabi, Wenpeng Yin, Chris Callison-Burch, Dan Roth
Abstract One key consequence of the information revolution is a significant increase and a contamination of our information supply. The practice of fact checking won’t suffice to eliminate the biases in text data we observe, as the degree of factuality alone does not determine whether biases exist in the spectrum of opinions visible to us. To better understand controversial issues, one needs to view them from a diverse yet comprehensive set of perspectives. For example, there are many ways to respond to a claim such as “animals should have lawful rights”, and these responses form a spectrum of perspectives, each with a stance relative to this claim and, ideally, with evidence supporting it. Inherently, this is a natural language understanding task, and we propose to address it as such. Specifically, we propose the task of substantiated perspective discovery where, given a claim, a system is expected to discover a diverse set of well-corroborated perspectives that take a stance with respect to the claim. Each perspective should be substantiated by evidence paragraphs which summarize pertinent results and facts. We construct PERSPECTRUM, a dataset of claims, perspectives and evidence, making use of online debate websites to create the initial data collection, and augmenting it using search engines in order to expand and diversify our dataset. We use crowd-sourcing to filter out noise and ensure high-quality data. Our dataset contains 1k claims, accompanied with pools of 10k and 8k perspective sentences and evidence paragraphs, respectively. We provide a thorough analysis of the dataset to highlight key underlying language understanding challenges, and show that human baselines across multiple subtasks far outperform ma-chine baselines built upon state-of-the-art NLP techniques. This poses a challenge and opportunity for the NLP community to address.
Tasks
Published 2019-06-08
URL https://arxiv.org/abs/1906.03538v1
PDF https://arxiv.org/pdf/1906.03538v1.pdf
PWC https://paperswithcode.com/paper/seeing-things-from-a-different-angle
Repo https://github.com/CogComp/perspectrum
Framework none

Visualization of Emergency Department Clinical Data for Interpretable Patient Phenotyping

Title Visualization of Emergency Department Clinical Data for Interpretable Patient Phenotyping
Authors Nathan C. Hurley, Adrian D. Haimovich, R. Andrew Taylor, Bobak J. Mortazavi
Abstract Visual summarization of clinical data collected on patients contained within the electronic health record (EHR) may enable precise and rapid triage at the time of patient presentation to an emergency department (ED). The triage process is critical in the appropriate allocation of resources and in anticipating eventual patient disposition, typically admission to the hospital or discharge home. EHR data are high-dimensional and complex, but offer the opportunity to discover and characterize underlying data-driven patient phenotypes. These phenotypes will enable improved, personalized therapeutic decision making and prognostication. In this work, we focus on the challenge of two-dimensional patient projections. A low dimensional embedding offers visual interpretability lost in higher dimensions. While linear dimensionality reduction techniques such as principal component analysis are often used towards this aim, they are insufficient to describe the variance of patient data. In this work, we employ the newly-described non-linear embedding technique called uniform manifold approximation and projection (UMAP). UMAP seeks to capture both local and global structures in high-dimensional data. We then use Gaussian mixture models to identify clusters in the embedded data and use the adjusted Rand index (ARI) to establish stability in the discovery of these clusters. This technique is applied to five common clinical chief complaints from a real-world ED EHR dataset, describing the emergent properties of discovered clusters. We observe clinically-relevant cluster attributes, suggesting that visual embeddings of EHR data using non-linear dimensionality reduction is a promising approach to reveal data-driven patient phenotypes. In the five chief complaints, we find between 2 and 6 clusters, with the peak mean pairwise ARI between subsequent training iterations to range from 0.35 to 0.74.
Tasks Decision Making, Dimensionality Reduction
Published 2019-07-05
URL https://arxiv.org/abs/1907.11039v1
PDF https://arxiv.org/pdf/1907.11039v1.pdf
PWC https://paperswithcode.com/paper/visualization-of-emergency-department
Repo https://github.com/nch08a/EDVizPhenotyping
Framework none

Cooperation-Aware Reinforcement Learning for Merging in Dense Traffic

Title Cooperation-Aware Reinforcement Learning for Merging in Dense Traffic
Authors Maxime Bouton, Alireza Nakhaei, Kikuo Fujimura, Mykel J. Kochenderfer
Abstract Decision making in dense traffic can be challenging for autonomous vehicles. An autonomous system only relying on predefined road priorities and considering other drivers as moving objects will cause the vehicle to freeze and fail the maneuver. Human drivers leverage the cooperation of other drivers to avoid such deadlock situations and convince others to change their behavior. Decision making algorithms must reason about the interaction with other drivers and anticipate a broad range of driver behaviors. In this work, we present a reinforcement learning approach to learn how to interact with drivers with different cooperation levels. We enhanced the performance of traditional reinforcement learning algorithms by maintaining a belief over the level of cooperation of other drivers. We show that our agent successfully learns how to navigate a dense merging scenario with less deadlocks than with online planning methods.
Tasks Autonomous Vehicles, Decision Making
Published 2019-06-26
URL https://arxiv.org/abs/1906.11021v1
PDF https://arxiv.org/pdf/1906.11021v1.pdf
PWC https://paperswithcode.com/paper/cooperation-aware-reinforcement-learning-for
Repo https://github.com/sisl/AutonomousMerging.jl
Framework none

Bi-Directional Cascade Network for Perceptual Edge Detection

Title Bi-Directional Cascade Network for Perceptual Edge Detection
Authors Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang
Abstract Exploiting multi-scale representations is critical to improve edge detection for objects at different scales. To extract edges at dramatically different scales, we propose a Bi-Directional Cascade Network (BDCN) structure, where an individual layer is supervised by labeled edges at its specific scale, rather than directly applying the same supervision to all CNN outputs. Furthermore, to enrich multi-scale representations learned by BDCN, we introduce a Scale Enhancement Module (SEM) which utilizes dilated convolution to generate multi-scale features, instead of using deeper CNNs or explicitly fusing multi-scale edge maps. These new approaches encourage the learning of multi-scale representations in different layers and detect edges that are well delineated by their scales. Learning scale dedicated layers also results in compact network with a fraction of parameters. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and Multicue, and achieve ODS Fmeasure of 0.828, 1.3% higher than current state-of-the art on BSDS500. The code has been available at https://github.com/pkuCactus/BDCN.
Tasks Edge Detection
Published 2019-02-28
URL http://arxiv.org/abs/1902.10903v1
PDF http://arxiv.org/pdf/1902.10903v1.pdf
PWC https://paperswithcode.com/paper/bi-directional-cascade-network-for-perceptual
Repo https://github.com/pkuCactus/BDCN
Framework pytorch

CA-EHN

Title CA-EHN
Authors Peng-Hsuan Li, Tsan-Yu Yang, Wei-Yun Ma
Abstract The title and abstract are abridged to prevent a direct search breaking blind review.
Tasks
Published 2019-08-20
URL https://arxiv.org/abs/1908.07218v3
PDF https://arxiv.org/pdf/1908.07218v3.pdf
PWC https://paperswithcode.com/paper/ca-ehn-commonsense-word-analogy-from-e-hownet
Repo https://github.com/ckiplab/CA-EHN
Framework none

catch22: CAnonical Time-series CHaracteristics

Title catch22: CAnonical Time-series CHaracteristics
Authors Carl H Lubba, Sarab S Sethi, Philip Knaute, Simon R Schultz, Ben D Fulcher, Nick S Jones
Abstract Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a generically useful set of 22 CAnonical Time-series CHaracteristics, catch22. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.
Tasks Dimensionality Reduction, Time Series, Time Series Analysis, Time Series Classification
Published 2019-01-29
URL http://arxiv.org/abs/1901.10200v2
PDF http://arxiv.org/pdf/1901.10200v2.pdf
PWC https://paperswithcode.com/paper/catch22-canonical-time-series-characteristics
Repo https://github.com/benfulcher/hctsa
Framework none

Dynamic Feature Fusion for Semantic Edge Detection

Title Dynamic Feature Fusion for Semantic Edge Detection
Authors Yuan Hu, Yunpeng Chen, Xiang Li, Jiashi Feng
Abstract Features from multiple scales can greatly benefit the semantic edge detection task if they are well fused. However, the prevalent semantic edge detection methods apply a fixed weight fusion strategy where images with different semantics are forced to share the same weights, resulting in universal fusion weights for all images and locations regardless of their different semantics or local context. In this work, we propose a novel dynamic feature fusion strategy that assigns different fusion weights for different input images and locations adaptively. This is achieved by a proposed weight learner to infer proper fusion weights over multi-level features for each location of the feature map, conditioned on the specific input. In this way, the heterogeneity in contributions made by different locations of feature maps and input images can be better considered and thus help produce more accurate and sharper edge predictions. We show that our model with the novel dynamic feature fusion is superior to fixed weight fusion and also the na"ive location-invariant weight fusion methods, via comprehensive experiments on benchmarks Cityscapes and SBD. In particular, our method outperforms all existing well established methods and achieves new state-of-the-art.
Tasks Edge Detection
Published 2019-02-25
URL http://arxiv.org/abs/1902.09104v1
PDF http://arxiv.org/pdf/1902.09104v1.pdf
PWC https://paperswithcode.com/paper/dynamic-feature-fusion-for-semantic-edge
Repo https://github.com/Lavender105/DFF
Framework pytorch

Learning Video Representations from Correspondence Proposals

Title Learning Video Representations from Correspondence Proposals
Authors Xingyu Liu, Joon-Young Lee, Hailin Jin
Abstract Correspondences between frames encode rich information about dynamic content in videos. However, it is challenging to effectively capture and learn those due to their irregular structure and complex dynamics. In this paper, we propose a novel neural network that learns video representations by aggregating information from potential correspondences. This network, named $CPNet$, can learn evolving 2D fields with temporal consistency. In particular, it can effectively learn representations for videos by mixing appearance and long-range motion with an RGB-only input. We provide extensive ablation experiments to validate our model. CPNet shows stronger performance than existing methods on Kinetics and achieves the state-of-the-art performance on Something-Something and Jester. We provide analysis towards the behavior of our model and show its robustness to errors in proposals.
Tasks Action Recognition In Videos
Published 2019-05-20
URL https://arxiv.org/abs/1905.07853v1
PDF https://arxiv.org/pdf/1905.07853v1.pdf
PWC https://paperswithcode.com/paper/learning-video-representations-from
Repo https://github.com/xingyul/cpnet
Framework tf
comments powered by Disqus