January 31, 2020

2839 words 14 mins read

Paper Group AWR 432

Paper Group AWR 432

The emergence of number and syntax units in LSTM language models. SEAN: Image Synthesis with Semantic Region-Adaptive Normalization. What does a Car-ssette tape tell?. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection. Spotting …

The emergence of number and syntax units in LSTM language models

Title The emergence of number and syntax units in LSTM language models
Authors Yair Lakretz, German Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni
Abstract Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two `number units’. Importantly, the behaviour of these units is partially controlled by other units independently shown to track syntactic structure. We conclude that LSTMs are, to some extent, implementing genuinely syntactic processing mechanisms, paving the way to a more general understanding of grammatical encoding in LSTMs. |
Tasks Language Modelling
Published 2019-03-18
URL http://arxiv.org/abs/1903.07435v2
PDF http://arxiv.org/pdf/1903.07435v2.pdf
PWC https://paperswithcode.com/paper/the-emergence-of-number-and-syntax-units-in
Repo https://github.com/FAIRNS/Number_and_syntax_units_in_LSTM_LMs
Framework pytorch

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization

Title SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
Authors Peihao Zhu, Rameen Abdal, Yipeng Qin, Peter Wonka
Abstract We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.
Tasks Image Generation
Published 2019-11-28
URL https://arxiv.org/abs/1911.12861v1
PDF https://arxiv.org/pdf/1911.12861v1.pdf
PWC https://paperswithcode.com/paper/sean-image-synthesis-with-semantic-region
Repo https://github.com/ZPdesu/SEAN
Framework pytorch

What does a Car-ssette tape tell?

Title What does a Car-ssette tape tell?
Authors Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu
Abstract Captioning has attracted much attention in image and video understanding while little work examines audio captioning. This paper contributes a manually-annotated dataset on car scene, in extension to a previously published hospital audio captioning dataset. An encoder-decoder model with pretrained word embeddings and additional sentence loss is proposed. This current model can accelerate the training process and generate semantically correct but unseen unique sentences. We test the model on the current car dataset, previous Hospital Dataset and the Joint Dataset, indicating its generalization capability across different scenes. Further, we make an effort to provide a better objective evaluation metric, namely the BERT similarity score. It compares the semantic-level similarity and compensates for drawbacks of N-gram based metrics like BLEU, namely high scores for word-similar sentences. This new metric demonstrates higher correlation with human evaluation. However, though detailed audio captions can now be automatically generated, human annotations still outperform model captions in many aspects.
Tasks Video Understanding, Word Embeddings
Published 2019-05-31
URL https://arxiv.org/abs/1905.13448v1
PDF https://arxiv.org/pdf/1905.13448v1.pdf
PWC https://paperswithcode.com/paper/what-does-a-car-ssette-tape-tell
Repo https://github.com/richermans/AudioCaption
Framework pytorch

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints

Title 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
Authors Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu
Abstract We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data. Our method tracks in real-time novel object instances of known object categories such as bowls, laptops, and mugs. 6-PACK learns to compactly represent an object by a handful of 3D keypoints, based on which the interframe motion of an object instance can be estimated through keypoint matching. These keypoints are learned end-to-end without manual supervision in order to be most effective for tracking. Our experiments show that our method substantially outperforms existing methods on the NOCS category-level 6D pose estimation benchmark and supports a physical robot to perform simple vision-based closed-loop manipulation tasks. Our code and video are available at https://sites.google.com/view/6packtracking.
Tasks 6D Pose Estimation, 6D Pose Estimation using RGBD, Pose Estimation, Pose Tracking
Published 2019-10-23
URL https://arxiv.org/abs/1910.10750v1
PDF https://arxiv.org/pdf/1910.10750v1.pdf
PWC https://paperswithcode.com/paper/6-pack-category-level-6d-pose-tracker-with
Repo https://github.com/j96w/6-PACK
Framework pytorch

Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection

Title Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection
Authors Zekun Xu, Deovrat Kakde, Arin Chaudhuri
Abstract In recent years, there have been many practical applications of anomaly detection such as in predictive maintenance, detection of credit fraud, network intrusion, and system failure. The goal of anomaly detection is to identify in the test data anomalous behaviors that are either rare or unseen in the training data. This is a common goal in predictive maintenance, which aims to forecast the imminent faults of an appliance given abundant samples of normal behaviors. Local outlier factor (LOF) is one of the state-of-the-art models used for anomaly detection, but the predictive performance of LOF depends greatly on the selection of hyperparameters. In this paper, we propose a novel, heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model that uses the proposed method shows good predictive performance in both simulations and real data sets.
Tasks Anomaly Detection
Published 2019-02-01
URL http://arxiv.org/abs/1902.00567v1
PDF http://arxiv.org/pdf/1902.00567v1.pdf
PWC https://paperswithcode.com/paper/automatic-hyperparameter-tuning-method-for
Repo https://github.com/vsatyakumar/automatic-local-outlier-factor-tuning
Framework none

Spotting Macro- and Micro-expression Intervals in Long Video Sequences

Title Spotting Macro- and Micro-expression Intervals in Long Video Sequences
Authors Ying He, Su-Jing Wang, Jingting Li, Moi Hoon Yap
Abstract This paper presents baseline results for the Third Facial Micro-Expression Grand Challenge (MEGC 2020). Both macro- and micro-expression intervals in CAS(ME)$^2$ and SAMM Long Videos are spotted by employing the method of Main Directional Maximal Difference Analysis (MDMD). The MDMD method uses the magnitude maximal difference in the main direction of optical flow features to spot facial movements. The single-frame prediction results of the original MDMD method are post-processed into reasonable video intervals. The metric F1-scores of baseline results are evaluated: for CAS(ME)$^2$, the F1-scores are 0.1196 and 0.0082 for macro- and micro-expressions respectively, and the overall F1-score is 0.0376; for SAMM Long Videos, the F1-scores are 0.0629 and 0.0364 for macro- and micro-expressions respectively, and the overall F1-score is 0.0445. The baseline project codes are publicly available at https://github.com/HeyingGithub/Baseline-project-for-MEGC2020_spotting.
Tasks Optical Flow Estimation
Published 2019-12-18
URL https://arxiv.org/abs/1912.11985v3
PDF https://arxiv.org/pdf/1912.11985v3.pdf
PWC https://paperswithcode.com/paper/spotting-macro-and-micro-expression-intervals
Repo https://github.com/HeyingGithub/Baseline-project-for-MEGC2020_spotting
Framework none

Lightweight Network Architecture for Real-Time Action Recognition

Title Lightweight Network Architecture for Real-Time Action Recognition
Authors Alexander Kozlov, Vadim Andronov, Yana Gritsenko
Abstract In this work we present a new efficient approach to Human Action Recognition called Video Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural Language Processing and applies them to video understanding. The proposed method allows us to create lightweight CNN models that achieve high accuracy and real-time speed using just an RGB mono camera and general purpose CPU. Furthermore, we explain how to improve accuracy by distilling from multiple models with different modalities into a single model. We conduct a comparison with state-of-the-art methods and show that our approach performs on par with most of them on famous Action Recognition datasets. We benchmark the inference time of the models using the modern inference framework and argue that our approach compares favorably with other methods in terms of speed/accuracy trade-off, running at 56 FPS on CPU. The models and the training code are available.
Tasks Temporal Action Localization, Video Understanding
Published 2019-05-21
URL https://arxiv.org/abs/1905.08711v1
PDF https://arxiv.org/pdf/1905.08711v1.pdf
PWC https://paperswithcode.com/paper/lightweight-network-architecture-for-real
Repo https://github.com/Chrisackerman1/Lightweight-Network-Architecture-for-Real-Time-Action-Recognition
Framework none

Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space

Title Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space
Authors Quentin Mérigot, Alex Delalande, Frédéric Chazal
Abstract This work studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density. This embedding linearizes to some extent the 2-Wasserstein space, and enables the direct use of generic supervised and unsupervised learning algorithms on measure data. Our main result is that the embedding is (bi-)H"older continuous, when the reference density is uniform over a convex set, and can be equivalently phrased as a dimension-independent H"older-stability results for optimal transport maps.
Tasks
Published 2019-10-14
URL https://arxiv.org/abs/1910.05954v1
PDF https://arxiv.org/pdf/1910.05954v1.pdf
PWC https://paperswithcode.com/paper/quantitative-stability-of-optimal-transport
Repo https://github.com/AlxDel/stability_ot_maps_and_linearization_wasserstein_space
Framework none

Feature Forwarding for Efficient Single Image Dehazing

Title Feature Forwarding for Efficient Single Image Dehazing
Authors Peter Morales, Tzofi Klinghoffer, Seung Jae Lee
Abstract Haze degrades content and obscures information of images, which can negatively impact vision-based decision-making in real-time systems. In this paper, we propose an efficient fully convolutional neural network (CNN) image dehazing method designed to run on edge graphical processing units (GPUs). We utilize three variants of our architecture to explore the dependency of dehazed image quality on parameter count and model design. The first two variants presented, a small and big version, make use of a single efficient encoder-decoder convolutional feature extractor. The final variant utilizes a pair of encoder-decoders for atmospheric light and transmission map estimation. Each variant ends with an image refinement pyramid pooling network to form the final dehazed image. For the big variant of the single-encoder network, we demonstrate state-of-the-art performance on the NYU Depth dataset. For the small variant, we maintain competitive performance on the super-resolution O/I-HAZE datasets without the need for image cropping. Finally, we examine some challenges presented by the Dense-Haze dataset when leveraging CNN architectures for dehazing of dense haze imagery and examine the impact of loss function selection on image quality. Benchmarks are included to show the feasibility of introducing this approach into real-time systems.
Tasks Decision Making, Image Cropping, Image Dehazing, Single Image Dehazing, Super-Resolution
Published 2019-04-19
URL https://arxiv.org/abs/1904.09059v2
PDF https://arxiv.org/pdf/1904.09059v2.pdf
PWC https://paperswithcode.com/paper/feature-forwarding-for-efficient-single-image
Repo https://github.com/pmm09c/ntire-dehazing
Framework pytorch

Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning

Title Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning
Authors Dayiheng Liu, Jie Fu, Yidan Zhang, Chris Pal, Jiancheng Lv
Abstract Typical methods for unsupervised text style transfer often rely on two key ingredients: 1) seeking the explicit disentanglement of the content and the attributes, and 2) troublesome adversarial learning. In this paper, we show that neither of these components is indispensable. We propose a new framework that utilizes the gradients to revise the sentence in a continuous space during inference to achieve text style transfer. Our method consists of three key components: a variational auto-encoder (VAE), some attribute predictors (one for each attribute), and a content predictor. The VAE and the two types of predictors enable us to perform gradient-based optimization in the continuous space, which is mapped from sentences in a discrete space, to find the representation of a target sentence with the desired attributes and preserved content. Moreover, the proposed method naturally has the ability to simultaneously manipulate multiple fine-grained attributes, such as sentence length and the presence of specific words, when performing text style transfer tasks. Compared with previous adversarial learning based methods, the proposed method is more interpretable, controllable and easier to train. Extensive experimental studies on three popular text style transfer tasks show that the proposed method significantly outperforms five state-of-the-art methods.
Tasks Style Transfer, Text Style Transfer
Published 2019-05-29
URL https://arxiv.org/abs/1905.12304v3
PDF https://arxiv.org/pdf/1905.12304v3.pdf
PWC https://paperswithcode.com/paper/revision-in-continuous-space-fine-grained
Repo https://github.com/dayihengliu/Fine-Grained-Style-Transfer
Framework tf

Cloud Removal in Satellite Images Using Spatiotemporal Generative Networks

Title Cloud Removal in Satellite Images Using Spatiotemporal Generative Networks
Authors Vishnu Sarukkai, Anirudh Jain, Burak Uzkent, Stefano Ermon
Abstract Satellite images hold great promise for continuous environmental monitoring and earth observation. Occlusions cast by clouds, however, can severely limit coverage, making ground information extraction more difficult. Existing pipelines typically perform cloud removal with simple temporal composites and hand-crafted filters. In contrast, we cast the problem of cloud removal as a conditional image synthesis challenge, and we propose a trainable spatiotemporal generator network (STGAN) to remove clouds. We train our model on a new large-scale spatiotemporal dataset that we construct, containing 97640 image pairs covering all continents. We demonstrate experimentally that the proposed STGAN model outperforms standard models and can generate realistic cloud-free images with high PSNR and SSIM values across a variety of atmospheric conditions, leading to improved performance in downstream tasks such as land cover classification.
Tasks Image Generation
Published 2019-12-14
URL https://arxiv.org/abs/1912.06838v1
PDF https://arxiv.org/pdf/1912.06838v1.pdf
PWC https://paperswithcode.com/paper/cloud-removal-in-satellite-images-using
Repo https://github.com/VSAnimator/stgan
Framework pytorch

Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks

Title Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks
Authors Yuni Lai, Linfeng Zhang, Donghong Han, Rui Zhou, Guoren Wang
Abstract Microblogs are widely used to express people’s opinions and feelings in daily life. Sentiment analysis (SA) can timely detect personal sentiment polarities through analyzing text. Deep learning approaches have been broadly used in SA but still have not fully exploited syntax information. In this paper, we propose a syntax-based graph convolution network (GCN) model to enhance the understanding of diverse grammatical structures of Chinese microblogs. In addition, a pooling method based on percentile is proposed to improve the accuracy of the model. In experiments, for Chinese microblogs emotion classification categories including happiness, sadness, like, anger, disgust, fear, and surprise, the F-measure of our model reaches 82.32% and exceeds the state-of-the-art algorithm by 5.90%. The experimental results show that our model can effectively utilize the information of dependency parsing to improve the performance of emotion detection. What is more, we annotate a new dataset for Chinese emotion classification, which is open to other researchers.
Tasks Dependency Parsing, Emotion Classification, Sentiment Analysis
Published 2019-12-05
URL https://arxiv.org/abs/1912.02545v1
PDF https://arxiv.org/pdf/1912.02545v1.pdf
PWC https://paperswithcode.com/paper/fine-grained-emotion-classification-of
Repo https://github.com/zhanglinfeng1997/Sentiment-Analysis-via-GCN
Framework pytorch

A Plug-in Method for Representation Factorization

Title A Plug-in Method for Representation Factorization
Authors Jee Seok Yoon, Myung-Cheol Roh, Heung-Il Suk
Abstract In this work, we focus on decomposing the latent representations in GANs or learned feature representations in deep auto-encoders into semantically controllable factors in a semi-supervised manner, without modifying the original trained models. Specifically, we propose a Factors Decomposer-Entangler Network (FDEN) that learns to decompose a latent representation into mutually independent factors. Given a latent representation, the proposed framework draws a set of interpretable factors, each aligned to independent factors of variations by maximizing their total correlation in an information-theoretic means. As a plug-in method, we have applied our proposed FDEN to the existing networks of Adversarially Learned Inference and Pioneer Network and conducted computer vision tasks of image-to-image translation in semantic ways, e.g., changing styles while keeping an identify of a subject, and object classification in a few-shot learning scheme. We have also validated the effectiveness of our method with various ablation studies in qualitative, quantitative, and statistical examination.
Tasks Few-Shot Learning, Image-to-Image Translation, Object Classification, Style Transfer
Published 2019-05-27
URL https://arxiv.org/abs/1905.11088v3
PDF https://arxiv.org/pdf/1905.11088v3.pdf
PWC https://paperswithcode.com/paper/plug-in-factorization-for-latent
Repo https://github.com/wltjr1007/FDEN
Framework tf

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

Title FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance
Authors Matteo Dell’Amico
Abstract FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm. It is flexible because it empowers users to work on arbitrary data, skipping the feature extraction step that usually transforms raw data in numeric arrays letting users define an arbitrary distance function instead. It is incremental and scalable: it avoids the $\mathcal O(n^2)$ performance of other approaches in non-metric spaces and requires only lightweight computation to update the clustering when few items are added. It is hierarchical: it produces a “flat” clustering which can be expanded to a tree structure, so that users can group and/or divide clusters in sub- or super-clusters when data exploration requires so. It is density-based and approximates HDBSCAN*, an evolution of DBSCAN.
Tasks
Published 2019-10-16
URL https://arxiv.org/abs/1910.07283v1
PDF https://arxiv.org/pdf/1910.07283v1.pdf
PWC https://paperswithcode.com/paper/fishdbc-flexible-incremental-scalable
Repo https://github.com/matteodellamico/flexible-clustering
Framework none

Pre-trained Language Model for Biomedical Question Answering

Title Pre-trained Language Model for Biomedical Question Answering
Authors Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang
Abstract The recent success of question answering systems is largely attributed to pre-trained language models. However, as language models are mostly pre-trained on general domain corpora such as Wikipedia, they often have difficulty in understanding biomedical questions. In this paper, we investigate the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions. BioBERT uses almost the same structure across various question types and achieved the best performance in the 7th BioASQ Challenge (Task 7b, Phase B). BioBERT pre-trained on SQuAD or SQuAD 2.0 easily outperformed previous state-of-the-art models. BioBERT obtains the best performance when it uses the appropriate pre-/post-processing strategies for questions, passages, and answers.
Tasks Language Modelling, Question Answering
Published 2019-09-18
URL https://arxiv.org/abs/1909.08229v1
PDF https://arxiv.org/pdf/1909.08229v1.pdf
PWC https://paperswithcode.com/paper/pre-trained-language-model-for-biomedical
Repo https://github.com/dmis-lab/bioasq-biobert
Framework tf
comments powered by Disqus