January 31, 2020

2839 words 14 mins read

Paper Group AWR 432

The emergence of number and syntax units in LSTM language models. SEAN: Image Synthesis with Semantic Region-Adaptive Normalization. What does a Car-ssette tape tell?. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection. Spotting …

The emergence of number and syntax units in LSTM language models


Title	The emergence of number and syntax units in LSTM language models
Authors	Yair Lakretz, German Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni
Abstract	Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two `number units’. Importantly, the behaviour of these units is partially controlled by other units independently shown to track syntactic structure. We conclude that LSTMs are, to some extent, implementing genuinely syntactic processing mechanisms, paving the way to a more general understanding of grammatical encoding in LSTMs. \|
Tasks	Language Modelling
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07435v2
PDF	http://arxiv.org/pdf/1903.07435v2.pdf
PWC	https://paperswithcode.com/paper/the-emergence-of-number-and-syntax-units-in
Repo	https://github.com/FAIRNS/Number_and_syntax_units_in_LSTM_LMs
Framework	pytorch

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization


Title	SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
Authors	Peihao Zhu, Rameen Abdal, Yipeng Qin, Peter Wonka
Abstract	We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.
Tasks	Image Generation
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12861v1
PDF	https://arxiv.org/pdf/1911.12861v1.pdf
PWC	https://paperswithcode.com/paper/sean-image-synthesis-with-semantic-region
Repo	https://github.com/ZPdesu/SEAN
Framework	pytorch

What does a Car-ssette tape tell?


Title	What does a Car-ssette tape tell?
Authors	Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu
Abstract	Captioning has attracted much attention in image and video understanding while little work examines audio captioning. This paper contributes a manually-annotated dataset on car scene, in extension to a previously published hospital audio captioning dataset. An encoder-decoder model with pretrained word embeddings and additional sentence loss is proposed. This current model can accelerate the training process and generate semantically correct but unseen unique sentences. We test the model on the current car dataset, previous Hospital Dataset and the Joint Dataset, indicating its generalization capability across different scenes. Further, we make an effort to provide a better objective evaluation metric, namely the BERT similarity score. It compares the semantic-level similarity and compensates for drawbacks of N-gram based metrics like BLEU, namely high scores for word-similar sentences. This new metric demonstrates higher correlation with human evaluation. However, though detailed audio captions can now be automatically generated, human annotations still outperform model captions in many aspects.
Tasks	Video Understanding, Word Embeddings
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13448v1
PDF	https://arxiv.org/pdf/1905.13448v1.pdf
PWC	https://paperswithcode.com/paper/what-does-a-car-ssette-tape-tell
Repo	https://github.com/richermans/AudioCaption
Framework	pytorch

6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints


Title	6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
Authors	Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu
Abstract	We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data. Our method tracks in real-time novel object instances of known object categories such as bowls, laptops, and mugs. 6-PACK learns to compactly represent an object by a handful of 3D keypoints, based on which the interframe motion of an object instance can be estimated through keypoint matching. These keypoints are learned end-to-end without manual supervision in order to be most effective for tracking. Our experiments show that our method substantially outperforms existing methods on the NOCS category-level 6D pose estimation benchmark and supports a physical robot to perform simple vision-based closed-loop manipulation tasks. Our code and video are available at https://sites.google.com/view/6packtracking.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGBD, Pose Estimation, Pose Tracking
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10750v1
PDF	https://arxiv.org/pdf/1910.10750v1.pdf
PWC	https://paperswithcode.com/paper/6-pack-category-level-6d-pose-tracker-with
Repo	https://github.com/j96w/6-PACK
Framework	pytorch

Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection


Title	Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection
Authors	Zekun Xu, Deovrat Kakde, Arin Chaudhuri
Abstract	In recent years, there have been many practical applications of anomaly detection such as in predictive maintenance, detection of credit fraud, network intrusion, and system failure. The goal of anomaly detection is to identify in the test data anomalous behaviors that are either rare or unseen in the training data. This is a common goal in predictive maintenance, which aims to forecast the imminent faults of an appliance given abundant samples of normal behaviors. Local outlier factor (LOF) is one of the state-of-the-art models used for anomaly detection, but the predictive performance of LOF depends greatly on the selection of hyperparameters. In this paper, we propose a novel, heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model that uses the proposed method shows good predictive performance in both simulations and real data sets.
Tasks	Anomaly Detection
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00567v1
PDF	http://arxiv.org/pdf/1902.00567v1.pdf
PWC	https://paperswithcode.com/paper/automatic-hyperparameter-tuning-method-for
Repo	https://github.com/vsatyakumar/automatic-local-outlier-factor-tuning
Framework	none

Spotting Macro- and Micro-expression Intervals in Long Video Sequences


Title	Spotting Macro- and Micro-expression Intervals in Long Video Sequences
Authors	Ying He, Su-Jing Wang, Jingting Li, Moi Hoon Yap
Abstract	This paper presents baseline results for the Third Facial Micro-Expression Grand Challenge (MEGC 2020). Both macro- and micro-expression intervals in CAS(ME)$^2$ and SAMM Long Videos are spotted by employing the method of Main Directional Maximal Difference Analysis (MDMD). The MDMD method uses the magnitude maximal difference in the main direction of optical flow features to spot facial movements. The single-frame prediction results of the original MDMD method are post-processed into reasonable video intervals. The metric F1-scores of baseline results are evaluated: for CAS(ME)$^2$, the F1-scores are 0.1196 and 0.0082 for macro- and micro-expressions respectively, and the overall F1-score is 0.0376; for SAMM Long Videos, the F1-scores are 0.0629 and 0.0364 for macro- and micro-expressions respectively, and the overall F1-score is 0.0445. The baseline project codes are publicly available at https://github.com/HeyingGithub/Baseline-project-for-MEGC2020_spotting.
Tasks	Optical Flow Estimation
Published	2019-12-18
URL	https://arxiv.org/abs/1912.11985v3
PDF	https://arxiv.org/pdf/1912.11985v3.pdf
PWC	https://paperswithcode.com/paper/spotting-macro-and-micro-expression-intervals
Repo	https://github.com/HeyingGithub/Baseline-project-for-MEGC2020_spotting
Framework	none

Lightweight Network Architecture for Real-Time Action Recognition


Title	Lightweight Network Architecture for Real-Time Action Recognition
Authors	Alexander Kozlov, Vadim Andronov, Yana Gritsenko
Abstract	In this work we present a new efficient approach to Human Action Recognition called Video Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural Language Processing and applies them to video understanding. The proposed method allows us to create lightweight CNN models that achieve high accuracy and real-time speed using just an RGB mono camera and general purpose CPU. Furthermore, we explain how to improve accuracy by distilling from multiple models with different modalities into a single model. We conduct a comparison with state-of-the-art methods and show that our approach performs on par with most of them on famous Action Recognition datasets. We benchmark the inference time of the models using the modern inference framework and argue that our approach compares favorably with other methods in terms of speed/accuracy trade-off, running at 56 FPS on CPU. The models and the training code are available.
Tasks	Temporal Action Localization, Video Understanding
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08711v1
PDF	https://arxiv.org/pdf/1905.08711v1.pdf
PWC	https://paperswithcode.com/paper/lightweight-network-architecture-for-real
Repo	https://github.com/Chrisackerman1/Lightweight-Network-Architecture-for-Real-Time-Action-Recognition
Framework	none

Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space


Title	Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space
Authors	Quentin Mérigot, Alex Delalande, Frédéric Chazal
Abstract	This work studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density. This embedding linearizes to some extent the 2-Wasserstein space, and enables the direct use of generic supervised and unsupervised learning algorithms on measure data. Our main result is that the embedding is (bi-)H"older continuous, when the reference density is uniform over a convex set, and can be equivalently phrased as a dimension-independent H"older-stability results for optimal transport maps.
Tasks
Published	2019-10-14
URL	https://arxiv.org/abs/1910.05954v1
PDF	https://arxiv.org/pdf/1910.05954v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-stability-of-optimal-transport
Repo	https://github.com/AlxDel/stability_ot_maps_and_linearization_wasserstein_space
Framework	none

Feature Forwarding for Efficient Single Image Dehazing


Title	Feature Forwarding for Efficient Single Image Dehazing
Authors	Peter Morales, Tzofi Klinghoffer, Seung Jae Lee
Abstract	Haze degrades content and obscures information of images, which can negatively impact vision-based decision-making in real-time systems. In this paper, we propose an efficient fully convolutional neural network (CNN) image dehazing method designed to run on edge graphical processing units (GPUs). We utilize three variants of our architecture to explore the dependency of dehazed image quality on parameter count and model design. The first two variants presented, a small and big version, make use of a single efficient encoder-decoder convolutional feature extractor. The final variant utilizes a pair of encoder-decoders for atmospheric light and transmission map estimation. Each variant ends with an image refinement pyramid pooling network to form the final dehazed image. For the big variant of the single-encoder network, we demonstrate state-of-the-art performance on the NYU Depth dataset. For the small variant, we maintain competitive performance on the super-resolution O/I-HAZE datasets without the need for image cropping. Finally, we examine some challenges presented by the Dense-Haze dataset when leveraging CNN architectures for dehazing of dense haze imagery and examine the impact of loss function selection on image quality. Benchmarks are included to show the feasibility of introducing this approach into real-time systems.
Tasks	Decision Making, Image Cropping, Image Dehazing, Single Image Dehazing, Super-Resolution
Published	2019-04-19
URL	https://arxiv.org/abs/1904.09059v2
PDF	https://arxiv.org/pdf/1904.09059v2.pdf
PWC	https://paperswithcode.com/paper/feature-forwarding-for-efficient-single-image
Repo	https://github.com/pmm09c/ntire-dehazing
Framework	pytorch

Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning


Title	Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning
Authors	Dayiheng Liu, Jie Fu, Yidan Zhang, Chris Pal, Jiancheng Lv
Abstract	Typical methods for unsupervised text style transfer often rely on two key ingredients: 1) seeking the explicit disentanglement of the content and the attributes, and 2) troublesome adversarial learning. In this paper, we show that neither of these components is indispensable. We propose a new framework that utilizes the gradients to revise the sentence in a continuous space during inference to achieve text style transfer. Our method consists of three key components: a variational auto-encoder (VAE), some attribute predictors (one for each attribute), and a content predictor. The VAE and the two types of predictors enable us to perform gradient-based optimization in the continuous space, which is mapped from sentences in a discrete space, to find the representation of a target sentence with the desired attributes and preserved content. Moreover, the proposed method naturally has the ability to simultaneously manipulate multiple fine-grained attributes, such as sentence length and the presence of specific words, when performing text style transfer tasks. Compared with previous adversarial learning based methods, the proposed method is more interpretable, controllable and easier to train. Extensive experimental studies on three popular text style transfer tasks show that the proposed method significantly outperforms five state-of-the-art methods.
Tasks	Style Transfer, Text Style Transfer
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12304v3
PDF	https://arxiv.org/pdf/1905.12304v3.pdf
PWC	https://paperswithcode.com/paper/revision-in-continuous-space-fine-grained
Repo	https://github.com/dayihengliu/Fine-Grained-Style-Transfer
Framework	tf

Cloud Removal in Satellite Images Using Spatiotemporal Generative Networks


Title	Cloud Removal in Satellite Images Using Spatiotemporal Generative Networks
Authors	Vishnu Sarukkai, Anirudh Jain, Burak Uzkent, Stefano Ermon
Abstract	Satellite images hold great promise for continuous environmental monitoring and earth observation. Occlusions cast by clouds, however, can severely limit coverage, making ground information extraction more difficult. Existing pipelines typically perform cloud removal with simple temporal composites and hand-crafted filters. In contrast, we cast the problem of cloud removal as a conditional image synthesis challenge, and we propose a trainable spatiotemporal generator network (STGAN) to remove clouds. We train our model on a new large-scale spatiotemporal dataset that we construct, containing 97640 image pairs covering all continents. We demonstrate experimentally that the proposed STGAN model outperforms standard models and can generate realistic cloud-free images with high PSNR and SSIM values across a variety of atmospheric conditions, leading to improved performance in downstream tasks such as land cover classification.
Tasks	Image Generation
Published	2019-12-14
URL	https://arxiv.org/abs/1912.06838v1
PDF	https://arxiv.org/pdf/1912.06838v1.pdf
PWC	https://paperswithcode.com/paper/cloud-removal-in-satellite-images-using
Repo	https://github.com/VSAnimator/stgan
Framework	pytorch

Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks


Title	Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks
Authors	Yuni Lai, Linfeng Zhang, Donghong Han, Rui Zhou, Guoren Wang
Abstract	Microblogs are widely used to express people’s opinions and feelings in daily life. Sentiment analysis (SA) can timely detect personal sentiment polarities through analyzing text. Deep learning approaches have been broadly used in SA but still have not fully exploited syntax information. In this paper, we propose a syntax-based graph convolution network (GCN) model to enhance the understanding of diverse grammatical structures of Chinese microblogs. In addition, a pooling method based on percentile is proposed to improve the accuracy of the model. In experiments, for Chinese microblogs emotion classification categories including happiness, sadness, like, anger, disgust, fear, and surprise, the F-measure of our model reaches 82.32% and exceeds the state-of-the-art algorithm by 5.90%. The experimental results show that our model can effectively utilize the information of dependency parsing to improve the performance of emotion detection. What is more, we annotate a new dataset for Chinese emotion classification, which is open to other researchers.
Tasks	Dependency Parsing, Emotion Classification, Sentiment Analysis
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02545v1
PDF	https://arxiv.org/pdf/1912.02545v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-emotion-classification-of
Repo	https://github.com/zhanglinfeng1997/Sentiment-Analysis-via-GCN
Framework	pytorch

A Plug-in Method for Representation Factorization


Title	A Plug-in Method for Representation Factorization
Authors	Jee Seok Yoon, Myung-Cheol Roh, Heung-Il Suk
Abstract	In this work, we focus on decomposing the latent representations in GANs or learned feature representations in deep auto-encoders into semantically controllable factors in a semi-supervised manner, without modifying the original trained models. Specifically, we propose a Factors Decomposer-Entangler Network (FDEN) that learns to decompose a latent representation into mutually independent factors. Given a latent representation, the proposed framework draws a set of interpretable factors, each aligned to independent factors of variations by maximizing their total correlation in an information-theoretic means. As a plug-in method, we have applied our proposed FDEN to the existing networks of Adversarially Learned Inference and Pioneer Network and conducted computer vision tasks of image-to-image translation in semantic ways, e.g., changing styles while keeping an identify of a subject, and object classification in a few-shot learning scheme. We have also validated the effectiveness of our method with various ablation studies in qualitative, quantitative, and statistical examination.
Tasks	Few-Shot Learning, Image-to-Image Translation, Object Classification, Style Transfer
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11088v3
PDF	https://arxiv.org/pdf/1905.11088v3.pdf
PWC	https://paperswithcode.com/paper/plug-in-factorization-for-latent
Repo	https://github.com/wltjr1007/FDEN
Framework	tf

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance


Title	FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance
Authors	Matteo Dell’Amico
Abstract	FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm. It is flexible because it empowers users to work on arbitrary data, skipping the feature extraction step that usually transforms raw data in numeric arrays letting users define an arbitrary distance function instead. It is incremental and scalable: it avoids the $\mathcal O(n^2)$ performance of other approaches in non-metric spaces and requires only lightweight computation to update the clustering when few items are added. It is hierarchical: it produces a “flat” clustering which can be expanded to a tree structure, so that users can group and/or divide clusters in sub- or super-clusters when data exploration requires so. It is density-based and approximates HDBSCAN*, an evolution of DBSCAN.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07283v1
PDF	https://arxiv.org/pdf/1910.07283v1.pdf
PWC	https://paperswithcode.com/paper/fishdbc-flexible-incremental-scalable
Repo	https://github.com/matteodellamico/flexible-clustering
Framework	none

Pre-trained Language Model for Biomedical Question Answering


Title	Pre-trained Language Model for Biomedical Question Answering
Authors	Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang
Abstract	The recent success of question answering systems is largely attributed to pre-trained language models. However, as language models are mostly pre-trained on general domain corpora such as Wikipedia, they often have difficulty in understanding biomedical questions. In this paper, we investigate the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions. BioBERT uses almost the same structure across various question types and achieved the best performance in the 7th BioASQ Challenge (Task 7b, Phase B). BioBERT pre-trained on SQuAD or SQuAD 2.0 easily outperformed previous state-of-the-art models. BioBERT obtains the best performance when it uses the appropriate pre-/post-processing strategies for questions, passages, and answers.
Tasks	Language Modelling, Question Answering
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08229v1
PDF	https://arxiv.org/pdf/1909.08229v1.pdf
PWC	https://paperswithcode.com/paper/pre-trained-language-model-for-biomedical
Repo	https://github.com/dmis-lab/bioasq-biobert
Framework	tf