Paper Group AWR 432
The emergence of number and syntax units in LSTM language models. SEAN: Image Synthesis with Semantic Region-Adaptive Normalization. What does a Car-ssette tape tell?. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints. Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection. Spotting …
The emergence of number and syntax units in LSTM language models
Title | The emergence of number and syntax units in LSTM language models |
Authors | Yair Lakretz, German Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni |
Abstract | Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the inner mechanics of number tracking in LSTMs at the single neuron level. We discover that long-distance number information is largely managed by two `number units’. Importantly, the behaviour of these units is partially controlled by other units independently shown to track syntactic structure. We conclude that LSTMs are, to some extent, implementing genuinely syntactic processing mechanisms, paving the way to a more general understanding of grammatical encoding in LSTMs. | |
Tasks | Language Modelling |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07435v2 |
http://arxiv.org/pdf/1903.07435v2.pdf | |
PWC | https://paperswithcode.com/paper/the-emergence-of-number-and-syntax-units-in |
Repo | https://github.com/FAIRNS/Number_and_syntax_units_in_LSTM_LMs |
Framework | pytorch |
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
Title | SEAN: Image Synthesis with Semantic Region-Adaptive Normalization |
Authors | Peihao Zhu, Rameen Abdal, Yipeng Qin, Peter Wonka |
Abstract | We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region. |
Tasks | Image Generation |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12861v1 |
https://arxiv.org/pdf/1911.12861v1.pdf | |
PWC | https://paperswithcode.com/paper/sean-image-synthesis-with-semantic-region |
Repo | https://github.com/ZPdesu/SEAN |
Framework | pytorch |
What does a Car-ssette tape tell?
Title | What does a Car-ssette tape tell? |
Authors | Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu |
Abstract | Captioning has attracted much attention in image and video understanding while little work examines audio captioning. This paper contributes a manually-annotated dataset on car scene, in extension to a previously published hospital audio captioning dataset. An encoder-decoder model with pretrained word embeddings and additional sentence loss is proposed. This current model can accelerate the training process and generate semantically correct but unseen unique sentences. We test the model on the current car dataset, previous Hospital Dataset and the Joint Dataset, indicating its generalization capability across different scenes. Further, we make an effort to provide a better objective evaluation metric, namely the BERT similarity score. It compares the semantic-level similarity and compensates for drawbacks of N-gram based metrics like BLEU, namely high scores for word-similar sentences. This new metric demonstrates higher correlation with human evaluation. However, though detailed audio captions can now be automatically generated, human annotations still outperform model captions in many aspects. |
Tasks | Video Understanding, Word Embeddings |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1905.13448v1 |
https://arxiv.org/pdf/1905.13448v1.pdf | |
PWC | https://paperswithcode.com/paper/what-does-a-car-ssette-tape-tell |
Repo | https://github.com/richermans/AudioCaption |
Framework | pytorch |
6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints
Title | 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints |
Authors | Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu |
Abstract | We present 6-PACK, a deep learning approach to category-level 6D object pose tracking on RGB-D data. Our method tracks in real-time novel object instances of known object categories such as bowls, laptops, and mugs. 6-PACK learns to compactly represent an object by a handful of 3D keypoints, based on which the interframe motion of an object instance can be estimated through keypoint matching. These keypoints are learned end-to-end without manual supervision in order to be most effective for tracking. Our experiments show that our method substantially outperforms existing methods on the NOCS category-level 6D pose estimation benchmark and supports a physical robot to perform simple vision-based closed-loop manipulation tasks. Our code and video are available at https://sites.google.com/view/6packtracking. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGBD, Pose Estimation, Pose Tracking |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10750v1 |
https://arxiv.org/pdf/1910.10750v1.pdf | |
PWC | https://paperswithcode.com/paper/6-pack-category-level-6d-pose-tracker-with |
Repo | https://github.com/j96w/6-PACK |
Framework | pytorch |
Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection
Title | Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection |
Authors | Zekun Xu, Deovrat Kakde, Arin Chaudhuri |
Abstract | In recent years, there have been many practical applications of anomaly detection such as in predictive maintenance, detection of credit fraud, network intrusion, and system failure. The goal of anomaly detection is to identify in the test data anomalous behaviors that are either rare or unseen in the training data. This is a common goal in predictive maintenance, which aims to forecast the imminent faults of an appliance given abundant samples of normal behaviors. Local outlier factor (LOF) is one of the state-of-the-art models used for anomaly detection, but the predictive performance of LOF depends greatly on the selection of hyperparameters. In this paper, we propose a novel, heuristic methodology to tune the hyperparameters in LOF. A tuned LOF model that uses the proposed method shows good predictive performance in both simulations and real data sets. |
Tasks | Anomaly Detection |
Published | 2019-02-01 |
URL | http://arxiv.org/abs/1902.00567v1 |
http://arxiv.org/pdf/1902.00567v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-hyperparameter-tuning-method-for |
Repo | https://github.com/vsatyakumar/automatic-local-outlier-factor-tuning |
Framework | none |
Spotting Macro- and Micro-expression Intervals in Long Video Sequences
Title | Spotting Macro- and Micro-expression Intervals in Long Video Sequences |
Authors | Ying He, Su-Jing Wang, Jingting Li, Moi Hoon Yap |
Abstract | This paper presents baseline results for the Third Facial Micro-Expression Grand Challenge (MEGC 2020). Both macro- and micro-expression intervals in CAS(ME)$^2$ and SAMM Long Videos are spotted by employing the method of Main Directional Maximal Difference Analysis (MDMD). The MDMD method uses the magnitude maximal difference in the main direction of optical flow features to spot facial movements. The single-frame prediction results of the original MDMD method are post-processed into reasonable video intervals. The metric F1-scores of baseline results are evaluated: for CAS(ME)$^2$, the F1-scores are 0.1196 and 0.0082 for macro- and micro-expressions respectively, and the overall F1-score is 0.0376; for SAMM Long Videos, the F1-scores are 0.0629 and 0.0364 for macro- and micro-expressions respectively, and the overall F1-score is 0.0445. The baseline project codes are publicly available at https://github.com/HeyingGithub/Baseline-project-for-MEGC2020_spotting. |
Tasks | Optical Flow Estimation |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.11985v3 |
https://arxiv.org/pdf/1912.11985v3.pdf | |
PWC | https://paperswithcode.com/paper/spotting-macro-and-micro-expression-intervals |
Repo | https://github.com/HeyingGithub/Baseline-project-for-MEGC2020_spotting |
Framework | none |
Lightweight Network Architecture for Real-Time Action Recognition
Title | Lightweight Network Architecture for Real-Time Action Recognition |
Authors | Alexander Kozlov, Vadim Andronov, Yana Gritsenko |
Abstract | In this work we present a new efficient approach to Human Action Recognition called Video Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural Language Processing and applies them to video understanding. The proposed method allows us to create lightweight CNN models that achieve high accuracy and real-time speed using just an RGB mono camera and general purpose CPU. Furthermore, we explain how to improve accuracy by distilling from multiple models with different modalities into a single model. We conduct a comparison with state-of-the-art methods and show that our approach performs on par with most of them on famous Action Recognition datasets. We benchmark the inference time of the models using the modern inference framework and argue that our approach compares favorably with other methods in terms of speed/accuracy trade-off, running at 56 FPS on CPU. The models and the training code are available. |
Tasks | Temporal Action Localization, Video Understanding |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08711v1 |
https://arxiv.org/pdf/1905.08711v1.pdf | |
PWC | https://paperswithcode.com/paper/lightweight-network-architecture-for-real |
Repo | https://github.com/Chrisackerman1/Lightweight-Network-Architecture-for-Real-Time-Action-Recognition |
Framework | none |
Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space
Title | Quantitative stability of optimal transport maps and linearization of the 2-Wasserstein space |
Authors | Quentin Mérigot, Alex Delalande, Frédéric Chazal |
Abstract | This work studies an explicit embedding of the set of probability measures into a Hilbert space, defined using optimal transport maps from a reference probability density. This embedding linearizes to some extent the 2-Wasserstein space, and enables the direct use of generic supervised and unsupervised learning algorithms on measure data. Our main result is that the embedding is (bi-)H"older continuous, when the reference density is uniform over a convex set, and can be equivalently phrased as a dimension-independent H"older-stability results for optimal transport maps. |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.05954v1 |
https://arxiv.org/pdf/1910.05954v1.pdf | |
PWC | https://paperswithcode.com/paper/quantitative-stability-of-optimal-transport |
Repo | https://github.com/AlxDel/stability_ot_maps_and_linearization_wasserstein_space |
Framework | none |
Feature Forwarding for Efficient Single Image Dehazing
Title | Feature Forwarding for Efficient Single Image Dehazing |
Authors | Peter Morales, Tzofi Klinghoffer, Seung Jae Lee |
Abstract | Haze degrades content and obscures information of images, which can negatively impact vision-based decision-making in real-time systems. In this paper, we propose an efficient fully convolutional neural network (CNN) image dehazing method designed to run on edge graphical processing units (GPUs). We utilize three variants of our architecture to explore the dependency of dehazed image quality on parameter count and model design. The first two variants presented, a small and big version, make use of a single efficient encoder-decoder convolutional feature extractor. The final variant utilizes a pair of encoder-decoders for atmospheric light and transmission map estimation. Each variant ends with an image refinement pyramid pooling network to form the final dehazed image. For the big variant of the single-encoder network, we demonstrate state-of-the-art performance on the NYU Depth dataset. For the small variant, we maintain competitive performance on the super-resolution O/I-HAZE datasets without the need for image cropping. Finally, we examine some challenges presented by the Dense-Haze dataset when leveraging CNN architectures for dehazing of dense haze imagery and examine the impact of loss function selection on image quality. Benchmarks are included to show the feasibility of introducing this approach into real-time systems. |
Tasks | Decision Making, Image Cropping, Image Dehazing, Single Image Dehazing, Super-Resolution |
Published | 2019-04-19 |
URL | https://arxiv.org/abs/1904.09059v2 |
https://arxiv.org/pdf/1904.09059v2.pdf | |
PWC | https://paperswithcode.com/paper/feature-forwarding-for-efficient-single-image |
Repo | https://github.com/pmm09c/ntire-dehazing |
Framework | pytorch |
Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning
Title | Revision in Continuous Space: Unsupervised Text Style Transfer without Adversarial Learning |
Authors | Dayiheng Liu, Jie Fu, Yidan Zhang, Chris Pal, Jiancheng Lv |
Abstract | Typical methods for unsupervised text style transfer often rely on two key ingredients: 1) seeking the explicit disentanglement of the content and the attributes, and 2) troublesome adversarial learning. In this paper, we show that neither of these components is indispensable. We propose a new framework that utilizes the gradients to revise the sentence in a continuous space during inference to achieve text style transfer. Our method consists of three key components: a variational auto-encoder (VAE), some attribute predictors (one for each attribute), and a content predictor. The VAE and the two types of predictors enable us to perform gradient-based optimization in the continuous space, which is mapped from sentences in a discrete space, to find the representation of a target sentence with the desired attributes and preserved content. Moreover, the proposed method naturally has the ability to simultaneously manipulate multiple fine-grained attributes, such as sentence length and the presence of specific words, when performing text style transfer tasks. Compared with previous adversarial learning based methods, the proposed method is more interpretable, controllable and easier to train. Extensive experimental studies on three popular text style transfer tasks show that the proposed method significantly outperforms five state-of-the-art methods. |
Tasks | Style Transfer, Text Style Transfer |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12304v3 |
https://arxiv.org/pdf/1905.12304v3.pdf | |
PWC | https://paperswithcode.com/paper/revision-in-continuous-space-fine-grained |
Repo | https://github.com/dayihengliu/Fine-Grained-Style-Transfer |
Framework | tf |
Cloud Removal in Satellite Images Using Spatiotemporal Generative Networks
Title | Cloud Removal in Satellite Images Using Spatiotemporal Generative Networks |
Authors | Vishnu Sarukkai, Anirudh Jain, Burak Uzkent, Stefano Ermon |
Abstract | Satellite images hold great promise for continuous environmental monitoring and earth observation. Occlusions cast by clouds, however, can severely limit coverage, making ground information extraction more difficult. Existing pipelines typically perform cloud removal with simple temporal composites and hand-crafted filters. In contrast, we cast the problem of cloud removal as a conditional image synthesis challenge, and we propose a trainable spatiotemporal generator network (STGAN) to remove clouds. We train our model on a new large-scale spatiotemporal dataset that we construct, containing 97640 image pairs covering all continents. We demonstrate experimentally that the proposed STGAN model outperforms standard models and can generate realistic cloud-free images with high PSNR and SSIM values across a variety of atmospheric conditions, leading to improved performance in downstream tasks such as land cover classification. |
Tasks | Image Generation |
Published | 2019-12-14 |
URL | https://arxiv.org/abs/1912.06838v1 |
https://arxiv.org/pdf/1912.06838v1.pdf | |
PWC | https://paperswithcode.com/paper/cloud-removal-in-satellite-images-using |
Repo | https://github.com/VSAnimator/stgan |
Framework | pytorch |
Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks
Title | Fine-Grained Emotion Classification of Chinese Microblogs Based on Graph Convolution Networks |
Authors | Yuni Lai, Linfeng Zhang, Donghong Han, Rui Zhou, Guoren Wang |
Abstract | Microblogs are widely used to express people’s opinions and feelings in daily life. Sentiment analysis (SA) can timely detect personal sentiment polarities through analyzing text. Deep learning approaches have been broadly used in SA but still have not fully exploited syntax information. In this paper, we propose a syntax-based graph convolution network (GCN) model to enhance the understanding of diverse grammatical structures of Chinese microblogs. In addition, a pooling method based on percentile is proposed to improve the accuracy of the model. In experiments, for Chinese microblogs emotion classification categories including happiness, sadness, like, anger, disgust, fear, and surprise, the F-measure of our model reaches 82.32% and exceeds the state-of-the-art algorithm by 5.90%. The experimental results show that our model can effectively utilize the information of dependency parsing to improve the performance of emotion detection. What is more, we annotate a new dataset for Chinese emotion classification, which is open to other researchers. |
Tasks | Dependency Parsing, Emotion Classification, Sentiment Analysis |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02545v1 |
https://arxiv.org/pdf/1912.02545v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-emotion-classification-of |
Repo | https://github.com/zhanglinfeng1997/Sentiment-Analysis-via-GCN |
Framework | pytorch |
A Plug-in Method for Representation Factorization
Title | A Plug-in Method for Representation Factorization |
Authors | Jee Seok Yoon, Myung-Cheol Roh, Heung-Il Suk |
Abstract | In this work, we focus on decomposing the latent representations in GANs or learned feature representations in deep auto-encoders into semantically controllable factors in a semi-supervised manner, without modifying the original trained models. Specifically, we propose a Factors Decomposer-Entangler Network (FDEN) that learns to decompose a latent representation into mutually independent factors. Given a latent representation, the proposed framework draws a set of interpretable factors, each aligned to independent factors of variations by maximizing their total correlation in an information-theoretic means. As a plug-in method, we have applied our proposed FDEN to the existing networks of Adversarially Learned Inference and Pioneer Network and conducted computer vision tasks of image-to-image translation in semantic ways, e.g., changing styles while keeping an identify of a subject, and object classification in a few-shot learning scheme. We have also validated the effectiveness of our method with various ablation studies in qualitative, quantitative, and statistical examination. |
Tasks | Few-Shot Learning, Image-to-Image Translation, Object Classification, Style Transfer |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11088v3 |
https://arxiv.org/pdf/1905.11088v3.pdf | |
PWC | https://paperswithcode.com/paper/plug-in-factorization-for-latent |
Repo | https://github.com/wltjr1007/FDEN |
Framework | tf |
FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance
Title | FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance |
Authors | Matteo Dell’Amico |
Abstract | FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm. It is flexible because it empowers users to work on arbitrary data, skipping the feature extraction step that usually transforms raw data in numeric arrays letting users define an arbitrary distance function instead. It is incremental and scalable: it avoids the $\mathcal O(n^2)$ performance of other approaches in non-metric spaces and requires only lightweight computation to update the clustering when few items are added. It is hierarchical: it produces a “flat” clustering which can be expanded to a tree structure, so that users can group and/or divide clusters in sub- or super-clusters when data exploration requires so. It is density-based and approximates HDBSCAN*, an evolution of DBSCAN. |
Tasks | |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07283v1 |
https://arxiv.org/pdf/1910.07283v1.pdf | |
PWC | https://paperswithcode.com/paper/fishdbc-flexible-incremental-scalable |
Repo | https://github.com/matteodellamico/flexible-clustering |
Framework | none |
Pre-trained Language Model for Biomedical Question Answering
Title | Pre-trained Language Model for Biomedical Question Answering |
Authors | Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang |
Abstract | The recent success of question answering systems is largely attributed to pre-trained language models. However, as language models are mostly pre-trained on general domain corpora such as Wikipedia, they often have difficulty in understanding biomedical questions. In this paper, we investigate the performance of BioBERT, a pre-trained biomedical language model, in answering biomedical questions including factoid, list, and yes/no type questions. BioBERT uses almost the same structure across various question types and achieved the best performance in the 7th BioASQ Challenge (Task 7b, Phase B). BioBERT pre-trained on SQuAD or SQuAD 2.0 easily outperformed previous state-of-the-art models. BioBERT obtains the best performance when it uses the appropriate pre-/post-processing strategies for questions, passages, and answers. |
Tasks | Language Modelling, Question Answering |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08229v1 |
https://arxiv.org/pdf/1909.08229v1.pdf | |
PWC | https://paperswithcode.com/paper/pre-trained-language-model-for-biomedical |
Repo | https://github.com/dmis-lab/bioasq-biobert |
Framework | tf |