February 1, 2020

3177 words 15 mins read

Paper Group AWR 310

Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer. Boosting Real-Time Driving Scene Parsing with Shared Semantics. Cooperative Semantic Segmentation and Image Restoration in Adverse Environmental Conditions. ViCo: Word Embeddings from Visual Co-occurrences. EdgeCNN: Convolutional Neural Network Classification Model …

Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer


Title	Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer
Authors	Akhilesh Sudhakar, Bhargav Upadhyay, Arjun Maheswaran
Abstract	Text style transfer is the task of transferring the style of text having certain stylistic attributes, while preserving non-stylistic or content information. In this work we introduce the Generative Style Transformer (GST) - a new approach to rewriting sentences to a target style in the absence of parallel style corpora. GST leverages the power of both, large unsupervised pre-trained language models as well as the Transformer. GST is a part of a larger `Delete Retrieve Generate’ framework, in which we also propose a novel method of deleting style attributes from the source sentence by exploiting the inner workings of the Transformer. Our models outperform state-of-art systems across 5 datasets on sentiment, gender and political slant transfer. We also propose the use of the GLEU metric as an automatic metric of evaluation of style transfer, which we found to compare better with human ratings than the predominantly used BLEU score. \|
Tasks	Style Transfer, Text Generation, Text Style Transfer
Published	2019-08-25
URL	https://arxiv.org/abs/1908.09368v1
PDF	https://arxiv.org/pdf/1908.09368v1.pdf
PWC	https://paperswithcode.com/paper/transforming-delete-retrieve-generate
Repo	https://github.com/agaralabs/transformer-drg-style-transfer
Framework	pytorch

Boosting Real-Time Driving Scene Parsing with Shared Semantics


Title	Boosting Real-Time Driving Scene Parsing with Shared Semantics
Authors	Zhenzhen Xiang, Anbo Bao, Jie Li, Jianbo Su
Abstract	Real-time scene parsing is a fundamental feature for autonomous driving vehicles with multiple cameras. In this letter we demonstrate that sharing semantics between cameras with different perspectives and overlapped views can boost the parsing performance when compared with traditional methods, which individually process the frames from each camera. Our framework is based on a deep neural network for semantic segmentation but with two kinds of additional modules for sharing and fusing semantics. On the one hand, a semantics sharing module is designed to establish the pixel-wise mapping between the input images. Features as well as semantics are shared by the map to reduce duplicated workload which leads to more efficient computation. On the other hand, feature fusion modules are designed to combine different modal of semantic features, which leverage the information from both inputs for better accuracy. To evaluate the effectiveness of the proposed framework, we have applied our network to a dual-camera vision system for driving scene parsing. Experimental results show that our network outperforms the baseline method on the parsing accuracy with comparable computations.
Tasks	Autonomous Driving, Scene Parsing, Semantic Segmentation
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07038v3
PDF	https://arxiv.org/pdf/1909.07038v3.pdf
PWC	https://paperswithcode.com/paper/boosting-real-time-driving-scene-parsing-with
Repo	https://github.com/zhenzhenxiang/SemanticsSharing
Framework	pytorch

Cooperative Semantic Segmentation and Image Restoration in Adverse Environmental Conditions


Title	Cooperative Semantic Segmentation and Image Restoration in Adverse Environmental Conditions
Authors	Weihao Xia, Zhanglin Cheng, Yujiu Yang, Jing-Hao Xue
Abstract	Most state-of-the-art semantic segmentation approaches only achieve high accuracy in good conditions. In practically-common but less-discussed adverse environmental conditions, their performance can decrease enormously. Existing studies usually cast the handling of segmentation in adverse conditions as a separate post-processing step after signal restoration, making the segmentation performance largely depend on the quality of restoration. In this paper, we propose a novel deep-learning framework to tackle semantic segmentation and image restoration in adverse environmental conditions in a holistic manner. The proposed approach contains two components: Semantically-Guided Adaptation, which exploits semantic information from degraded images to refine the segmentation; and Exemplar-Guided Synthesis, which restores images from semantic label maps given degraded exemplars as the guidance. Our method cooperatively leverages the complementarity and interdependence of low-level restoration and high-level segmentation in adverse environmental conditions. Extensive experiments on various datasets demonstrate that our approach can not only improve the accuracy of semantic segmentation with degradation cues, but also boost the perceptual quality and structural similarity of image restoration with semantic guidance.
Tasks	Image Restoration, Scene Parsing, Semantic Segmentation
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00679v3
PDF	https://arxiv.org/pdf/1911.00679v3.pdf
PWC	https://paperswithcode.com/paper/segment-for-restoration-restore-for
Repo	https://github.com/xiaweihao/SR-Net
Framework	pytorch

ViCo: Word Embeddings from Visual Co-occurrences


Title	ViCo: Word Embeddings from Visual Co-occurrences
Authors	Tanmay Gupta, Alexander Schwing, Derek Hoiem
Abstract	We propose to learn word embeddings from visual co-occurrences. Two words co-occur visually if both words apply to the same image or image region. Specifically, we extract four types of visual co-occurrences between object and attribute words from large-scale, textually-annotated visual databases like VisualGenome and ImageNet. We then train a multi-task log-bilinear model that compactly encodes word “meanings” represented by each co-occurrence type into a single visual word-vector. Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone. We further evaluate our embeddings on five downstream applications, four of which are vision-language tasks. Augmenting GloVe with our embeddings yields gains on all tasks. We also find that random embeddings perform comparably to learned embeddings on all supervised vision-language tasks, contrary to conventional wisdom.
Tasks	Word Embeddings
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08527v1
PDF	https://arxiv.org/pdf/1908.08527v1.pdf
PWC	https://paperswithcode.com/paper/vico-word-embeddings-from-visual-co
Repo	https://github.com/BigRedT/vico
Framework	none

EdgeCNN: Convolutional Neural Network Classification Model with small inputs for Edge Computing


Title	EdgeCNN: Convolutional Neural Network Classification Model with small inputs for Edge Computing
Authors	Shunzhi Yang, Zheng Gong, Kai Ye, Yungen Wei, Zheng Huang, Zhenhua Huang
Abstract	With the development of Internet of Things (IoT), data is increasingly appearing on the edge of the network. Processing tasks on the edge of the network can effectively solve the problems of personal privacy leaks and server overload. As a result, it has attracted a great deal of attention and made substantial progress. This progress includes efficient convolutional neural network (CNN) models such as MobileNet and ShuffleNet. However, all of these networks appear as a common network model and they usually need to identify multiple targets when applied. So the size of the input is very large. In some specific cases, only the target needs to be classified. Therefore, a small input network can be designed to reduce computation. In addition, other efficient neural network models are primarily designed for mobile phones. Mobile phones have faster memory access, which allows them to use group convolution. In particular, this paper finds that the recently widely used group convolution is not suitable for devices with very slow memory access. Therefore, the EdgeCNN of this paper is designed for edge computing devices with low memory access speed and low computing resources. EdgeCNN has been run successfully on the Raspberry Pi 3B+ at a speed of 1.37 frames per second. The accuracy of facial expression classification for the FER-2013 and RAF-DB datasets outperforms other proposed networks that are compatible with the Raspberry Pi 3B+. The implementation of EdgeCNN is available at https://github.com/yangshunzhi1994/EdgeCNN
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13522v1
PDF	https://arxiv.org/pdf/1909.13522v1.pdf
PWC	https://paperswithcode.com/paper/edgecnn-convolutional-neural-network
Repo	https://github.com/yangshunzhi1994/EdgeCNN
Framework	pytorch

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos


Title	KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos
Authors	Egor Lakomkin, Sven Magg, Cornelius Weber, Stefan Wermter
Abstract	In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos. We outline several filtering and post-processing steps, which extract samples that can be used for training end-to-end neural speech recognition systems. In our experiments, we demonstrate that a single-core version of the crawler can obtain around 150 hours of transcribed speech within a day, containing an estimated 3.5% word error rate in the transcriptions. Automatically collected samples contain reading and spontaneous speech recorded in various conditions including background noise and music, distant microphone recordings, and a variety of accents and reverberation. When training a deep neural network on speech recognition, we observed around 40% word error rate reduction on the Wall Street Journal dataset by integrating 200 hours of the collected samples into the training set. The demo (http://emnlp-demo.lakomkin.me/) and the crawler code (https://github.com/EgorLakomkin/KTSpeechCrawler) are publicly available.
Tasks	Speech Recognition
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00216v1
PDF	http://arxiv.org/pdf/1903.00216v1.pdf
PWC	https://paperswithcode.com/paper/kt-speech-crawler-automatic-dataset
Repo	https://github.com/EgorLakomkin/KTSpeechCrawler
Framework	none

Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling


Title	Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling
Authors	Jan Kudlicka, Lawrence M. Murray, Fredrik Ronquist, Thomas B. Schön
Abstract	We consider probabilistic programming for birth-death models of evolution and introduce a new widely-applicable inference method that combines an extension of the alive particle filter (APF) with automatic Rao-Blackwellization via delayed sampling. Birth-death models of evolution are an important family of phylogenetic models of the diversification processes that lead to evolutionary trees. Probabilistic programming languages (PPLs) give phylogeneticists a new and exciting tool: their models can be implemented as probabilistic programs with just a basic knowledge of programming. The general inference methods in PPLs reduce the need for external experts, allow quick prototyping and testing, and accelerate the development and deployment of new models. We show how these birth-death models can be implemented as simple programs in existing PPLs, and demonstrate the usefulness of the proposed inference method for such models. For the popular BiSSE model the method yields an increase of the effective sample size and the conditional acceptance rate by a factor of 30 in comparison with a standard bootstrap particle filter. Although concentrating on phylogenetics, the extended APF is a general inference method that shows its strength in situations where particles are often assigned zero weight. In the case when the weights are always positive, the extra cost of using the APF rather than the bootstrap particle filter is negligible, making our method a suitable drop-in replacement for the bootstrap particle filter in probabilistic programming inference.
Tasks	Probabilistic Programming
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04615v2
PDF	https://arxiv.org/pdf/1907.04615v2.pdf
PWC	https://paperswithcode.com/paper/probabilistic-programming-for-birth-death
Repo	https://github.com/kudlicka/paper-2019-probabilistic
Framework	none

Rotation Invariant Householder Parameterization for Bayesian PCA


Title	Rotation Invariant Householder Parameterization for Bayesian PCA
Authors	Rajbir S. Nirwan, Nils Bertschinger
Abstract	We consider probabilistic PCA and related factor models from a Bayesian perspective. These models are in general not identifiable as the likelihood has a rotational symmetry. This gives rise to complicated posterior distributions with continuous subspaces of equal density and thus hinders efficiency of inference as well as interpretation of obtained parameters. In particular, posterior averages over factor loadings become meaningless and only model predictions are unambiguous. Here, we propose a parameterization based on Householder transformations, which remove the rotational symmetry of the posterior. Furthermore, by relying on results from random matrix theory, we establish the parameter distribution which leaves the model unchanged compared to the original rotationally symmetric formulation. In particular, we avoid the need to compute the Jacobian determinant of the parameter transformation. This allows us to efficiently implement probabilistic PCA in a rotation invariant fashion in any state of the art toolbox. Here, we implemented our model in the probabilistic programming language Stan and illustrate it on several examples.
Tasks	Probabilistic Programming
Published	2019-05-12
URL	https://arxiv.org/abs/1905.04720v1
PDF	https://arxiv.org/pdf/1905.04720v1.pdf
PWC	https://paperswithcode.com/paper/rotation-invariant-householder
Repo	https://github.com/RSNirwan/HouseholderBPCA
Framework	none

Advancing NLP with Cognitive Language Processing Signals


Title	Advancing NLP with Cognitive Language Processing Signals
Authors	Nora Hollenstein, Maria Barrett, Marius Troendle, Francesco Bigiolli, Nicolas Langer, Ce Zhang
Abstract	When we read, our brain processes language and generates cognitive processing data such as gaze patterns and brain activity. These signals can be recorded while reading. Cognitive language processing data such as eye-tracking features have shown improvements on single NLP tasks. We analyze whether using such human features can show consistent improvement across tasks and data sources. We present an extensive investigation of the benefits and limitations of using cognitive processing data for NLP. Specifically, we use gaze and EEG features to augment models of named entity recognition, relation classification, and sentiment analysis. These methods significantly outperform the baselines and show the potential and current limitations of employing human language processing data for NLP.
Tasks	EEG, Eye Tracking, Named Entity Recognition, Relation Classification, Sentiment Analysis
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02682v1
PDF	http://arxiv.org/pdf/1904.02682v1.pdf
PWC	https://paperswithcode.com/paper/advancing-nlp-with-cognitive-language
Repo	https://github.com/DS3Lab/zuco-nlp
Framework	none

Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search


Title	Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search
Authors	Xiangxiang Chu, Bo Zhang, Hailong Ma, Ruijun Xu, Jixiang Li, Qingyuan Li
Abstract	Deep convolution neural networks demonstrate impressive results in the super-resolution domain. A series of studies concentrate on improving peak signal noise ratio (PSNR) by using much deeper layers, which are not friendly to constrained resources. Pursuing a trade-off between the restoration capacity and the simplicity of models is still non-trivial. Recent contributions are struggling to manually maximize this balance, while our work achieves the same goal automatically with neural architecture search. Specifically, we handle super-resolution with a multi-objective approach. We also propose an elastic search tactic at both micro and macro level, based on a hybrid controller that profits from evolutionary computation and reinforcement learning. Quantitative experiments help us to draw a conclusion that our generated models dominate most of the state-of-the-art methods with respect to the individual FLOPS.
Tasks	Neural Architecture Search, Super-Resolution
Published	2019-01-22
URL	http://arxiv.org/abs/1901.07261v2
PDF	http://arxiv.org/pdf/1901.07261v2.pdf
PWC	https://paperswithcode.com/paper/fast-accurate-and-lightweight-super
Repo	https://github.com/falsr/FALSR
Framework	tf

Learning End-To-End Scene Flow by Distilling Single Tasks Knowledge


Title	Learning End-To-End Scene Flow by Distilling Single Tasks Knowledge
Authors	Filippo Aleotti, Matteo Poggi, Fabio Tosi, Stefano Mattoccia
Abstract	Scene flow is a challenging task aimed at jointly estimating the 3D structure and motion of the sensed environment. Although deep learning solutions achieve outstanding performance in terms of accuracy, these approaches divide the whole problem into standalone tasks (stereo and optical flow) addressing them with independent networks. Such a strategy dramatically increases the complexity of the training procedure and requires power-hungry GPUs to infer scene flow barely at 1 FPS. Conversely, we propose DWARF, a novel and lightweight architecture able to infer full scene flow jointly reasoning about depth and optical flow easily and elegantly trainable end-to-end from scratch. Moreover, since ground truth images for full scene flow are scarce, we propose to leverage on the knowledge learned by networks specialized in stereo or flow, for which much more data are available, to distill proxy annotations. Exhaustive experiments show that i) DWARF runs at about 10 FPS on a single high-end GPU and about 1 FPS on NVIDIA Jetson TX2 embedded at KITTI resolution, with moderate drop in accuracy compared to 10x deeper models, ii) learning from many distilled samples is more effective than from the few, annotated ones available. Code available at: https://github.com/FilippoAleotti/Dwarf-Tensorflow
Tasks	Optical Flow Estimation
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10090v1
PDF	https://arxiv.org/pdf/1911.10090v1.pdf
PWC	https://paperswithcode.com/paper/learning-end-to-end-scene-flow-by-distilling
Repo	https://github.com/FilippoAleotti/Dwarf-Tensorflow
Framework	tf

Diffusion Improves Graph Learning


Title	Diffusion Improves Graph Learning
Authors	Johannes Klicpera, Stefan Weißenberger, Stephan Günnemann
Abstract	Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g. spectral clustering) without requiring any changes to the latter or affecting its computational complexity. Our implementation is available online.
Tasks	Node Classification
Published	2019-10-28
URL	https://arxiv.org/abs/1911.05485v5
PDF	https://arxiv.org/pdf/1911.05485v5.pdf
PWC	https://paperswithcode.com/paper/diffusion-improves-graph-learning-1
Repo	https://github.com/rusty1s/pytorch_geometric
Framework	pytorch

On Symmetric Losses for Learning from Corrupted Labels


Title	On Symmetric Losses for Learning from Corrupted Labels
Authors	Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama
Abstract	This paper aims to provide a better understanding of a symmetric loss. First, we emphasize that using a symmetric loss is advantageous in the balanced error rate (BER) minimization and area under the receiver operating characteristic curve (AUC) maximization from corrupted labels. Second, we prove general theoretical properties of symmetric losses, including a classification-calibration condition, excess risk bound, conditional risk minimizer, and AUC-consistency condition. Third, since all nonnegative symmetric losses are non-convex, we propose a convex barrier hinge loss that benefits significantly from the symmetric condition, although it is not symmetric everywhere. Finally, we conduct experiments to validate the relevance of the symmetric condition.
Tasks	Calibration
Published	2019-01-27
URL	https://arxiv.org/abs/1901.09314v2
PDF	https://arxiv.org/pdf/1901.09314v2.pdf
PWC	https://paperswithcode.com/paper/on-symmetric-losses-for-learning-from
Repo	https://github.com/nolfwin/symloss-ber-auc
Framework	pytorch

Left-to-Right Dependency Parsing with Pointer Networks


Title	Left-to-Right Dependency Parsing with Pointer Networks
Authors	Daniel Fernández-González, Carlos Gómez-Rodríguez
Abstract	We propose a novel transition-based algorithm that straightforwardly parses sentences from left to right by building $n$ attachments, with $n$ being the length of the input sentence. Similarly to the recent stack-pointer parser by Ma et al. (2018), we use the pointer network framework that, given a word, can directly point to a position from the sentence. However, our left-to-right approach is simpler than the original top-down stack-pointer parser (not requiring a stack) and reduces transition sequence length in half, from 2$n$-1 actions to $n$. This results in a quadratic non-projective parser that runs twice as fast as the original while achieving the best accuracy to date on the English PTB dataset (96.04% UAS, 94.43% LAS) among fully-supervised single-model dependency parsers, and improves over the former top-down transition system in the majority of languages tested.
Tasks	Dependency Parsing
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08445v1
PDF	http://arxiv.org/pdf/1903.08445v1.pdf
PWC	https://paperswithcode.com/paper/left-to-right-dependency-parsing-with-pointer
Repo	https://github.com/danifg/Left2Right-Pointer-Parser
Framework	pytorch

Deep Semi-Supervised Anomaly Detection


Title	Deep Semi-Supervised Anomaly Detection
Authors	Lukas Ruff, Robert A. Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, Marius Kloft
Abstract	Deep approaches to anomaly detection have recently shown promising results over shallow methods on large and complex datasets. Typically anomaly detection is treated as an unsupervised learning problem. In practice however, one may have—in addition to a large set of unlabeled samples—access to a small pool of labeled samples, e.g. a subset verified by some domain expert as being normal or anomalous. Semi-supervised approaches to anomaly detection aim to utilize such labeled samples, but most proposed methods are limited to merely including labeled normal samples. Only a few methods take advantage of labeled anomalies, with existing deep approaches being domain-specific. In this work we present Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection. We further introduce an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for our method. In extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10, along with other anomaly detection benchmark datasets, we demonstrate that our method is on par or outperforms shallow, hybrid, and deep competitors, yielding appreciable performance improvements even when provided with only little labeled data.
Tasks	Anomaly Detection
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02694v2
PDF	https://arxiv.org/pdf/1906.02694v2.pdf
PWC	https://paperswithcode.com/paper/deep-semi-supervised-anomaly-detection
Repo	https://github.com/pg2455/AudioAge
Framework	none