Paper Group AWR 310
Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer. Boosting Real-Time Driving Scene Parsing with Shared Semantics. Cooperative Semantic Segmentation and Image Restoration in Adverse Environmental Conditions. ViCo: Word Embeddings from Visual Co-occurrences. EdgeCNN: Convolutional Neural Network Classification Model …
Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer
Title | Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer |
Authors | Akhilesh Sudhakar, Bhargav Upadhyay, Arjun Maheswaran |
Abstract | Text style transfer is the task of transferring the style of text having certain stylistic attributes, while preserving non-stylistic or content information. In this work we introduce the Generative Style Transformer (GST) - a new approach to rewriting sentences to a target style in the absence of parallel style corpora. GST leverages the power of both, large unsupervised pre-trained language models as well as the Transformer. GST is a part of a larger `Delete Retrieve Generate’ framework, in which we also propose a novel method of deleting style attributes from the source sentence by exploiting the inner workings of the Transformer. Our models outperform state-of-art systems across 5 datasets on sentiment, gender and political slant transfer. We also propose the use of the GLEU metric as an automatic metric of evaluation of style transfer, which we found to compare better with human ratings than the predominantly used BLEU score. | |
Tasks | Style Transfer, Text Generation, Text Style Transfer |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09368v1 |
https://arxiv.org/pdf/1908.09368v1.pdf | |
PWC | https://paperswithcode.com/paper/transforming-delete-retrieve-generate |
Repo | https://github.com/agaralabs/transformer-drg-style-transfer |
Framework | pytorch |
Boosting Real-Time Driving Scene Parsing with Shared Semantics
Title | Boosting Real-Time Driving Scene Parsing with Shared Semantics |
Authors | Zhenzhen Xiang, Anbo Bao, Jie Li, Jianbo Su |
Abstract | Real-time scene parsing is a fundamental feature for autonomous driving vehicles with multiple cameras. In this letter we demonstrate that sharing semantics between cameras with different perspectives and overlapped views can boost the parsing performance when compared with traditional methods, which individually process the frames from each camera. Our framework is based on a deep neural network for semantic segmentation but with two kinds of additional modules for sharing and fusing semantics. On the one hand, a semantics sharing module is designed to establish the pixel-wise mapping between the input images. Features as well as semantics are shared by the map to reduce duplicated workload which leads to more efficient computation. On the other hand, feature fusion modules are designed to combine different modal of semantic features, which leverage the information from both inputs for better accuracy. To evaluate the effectiveness of the proposed framework, we have applied our network to a dual-camera vision system for driving scene parsing. Experimental results show that our network outperforms the baseline method on the parsing accuracy with comparable computations. |
Tasks | Autonomous Driving, Scene Parsing, Semantic Segmentation |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07038v3 |
https://arxiv.org/pdf/1909.07038v3.pdf | |
PWC | https://paperswithcode.com/paper/boosting-real-time-driving-scene-parsing-with |
Repo | https://github.com/zhenzhenxiang/SemanticsSharing |
Framework | pytorch |
Cooperative Semantic Segmentation and Image Restoration in Adverse Environmental Conditions
Title | Cooperative Semantic Segmentation and Image Restoration in Adverse Environmental Conditions |
Authors | Weihao Xia, Zhanglin Cheng, Yujiu Yang, Jing-Hao Xue |
Abstract | Most state-of-the-art semantic segmentation approaches only achieve high accuracy in good conditions. In practically-common but less-discussed adverse environmental conditions, their performance can decrease enormously. Existing studies usually cast the handling of segmentation in adverse conditions as a separate post-processing step after signal restoration, making the segmentation performance largely depend on the quality of restoration. In this paper, we propose a novel deep-learning framework to tackle semantic segmentation and image restoration in adverse environmental conditions in a holistic manner. The proposed approach contains two components: Semantically-Guided Adaptation, which exploits semantic information from degraded images to refine the segmentation; and Exemplar-Guided Synthesis, which restores images from semantic label maps given degraded exemplars as the guidance. Our method cooperatively leverages the complementarity and interdependence of low-level restoration and high-level segmentation in adverse environmental conditions. Extensive experiments on various datasets demonstrate that our approach can not only improve the accuracy of semantic segmentation with degradation cues, but also boost the perceptual quality and structural similarity of image restoration with semantic guidance. |
Tasks | Image Restoration, Scene Parsing, Semantic Segmentation |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.00679v3 |
https://arxiv.org/pdf/1911.00679v3.pdf | |
PWC | https://paperswithcode.com/paper/segment-for-restoration-restore-for |
Repo | https://github.com/xiaweihao/SR-Net |
Framework | pytorch |
ViCo: Word Embeddings from Visual Co-occurrences
Title | ViCo: Word Embeddings from Visual Co-occurrences |
Authors | Tanmay Gupta, Alexander Schwing, Derek Hoiem |
Abstract | We propose to learn word embeddings from visual co-occurrences. Two words co-occur visually if both words apply to the same image or image region. Specifically, we extract four types of visual co-occurrences between object and attribute words from large-scale, textually-annotated visual databases like VisualGenome and ImageNet. We then train a multi-task log-bilinear model that compactly encodes word “meanings” represented by each co-occurrence type into a single visual word-vector. Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone. We further evaluate our embeddings on five downstream applications, four of which are vision-language tasks. Augmenting GloVe with our embeddings yields gains on all tasks. We also find that random embeddings perform comparably to learned embeddings on all supervised vision-language tasks, contrary to conventional wisdom. |
Tasks | Word Embeddings |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08527v1 |
https://arxiv.org/pdf/1908.08527v1.pdf | |
PWC | https://paperswithcode.com/paper/vico-word-embeddings-from-visual-co |
Repo | https://github.com/BigRedT/vico |
Framework | none |
EdgeCNN: Convolutional Neural Network Classification Model with small inputs for Edge Computing
Title | EdgeCNN: Convolutional Neural Network Classification Model with small inputs for Edge Computing |
Authors | Shunzhi Yang, Zheng Gong, Kai Ye, Yungen Wei, Zheng Huang, Zhenhua Huang |
Abstract | With the development of Internet of Things (IoT), data is increasingly appearing on the edge of the network. Processing tasks on the edge of the network can effectively solve the problems of personal privacy leaks and server overload. As a result, it has attracted a great deal of attention and made substantial progress. This progress includes efficient convolutional neural network (CNN) models such as MobileNet and ShuffleNet. However, all of these networks appear as a common network model and they usually need to identify multiple targets when applied. So the size of the input is very large. In some specific cases, only the target needs to be classified. Therefore, a small input network can be designed to reduce computation. In addition, other efficient neural network models are primarily designed for mobile phones. Mobile phones have faster memory access, which allows them to use group convolution. In particular, this paper finds that the recently widely used group convolution is not suitable for devices with very slow memory access. Therefore, the EdgeCNN of this paper is designed for edge computing devices with low memory access speed and low computing resources. EdgeCNN has been run successfully on the Raspberry Pi 3B+ at a speed of 1.37 frames per second. The accuracy of facial expression classification for the FER-2013 and RAF-DB datasets outperforms other proposed networks that are compatible with the Raspberry Pi 3B+. The implementation of EdgeCNN is available at https://github.com/yangshunzhi1994/EdgeCNN |
Tasks | |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13522v1 |
https://arxiv.org/pdf/1909.13522v1.pdf | |
PWC | https://paperswithcode.com/paper/edgecnn-convolutional-neural-network |
Repo | https://github.com/yangshunzhi1994/EdgeCNN |
Framework | pytorch |
KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos
Title | KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos |
Authors | Egor Lakomkin, Sven Magg, Cornelius Weber, Stefan Wermter |
Abstract | In this paper, we describe KT-Speech-Crawler: an approach for automatic dataset construction for speech recognition by crawling YouTube videos. We outline several filtering and post-processing steps, which extract samples that can be used for training end-to-end neural speech recognition systems. In our experiments, we demonstrate that a single-core version of the crawler can obtain around 150 hours of transcribed speech within a day, containing an estimated 3.5% word error rate in the transcriptions. Automatically collected samples contain reading and spontaneous speech recorded in various conditions including background noise and music, distant microphone recordings, and a variety of accents and reverberation. When training a deep neural network on speech recognition, we observed around 40% word error rate reduction on the Wall Street Journal dataset by integrating 200 hours of the collected samples into the training set. The demo (http://emnlp-demo.lakomkin.me/) and the crawler code (https://github.com/EgorLakomkin/KTSpeechCrawler) are publicly available. |
Tasks | Speech Recognition |
Published | 2019-03-01 |
URL | http://arxiv.org/abs/1903.00216v1 |
http://arxiv.org/pdf/1903.00216v1.pdf | |
PWC | https://paperswithcode.com/paper/kt-speech-crawler-automatic-dataset |
Repo | https://github.com/EgorLakomkin/KTSpeechCrawler |
Framework | none |
Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling
Title | Probabilistic programming for birth-death models of evolution using an alive particle filter with delayed sampling |
Authors | Jan Kudlicka, Lawrence M. Murray, Fredrik Ronquist, Thomas B. Schön |
Abstract | We consider probabilistic programming for birth-death models of evolution and introduce a new widely-applicable inference method that combines an extension of the alive particle filter (APF) with automatic Rao-Blackwellization via delayed sampling. Birth-death models of evolution are an important family of phylogenetic models of the diversification processes that lead to evolutionary trees. Probabilistic programming languages (PPLs) give phylogeneticists a new and exciting tool: their models can be implemented as probabilistic programs with just a basic knowledge of programming. The general inference methods in PPLs reduce the need for external experts, allow quick prototyping and testing, and accelerate the development and deployment of new models. We show how these birth-death models can be implemented as simple programs in existing PPLs, and demonstrate the usefulness of the proposed inference method for such models. For the popular BiSSE model the method yields an increase of the effective sample size and the conditional acceptance rate by a factor of 30 in comparison with a standard bootstrap particle filter. Although concentrating on phylogenetics, the extended APF is a general inference method that shows its strength in situations where particles are often assigned zero weight. In the case when the weights are always positive, the extra cost of using the APF rather than the bootstrap particle filter is negligible, making our method a suitable drop-in replacement for the bootstrap particle filter in probabilistic programming inference. |
Tasks | Probabilistic Programming |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04615v2 |
https://arxiv.org/pdf/1907.04615v2.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-programming-for-birth-death |
Repo | https://github.com/kudlicka/paper-2019-probabilistic |
Framework | none |
Rotation Invariant Householder Parameterization for Bayesian PCA
Title | Rotation Invariant Householder Parameterization for Bayesian PCA |
Authors | Rajbir S. Nirwan, Nils Bertschinger |
Abstract | We consider probabilistic PCA and related factor models from a Bayesian perspective. These models are in general not identifiable as the likelihood has a rotational symmetry. This gives rise to complicated posterior distributions with continuous subspaces of equal density and thus hinders efficiency of inference as well as interpretation of obtained parameters. In particular, posterior averages over factor loadings become meaningless and only model predictions are unambiguous. Here, we propose a parameterization based on Householder transformations, which remove the rotational symmetry of the posterior. Furthermore, by relying on results from random matrix theory, we establish the parameter distribution which leaves the model unchanged compared to the original rotationally symmetric formulation. In particular, we avoid the need to compute the Jacobian determinant of the parameter transformation. This allows us to efficiently implement probabilistic PCA in a rotation invariant fashion in any state of the art toolbox. Here, we implemented our model in the probabilistic programming language Stan and illustrate it on several examples. |
Tasks | Probabilistic Programming |
Published | 2019-05-12 |
URL | https://arxiv.org/abs/1905.04720v1 |
https://arxiv.org/pdf/1905.04720v1.pdf | |
PWC | https://paperswithcode.com/paper/rotation-invariant-householder |
Repo | https://github.com/RSNirwan/HouseholderBPCA |
Framework | none |
Advancing NLP with Cognitive Language Processing Signals
Title | Advancing NLP with Cognitive Language Processing Signals |
Authors | Nora Hollenstein, Maria Barrett, Marius Troendle, Francesco Bigiolli, Nicolas Langer, Ce Zhang |
Abstract | When we read, our brain processes language and generates cognitive processing data such as gaze patterns and brain activity. These signals can be recorded while reading. Cognitive language processing data such as eye-tracking features have shown improvements on single NLP tasks. We analyze whether using such human features can show consistent improvement across tasks and data sources. We present an extensive investigation of the benefits and limitations of using cognitive processing data for NLP. Specifically, we use gaze and EEG features to augment models of named entity recognition, relation classification, and sentiment analysis. These methods significantly outperform the baselines and show the potential and current limitations of employing human language processing data for NLP. |
Tasks | EEG, Eye Tracking, Named Entity Recognition, Relation Classification, Sentiment Analysis |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02682v1 |
http://arxiv.org/pdf/1904.02682v1.pdf | |
PWC | https://paperswithcode.com/paper/advancing-nlp-with-cognitive-language |
Repo | https://github.com/DS3Lab/zuco-nlp |
Framework | none |
Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search
Title | Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search |
Authors | Xiangxiang Chu, Bo Zhang, Hailong Ma, Ruijun Xu, Jixiang Li, Qingyuan Li |
Abstract | Deep convolution neural networks demonstrate impressive results in the super-resolution domain. A series of studies concentrate on improving peak signal noise ratio (PSNR) by using much deeper layers, which are not friendly to constrained resources. Pursuing a trade-off between the restoration capacity and the simplicity of models is still non-trivial. Recent contributions are struggling to manually maximize this balance, while our work achieves the same goal automatically with neural architecture search. Specifically, we handle super-resolution with a multi-objective approach. We also propose an elastic search tactic at both micro and macro level, based on a hybrid controller that profits from evolutionary computation and reinforcement learning. Quantitative experiments help us to draw a conclusion that our generated models dominate most of the state-of-the-art methods with respect to the individual FLOPS. |
Tasks | Neural Architecture Search, Super-Resolution |
Published | 2019-01-22 |
URL | http://arxiv.org/abs/1901.07261v2 |
http://arxiv.org/pdf/1901.07261v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-accurate-and-lightweight-super |
Repo | https://github.com/falsr/FALSR |
Framework | tf |
Learning End-To-End Scene Flow by Distilling Single Tasks Knowledge
Title | Learning End-To-End Scene Flow by Distilling Single Tasks Knowledge |
Authors | Filippo Aleotti, Matteo Poggi, Fabio Tosi, Stefano Mattoccia |
Abstract | Scene flow is a challenging task aimed at jointly estimating the 3D structure and motion of the sensed environment. Although deep learning solutions achieve outstanding performance in terms of accuracy, these approaches divide the whole problem into standalone tasks (stereo and optical flow) addressing them with independent networks. Such a strategy dramatically increases the complexity of the training procedure and requires power-hungry GPUs to infer scene flow barely at 1 FPS. Conversely, we propose DWARF, a novel and lightweight architecture able to infer full scene flow jointly reasoning about depth and optical flow easily and elegantly trainable end-to-end from scratch. Moreover, since ground truth images for full scene flow are scarce, we propose to leverage on the knowledge learned by networks specialized in stereo or flow, for which much more data are available, to distill proxy annotations. Exhaustive experiments show that i) DWARF runs at about 10 FPS on a single high-end GPU and about 1 FPS on NVIDIA Jetson TX2 embedded at KITTI resolution, with moderate drop in accuracy compared to 10x deeper models, ii) learning from many distilled samples is more effective than from the few, annotated ones available. Code available at: https://github.com/FilippoAleotti/Dwarf-Tensorflow |
Tasks | Optical Flow Estimation |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10090v1 |
https://arxiv.org/pdf/1911.10090v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-end-to-end-scene-flow-by-distilling |
Repo | https://github.com/FilippoAleotti/Dwarf-Tensorflow |
Framework | tf |
Diffusion Improves Graph Learning
Title | Diffusion Improves Graph Learning |
Authors | Johannes Klicpera, Stefan Weißenberger, Stephan Günnemann |
Abstract | Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g. spectral clustering) without requiring any changes to the latter or affecting its computational complexity. Our implementation is available online. |
Tasks | Node Classification |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1911.05485v5 |
https://arxiv.org/pdf/1911.05485v5.pdf | |
PWC | https://paperswithcode.com/paper/diffusion-improves-graph-learning-1 |
Repo | https://github.com/rusty1s/pytorch_geometric |
Framework | pytorch |
On Symmetric Losses for Learning from Corrupted Labels
Title | On Symmetric Losses for Learning from Corrupted Labels |
Authors | Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama |
Abstract | This paper aims to provide a better understanding of a symmetric loss. First, we emphasize that using a symmetric loss is advantageous in the balanced error rate (BER) minimization and area under the receiver operating characteristic curve (AUC) maximization from corrupted labels. Second, we prove general theoretical properties of symmetric losses, including a classification-calibration condition, excess risk bound, conditional risk minimizer, and AUC-consistency condition. Third, since all nonnegative symmetric losses are non-convex, we propose a convex barrier hinge loss that benefits significantly from the symmetric condition, although it is not symmetric everywhere. Finally, we conduct experiments to validate the relevance of the symmetric condition. |
Tasks | Calibration |
Published | 2019-01-27 |
URL | https://arxiv.org/abs/1901.09314v2 |
https://arxiv.org/pdf/1901.09314v2.pdf | |
PWC | https://paperswithcode.com/paper/on-symmetric-losses-for-learning-from |
Repo | https://github.com/nolfwin/symloss-ber-auc |
Framework | pytorch |
Left-to-Right Dependency Parsing with Pointer Networks
Title | Left-to-Right Dependency Parsing with Pointer Networks |
Authors | Daniel Fernández-González, Carlos Gómez-Rodríguez |
Abstract | We propose a novel transition-based algorithm that straightforwardly parses sentences from left to right by building $n$ attachments, with $n$ being the length of the input sentence. Similarly to the recent stack-pointer parser by Ma et al. (2018), we use the pointer network framework that, given a word, can directly point to a position from the sentence. However, our left-to-right approach is simpler than the original top-down stack-pointer parser (not requiring a stack) and reduces transition sequence length in half, from 2$n$-1 actions to $n$. This results in a quadratic non-projective parser that runs twice as fast as the original while achieving the best accuracy to date on the English PTB dataset (96.04% UAS, 94.43% LAS) among fully-supervised single-model dependency parsers, and improves over the former top-down transition system in the majority of languages tested. |
Tasks | Dependency Parsing |
Published | 2019-03-20 |
URL | http://arxiv.org/abs/1903.08445v1 |
http://arxiv.org/pdf/1903.08445v1.pdf | |
PWC | https://paperswithcode.com/paper/left-to-right-dependency-parsing-with-pointer |
Repo | https://github.com/danifg/Left2Right-Pointer-Parser |
Framework | pytorch |
Deep Semi-Supervised Anomaly Detection
Title | Deep Semi-Supervised Anomaly Detection |
Authors | Lukas Ruff, Robert A. Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, Marius Kloft |
Abstract | Deep approaches to anomaly detection have recently shown promising results over shallow methods on large and complex datasets. Typically anomaly detection is treated as an unsupervised learning problem. In practice however, one may have—in addition to a large set of unlabeled samples—access to a small pool of labeled samples, e.g. a subset verified by some domain expert as being normal or anomalous. Semi-supervised approaches to anomaly detection aim to utilize such labeled samples, but most proposed methods are limited to merely including labeled normal samples. Only a few methods take advantage of labeled anomalies, with existing deep approaches being domain-specific. In this work we present Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection. We further introduce an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for our method. In extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10, along with other anomaly detection benchmark datasets, we demonstrate that our method is on par or outperforms shallow, hybrid, and deep competitors, yielding appreciable performance improvements even when provided with only little labeled data. |
Tasks | Anomaly Detection |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02694v2 |
https://arxiv.org/pdf/1906.02694v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-semi-supervised-anomaly-detection |
Repo | https://github.com/pg2455/AudioAge |
Framework | none |