January 31, 2020

3300 words 16 mins read

Paper Group AWR 445

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions. Recursive Estimation for Sparse Gaussian Process Regression. Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings. A Benchmark for Anomaly Segmentation. Style Transfo …

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions


Title	MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions
Authors	Nuo Xu, Pinghui Wang, Long Chen, Jing Tao, Junzhou Zhao
Abstract	Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i.e., a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present MR-GNN, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (L-STMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09558v1
PDF	https://arxiv.org/pdf/1905.09558v1.pdf
PWC	https://paperswithcode.com/paper/mr-gnn-multi-resolution-and-dual-graph-neural
Repo	https://github.com/prometheusXN/MR-GNN
Framework	tf

Recursive Estimation for Sparse Gaussian Process Regression


Title	Recursive Estimation for Sparse Gaussian Process Regression
Authors	Manuel Schürch, Dario Azzimonti, Alessio Benavoli, Marco Zaffalon
Abstract	Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their plain usage is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. The majority of previous work has focused on the batch setting, whereas in this work we focusing on the training with mini-batches. In particular, we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman Filter and Information Filter like updating for online learning. Moreover, exploiting ideas from distributed estimation, we show how our approach can be distributed. For unknown parameters, we propose a novel approach that relies on recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state of the art methods, we have analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.
Tasks	Gaussian Processes
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11711v1
PDF	https://arxiv.org/pdf/1905.11711v1.pdf
PWC	https://paperswithcode.com/paper/recursive-estimation-for-sparse-gaussian
Repo	https://github.com/manuelIDSIA/SRGP
Framework	none

Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings


Title	Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings
Authors	Hang Jiang, Xianzhe Zhang, Jinho D. Choi
Abstract	Previous works related to automatic personality recognition focus on using traditional classification models with linguistic features. However, attentive neural networks with contextual embeddings, which have achieved huge success in text classification, are rarely explored for this task. In this project, we have two major contributions. First, we create the first dialogue-based personality dataset, FriendsPersona, by annotating 5 personality traits of speakers from Friends TV Show through crowdsourcing. Second, we present a novel approach to automatic personality recognition using pre-trained contextual embeddings (BERT and RoBERTa) and attentive neural networks. Our models largely improve the state-of-art results on the monologue Essays dataset by 2.49%, and establish a solid benchmark on our FriendsPersona. By comparing results in two datasets, we demonstrate the challenges of modeling personality in multi-party dialogue.
Tasks	Text Classification
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09304v1
PDF	https://arxiv.org/pdf/1911.09304v1.pdf
PWC	https://paperswithcode.com/paper/automatic-text-based-personality-recognition
Repo	https://github.com/emorynlp/personality-detection
Framework	none

A Benchmark for Anomaly Segmentation


Title	A Benchmark for Anomaly Segmentation
Authors	Dan Hendrycks, Steven Basart, Mantas Mazeika, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song
Abstract	Detecting out-of-distribution examples is important for safety-critical machine learning applications such as self-driving vehicles. However, existing research mainly focuses on small-scale images where the whole image is considered anomalous. We propose to segment only the anomalous regions within an image, and hence we introduce the Combined Anomalous Object Segmentation benchmark for the more realistic task of large-scale anomaly segmentation. Our benchmark combines two novel datasets for anomaly segmentation that incorporate both realism and anomaly diversity. Using both real images and those from a simulated driving environment, we ensure the background context and a wide variety of anomalous objects are naturally integrated, unlike before. Additionally, we improve out-of-distribution detectors on large-scale multi-class datasets and introduce detectors for the previously unexplored setting of multi-label out-of-distribution detection. These novel baselines along with our anomaly segmentation benchmark open the door to further research in large-scale out-of-distribution detection and segmentation.
Tasks	Out-of-Distribution Detection, Semantic Segmentation
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11132v1
PDF	https://arxiv.org/pdf/1911.11132v1.pdf
PWC	https://paperswithcode.com/paper/a-benchmark-for-anomaly-segmentation
Repo	https://github.com/xksteven/multilabel-ood
Framework	pytorch

Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation


Title	Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation
Authors	Ning Dai, Jianze Liang, Xipeng Qiu, Xuanjing Huang
Abstract	Disentangling the content and style in the latent space is prevalent in unpaired text style transfer. However, two major issues exist in most of the current neural models. 1) It is difficult to completely strip the style information from the semantics for a sentence. 2) The recurrent neural network (RNN) based encoder and decoder, mediated by the latent representation, cannot well deal with the issue of the long-term dependency, resulting in poor preservation of non-stylistic semantic content. In this paper, we propose the Style Transformer, which makes no assumption about the latent representation of source sentence and equips the power of attention mechanism in Transformer to achieve better style transfer and better content preservation.
Tasks	Style Transfer, Text Style Transfer
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05621v3
PDF	https://arxiv.org/pdf/1905.05621v3.pdf
PWC	https://paperswithcode.com/paper/190505621
Repo	https://github.com/fastnlp/style-transformer
Framework	pytorch

Optimal Projection Guided Transfer Hashing for Image Retrieval


Title	Optimal Projection Guided Transfer Hashing for Image Retrieval
Authors	Ji Liu, Lei Zhang
Abstract	Recently, learning to hash has been widely studied for image retrieval thanks to the computation and storage efficiency of binary codes. For most existing learning to hash methods, sufficient training images are required and used to learn precise hashing codes. However, in some real-world applications, there are not always sufficient training images in the domain of interest. In addition, some existing supervised approaches need a amount of labeled data, which is an expensive process in term of time, label and human expertise. To handle such problems, inspired by transfer learning, we propose a simple yet effective unsupervised hashing method named Optimal Projection Guided Transfer Hashing (GTH) where we borrow the images of other different but related domain i.e., source domain to help learn precise hashing codes for the domain of interest i.e., target domain. Besides, we propose to seek for the maximum likelihood estimation (MLE) solution of the hashing functions of target and source domains due to the domain gap. Furthermore,an alternating optimization method is adopted to obtain the two projections of target and source domains such that the domain hashing disparity is reduced gradually. Extensive experiments on various benchmark databases verify that our method outperforms many state-of-the-art learning to hash methods. The implementation details are available at https://github.com/liuji93/GTH.
Tasks	Image Retrieval, Transfer Learning
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00252v1
PDF	http://arxiv.org/pdf/1903.00252v1.pdf
PWC	https://paperswithcode.com/paper/optimal-projection-guided-transfer-hashing
Repo	https://github.com/liuji93/GTH
Framework	none

Stabilizing GANs with Octave Convolutions


Title	Stabilizing GANs with Octave Convolutions
Authors	Ricard Durall, Franz-Josef Pfreundt, Janis Keuper
Abstract	In this preliminary report, we present a simple but very effective technique to stabilize the training of CNN based GANs. Motivated by recently published methods using frequency decomposition of convolutions (e.g. Octave Convolutions), we propose a novel convolution scheme to stabilize the training and reduce the likelihood of a mode collapse. The basic idea of our approach is to split convolutional filters into additive high and low frequency parts, while shifting weight updates from low to high during the training. Intuitively, this method forces GANs to learn low frequency coarse image structures before descending into fine (high frequency) details. Our approach is orthogonal and complementary to existing stabilization methods and can simply plugged into any CNN based GAN architecture. First experiments on the CelebA dataset show the effectiveness of the proposed method.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12534v1
PDF	https://arxiv.org/pdf/1905.12534v1.pdf
PWC	https://paperswithcode.com/paper/stabilizing-gans-with-octave-convolutions
Repo	https://github.com/cc-hpc-itwm/Stabilizing-GANs-with-Octave-Convolutions
Framework	pytorch

Not All Claims are Created Equal: Choosing the Right Approach to Assess Your Hypotheses


Title	Not All Claims are Created Equal: Choosing the Right Approach to Assess Your Hypotheses
Authors	Erfan Sadeqi Azer, Daniel Khashabi, Ashish Sabharwal, Dan Roth
Abstract	Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues. While alternative proposals have been well-debated and adopted in other fields, they remain rarely discussed or used within the NLP community. We address this gap by contrasting various hypothesis assessment techniques, especially those not commonly used in the field (such as evaluations based on Bayesian inference). Since these statistical techniques differ in the hypotheses they can support, we argue that practitioners should first decide their target hypothesis before choosing an assessment method. This is crucial because common fallacies, misconceptions, and misinterpretation surrounding hypothesis assessment methods often stem from a discrepancy between what one would like to claim versus what the method used actually assesses. Our survey reveals that these issues are omnipresent in the NLP research community. As a step forward, we provide best practices and guidelines tailored to NLP research, as well as an easy-to-use package called ‘HyBayes’ for Bayesian assessment of hypotheses, complementing existing tools.
Tasks	Bayesian Inference
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03850v1
PDF	https://arxiv.org/pdf/1911.03850v1.pdf
PWC	https://paperswithcode.com/paper/not-all-claims-are-created-equal-choosing-the
Repo	https://github.com/allenai/HyBayes
Framework	none

Exploring helical dynamos with machine learning


Title	Exploring helical dynamos with machine learning
Authors	Farrukh Nauman, Joonas Nättilä
Abstract	We use ensemble machine learning algorithms to study the evolution of magnetic fields in magnetohydrodynamic (MHD) turbulence that is helically forced. We perform direct numerical simulations of helically forced turbulence using mean field formalism, with electromotive force (EMF) modeled both as a linear and non-linear function of the mean magnetic field and current density. The form of the EMF is determined using regularized linear regression and random forests. We also compare various analytical models to the data using Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling. Our results demonstrate that linear regression is largely successful at predicting the EMF and the use of more sophisticated algorithms (random forests, MCMC) do not lead to significant improvement in the fits. We conclude that the data we are looking at is effectively low dimensional and essentially linear. Finally, to encourage further exploration by the community, we provide all of our simulation data and analysis scripts as open source IPython notebooks.
Tasks	Bayesian Inference
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08193v3
PDF	https://arxiv.org/pdf/1905.08193v3.pdf
PWC	https://paperswithcode.com/paper/exploring-helical-dynamos-with-machine
Repo	https://github.com/fnauman/ML_alpha2
Framework	none

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation


Title	Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
Authors	Gedas Bertasius, Lorenzo Torresani
Abstract	We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence. Our method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates frame-level object instance masks from each video frame to all the other frames in a video clip. This allows our system to predict clip-level instance tracks with respect to the object instances segmented in the middle frame of the clip. Clip-level instance tracks generated densely for each frame in the sequence are finally aggregated to produce video-level object instance segmentation and classification. Our experiments demonstrate that our clip-level instance segmentation makes our approach robust to motion blur and object occlusions in video. MaskProp achieves the best reported accuracy on the YouTube-VIS dataset, outperforming the ICCV 2019 video instance segmentation challenge winner despite being much simpler and using orders of magnitude less labeled data (1.3M vs 1B images and 860K vs 14M bounding boxes)
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04573v1
PDF	https://arxiv.org/pdf/1912.04573v1.pdf
PWC	https://paperswithcode.com/paper/classifying-segmenting-and-tracking-object
Repo	https://github.com/jiawen9611/Awesome-Video-Instance-Segmentation
Framework	pytorch

Generalized Variational Inference: Three arguments for deriving new Posteriors


Title	Generalized Variational Inference: Three arguments for deriving new Posteriors
Authors	Jeremias Knoblauch, Jack Jewson, Theodoros Damoulas
Abstract	We advocate an optimization-centric view on and introduce a novel generalization of Bayesian inference. Our inspiration is the representation of Bayes’ rule as infinite-dimensional optimization problem (Csiszar, 1975; Donsker and Varadhan; 1975, Zellner; 1988). First, we use it to prove an optimality result of standard Variational Inference (VI): Under the proposed view, the standard Evidence Lower Bound (ELBO) maximizing VI posterior is preferable to alternative approximations of the Bayesian posterior. Next, we argue for generalizing standard Bayesian inference. The need for this arises in situations of severe misalignment between reality and three assumptions underlying standard Bayesian inference: (1) Well-specified priors, (2) well-specified likelihoods, (3) the availability of infinite computing power. Our generalization addresses these shortcomings with three arguments and is called the Rule of Three (RoT). We derive it axiomatically and recover existing posteriors as special cases, including the Bayesian posterior and its approximation by standard VI. In contrast, approximations based on alternative ELBO-like objectives violate the axioms. Finally, we study a special case of the RoT that we call Generalized Variational Inference (GVI). GVI posteriors are a large and tractable family of belief distributions specified by three arguments: A loss, a divergence and a variational family. GVI posteriors have appealing properties, including consistency and an interpretation as approximate ELBO. The last part of the paper explores some attractive applications of GVI in popular machine learning models, including robustness and more appropriate marginals. After deriving black box inference schemes for GVI posteriors, their predictive performance is investigated on Bayesian Neural Networks and Deep Gaussian Processes, where GVI can comprehensively improve upon existing methods.
Tasks	Bayesian Inference, Gaussian Processes
Published	2019-04-03
URL	https://arxiv.org/abs/1904.02063v4
PDF	https://arxiv.org/pdf/1904.02063v4.pdf
PWC	https://paperswithcode.com/paper/generalized-variational-inference
Repo	https://github.com/JeremiasKnoblauch/GVIPublic
Framework	tf

Crosslingual Document Embedding as Reduced-Rank Ridge Regression


Title	Crosslingual Document Embedding as Reduced-Rank Ridge Regression
Authors	Martin Josifoski, Ivan S. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West
Abstract	There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.
Tasks	Document Embedding
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03922v1
PDF	http://arxiv.org/pdf/1904.03922v1.pdf
PWC	https://paperswithcode.com/paper/crosslingual-document-embedding-as-reduced
Repo	https://github.com/epfl-dlab/Cr5
Framework	none

3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization


Title	3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization
Authors	Sanath Narayan, Hisham Cholakkal, Fahad Shahbaz Khan, Ling Shao
Abstract	Temporal action localization is a challenging computer vision problem with numerous real-world applications. Most existing methods require laborious frame-level supervision to train action localization models. In this work, we propose a framework, called 3C-Net, which only requires video-level supervision (weak supervision) in the form of action category labels and the corresponding count. We introduce a novel formulation to learn discriminative action features with enhanced localization capabilities. Our joint formulation has three terms: a classification term to ensure the separability of learned action features, an adapted multi-label center loss term to enhance the action feature discriminability and a counting loss term to delineate adjacent action sequences, leading to improved localization. Comprehensive experiments are performed on two challenging benchmarks: THUMOS14 and ActivityNet 1.2. Our approach sets a new state-of-the-art for weakly-supervised temporal action localization on both datasets. On the THUMOS14 dataset, the proposed method achieves an absolute gain of 4.6% in terms of mean average precision (mAP), compared to the state-of-the-art. Source code is available at https://github.com/naraysa/3c-net.
Tasks	Action Localization, Temporal Action Localization, Weakly Supervised Action Localization, Weakly-supervised Temporal Action Localization
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08216v2
PDF	https://arxiv.org/pdf/1908.08216v2.pdf
PWC	https://paperswithcode.com/paper/3c-net-category-count-and-center-loss-for
Repo	https://github.com/naraysa/3c-net
Framework	pytorch

WriterForcing: Generating more interesting story endings


Title	WriterForcing: Generating more interesting story endings
Authors	Prakhar Gupta, Vinayshekhar Bannihatti Kumar, Mukul Bhutani, Alan W Black
Abstract	We study the problem of generating interesting endings for stories. Neural generative models have shown promising results for various text generation problems. Sequence to Sequence (Seq2Seq) models are typically trained to generate a single output sequence for a given input sequence. However, in the context of a story, multiple endings are possible. Seq2Seq models tend to ignore the context and generate generic and dull responses. Very few works have studied generating diverse and interesting story endings for a given story context. In this paper, we propose models which generate more diverse and interesting outputs by 1) training models to focus attention on important keyphrases of the story, and 2) promoting generation of non-generic words. We show that the combination of the two leads to more diverse and interesting endings.
Tasks	Text Generation
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08259v1
PDF	https://arxiv.org/pdf/1907.08259v1.pdf
PWC	https://paperswithcode.com/paper/writerforcing-generating-more-interesting
Repo	https://github.com/witerforcing/WriterForcing
Framework	pytorch


Title	Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation
Authors	Bin Duan, Wei Wang, Hao Tang, Hugo Latapie, Yan Yan
Abstract	Since we were babies, we intuitively develop the ability to correlate the input from different cognitive sensors such as vision, audio, and text. However, in machine learning, this cross-modal learning is a nontrivial task because different modalities have no homogeneous properties. Previous works discover that there should be bridges among different modalities. From neurology and psychology perspective, humans have the capacity to link one modality with another one, e.g., associating a picture of a bird with the only hearing of its singing and vice versa. Is it possible for machine learning algorithms to recover the scene given the audio signal? In this paper, we propose a novel Cascade Attention-Guided Residue GAN (CAR-GAN), aiming at reconstructing the scenes given the corresponding audio signals. Particularly, we present a residue module to mitigate the gap between different modalities progressively. Moreover, a cascade attention guided network with a novel classification loss function is designed to tackle the cross-modal learning task. Our model keeps the consistency in high-level semantic label domain and is able to balance two different modalities. The experimental results demonstrate that our model achieves the state-of-the-art cross-modal audio-visual generation on the challenging Sub-URMP dataset. Code will be available at https://github.com/tuffr5/CAR-GAN.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01826v2
PDF	https://arxiv.org/pdf/1907.01826v2.pdf
PWC	https://paperswithcode.com/paper/cascade-attention-guided-residue-learning-gan
Repo	https://github.com/tuffr5/CAR-GAN
Framework	pytorch