February 1, 2020

3298 words 16 mins read

Paper Group AWR 110

Neural Predictor for Neural Architecture Search. A New Burrows Wheeler Transform Markov Distance. Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. An Empirical Study of Spatial Attention Mechanisms in Deep Networks. Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements. Sparseo …

Neural Predictor for Neural Architecture Search


Title	Neural Predictor for Neural Architecture Search
Authors	Wei Wen, Hanxiao Liu, Hai Li, Yiran Chen, Gabriel Bender, Pieter-Jan Kindermans
Abstract	Neural Architecture Search methods are effective but often use complex algorithms to come up with the best architecture. We propose an approach with three basic steps that is conceptually much simpler. First we train N random architectures to generate N (architecture, validation accuracy) pairs and use them to train a regression model that predicts accuracy based on the architecture. Next, we use this regression model to predict the validation accuracies of a large number of random architectures. Finally, we train the top-K predicted architectures and deploy the model with the best validation result. While this approach seems simple, it is more than 20 times as sample efficient as Regularized Evolution on the NASBench-101 benchmark and can compete on ImageNet with more complex approaches based on weight sharing, such as ProxylessNAS.
Tasks	Neural Architecture Search
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00848v1
PDF	https://arxiv.org/pdf/1912.00848v1.pdf
PWC	https://paperswithcode.com/paper/neural-predictor-for-neural-architecture
Repo	https://github.com/ultmaster/neuralpredictor.pytorch
Framework	pytorch

A New Burrows Wheeler Transform Markov Distance


Title	A New Burrows Wheeler Transform Markov Distance
Authors	Edward Raff, Charles Nicholas, Mark McLean
Abstract	Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.
Tasks	Malware Classification
Published	2019-12-30
URL	https://arxiv.org/abs/1912.13046v1
PDF	https://arxiv.org/pdf/1912.13046v1.pdf
PWC	https://paperswithcode.com/paper/a-new-burrows-wheeler-transform-markov
Repo	https://github.com/EdwardRaff/pyBWMD
Framework	none

Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback


Title	Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
Authors	Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, Rogerio Feris
Abstract	We contribute a new dataset and a novel method for natural language based fashion image retrieval. Unlike previous fashion datasets, we provide natural language annotations to facilitate the training of interactive image retrieval systems, as well as the commonly used attribute based labels. We propose a novel approach and empirically demonstrate that combining natural language feedback with visual attribute information results in superior user feedback modeling and retrieval performance relative to using either of these modalities. We believe that our dataset can encourage further work on developing more natural and real-world applicable conversational shopping assistants.
Tasks	Image Retrieval
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12794v2
PDF	https://arxiv.org/pdf/1905.12794v2.pdf
PWC	https://paperswithcode.com/paper/the-fashion-iq-dataset-retrieving-images-by
Repo	https://github.com/XiaoxiaoGuo/fashion-iq
Framework	pytorch

An Empirical Study of Spatial Attention Mechanisms in Deep Networks


Title	An Empirical Study of Spatial Attention Mechanisms in Deep Networks
Authors	Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai
Abstract	Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the query and key content comparison in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. A proper combination of deformable convolution with key content only saliency achieves the best accuracy-efficiency tradeoff in self-attention. Our results suggest that there exists much room for improvement in the design of attention mechanisms.
Tasks
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05873v1
PDF	http://arxiv.org/pdf/1904.05873v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-of-spatial-attention
Repo	https://github.com/open-mmlab/mmdetection
Framework	pytorch

Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements


Title	Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements
Authors	Kaixuan Wei, Jiaolong Yang, Ying Fu, David Wipf, Hua Huang
Abstract	Removing undesirable reflections from a single image captured through a glass window is of practical importance to visual computing systems. Although state-of-the-art methods can obtain decent results in certain situations, performance declines significantly when tackling more general real-world cases. These failures stem from the intrinsic difficulty of single image reflection removal – the fundamental ill-posedness of the problem, and the insufficiency of densely-labeled training data needed for resolving this ambiguity within learning-based neural network pipelines. In this paper, we address these issues by exploiting targeted network enhancements and the novel use of misaligned data. For the former, we augment a baseline network architecture by embedding context encoding modules that are capable of leveraging high-level contextual clues to reduce indeterminacy within areas containing strong reflections. For the latter, we introduce an alignment-invariant loss function that facilitates exploiting misaligned real-world training data that is much easier to collect. Experimental results collectively show that our method outperforms the state-of-the-art with aligned data, and that significant improvements are possible when using additional misaligned data.
Tasks
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00637v1
PDF	http://arxiv.org/pdf/1904.00637v1.pdf
PWC	https://paperswithcode.com/paper/single-image-reflection-removal-exploiting
Repo	https://github.com/Vandermode/ERRNet
Framework	pytorch

Sparseout: Controlling Sparsity in Deep Networks


Title	Sparseout: Controlling Sparsity in Deep Networks
Authors	Najeeb Khan, Ian Stavness
Abstract	Dropout is commonly used to help reduce overfitting in deep neural networks. Sparsity is a potentially important property of neural networks, but is not explicitly controlled by Dropout-based regularization. In this work, we propose Sparseout a simple and efficient variant of Dropout that can be used to control the sparsity of the activations in a neural network. We theoretically prove that Sparseout is equivalent to an $L_q$ penalty on the features of a generalized linear model and that Dropout is a special case of Sparseout for neural networks. We empirically demonstrate that Sparseout is computationally inexpensive and is able to control the desired level of sparsity in the activations. We evaluated Sparseout on image classification and language modelling tasks to see the effect of sparsity on these tasks. We found that sparsity of the activations is favorable for language modelling performance while image classification benefits from denser activations. Sparseout provides a way to investigate sparsity in state-of-the-art deep learning models. Source code for Sparseout could be found at \url{https://github.com/najeebkhan/sparseout}.
Tasks	Image Classification, Language Modelling
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08050v1
PDF	http://arxiv.org/pdf/1904.08050v1.pdf
PWC	https://paperswithcode.com/paper/sparseout-controlling-sparsity-in-deep
Repo	https://github.com/najeebkhan/sparseout
Framework	pytorch

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks


Title	Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks
Authors	Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans
Abstract	We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the $i$-vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neural Networks (TNNs), to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset. This training set contains $i$-vectors from 3,631 speakers, with only 3 samples for each speaker, thus making speaker recognition a challenging task. When using the train and development set for training both the TNN and baseline model (i.e., similarity evaluation directly on the $i$-vector representation), our proposed model outperforms the baseline by 23%. When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model. These results show that the representational power of TNNs is especially evident when training on small datasets with few instances available per class.
Tasks	Speaker Identification, Speaker Recognition
Published	2019-10-01
URL	https://arxiv.org/abs/1910.01463v2
PDF	https://arxiv.org/pdf/1910.01463v2.pdf
PWC	https://paperswithcode.com/paper/latent-space-representation-for-multi-target
Repo	https://github.com/KinWaiCheuk/MCE2018
Framework	tf

Scalable Neural Dialogue State Tracking


Title	Scalable Neural Dialogue State Tracking
Authors	Vevake Balaraman, Bernardo Magnini
Abstract	A Dialogue State Tracker (DST) is a key component in a dialogue system aiming at estimating the beliefs of possible user goals at each dialogue turn. Most of the current DST trackers make use of recurrent neural networks and are based on complex architectures that manage several aspects of a dialogue, including the user utterance, the system actions, and the slot-value pairs defined in a domain ontology. However, the complexity of such neural architectures incurs into a considerable latency in the dialogue state prediction, which limits the deployments of the models in real-world applications, particularly when task scalability (i.e. amount of slots) is a crucial factor. In this paper, we propose an innovative neural model for dialogue state tracking, named Global encoder and Slot-Attentive decoders (G-SAT), which can predict the dialogue state with a very low latency time, while maintaining high-level performance. We report experiments on three different languages (English, Italian, and German) of the WoZ2.0 dataset, and show that the proposed approach provides competitive advantages over state-of-art DST systems, both in terms of accuracy and in terms of time complexity for predictions, being over 15 times faster than the other systems.
Tasks	Dialogue State Tracking
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09942v1
PDF	https://arxiv.org/pdf/1910.09942v1.pdf
PWC	https://paperswithcode.com/paper/scalable-neural-dialogue-state-tracking
Repo	https://github.com/vevake/GSAT
Framework	pytorch

Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks


Title	Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks
Authors	Srijan Kumar, Xikun Zhang, Jure Leskovec
Abstract	Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space. Here we propose JODIE, a coupled recurrent neural network model that learns the embedding trajectories of users and items. JODIE employs two recurrent neural networks to update the embedding of a user and an item at every interaction. Crucially, JODIE also models the future embedding trajectory of a user/item. To this end, it introduces a novel projection operator that learns to estimate the embedding of the user at any time in the future. These estimated embeddings are then used to predict future user-item interactions. To make the method scalable, we develop a t-Batch algorithm that creates time-consistent batches and leads to 9x faster training. We conduct six experiments to validate JODIE on two prediction tasks—future interaction prediction and state change prediction—using four real-world datasets. We show that JODIE outperforms six state-of-the-art algorithms in these tasks by at least 20% in predicting future interactions and 12% in state change prediction.
Tasks	Representation Learning
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01207v1
PDF	https://arxiv.org/pdf/1908.01207v1.pdf
PWC	https://paperswithcode.com/paper/predicting-dynamic-embedding-trajectory-in
Repo	https://github.com/srijankr/jodie
Framework	pytorch

Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering


Title	Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering
Authors	Congyuan Yang, Carey E. Priebe, Youngser Park, David J. Marchette
Abstract	Our problem of interest is to cluster vertices of a graph by identifying its underlying community structure. Among various vertex clustering approaches, spectral clustering is one of the most popular methods, because it is easy to implement while often outperforming traditional clustering algorithms. However, there are two inherent model selection problems in spectral clustering, namely estimating the embedding dimension and number of clusters. This paper attempts to address the issue by establishing a novel model selection framework specifically for vertex clustering on graphs under a stochastic block model. The first contribution is a probabilistic model which approximates the distribution of the extended spectral embedding of a graph. The model is constructed based on a theoretical result of asymptotic normality of the informative part of the embedding, and on a simulation result of limiting behavior of the redundant part of the embedding. The second contribution is a simultaneous model selection framework. In contrast with the traditional approaches, our model selection procedure estimates embedding dimension and number of clusters simultaneously. Based on our proposed distributional model, a theorem on the consistency of the estimates of model parameters is stated and proven. The theorem provides a statistical support for the validity of our method. Heuristic algorithms via the simultaneous model selection framework for vertex clustering are proposed, with good performance shown in the experiment on synthetic data and on the real application of connectome analysis.
Tasks	Graph Clustering, Model Selection, Spectral Graph Clustering
Published	2019-04-05
URL	https://arxiv.org/abs/1904.02926v2
PDF	https://arxiv.org/pdf/1904.02926v2.pdf
PWC	https://paperswithcode.com/paper/simultaneous-dimensionality-and-complexity
Repo	https://github.com/youngser/dhatkhat
Framework	none

Deep learning observables in computational fluid dynamics


Title	Deep learning observables in computational fluid dynamics
Authors	Kjetil O. Lye, Siddhartha Mishra, Deep Ray
Abstract	Many large scale problems in computational fluid dynamics such as uncertainty quantification, Bayesian inversion, data assimilation and PDE constrained optimization are considered very challenging computationally as they require a large number of expensive (forward) numerical solutions of the corresponding PDEs. We propose a machine learning algorithm, based on deep artificial neural networks, that predicts the underlying \emph{input parameters to observable} map from a few training samples (computed realizations of this map). By a judicious combination of theoretical arguments and empirical observations, we find suitable network architectures and training hyperparameters that result in robust and efficient neural network approximations of the parameters to observable map. Numerical experiments are presented to demonstrate low prediction errors for the trained network networks, even when the network has been trained with a few samples, at a computational cost which is several orders of magnitude lower than the underlying PDE solver. Moreover, we combine the proposed deep learning algorithm with Monte Carlo (MC) and Quasi-Monte Carlo (QMC) methods to efficiently compute uncertainty propagation for nonlinear PDEs. Under the assumption that the underlying neural networks generalize well, we prove that the deep learning MC and QMC algorithms are guaranteed to be faster than the baseline (quasi-) Monte Carlo methods. Numerical experiments demonstrating one to two orders of magnitude speed up over baseline QMC and MC algorithms, for the intricate problem of computing probability distributions of the observable, are also presented.
Tasks
Published	2019-03-07
URL	https://arxiv.org/abs/1903.03040v2
PDF	https://arxiv.org/pdf/1903.03040v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-observables-in-computational
Repo	https://github.com/mroberto166/MultilevelMachineLearning
Framework	tf

Image Classification with Hierarchical Multigraph Networks


Title	Image Classification with Hierarchical Multigraph Networks
Authors	Boris Knyazev, Xiao Lin, Mohamed R. Amer, Graham W. Taylor
Abstract	Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is the ability to work on irregular inputs, such as superpixels of images. This could significantly reduce the computational cost of image reasoning tasks. Another key advantage inherent to GCNs is the natural ability to model multirelational data. Building upon these two promising properties, in this work, we show best practices for designing GCNs for image classification; in some cases even outperforming CNNs on the MNIST, CIFAR-10 and PASCAL image datasets.
Tasks	Image Classification
Published	2019-07-21
URL	https://arxiv.org/abs/1907.09000v1
PDF	https://arxiv.org/pdf/1907.09000v1.pdf
PWC	https://paperswithcode.com/paper/image-classification-with-hierarchical
Repo	https://github.com/bknyaz/bmvc_2019
Framework	pytorch

Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data


Title	Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data
Authors	N. Benjamin Erichson, Lionel Mathelin, Zhewei Yao, Steven L. Brunton, Michael W. Mahoney, J. Nathan Kutz
Abstract	In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data. In this work, we propose a shallow neural network-based learning methodology for such fluid flow reconstruction. Our approach learns an end-to-end mapping between the sensor measurements and the high-dimensional fluid flow field, without any heavy preprocessing on the raw data. No prior knowledge is assumed to be available, and the estimation method is purely data-driven. We demonstrate the performance on three examples in fluid mechanics and oceanography, showing that this modern data-driven approach outperforms traditional modal approximation techniques which are commonly used for flow reconstruction. Not only does the proposed method show superior performance characteristics, it can also produce a comparable level of performance with traditional methods in the area, using significantly fewer sensors. Thus, the mathematical architecture is ideal for emerging global monitoring technologies where measurement data are often limited.
Tasks
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07358v1
PDF	http://arxiv.org/pdf/1902.07358v1.pdf
PWC	https://paperswithcode.com/paper/shallow-learning-for-fluid-flow
Repo	https://github.com/EiffL/FluidFlowPrediction
Framework	tf

Categorical Metadata Representation for Customized Text Classification


Title	Categorical Metadata Representation for Customized Text Classification
Authors	Jihyeok Kim, Reinald Kim Amplayo, Kyungjae Lee, Sua Sung, Minji Seo, Seung-won Hwang
Abstract	The performance of text classification has improved tremendously using intelligently engineered neural-based models, especially those injecting categorical metadata as additional information, e.g., using user/product information for sentiment classification. These information have been used to modify parts of the model (e.g., word embeddings, attention mechanisms) such that results can be customized according to the metadata. We observe that current representation methods for categorical metadata, which are devised for human consumption, are not as effective as claimed in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category, and even such context is often scarce (for tail category). To this end, we propose to use basis vectors to effectively incorporate categorical metadata on various parts of a neural-based model. This additionally decreases the number of parameters dramatically, especially when the number of categorical features is large. Extensive experiments on various datasets with different properties are performed and show that through our method, we can represent categorical metadata more effectively to customize parts of the model, including unexplored ones, and increase the performance of the model greatly.
Tasks	Sentiment Analysis, Text Classification, Word Embeddings
Published	2019-02-14
URL	http://arxiv.org/abs/1902.05196v1
PDF	http://arxiv.org/pdf/1902.05196v1.pdf
PWC	https://paperswithcode.com/paper/categorical-metadata-representation-for
Repo	https://github.com/zhou059/w266-project
Framework	none

Panoptic Feature Pyramid Networks


Title	Panoptic Feature Pyramid Networks
Authors	Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár
Abstract	The recently introduced panoptic segmentation task has renewed our community’s interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation.
Tasks	Instance Segmentation, Panoptic Segmentation, Semantic Segmentation
Published	2019-01-08
URL	http://arxiv.org/abs/1901.02446v2
PDF	http://arxiv.org/pdf/1901.02446v2.pdf
PWC	https://paperswithcode.com/paper/panoptic-feature-pyramid-networks
Repo	https://github.com/facebookresearch/detectron2
Framework	pytorch