Paper Group AWR 110
Neural Predictor for Neural Architecture Search. A New Burrows Wheeler Transform Markov Distance. Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. An Empirical Study of Spatial Attention Mechanisms in Deep Networks. Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements. Sparseo …
Neural Predictor for Neural Architecture Search
Title | Neural Predictor for Neural Architecture Search |
Authors | Wei Wen, Hanxiao Liu, Hai Li, Yiran Chen, Gabriel Bender, Pieter-Jan Kindermans |
Abstract | Neural Architecture Search methods are effective but often use complex algorithms to come up with the best architecture. We propose an approach with three basic steps that is conceptually much simpler. First we train N random architectures to generate N (architecture, validation accuracy) pairs and use them to train a regression model that predicts accuracy based on the architecture. Next, we use this regression model to predict the validation accuracies of a large number of random architectures. Finally, we train the top-K predicted architectures and deploy the model with the best validation result. While this approach seems simple, it is more than 20 times as sample efficient as Regularized Evolution on the NASBench-101 benchmark and can compete on ImageNet with more complex approaches based on weight sharing, such as ProxylessNAS. |
Tasks | Neural Architecture Search |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00848v1 |
https://arxiv.org/pdf/1912.00848v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-predictor-for-neural-architecture |
Repo | https://github.com/ultmaster/neuralpredictor.pytorch |
Framework | pytorch |
A New Burrows Wheeler Transform Markov Distance
Title | A New Burrows Wheeler Transform Markov Distance |
Authors | Edward Raff, Charles Nicholas, Mark McLean |
Abstract | Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods. |
Tasks | Malware Classification |
Published | 2019-12-30 |
URL | https://arxiv.org/abs/1912.13046v1 |
https://arxiv.org/pdf/1912.13046v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-burrows-wheeler-transform-markov |
Repo | https://github.com/EdwardRaff/pyBWMD |
Framework | none |
Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
Title | Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback |
Authors | Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, Rogerio Feris |
Abstract | We contribute a new dataset and a novel method for natural language based fashion image retrieval. Unlike previous fashion datasets, we provide natural language annotations to facilitate the training of interactive image retrieval systems, as well as the commonly used attribute based labels. We propose a novel approach and empirically demonstrate that combining natural language feedback with visual attribute information results in superior user feedback modeling and retrieval performance relative to using either of these modalities. We believe that our dataset can encourage further work on developing more natural and real-world applicable conversational shopping assistants. |
Tasks | Image Retrieval |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.12794v2 |
https://arxiv.org/pdf/1905.12794v2.pdf | |
PWC | https://paperswithcode.com/paper/the-fashion-iq-dataset-retrieving-images-by |
Repo | https://github.com/XiaoxiaoGuo/fashion-iq |
Framework | pytorch |
An Empirical Study of Spatial Attention Mechanisms in Deep Networks
Title | An Empirical Study of Spatial Attention Mechanisms in Deep Networks |
Authors | Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai |
Abstract | Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the query and key content comparison in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. A proper combination of deformable convolution with key content only saliency achieves the best accuracy-efficiency tradeoff in self-attention. Our results suggest that there exists much room for improvement in the design of attention mechanisms. |
Tasks | |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05873v1 |
http://arxiv.org/pdf/1904.05873v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-spatial-attention |
Repo | https://github.com/open-mmlab/mmdetection |
Framework | pytorch |
Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements
Title | Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements |
Authors | Kaixuan Wei, Jiaolong Yang, Ying Fu, David Wipf, Hua Huang |
Abstract | Removing undesirable reflections from a single image captured through a glass window is of practical importance to visual computing systems. Although state-of-the-art methods can obtain decent results in certain situations, performance declines significantly when tackling more general real-world cases. These failures stem from the intrinsic difficulty of single image reflection removal – the fundamental ill-posedness of the problem, and the insufficiency of densely-labeled training data needed for resolving this ambiguity within learning-based neural network pipelines. In this paper, we address these issues by exploiting targeted network enhancements and the novel use of misaligned data. For the former, we augment a baseline network architecture by embedding context encoding modules that are capable of leveraging high-level contextual clues to reduce indeterminacy within areas containing strong reflections. For the latter, we introduce an alignment-invariant loss function that facilitates exploiting misaligned real-world training data that is much easier to collect. Experimental results collectively show that our method outperforms the state-of-the-art with aligned data, and that significant improvements are possible when using additional misaligned data. |
Tasks | |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.00637v1 |
http://arxiv.org/pdf/1904.00637v1.pdf | |
PWC | https://paperswithcode.com/paper/single-image-reflection-removal-exploiting |
Repo | https://github.com/Vandermode/ERRNet |
Framework | pytorch |
Sparseout: Controlling Sparsity in Deep Networks
Title | Sparseout: Controlling Sparsity in Deep Networks |
Authors | Najeeb Khan, Ian Stavness |
Abstract | Dropout is commonly used to help reduce overfitting in deep neural networks. Sparsity is a potentially important property of neural networks, but is not explicitly controlled by Dropout-based regularization. In this work, we propose Sparseout a simple and efficient variant of Dropout that can be used to control the sparsity of the activations in a neural network. We theoretically prove that Sparseout is equivalent to an $L_q$ penalty on the features of a generalized linear model and that Dropout is a special case of Sparseout for neural networks. We empirically demonstrate that Sparseout is computationally inexpensive and is able to control the desired level of sparsity in the activations. We evaluated Sparseout on image classification and language modelling tasks to see the effect of sparsity on these tasks. We found that sparsity of the activations is favorable for language modelling performance while image classification benefits from denser activations. Sparseout provides a way to investigate sparsity in state-of-the-art deep learning models. Source code for Sparseout could be found at \url{https://github.com/najeebkhan/sparseout}. |
Tasks | Image Classification, Language Modelling |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08050v1 |
http://arxiv.org/pdf/1904.08050v1.pdf | |
PWC | https://paperswithcode.com/paper/sparseout-controlling-sparsity-in-deep |
Repo | https://github.com/najeebkhan/sparseout |
Framework | pytorch |
Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks
Title | Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks |
Authors | Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans |
Abstract | We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the $i$-vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neural Networks (TNNs), to built a latent space for different classifiers to solve the Multi-Target Speaker Detection and Identification Challenge Evaluation 2018 (MCE 2018) dataset. This training set contains $i$-vectors from 3,631 speakers, with only 3 samples for each speaker, thus making speaker recognition a challenging task. When using the train and development set for training both the TNN and baseline model (i.e., similarity evaluation directly on the $i$-vector representation), our proposed model outperforms the baseline by 23%. When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model. These results show that the representational power of TNNs is especially evident when training on small datasets with few instances available per class. |
Tasks | Speaker Identification, Speaker Recognition |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.01463v2 |
https://arxiv.org/pdf/1910.01463v2.pdf | |
PWC | https://paperswithcode.com/paper/latent-space-representation-for-multi-target |
Repo | https://github.com/KinWaiCheuk/MCE2018 |
Framework | tf |
Scalable Neural Dialogue State Tracking
Title | Scalable Neural Dialogue State Tracking |
Authors | Vevake Balaraman, Bernardo Magnini |
Abstract | A Dialogue State Tracker (DST) is a key component in a dialogue system aiming at estimating the beliefs of possible user goals at each dialogue turn. Most of the current DST trackers make use of recurrent neural networks and are based on complex architectures that manage several aspects of a dialogue, including the user utterance, the system actions, and the slot-value pairs defined in a domain ontology. However, the complexity of such neural architectures incurs into a considerable latency in the dialogue state prediction, which limits the deployments of the models in real-world applications, particularly when task scalability (i.e. amount of slots) is a crucial factor. In this paper, we propose an innovative neural model for dialogue state tracking, named Global encoder and Slot-Attentive decoders (G-SAT), which can predict the dialogue state with a very low latency time, while maintaining high-level performance. We report experiments on three different languages (English, Italian, and German) of the WoZ2.0 dataset, and show that the proposed approach provides competitive advantages over state-of-art DST systems, both in terms of accuracy and in terms of time complexity for predictions, being over 15 times faster than the other systems. |
Tasks | Dialogue State Tracking |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09942v1 |
https://arxiv.org/pdf/1910.09942v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-neural-dialogue-state-tracking |
Repo | https://github.com/vevake/GSAT |
Framework | pytorch |
Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks
Title | Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks |
Authors | Srijan Kumar, Xikun Zhang, Jure Leskovec |
Abstract | Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space. Here we propose JODIE, a coupled recurrent neural network model that learns the embedding trajectories of users and items. JODIE employs two recurrent neural networks to update the embedding of a user and an item at every interaction. Crucially, JODIE also models the future embedding trajectory of a user/item. To this end, it introduces a novel projection operator that learns to estimate the embedding of the user at any time in the future. These estimated embeddings are then used to predict future user-item interactions. To make the method scalable, we develop a t-Batch algorithm that creates time-consistent batches and leads to 9x faster training. We conduct six experiments to validate JODIE on two prediction tasks—future interaction prediction and state change prediction—using four real-world datasets. We show that JODIE outperforms six state-of-the-art algorithms in these tasks by at least 20% in predicting future interactions and 12% in state change prediction. |
Tasks | Representation Learning |
Published | 2019-08-03 |
URL | https://arxiv.org/abs/1908.01207v1 |
https://arxiv.org/pdf/1908.01207v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-dynamic-embedding-trajectory-in |
Repo | https://github.com/srijankr/jodie |
Framework | pytorch |
Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering
Title | Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering |
Authors | Congyuan Yang, Carey E. Priebe, Youngser Park, David J. Marchette |
Abstract | Our problem of interest is to cluster vertices of a graph by identifying its underlying community structure. Among various vertex clustering approaches, spectral clustering is one of the most popular methods, because it is easy to implement while often outperforming traditional clustering algorithms. However, there are two inherent model selection problems in spectral clustering, namely estimating the embedding dimension and number of clusters. This paper attempts to address the issue by establishing a novel model selection framework specifically for vertex clustering on graphs under a stochastic block model. The first contribution is a probabilistic model which approximates the distribution of the extended spectral embedding of a graph. The model is constructed based on a theoretical result of asymptotic normality of the informative part of the embedding, and on a simulation result of limiting behavior of the redundant part of the embedding. The second contribution is a simultaneous model selection framework. In contrast with the traditional approaches, our model selection procedure estimates embedding dimension and number of clusters simultaneously. Based on our proposed distributional model, a theorem on the consistency of the estimates of model parameters is stated and proven. The theorem provides a statistical support for the validity of our method. Heuristic algorithms via the simultaneous model selection framework for vertex clustering are proposed, with good performance shown in the experiment on synthetic data and on the real application of connectome analysis. |
Tasks | Graph Clustering, Model Selection, Spectral Graph Clustering |
Published | 2019-04-05 |
URL | https://arxiv.org/abs/1904.02926v2 |
https://arxiv.org/pdf/1904.02926v2.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-dimensionality-and-complexity |
Repo | https://github.com/youngser/dhatkhat |
Framework | none |
Deep learning observables in computational fluid dynamics
Title | Deep learning observables in computational fluid dynamics |
Authors | Kjetil O. Lye, Siddhartha Mishra, Deep Ray |
Abstract | Many large scale problems in computational fluid dynamics such as uncertainty quantification, Bayesian inversion, data assimilation and PDE constrained optimization are considered very challenging computationally as they require a large number of expensive (forward) numerical solutions of the corresponding PDEs. We propose a machine learning algorithm, based on deep artificial neural networks, that predicts the underlying \emph{input parameters to observable} map from a few training samples (computed realizations of this map). By a judicious combination of theoretical arguments and empirical observations, we find suitable network architectures and training hyperparameters that result in robust and efficient neural network approximations of the parameters to observable map. Numerical experiments are presented to demonstrate low prediction errors for the trained network networks, even when the network has been trained with a few samples, at a computational cost which is several orders of magnitude lower than the underlying PDE solver. Moreover, we combine the proposed deep learning algorithm with Monte Carlo (MC) and Quasi-Monte Carlo (QMC) methods to efficiently compute uncertainty propagation for nonlinear PDEs. Under the assumption that the underlying neural networks generalize well, we prove that the deep learning MC and QMC algorithms are guaranteed to be faster than the baseline (quasi-) Monte Carlo methods. Numerical experiments demonstrating one to two orders of magnitude speed up over baseline QMC and MC algorithms, for the intricate problem of computing probability distributions of the observable, are also presented. |
Tasks | |
Published | 2019-03-07 |
URL | https://arxiv.org/abs/1903.03040v2 |
https://arxiv.org/pdf/1903.03040v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-observables-in-computational |
Repo | https://github.com/mroberto166/MultilevelMachineLearning |
Framework | tf |
Image Classification with Hierarchical Multigraph Networks
Title | Image Classification with Hierarchical Multigraph Networks |
Authors | Boris Knyazev, Xiao Lin, Mohamed R. Amer, Graham W. Taylor |
Abstract | Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is the ability to work on irregular inputs, such as superpixels of images. This could significantly reduce the computational cost of image reasoning tasks. Another key advantage inherent to GCNs is the natural ability to model multirelational data. Building upon these two promising properties, in this work, we show best practices for designing GCNs for image classification; in some cases even outperforming CNNs on the MNIST, CIFAR-10 and PASCAL image datasets. |
Tasks | Image Classification |
Published | 2019-07-21 |
URL | https://arxiv.org/abs/1907.09000v1 |
https://arxiv.org/pdf/1907.09000v1.pdf | |
PWC | https://paperswithcode.com/paper/image-classification-with-hierarchical |
Repo | https://github.com/bknyaz/bmvc_2019 |
Framework | pytorch |
Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data
Title | Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data |
Authors | N. Benjamin Erichson, Lionel Mathelin, Zhewei Yao, Steven L. Brunton, Michael W. Mahoney, J. Nathan Kutz |
Abstract | In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data. In this work, we propose a shallow neural network-based learning methodology for such fluid flow reconstruction. Our approach learns an end-to-end mapping between the sensor measurements and the high-dimensional fluid flow field, without any heavy preprocessing on the raw data. No prior knowledge is assumed to be available, and the estimation method is purely data-driven. We demonstrate the performance on three examples in fluid mechanics and oceanography, showing that this modern data-driven approach outperforms traditional modal approximation techniques which are commonly used for flow reconstruction. Not only does the proposed method show superior performance characteristics, it can also produce a comparable level of performance with traditional methods in the area, using significantly fewer sensors. Thus, the mathematical architecture is ideal for emerging global monitoring technologies where measurement data are often limited. |
Tasks | |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07358v1 |
http://arxiv.org/pdf/1902.07358v1.pdf | |
PWC | https://paperswithcode.com/paper/shallow-learning-for-fluid-flow |
Repo | https://github.com/EiffL/FluidFlowPrediction |
Framework | tf |
Categorical Metadata Representation for Customized Text Classification
Title | Categorical Metadata Representation for Customized Text Classification |
Authors | Jihyeok Kim, Reinald Kim Amplayo, Kyungjae Lee, Sua Sung, Minji Seo, Seung-won Hwang |
Abstract | The performance of text classification has improved tremendously using intelligently engineered neural-based models, especially those injecting categorical metadata as additional information, e.g., using user/product information for sentiment classification. These information have been used to modify parts of the model (e.g., word embeddings, attention mechanisms) such that results can be customized according to the metadata. We observe that current representation methods for categorical metadata, which are devised for human consumption, are not as effective as claimed in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category, and even such context is often scarce (for tail category). To this end, we propose to use basis vectors to effectively incorporate categorical metadata on various parts of a neural-based model. This additionally decreases the number of parameters dramatically, especially when the number of categorical features is large. Extensive experiments on various datasets with different properties are performed and show that through our method, we can represent categorical metadata more effectively to customize parts of the model, including unexplored ones, and increase the performance of the model greatly. |
Tasks | Sentiment Analysis, Text Classification, Word Embeddings |
Published | 2019-02-14 |
URL | http://arxiv.org/abs/1902.05196v1 |
http://arxiv.org/pdf/1902.05196v1.pdf | |
PWC | https://paperswithcode.com/paper/categorical-metadata-representation-for |
Repo | https://github.com/zhou059/w266-project |
Framework | none |
Panoptic Feature Pyramid Networks
Title | Panoptic Feature Pyramid Networks |
Authors | Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollár |
Abstract | The recently introduced panoptic segmentation task has renewed our community’s interest in unifying the tasks of instance segmentation (for thing classes) and semantic segmentation (for stuff classes). However, current state-of-the-art methods for this joint task use separate and dissimilar networks for instance and semantic segmentation, without performing any shared computation. In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. Our approach is to endow Mask R-CNN, a popular instance segmentation method, with a semantic segmentation branch using a shared Feature Pyramid Network (FPN) backbone. Surprisingly, this simple baseline not only remains effective for instance segmentation, but also yields a lightweight, top-performing method for semantic segmentation. In this work, we perform a detailed study of this minimally extended version of Mask R-CNN with FPN, which we refer to as Panoptic FPN, and show it is a robust and accurate baseline for both tasks. Given its effectiveness and conceptual simplicity, we hope our method can serve as a strong baseline and aid future research in panoptic segmentation. |
Tasks | Instance Segmentation, Panoptic Segmentation, Semantic Segmentation |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02446v2 |
http://arxiv.org/pdf/1901.02446v2.pdf | |
PWC | https://paperswithcode.com/paper/panoptic-feature-pyramid-networks |
Repo | https://github.com/facebookresearch/detectron2 |
Framework | pytorch |