Paper Group AWR 36
Latent Intention Dialogue Models. Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation. The iNaturalist Species Classification and Detection Dataset. Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis. Learning S …
Latent Intention Dialogue Models
Title | Latent Intention Dialogue Models |
Authors | Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young |
Abstract | Developing a dialogue agent that is capable of making autonomous decisions and communicating by natural language is one of the long-term goals of machine learning research. Traditional approaches either rely on hand-crafting a small state-action set for applying reinforcement learning that is not scalable or constructing deterministic models for learning dialogue sentences that fail to capture natural conversational variability. In this paper, we propose a Latent Intention Dialogue Model (LIDM) that employs a discrete latent variable to learn underlying dialogue intentions in the framework of neural variational inference. In a goal-oriented dialogue scenario, these latent intentions can be interpreted as actions guiding the generation of machine responses, which can be further refined autonomously by reinforcement learning. The experimental evaluation of LIDM shows that the model out-performs published benchmarks for both corpus-based and human evaluation, demonstrating the effectiveness of discrete latent variable models for learning goal-oriented dialogues. |
Tasks | Latent Variable Models |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10229v1 |
http://arxiv.org/pdf/1705.10229v1.pdf | |
PWC | https://paperswithcode.com/paper/latent-intention-dialogue-models |
Repo | https://github.com/shawnwun/NNDIAL |
Framework | none |
Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation
Title | Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation |
Authors | Heyi Li, Yunke Tian, Klaus Mueller, Xin Chen |
Abstract | Despite the tremendous achievements of deep convolutional neural networks (CNNs) in many computer vision tasks, understanding how they actually work remains a significant challenge. In this paper, we propose a novel two-step understanding method, namely Salient Relevance (SR) map, which aims to shed light on how deep CNNs recognize images and learn features from areas, referred to as attention areas, therein. Our proposed method starts out with a layer-wise relevance propagation (LRP) step which estimates a pixel-wise relevance map over the input image. Following, we construct a context-aware saliency map, SR map, from the LRP-generated map which predicts areas close to the foci of attention instead of isolated pixels that LRP reveals. In human visual system, information of regions is more important than of pixels in recognition. Consequently, our proposed approach closely simulates human recognition. Experimental results using the ILSVRC2012 validation dataset in conjunction with two well-established deep CNN models, AlexNet and VGG-16, clearly demonstrate that our proposed approach concisely identifies not only key pixels but also attention areas that contribute to the underlying neural network’s comprehension of the given images. As such, our proposed SR map constitutes a convenient visual interface which unveils the visual attention of the network and reveals which type of objects the model has learned to recognize after training. The source code is available at https://github.com/Hey1Li/Salient-Relevance-Propagation. |
Tasks | Saliency Prediction |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08268v5 |
http://arxiv.org/pdf/1712.08268v5.pdf | |
PWC | https://paperswithcode.com/paper/beyond-saliency-understanding-convolutional |
Repo | https://github.com/Hey1Li/Salient-Relevance-Propagation |
Framework | pytorch |
The iNaturalist Species Classification and Detection Dataset
Title | The iNaturalist Species Classification and Detection Dataset |
Authors | Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie |
Abstract | Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist species classification and detection dataset, consisting of 859,000 images from over 5,000 different species of plants and animals. It features visually similar species, captured in a wide variety of situations, from all over the world. Images were collected with different camera types, have varying image quality, feature a large class imbalance, and have been verified by multiple citizen scientists. We discuss the collection of the dataset and present extensive baseline experiments using state-of-the-art computer vision classification and detection models. Results show that current non-ensemble based methods achieve only 67% top one classification accuracy, illustrating the difficulty of the dataset. Specifically, we observe poor results for classes with small numbers of training examples suggesting more attention is needed in low-shot learning. |
Tasks | Image Classification |
Published | 2017-07-20 |
URL | http://arxiv.org/abs/1707.06642v2 |
http://arxiv.org/pdf/1707.06642v2.pdf | |
PWC | https://paperswithcode.com/paper/the-inaturalist-species-classification-and |
Repo | https://github.com/tensorflow/models/tree/master/research/object_detection |
Framework | tf |
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Title | Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis |
Authors | Rui Huang, Shu Zhang, Tianyu Li, Ran He |
Abstract | Photorealistic frontal view synthesis from a single face image has a wide range of applications in the field of face recognition. Although data-driven deep learning methods have been proposed to address this problem by seeking solutions from ample face data, this problem is still challenging because it is intrinsically ill-posed. This paper proposes a Two-Pathway Generative Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by simultaneously perceiving global structures and local details. Four landmark located patch networks are proposed to attend to local textures in addition to the commonly used global encoder-decoder network. Except for the novel architecture, we make this ill-posed problem well constrained by introducing a combination of adversarial loss, symmetry loss and identity preserving loss. The combined loss function leverages both frontal face distribution and pre-trained discriminative deep face models to guide an identity preserving inference of frontal views from profiles. Different from previous deep learning methods that mainly rely on intermediate features for recognition, our method directly leverages the synthesized identity preserving image for downstream tasks like face recognition and attribution estimation. Experimental results demonstrate that our method not only presents compelling perceptual results but also outperforms state-of-the-art results on large pose face recognition. |
Tasks | Face Recognition |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04086v2 |
http://arxiv.org/pdf/1704.04086v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-face-rotation-global-and-local |
Repo | https://github.com/UnrealLink/TP-GAN |
Framework | pytorch |
Learning Structural Node Embeddings Via Diffusion Wavelets
Title | Learning Structural Node Embeddings Via Diffusion Wavelets |
Authors | Claire Donnat, Marinka Zitnik, David Hallac, Jure Leskovec |
Abstract | Nodes residing in different parts of a graph can have similar structural roles within their local network topology. The identification of such roles provides key insight into the organization of networks and can be used for a variety of machine learning tasks. However, learning structural representations of nodes is a challenging problem, and it has typically involved manually specifying and tailoring topological features for each node. In this paper, we develop GraphWave, a method that represents each node’s network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns. Instead of training on hand-selected features, GraphWave learns these embeddings in an unsupervised way. We mathematically prove that nodes with similar network neighborhoods will have similar GraphWave embeddings even though these nodes may reside in very different parts of the network, and our method scales linearly with the number of edges. Experiments in a variety of different settings demonstrate GraphWave’s real-world potential for capturing structural roles in networks, and our approach outperforms existing state-of-the-art baselines in every experiment, by as much as 137%. |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10321v4 |
http://arxiv.org/pdf/1710.10321v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-structural-node-embeddings-via |
Repo | https://github.com/benedekrozemberczki/karateclub |
Framework | none |
Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling
Title | Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling |
Authors | Anuj Karpatne, William Watkins, Jordan Read, Vipin Kumar |
Abstract | This paper introduces a novel framework for combining scientific knowledge of physics-based models with neural networks to advance scientific discovery. This framework, termed as physics-guided neural network (PGNN), leverages the output of physics-based model simulations along with observational features to generate predictions using a neural network architecture. Further, this paper presents a novel framework for using physics-based loss functions in the learning objective of neural networks, to ensure that the model predictions not only show lower errors on the training set but are also scientifically consistent with the known physics on the unlabeled set. We illustrate the effectiveness of PGNN for the problem of lake temperature modeling, where physical relationships between the temperature, density, and depth of water are used to design a physics-based loss function. By using scientific knowledge to guide the construction and learning of neural networks, we are able to show that the proposed framework ensures better generalizability as well as scientific consistency of results. |
Tasks | |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1710.11431v2 |
http://arxiv.org/pdf/1710.11431v2.pdf | |
PWC | https://paperswithcode.com/paper/physics-guided-neural-networks-pgnn-an |
Repo | https://github.com/ballcap231/ML-Applied-for-Physics |
Framework | none |
Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling
Title | Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling |
Authors | Samuel Burkart, Franz J Király |
Abstract | Testing (conditional) independence of multivariate random variables is a task central to statistical inference and modelling in general - though unfortunately one for which to date there does not exist a practicable workflow. State-of-art workflows suffer from the need for heuristic or subjective manual choices, high computational complexity, or strong parametric assumptions. We address these problems by establishing a theoretical link between multivariate/conditional independence testing, and model comparison in the multivariate predictive modelling aka supervised learning task. This link allows advances in the extensively studied supervised learning workflow to be directly transferred to independence testing workflows - including automated tuning of machine learning type which addresses the need for a heuristic choice, the ability to quantitatively trade-off computational demand with accuracy, and the modern black-box philosophy for checking and interfacing. As a practical implementation of this link between the two workflows, we present a python package ‘pcit’, which implements our novel multivariate and conditional independence tests, interfacing the supervised learning API of the scikit-learn package. Theory and package also allow for straightforward independence test based learning of graphical model structure. We empirically show that our proposed predictive independence test outperform or are on par to current practice, and the derived graphical model structure learning algorithms asymptotically recover the ‘true’ graph. This paper, and the ‘pcit’ package accompanying it, thus provide powerful, scalable, generalizable, and easy-to-use methods for multivariate and conditional independence testing, as well as for graphical model structure learning. |
Tasks | |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.05869v2 |
http://arxiv.org/pdf/1711.05869v2.pdf | |
PWC | https://paperswithcode.com/paper/predictive-independence-testing-predictive |
Repo | https://github.com/alan-turing-institute/pcit |
Framework | none |
Deep MR to CT Synthesis using Unpaired Data
Title | Deep MR to CT Synthesis using Unpaired Data |
Authors | Jelmer M. Wolterink, Anna M. Dinkla, Mark H. F. Savenije, Peter R. Seevinck, Cornelis A. T. van den Berg, Ivana Isgum |
Abstract | MR-only radiotherapy treatment planning requires accurate MR-to-CT synthesis. Current deep learning methods for MR-to-CT synthesis depend on pairwise aligned MR and CT training images of the same patient. However, misalignment between paired images could lead to errors in synthesized CT images. To overcome this, we propose to train a generative adversarial network (GAN) with unpaired MR and CT images. A GAN consisting of two synthesis convolutional neural networks (CNNs) and two discriminator CNNs was trained with cycle consistency to transform 2D brain MR image slices into 2D brain CT image slices and vice versa. Brain MR and CT images of 24 patients were analyzed. A quantitative evaluation showed that the model was able to synthesize CT images that closely approximate reference CT images, and was able to outperform a GAN model trained with paired MR and CT images. |
Tasks | |
Published | 2017-08-03 |
URL | http://arxiv.org/abs/1708.01155v1 |
http://arxiv.org/pdf/1708.01155v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-mr-to-ct-synthesis-using-unpaired-data |
Repo | https://github.com/ChengBinJin/SpineC2M |
Framework | tf |
ELFI: Engine for Likelihood-Free Inference
Title | ELFI: Engine for Likelihood-Free Inference |
Authors | Jarno Lintusaari, Henri Vuollekoski, Antti Kangasrääsiö, Kusti Skytén, Marko Järvenpää, Pekka Marttinen, Michael U. Gutmann, Aki Vehtari, Jukka Corander, Samuel Kaski |
Abstract | Engine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for arranging components in LFI, such as priors, simulators, summaries or distances, to a network called ELFI graph. The components can be implemented in a wide variety of languages. The stand-alone ELFI graph can be used with any of the available inference methods without modifications. A central method implemented in ELFI is Bayesian Optimization for Likelihood-Free Inference (BOLFI), which has recently been shown to accelerate likelihood-free inference up to several orders of magnitude by surrogate-modelling the distance. ELFI also has an inbuilt support for output data storing for reuse and analysis, and supports parallelization of computation from multiple cores up to a cluster environment. ELFI is designed to be extensible and provides interfaces for widening its functionality. This makes the adding of new inference methods to ELFI straightforward and automatically compatible with the inbuilt features. |
Tasks | |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00707v3 |
http://arxiv.org/pdf/1708.00707v3.pdf | |
PWC | https://paperswithcode.com/paper/elfi-engine-for-likelihood-free-inference |
Repo | https://github.com/elfi-dev/elfi |
Framework | none |
A Deep Relevance Matching Model for Ad-hoc Retrieval
Title | A Deep Relevance Matching Model for Ad-hoc Retrieval |
Authors | Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft |
Abstract | In recent years, deep neural networks have led to exciting breakthroughs in speech recognition, computer vision, and natural language processing (NLP) tasks. However, there have been few positive results of deep models on ad-hoc retrieval tasks. This is partially due to the fact that many important characteristics of the ad-hoc retrieval task have not been well addressed in deep models yet. Typically, the ad-hoc retrieval task is formalized as a matching problem between two pieces of text in existing work using deep models, and treated equivalent to many NLP tasks such as paraphrase identification, question answering and automatic conversation. However, we argue that the ad-hoc retrieval task is mainly about relevance matching while most NLP matching tasks concern semantic matching, and there are some fundamental differences between these two matching tasks. Successful relevance matching requires proper handling of the exact matching signals, query term importance, and diverse matching requirements. In this paper, we propose a novel deep relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model employs a joint deep architecture at the query term level for relevance matching. By using matching histogram mapping, a feed forward matching network, and a term gating network, we can effectively deal with the three relevance matching factors mentioned above. Experimental results on two representative benchmark collections show that our model can significantly outperform some well-known retrieval models as well as state-of-the-art deep matching models. |
Tasks | Ad-Hoc Information Retrieval, Paraphrase Identification, Question Answering, Speech Recognition |
Published | 2017-11-23 |
URL | http://arxiv.org/abs/1711.08611v1 |
http://arxiv.org/pdf/1711.08611v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-relevance-matching-model-for-ad-hoc |
Repo | https://github.com/faneshion/DRMM |
Framework | none |
Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?
Title | Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? |
Authors | Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg |
Abstract | Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization. Here we consider a family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer, and we propose their use as environments for evaluating and comparing different approaches to reinforcement learning. These games have a number of appealing features: they are challenging for current learning approaches, but they form (i) a low-dimensional, simply parametrized environment where (ii) there is a linear closed form solution for optimal behavior from any state, and (iii) the difficulty of the game can be tuned by changing environment parameters in an interpretable way. We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyse multiagent play, and even develop a self play algorithm. Code can be found at: https://github.com/rubai5/ESS_Game |
Tasks | |
Published | 2017-11-07 |
URL | http://arxiv.org/abs/1711.02301v5 |
http://arxiv.org/pdf/1711.02301v5.pdf | |
PWC | https://paperswithcode.com/paper/can-deep-reinforcement-learning-solve-erdos |
Repo | https://github.com/benburk/erdos_selfridge_spencer_games |
Framework | none |
Training a Fully Convolutional Neural Network to Route Integrated Circuits
Title | Training a Fully Convolutional Neural Network to Route Integrated Circuits |
Authors | Sambhav R. Jain, Kye Okabe |
Abstract | We present a deep, fully convolutional neural network that learns to route a circuit layout net with appropriate choice of metal tracks and wire class combinations. Inputs to the network are the encoded layouts containing spatial location of pins to be routed. After 15 fully convolutional stages followed by a score comparator, the network outputs 8 layout layers (corresponding to 4 route layers, 3 via layers and an identity-mapped pin layer) which are then decoded to obtain the routed layouts. We formulate this as a binary segmentation problem on a per-pixel per-layer basis, where the network is trained to correctly classify pixels in each layout layer to be ‘on’ or ‘off’. To demonstrate learnability of layout design rules, we train the network on a dataset of 50,000 train and 10,000 validation samples that we generate based on certain pre-defined layout constraints. Precision, recall and $F_1$ score metrics are used to track the training progress. Our network achieves $F_1\approx97%$ on the train set and $F_1\approx92%$ on the validation set. We use PyTorch for implementing our model. Code is made publicly available at https://github.com/sjain-stanford/deep-route . |
Tasks | |
Published | 2017-06-27 |
URL | http://arxiv.org/abs/1706.08948v2 |
http://arxiv.org/pdf/1706.08948v2.pdf | |
PWC | https://paperswithcode.com/paper/training-a-fully-convolutional-neural-network |
Repo | https://github.com/sjain-stanford/deep-route |
Framework | pytorch |
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
Title | Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification |
Authors | Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen |
Abstract | Recently, substantial research effort has focused on how to apply CNNs or RNNs to better extract temporal patterns from videos, so as to improve the accuracy of video classification. In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets. We investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we propose a local feature integration framework based on attention clusters, and introduce a shifting operation to capture more diverse signals. We carefully analyze and compare the effect of different attention mechanisms, cluster sizes, and the use of the shifting operation, and also investigate the combination of attention clusters for multimodal integration. We demonstrate the effectiveness of our framework on three real-world video classification datasets. Our model achieves competitive results across all of these. In particular, on the large-scale Kinetics dataset, our framework obtains an excellent single model accuracy of 79.4% in terms of the top-1 and 94.0% in terms of the top-5 accuracy on the validation set. The attention clusters are the backbone of our winner solution at ActivityNet Kinetics Challenge 2017. Code and models will be released soon. |
Tasks | Video Classification |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09550v1 |
http://arxiv.org/pdf/1711.09550v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-clusters-purely-attention-based |
Repo | https://github.com/pomonam/AttentionCluster |
Framework | tf |
Unsupervised Learning for Cell-level Visual Representation in Histopathology Images with Generative Adversarial Networks
Title | Unsupervised Learning for Cell-level Visual Representation in Histopathology Images with Generative Adversarial Networks |
Authors | Bo Hu, Ye Tang, Eric I-Chao Chang, Yubo Fan, Maode Lai, Yan Xu |
Abstract | The visual attributes of cells, such as the nuclear morphology and chromatin openness, are critical for histopathology image analysis. By learning cell-level visual representation, we can obtain a rich mix of features that are highly reusable for various tasks, such as cell-level classification, nuclei segmentation, and cell counting. In this paper, we propose a unified generative adversarial networks architecture with a new formulation of loss to perform robust cell-level visual representation learning in an unsupervised setting. Our model is not only label-free and easily trained but also capable of cell-level unsupervised classification with interpretable visualization, which achieves promising results in the unsupervised classification of bone marrow cellular components. Based on the proposed cell-level visual representation learning, we further develop a pipeline that exploits the varieties of cellular elements to perform histopathology image classification, the advantages of which are demonstrated on bone marrow datasets. |
Tasks | Image Classification, Representation Learning |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11317v4 |
http://arxiv.org/pdf/1711.11317v4.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-for-cell-level-visual |
Repo | https://github.com/bohu615/nu_gan |
Framework | pytorch |
Temporal Action Detection with Structured Segment Networks
Title | Temporal Action Detection with Structured Segment Networks |
Authors | Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin |
Abstract | Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we present the structured segment network (SSN), a novel framework which models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end fashion. Additionally, a simple yet effective temporal action proposal scheme, dubbed temporal actionness grouping (TAG) is devised to generate high quality action proposals. On two challenging benchmarks, THUMOS14 and ActivityNet, our method remarkably outperforms previous state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling actions with various temporal structures. |
Tasks | Action Detection, Action Recognition In Videos |
Published | 2017-04-20 |
URL | http://arxiv.org/abs/1704.06228v2 |
http://arxiv.org/pdf/1704.06228v2.pdf | |
PWC | https://paperswithcode.com/paper/temporal-action-detection-with-structured |
Repo | https://github.com/open-mmlab/mmaction |
Framework | pytorch |