Paper Group AWR 106
Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection. BlitzNet: A Real-Time Deep Network for Scene Understanding. Learning Graph-Level Representation for Drug Discovery. Neural Models for Documents with Metadata. HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Local …
Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection
Title | Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection |
Authors | Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox |
Abstract | General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB. |
Tasks | Action Classification, Action Localization, Skeleton Based Action Recognition, Spatio-Temporal Action Localization, Temporal Action Localization |
Published | 2017-04-03 |
URL | http://arxiv.org/abs/1704.00616v2 |
http://arxiv.org/pdf/1704.00616v2.pdf | |
PWC | https://paperswithcode.com/paper/chained-multi-stream-networks-exploiting-pose |
Repo | https://github.com/mzolfaghari/chained-multistream-networks |
Framework | none |
BlitzNet: A Real-Time Deep Network for Scene Understanding
Title | BlitzNet: A Real-Time Deep Network for Scene Understanding |
Authors | Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid |
Abstract | Real-time scene understanding has become crucial in many applications such as autonomous driving. In this paper, we propose a deep architecture, called BlitzNet, that jointly performs object detection and semantic segmentation in one forward pass, allowing real-time computations. Besides the computational gain of having a single network to perform several tasks, we show that object detection and semantic segmentation benefit from each other in terms of accuracy. Experimental results for VOC and COCO datasets show state-of-the-art performance for object detection and segmentation among real time systems. |
Tasks | Autonomous Driving, Object Detection, Real-Time Object Detection, Real-Time Semantic Segmentation, Scene Understanding, Semantic Segmentation |
Published | 2017-08-09 |
URL | http://arxiv.org/abs/1708.02813v1 |
http://arxiv.org/pdf/1708.02813v1.pdf | |
PWC | https://paperswithcode.com/paper/blitznet-a-real-time-deep-network-for-scene |
Repo | https://github.com/dvornikita/blitznet |
Framework | tf |
Learning Graph-Level Representation for Drug Discovery
Title | Learning Graph-Level Representation for Drug Discovery |
Authors | Junying Li, Deng Cai, Xiaofei He |
Abstract | Predicating macroscopic influences of drugs on human body, like efficacy and toxicity, is a central problem of small-molecule based drug discovery. Molecules can be represented as an undirected graph, and we can utilize graph convolution networks to predication molecular properties. However, graph convolutional networks and other graph neural networks all focus on learning node-level representation rather than graph-level representation. Previous works simply sum all feature vectors for all nodes in the graph to obtain the graph feature vector for drug predication. In this paper, we introduce a dummy super node that is connected with all nodes in the graph by a directed edge as the representation of the graph and modify the graph operation to help the dummy super node learn graph-level feature. Thus, we can handle graph-level classification and regression in the same way as node-level classification and regression. In addition, we apply focal loss to address class imbalance in drug datasets. The experiments on MoleculeNet show that our method can effectively improve the performance of molecular properties predication. |
Tasks | Drug Discovery |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03741v2 |
http://arxiv.org/pdf/1709.03741v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-graph-level-representation-for-drug |
Repo | https://github.com/ZJULearning/graph_level_drug_discovery |
Framework | none |
Neural Models for Documents with Metadata
Title | Neural Models for Documents with Metadata |
Authors | Dallas Card, Chenhao Tan, Noah A. Smith |
Abstract | Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information. While specialized models have been developed for particular applications, few are widely used in practice, as customization typically requires derivation of a custom inference algorithm. In this paper, we build on recent advances in variational inference methods and propose a general neural framework, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models. Our approach achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity. Finally, we demonstrate the potential of our framework through an exploration of a corpus of articles about US immigration. |
Tasks | Topic Models |
Published | 2017-05-25 |
URL | http://arxiv.org/abs/1705.09296v2 |
http://arxiv.org/pdf/1705.09296v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-models-for-documents-with-metadata |
Repo | https://github.com/dallascard/scholar |
Framework | pytorch |
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
Title | HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization |
Authors | Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan |
Abstract | This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage both consensus and disagreement among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated by human annotators. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips sampled from 504K untrimmed videos, and HACS Seg-ments contains 139K action segments densely annotatedin 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset both a large scale action recognition benchmark and an excellent source for spatiotemporal feature learning. In our transferlearning experiments on three target datasets, HACS Clips outperforms Kinetics-600, Moments-In-Time and Sports1Mas a pretraining source. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense temporal annotations. |
Tasks | Action Classification, Action Localization, Temporal Action Localization, Temporal Localization, Transfer Learning |
Published | 2017-12-26 |
URL | https://arxiv.org/abs/1712.09374v3 |
https://arxiv.org/pdf/1712.09374v3.pdf | |
PWC | https://paperswithcode.com/paper/hacs-human-action-clips-and-segments-dataset |
Repo | https://github.com/hangzhaomit/HACS-dataset |
Framework | none |
Improving the Neural GPU Architecture for Algorithm Learning
Title | Improving the Neural GPU Architecture for Algorithm Learning |
Authors | Karlis Freivalds, Renars Liepins |
Abstract | Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a new technique - hard nonlinearities with saturation costs- that has general applicability. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end. |
Tasks | |
Published | 2017-02-28 |
URL | http://arxiv.org/abs/1702.08727v2 |
http://arxiv.org/pdf/1702.08727v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-neural-gpu-architecture-for |
Repo | https://github.com/LUMII-Syslab/DNGPU |
Framework | tf |
Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking
Title | Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking |
Authors | Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides |
Abstract | Traditional generative adversarial networks (GAN) and many of its variants are trained by minimizing the KL or JS-divergence loss that measures how close the generated data distribution is from the true data distribution. A recent advance called the WGAN based on Wasserstein distance can improve on the KL and JS-divergence based GANs, and alleviate the gradient vanishing, instability, and mode collapse issues that are common in the GAN training. In this work, we aim at improving on the WGAN by first generalizing its discriminator loss to a margin-based one, which leads to a better discriminator, and in turn a better generator, and then carrying out a progressive training paradigm involving multiple GANs to contribute to the maximum margin ranking loss so that the GAN at later stages will improve upon early stages. We call this method Gang of GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce the gap between the true data distribution and the generated data distribution by at least half in an optimally trained WGAN. We have also proposed a new way of measuring GAN quality which is based on image completion tasks. We have evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR-10, and 50K-SSFF, and have seen both visual and quantitative improvement over baseline WGAN. |
Tasks | |
Published | 2017-04-17 |
URL | http://arxiv.org/abs/1704.04865v1 |
http://arxiv.org/pdf/1704.04865v1.pdf | |
PWC | https://paperswithcode.com/paper/gang-of-gans-generative-adversarial-networks |
Repo | https://github.com/human-analysis/RankGAN |
Framework | pytorch |
DCFNet: Discriminant Correlation Filters Network for Visual Tracking
Title | DCFNet: Discriminant Correlation Filters Network for Visual Tracking |
Authors | Qiang Wang, Jin Gao, Junliang Xing, Mengdan Zhang, Weiming Hu |
Abstract | Discriminant Correlation Filters (DCF) based methods now become a kind of dominant approach to online object tracking. The features used in these methods, however, are either based on hand-crafted features like HoGs, or convolutional features trained independently from other tasks like image classification. In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously. Specifically, we treat DCF as a special correlation filter layer added in a Siamese network, and carefully derive the backpropagation through it by defining the network output as the probability heatmap of object location. Since the derivation is still carried out in Fourier frequency domain, the efficiency property of DCF is preserved. This enables our tracker to run at more than 60 FPS during test time, while achieving a significant accuracy gain compared with KCF using HoGs. Extensive evaluations on OTB-2013, OTB-2015, and VOT2015 benchmarks demonstrate that the proposed DCFNet tracker is competitive with several state-of-the-art trackers, while being more compact and much faster. |
Tasks | Object Tracking, Visual Tracking |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04057v1 |
http://arxiv.org/pdf/1704.04057v1.pdf | |
PWC | https://paperswithcode.com/paper/dcfnet-discriminant-correlation-filters |
Repo | https://github.com/linzhi123/DCFNet |
Framework | none |
An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation
Title | An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation |
Authors | Christian F. Baumgartner, Lisa M. Koch, Marc Pollefeys, Ender Konukoglu |
Abstract | Accurate segmentation of the heart is an important step towards evaluating cardiac function. In this paper, we present a fully automated framework for segmentation of the left (LV) and right (RV) ventricular cavities and the myocardium (Myo) on short-axis cardiac MR images. We investigate various 2D and 3D convolutional neural network architectures for this task. We investigate the suitability of various state-of-the art 2D and 3D convolutional neural network architectures, as well as slight modifications thereof, for this task. Experiments were performed on the ACDC 2017 challenge training dataset comprising cardiac MR images of 100 patients, where manual reference segmentations were made available for end-diastolic (ED) and end-systolic (ES) frames. We find that processing the images in a slice-by-slice fashion using 2D networks is beneficial due to a relatively large slice thickness. However, the exact network architecture only plays a minor role. We report mean Dice coefficients of $0.950$ (LV), $0.893$ (RV), and $0.899$ (Myo), respectively with an average evaluation time of 1.1 seconds per volume on a modern GPU. |
Tasks | Semantic Segmentation |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04496v2 |
http://arxiv.org/pdf/1709.04496v2.pdf | |
PWC | https://paperswithcode.com/paper/an-exploration-of-2d-and-3d-deep-learning |
Repo | https://github.com/baumgach/acdc_segmenter |
Framework | tf |
Sparse models for Computer Vision
Title | Sparse models for Computer Vision |
Authors | Laurent Perrinet |
Abstract | The representation of images in the brain is known to be sparse. That is, as neural activity is recorded in a visual area —for instance the primary visual cortex of primates— only a few neurons are active at a given time with respect to the whole population. It is believed that such a property reflects the efficient match of the representation with the statistics of natural scenes. Applying such a paradigm to computer vision therefore seems a promising approach towards more biomimetic algorithms. Herein, we will describe a biologically-inspired approach to this problem. First, we will describe an unsupervised learning paradigm which is particularly adapted to the efficient coding of image patches. Then, we will outline a complete multi-scale framework —SparseLets— implementing a biologically inspired sparse representation of natural images. Finally, we will propose novel methods for integrating prior information into these algorithms and provide some preliminary experimental results. We will conclude by giving some perspective on applying such algorithms to computer vision. More specifically, we will propose that bio-inspired approaches may be applied to computer vision using predictive coding schemes, sparse models being one simple and efficient instance of such schemes. |
Tasks | |
Published | 2017-01-24 |
URL | http://arxiv.org/abs/1701.06859v1 |
http://arxiv.org/pdf/1701.06859v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-models-for-computer-vision |
Repo | https://github.com/bicv/Perrinet2015BICV_sparse |
Framework | none |
General Latent Feature Models for Heterogeneous Datasets
Title | General Latent Feature Models for Heterogeneous Datasets |
Authors | Isabel Valera, Melanie F. Pradier, Maria Lomeli, Zoubin Ghahramani |
Abstract | Latent feature modeling allows capturing the latent structure responsible for generating the observed properties of a set of objects. It is often used to make predictions either for new values of interest or missing information in the original data, as well as to perform data exploratory analysis. However, although there is an extensive literature on latent feature models for homogeneous datasets, where all the attributes that describe each object are of the same (continuous or discrete) nature, there is a lack of work on latent feature modeling for heterogeneous databases. In this paper, we introduce a general Bayesian nonparametric latent feature model suitable for heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while keeping the properties of conjugate models, which allow us to infer the model in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploratory analysis. We show the flexibility of the proposed model by solving both prediction and data analysis tasks on several real-world datasets. Moreover, a software package of the GLFM is publicly available for other researcher to use and improve it. |
Tasks | |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03779v2 |
http://arxiv.org/pdf/1706.03779v2.pdf | |
PWC | https://paperswithcode.com/paper/general-latent-feature-models-for |
Repo | https://github.com/ivaleraM/GLFM |
Framework | none |
Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes
Title | Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes |
Authors | Francois Petitjean, Wray Buntine, Geoffrey I. Webb, Nayyar Zaidi |
Abstract | This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs). The main result of this paper is to show that improved parameter estimation allows BNCs to outperform leading learning methods such as Random Forest for both 0-1 loss and RMSE, albeit just on categorical datasets. As data assets become larger, entering the hyped world of “big”, efficient accurate classification requires three main elements: (1) classifiers with low-bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest Bayesian network classifiers (BNCs) satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters’ estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of Hierarchical Dirichlet Processes for accurate BNC parameter estimation. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with Random Forest in terms of prediction, while keeping the out-of-core capability and superior classification time. |
Tasks | |
Published | 2017-08-25 |
URL | http://arxiv.org/abs/1708.07581v3 |
http://arxiv.org/pdf/1708.07581v3.pdf | |
PWC | https://paperswithcode.com/paper/accurate-parameter-estimation-for-bayesian |
Repo | https://github.com/fpetitjean/HDP |
Framework | none |
Forecasting of commercial sales with large scale Gaussian Processes
Title | Forecasting of commercial sales with large scale Gaussian Processes |
Authors | Rodrigo Rivera, Evgeny Burnaev |
Abstract | This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management. |
Tasks | Decision Making, Gaussian Processes |
Published | 2017-09-16 |
URL | http://arxiv.org/abs/1709.05548v1 |
http://arxiv.org/pdf/1709.05548v1.pdf | |
PWC | https://paperswithcode.com/paper/forecasting-of-commercial-sales-with-large |
Repo | https://github.com/rodrigorivera/forecastingcommercial |
Framework | none |
Low-Dose CT with a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN)
Title | Low-Dose CT with a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN) |
Authors | Hu Chen, Yi Zhang, Mannudeep K. Kalra, Feng Lin, Yang Chen, Peixi Liao, Jiliu Zhou, Ge Wang |
Abstract | Given the potential X-ray radiation risk to the patient, low-dose CT has attracted a considerable interest in the medical imaging field. The current main stream low-dose CT methods include vendor-specific sinogram domain filtration and iterative reconstruction, but they need to access original raw data whose formats are not transparent to most users. Due to the difficulty of modeling the statistical characteristics in the image domain, the existing methods for directly processing reconstructed images cannot eliminate image noise very well while keeping structural details. Inspired by the idea of deep learning, here we combine the autoencoder, the deconvolution network, and shortcut connections into the residual encoder-decoder convolutional neural network (RED-CNN) for low-dose CT imaging. After patch-based training, the proposed RED-CNN achieves a competitive performance relative to the-state-of-art methods in both simulated and clinical cases. Especially, our method has been favorably evaluated in terms of noise suppression, structural preservation and lesion detection. |
Tasks | |
Published | 2017-02-01 |
URL | http://arxiv.org/abs/1702.00288v3 |
http://arxiv.org/pdf/1702.00288v3.pdf | |
PWC | https://paperswithcode.com/paper/low-dose-ct-with-a-residual-encoder-decoder |
Repo | https://github.com/SSinyu/RED_CNN |
Framework | pytorch |
Adposition and Case Supersenses v2.5: Guidelines for English
Title | Adposition and Case Supersenses v2.5: Guidelines for English |
Authors | Nathan Schneider, Jena D. Hwang, Archna Bhatia, Vivek Srikumar, Na-Rae Han, Tim O’Gorman, Sarah R. Moeller, Omri Abend, Adi Shalev, Austin Blodgett, Jakob Prange |
Abstract | This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018), an inventory of 50 semantic labels (“supersenses”) that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE corpus (https://github.com/nert-gu/streusle/; version 4.3 tracks guidelines version 2.5). Though the SNACS inventory aspires to be universal, this document is specific to English; documentation for other languages will be published separately. Version 2 is a revision of the supersense inventory proposed for English by Schneider et al. (2015, 2016) (henceforth “v1”), which in turn was based on previous schemes. The present inventory was developed after extensive review of the v1 corpus annotations for English, plus previously unanalyzed genitive case possessives (Blodgett and Schneider, 2018), as well as consideration of adposition and case phenomena in Hebrew, Hindi, Korean, and German. Hwang et al. (2017) present the theoretical underpinnings of the v2 scheme. Schneider et al. (2018) summarize the scheme, its application to English corpus data, and an automatic disambiguation task. |
Tasks | |
Published | 2017-04-07 |
URL | https://arxiv.org/abs/1704.02134v6 |
https://arxiv.org/pdf/1704.02134v6.pdf | |
PWC | https://paperswithcode.com/paper/adposition-and-case-supersenses-v2-guidelines |
Repo | https://github.com/nert-nlp/streusle |
Framework | none |