July 29, 2019

3177 words 15 mins read

Paper Group AWR 106

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection. BlitzNet: A Real-Time Deep Network for Scene Understanding. Learning Graph-Level Representation for Drug Discovery. Neural Models for Documents with Metadata. HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Local …

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection


Title	Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection
Authors	Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox
Abstract	General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification as well as to spatial and temporal action localization. The two contributions clearly improve the performance over respective baselines. The overall approach achieves state-of-the-art action classification performance on HMDB51, J-HMDB and NTU RGB+D datasets. Moreover, it yields state-of-the-art spatio-temporal action localization results on UCF101 and J-HMDB.
Tasks	Action Classification, Action Localization, Skeleton Based Action Recognition, Spatio-Temporal Action Localization, Temporal Action Localization
Published	2017-04-03
URL	http://arxiv.org/abs/1704.00616v2
PDF	http://arxiv.org/pdf/1704.00616v2.pdf
PWC	https://paperswithcode.com/paper/chained-multi-stream-networks-exploiting-pose
Repo	https://github.com/mzolfaghari/chained-multistream-networks
Framework	none

BlitzNet: A Real-Time Deep Network for Scene Understanding


Title	BlitzNet: A Real-Time Deep Network for Scene Understanding
Authors	Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid
Abstract	Real-time scene understanding has become crucial in many applications such as autonomous driving. In this paper, we propose a deep architecture, called BlitzNet, that jointly performs object detection and semantic segmentation in one forward pass, allowing real-time computations. Besides the computational gain of having a single network to perform several tasks, we show that object detection and semantic segmentation benefit from each other in terms of accuracy. Experimental results for VOC and COCO datasets show state-of-the-art performance for object detection and segmentation among real time systems.
Tasks	Autonomous Driving, Object Detection, Real-Time Object Detection, Real-Time Semantic Segmentation, Scene Understanding, Semantic Segmentation
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02813v1
PDF	http://arxiv.org/pdf/1708.02813v1.pdf
PWC	https://paperswithcode.com/paper/blitznet-a-real-time-deep-network-for-scene
Repo	https://github.com/dvornikita/blitznet
Framework	tf

Learning Graph-Level Representation for Drug Discovery


Title	Learning Graph-Level Representation for Drug Discovery
Authors	Junying Li, Deng Cai, Xiaofei He
Abstract	Predicating macroscopic influences of drugs on human body, like efficacy and toxicity, is a central problem of small-molecule based drug discovery. Molecules can be represented as an undirected graph, and we can utilize graph convolution networks to predication molecular properties. However, graph convolutional networks and other graph neural networks all focus on learning node-level representation rather than graph-level representation. Previous works simply sum all feature vectors for all nodes in the graph to obtain the graph feature vector for drug predication. In this paper, we introduce a dummy super node that is connected with all nodes in the graph by a directed edge as the representation of the graph and modify the graph operation to help the dummy super node learn graph-level feature. Thus, we can handle graph-level classification and regression in the same way as node-level classification and regression. In addition, we apply focal loss to address class imbalance in drug datasets. The experiments on MoleculeNet show that our method can effectively improve the performance of molecular properties predication.
Tasks	Drug Discovery
Published	2017-09-12
URL	http://arxiv.org/abs/1709.03741v2
PDF	http://arxiv.org/pdf/1709.03741v2.pdf
PWC	https://paperswithcode.com/paper/learning-graph-level-representation-for-drug
Repo	https://github.com/ZJULearning/graph_level_drug_discovery
Framework	none

Neural Models for Documents with Metadata


Title	Neural Models for Documents with Metadata
Authors	Dallas Card, Chenhao Tan, Noah A. Smith
Abstract	Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information. While specialized models have been developed for particular applications, few are widely used in practice, as customization typically requires derivation of a custom inference algorithm. In this paper, we build on recent advances in variational inference methods and propose a general neural framework, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models. Our approach achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity. Finally, we demonstrate the potential of our framework through an exploration of a corpus of articles about US immigration.
Tasks	Topic Models
Published	2017-05-25
URL	http://arxiv.org/abs/1705.09296v2
PDF	http://arxiv.org/pdf/1705.09296v2.pdf
PWC	https://paperswithcode.com/paper/neural-models-for-documents-with-metadata
Repo	https://github.com/dallascard/scholar
Framework	pytorch

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization


Title	HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
Authors	Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan
Abstract	This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos. We refer to it as HACS (Human Action Clips and Segments). We leverage both consensus and disagreement among visual classifiers to automatically mine candidate short clips from unlabeled videos, which are subsequently validated by human annotators. The resulting dataset is dubbed HACS Clips. Through a separate process we also collect annotations defining action segment boundaries. This resulting dataset is called HACS Segments. Overall, HACS Clips consists of 1.5M annotated clips sampled from 504K untrimmed videos, and HACS Seg-ments contains 139K action segments densely annotatedin 50K untrimmed videos spanning 200 action categories. HACS Clips contains more labeled examples than any existing video benchmark. This renders our dataset both a large scale action recognition benchmark and an excellent source for spatiotemporal feature learning. In our transferlearning experiments on three target datasets, HACS Clips outperforms Kinetics-600, Moments-In-Time and Sports1Mas a pretraining source. On HACS Segments, we evaluate state-of-the-art methods of action proposal generation and action localization, and highlight the new challenges posed by our dense temporal annotations.
Tasks	Action Classification, Action Localization, Temporal Action Localization, Temporal Localization, Transfer Learning
Published	2017-12-26
URL	https://arxiv.org/abs/1712.09374v3
PDF	https://arxiv.org/pdf/1712.09374v3.pdf
PWC	https://paperswithcode.com/paper/hacs-human-action-clips-and-segments-dataset
Repo	https://github.com/hangzhaomit/HACS-dataset
Framework	none

Improving the Neural GPU Architecture for Algorithm Learning


Title	Improving the Neural GPU Architecture for Algorithm Learning
Authors	Karlis Freivalds, Renars Liepins
Abstract	Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a new technique - hard nonlinearities with saturation costs- that has general applicability. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end.
Tasks
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08727v2
PDF	http://arxiv.org/pdf/1702.08727v2.pdf
PWC	https://paperswithcode.com/paper/improving-the-neural-gpu-architecture-for
Repo	https://github.com/LUMII-Syslab/DNGPU
Framework	tf

Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking


Title	Gang of GANs: Generative Adversarial Networks with Maximum Margin Ranking
Authors	Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides
Abstract	Traditional generative adversarial networks (GAN) and many of its variants are trained by minimizing the KL or JS-divergence loss that measures how close the generated data distribution is from the true data distribution. A recent advance called the WGAN based on Wasserstein distance can improve on the KL and JS-divergence based GANs, and alleviate the gradient vanishing, instability, and mode collapse issues that are common in the GAN training. In this work, we aim at improving on the WGAN by first generalizing its discriminator loss to a margin-based one, which leads to a better discriminator, and in turn a better generator, and then carrying out a progressive training paradigm involving multiple GANs to contribute to the maximum margin ranking loss so that the GAN at later stages will improve upon early stages. We call this method Gang of GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce the gap between the true data distribution and the generated data distribution by at least half in an optimally trained WGAN. We have also proposed a new way of measuring GAN quality which is based on image completion tasks. We have evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR-10, and 50K-SSFF, and have seen both visual and quantitative improvement over baseline WGAN.
Tasks
Published	2017-04-17
URL	http://arxiv.org/abs/1704.04865v1
PDF	http://arxiv.org/pdf/1704.04865v1.pdf
PWC	https://paperswithcode.com/paper/gang-of-gans-generative-adversarial-networks
Repo	https://github.com/human-analysis/RankGAN
Framework	pytorch

DCFNet: Discriminant Correlation Filters Network for Visual Tracking


Title	DCFNet: Discriminant Correlation Filters Network for Visual Tracking
Authors	Qiang Wang, Jin Gao, Junliang Xing, Mengdan Zhang, Weiming Hu
Abstract	Discriminant Correlation Filters (DCF) based methods now become a kind of dominant approach to online object tracking. The features used in these methods, however, are either based on hand-crafted features like HoGs, or convolutional features trained independently from other tasks like image classification. In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously. Specifically, we treat DCF as a special correlation filter layer added in a Siamese network, and carefully derive the backpropagation through it by defining the network output as the probability heatmap of object location. Since the derivation is still carried out in Fourier frequency domain, the efficiency property of DCF is preserved. This enables our tracker to run at more than 60 FPS during test time, while achieving a significant accuracy gain compared with KCF using HoGs. Extensive evaluations on OTB-2013, OTB-2015, and VOT2015 benchmarks demonstrate that the proposed DCFNet tracker is competitive with several state-of-the-art trackers, while being more compact and much faster.
Tasks	Object Tracking, Visual Tracking
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04057v1
PDF	http://arxiv.org/pdf/1704.04057v1.pdf
PWC	https://paperswithcode.com/paper/dcfnet-discriminant-correlation-filters
Repo	https://github.com/linzhi123/DCFNet
Framework	none

An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation


Title	An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation
Authors	Christian F. Baumgartner, Lisa M. Koch, Marc Pollefeys, Ender Konukoglu
Abstract	Accurate segmentation of the heart is an important step towards evaluating cardiac function. In this paper, we present a fully automated framework for segmentation of the left (LV) and right (RV) ventricular cavities and the myocardium (Myo) on short-axis cardiac MR images. We investigate various 2D and 3D convolutional neural network architectures for this task. We investigate the suitability of various state-of-the art 2D and 3D convolutional neural network architectures, as well as slight modifications thereof, for this task. Experiments were performed on the ACDC 2017 challenge training dataset comprising cardiac MR images of 100 patients, where manual reference segmentations were made available for end-diastolic (ED) and end-systolic (ES) frames. We find that processing the images in a slice-by-slice fashion using 2D networks is beneficial due to a relatively large slice thickness. However, the exact network architecture only plays a minor role. We report mean Dice coefficients of $0.950$ (LV), $0.893$ (RV), and $0.899$ (Myo), respectively with an average evaluation time of 1.1 seconds per volume on a modern GPU.
Tasks	Semantic Segmentation
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04496v2
PDF	http://arxiv.org/pdf/1709.04496v2.pdf
PWC	https://paperswithcode.com/paper/an-exploration-of-2d-and-3d-deep-learning
Repo	https://github.com/baumgach/acdc_segmenter
Framework	tf

Sparse models for Computer Vision


Title	Sparse models for Computer Vision
Authors	Laurent Perrinet
Abstract	The representation of images in the brain is known to be sparse. That is, as neural activity is recorded in a visual area —for instance the primary visual cortex of primates— only a few neurons are active at a given time with respect to the whole population. It is believed that such a property reflects the efficient match of the representation with the statistics of natural scenes. Applying such a paradigm to computer vision therefore seems a promising approach towards more biomimetic algorithms. Herein, we will describe a biologically-inspired approach to this problem. First, we will describe an unsupervised learning paradigm which is particularly adapted to the efficient coding of image patches. Then, we will outline a complete multi-scale framework —SparseLets— implementing a biologically inspired sparse representation of natural images. Finally, we will propose novel methods for integrating prior information into these algorithms and provide some preliminary experimental results. We will conclude by giving some perspective on applying such algorithms to computer vision. More specifically, we will propose that bio-inspired approaches may be applied to computer vision using predictive coding schemes, sparse models being one simple and efficient instance of such schemes.
Tasks
Published	2017-01-24
URL	http://arxiv.org/abs/1701.06859v1
PDF	http://arxiv.org/pdf/1701.06859v1.pdf
PWC	https://paperswithcode.com/paper/sparse-models-for-computer-vision
Repo	https://github.com/bicv/Perrinet2015BICV_sparse
Framework	none

General Latent Feature Models for Heterogeneous Datasets


Title	General Latent Feature Models for Heterogeneous Datasets
Authors	Isabel Valera, Melanie F. Pradier, Maria Lomeli, Zoubin Ghahramani
Abstract	Latent feature modeling allows capturing the latent structure responsible for generating the observed properties of a set of objects. It is often used to make predictions either for new values of interest or missing information in the original data, as well as to perform data exploratory analysis. However, although there is an extensive literature on latent feature models for homogeneous datasets, where all the attributes that describe each object are of the same (continuous or discrete) nature, there is a lack of work on latent feature modeling for heterogeneous databases. In this paper, we introduce a general Bayesian nonparametric latent feature model suitable for heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while keeping the properties of conjugate models, which allow us to infer the model in linear time with respect to the number of objects and attributes. Second, its Bayesian nonparametric nature allows us to automatically infer the model complexity from the data, i.e., the number of features necessary to capture the latent structure in the data. Third, the latent features in the model are binary-valued variables, easing the interpretability of the obtained latent features in data exploratory analysis. We show the flexibility of the proposed model by solving both prediction and data analysis tasks on several real-world datasets. Moreover, a software package of the GLFM is publicly available for other researcher to use and improve it.
Tasks
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03779v2
PDF	http://arxiv.org/pdf/1706.03779v2.pdf
PWC	https://paperswithcode.com/paper/general-latent-feature-models-for
Repo	https://github.com/ivaleraM/GLFM
Framework	none

Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes


Title	Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes
Authors	Francois Petitjean, Wray Buntine, Geoffrey I. Webb, Nayyar Zaidi
Abstract	This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs). The main result of this paper is to show that improved parameter estimation allows BNCs to outperform leading learning methods such as Random Forest for both 0-1 loss and RMSE, albeit just on categorical datasets. As data assets become larger, entering the hyped world of “big”, efficient accurate classification requires three main elements: (1) classifiers with low-bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest Bayesian network classifiers (BNCs) satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters’ estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of Hierarchical Dirichlet Processes for accurate BNC parameter estimation. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with Random Forest in terms of prediction, while keeping the out-of-core capability and superior classification time.
Tasks
Published	2017-08-25
URL	http://arxiv.org/abs/1708.07581v3
PDF	http://arxiv.org/pdf/1708.07581v3.pdf
PWC	https://paperswithcode.com/paper/accurate-parameter-estimation-for-bayesian
Repo	https://github.com/fpetitjean/HDP
Framework	none

Forecasting of commercial sales with large scale Gaussian Processes


Title	Forecasting of commercial sales with large scale Gaussian Processes
Authors	Rodrigo Rivera, Evgeny Burnaev
Abstract	This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.
Tasks	Decision Making, Gaussian Processes
Published	2017-09-16
URL	http://arxiv.org/abs/1709.05548v1
PDF	http://arxiv.org/pdf/1709.05548v1.pdf
PWC	https://paperswithcode.com/paper/forecasting-of-commercial-sales-with-large
Repo	https://github.com/rodrigorivera/forecastingcommercial
Framework	none

Low-Dose CT with a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN)


Title	Low-Dose CT with a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN)
Authors	Hu Chen, Yi Zhang, Mannudeep K. Kalra, Feng Lin, Yang Chen, Peixi Liao, Jiliu Zhou, Ge Wang
Abstract	Given the potential X-ray radiation risk to the patient, low-dose CT has attracted a considerable interest in the medical imaging field. The current main stream low-dose CT methods include vendor-specific sinogram domain filtration and iterative reconstruction, but they need to access original raw data whose formats are not transparent to most users. Due to the difficulty of modeling the statistical characteristics in the image domain, the existing methods for directly processing reconstructed images cannot eliminate image noise very well while keeping structural details. Inspired by the idea of deep learning, here we combine the autoencoder, the deconvolution network, and shortcut connections into the residual encoder-decoder convolutional neural network (RED-CNN) for low-dose CT imaging. After patch-based training, the proposed RED-CNN achieves a competitive performance relative to the-state-of-art methods in both simulated and clinical cases. Especially, our method has been favorably evaluated in terms of noise suppression, structural preservation and lesion detection.
Tasks
Published	2017-02-01
URL	http://arxiv.org/abs/1702.00288v3
PDF	http://arxiv.org/pdf/1702.00288v3.pdf
PWC	https://paperswithcode.com/paper/low-dose-ct-with-a-residual-encoder-decoder
Repo	https://github.com/SSinyu/RED_CNN
Framework	pytorch

Adposition and Case Supersenses v2.5: Guidelines for English


Title	Adposition and Case Supersenses v2.5: Guidelines for English
Authors	Nathan Schneider, Jena D. Hwang, Archna Bhatia, Vivek Srikumar, Na-Rae Han, Tim O’Gorman, Sarah R. Moeller, Omri Abend, Adi Shalev, Austin Blodgett, Jakob Prange
Abstract	This document offers a detailed linguistic description of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al., 2018), an inventory of 50 semantic labels (“supersenses”) that characterize the use of adpositions and case markers at a somewhat coarse level of granularity, as demonstrated in the STREUSLE corpus (https://github.com/nert-gu/streusle/; version 4.3 tracks guidelines version 2.5). Though the SNACS inventory aspires to be universal, this document is specific to English; documentation for other languages will be published separately. Version 2 is a revision of the supersense inventory proposed for English by Schneider et al. (2015, 2016) (henceforth “v1”), which in turn was based on previous schemes. The present inventory was developed after extensive review of the v1 corpus annotations for English, plus previously unanalyzed genitive case possessives (Blodgett and Schneider, 2018), as well as consideration of adposition and case phenomena in Hebrew, Hindi, Korean, and German. Hwang et al. (2017) present the theoretical underpinnings of the v2 scheme. Schneider et al. (2018) summarize the scheme, its application to English corpus data, and an automatic disambiguation task.
Tasks
Published	2017-04-07
URL	https://arxiv.org/abs/1704.02134v6
PDF	https://arxiv.org/pdf/1704.02134v6.pdf
PWC	https://paperswithcode.com/paper/adposition-and-case-supersenses-v2-guidelines
Repo	https://github.com/nert-nlp/streusle
Framework	none