May 7, 2019

2842 words 14 mins read

Paper Group AWR 43

Paper Group AWR 43

A Deep Multi-Level Network for Saliency Prediction. Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation. ILGNet: Inception Modules with Connected Local and Global Features for Efficient Image Aesthetic Quality Classification using Domain Adaptation. Event-based, 6-DOF Camera Tracking from Photometric Depth Maps. VConv-DAE: Dee …

A Deep Multi-Level Network for Saliency Prediction

Title A Deep Multi-Level Network for Saliency Prediction
Authors Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
Abstract This paper presents a novel deep architecture for saliency prediction. Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural Network (CNN). Our model is composed of three main blocks: a feature extraction CNN, a feature encoding network, that weights low and high level feature maps, and a prior learning network. We compare our solution with state of the art saliency models on two public benchmarks datasets. Results show that our model outperforms under all evaluation metrics on the SALICON dataset, which is currently the largest public dataset for saliency prediction, and achieves competitive results on the MIT300 benchmark.
Tasks Saliency Prediction
Published 2016-09-05
URL http://arxiv.org/abs/1609.01064v2
PDF http://arxiv.org/pdf/1609.01064v2.pdf
PWC https://paperswithcode.com/paper/a-deep-multi-level-network-for-saliency
Repo https://github.com/marcellacornia/mlnet
Framework pytorch

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Title Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation
Authors Golnaz Ghiasi, Charless C. Fowlkes
Abstract CNN architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixel-accurate labeling. This paper makes two contributions: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the high-dimensional feature representation contains significant sub-pixel localization information. (2) We describe a multi-resolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lower-resolution maps. This approach yields state-of-the-art semantic segmentation results on the PASCAL VOC and Cityscapes segmentation benchmarks without resorting to more complex random-field inference or instance detection driven architectures.
Tasks Semantic Segmentation
Published 2016-05-08
URL http://arxiv.org/abs/1605.02264v2
PDF http://arxiv.org/pdf/1605.02264v2.pdf
PWC https://paperswithcode.com/paper/laplacian-pyramid-reconstruction-and
Repo https://github.com/golnazghiasi/LRR
Framework none

ILGNet: Inception Modules with Connected Local and Global Features for Efficient Image Aesthetic Quality Classification using Domain Adaptation

Title ILGNet: Inception Modules with Connected Local and Global Features for Efficient Image Aesthetic Quality Classification using Domain Adaptation
Authors Xin Jin, Le Wu, Xiaodong Li, Xiaokun Zhang, Jingying Chi, Siwei Peng, Shiming Ge, Geng Zhao, Shuying Li
Abstract In this paper, we address a challenging problem of aesthetic image classification, which is to label an input image as high or low aesthetic quality. We take both the local and global features of images into consideration. A novel deep convolutional neural network named ILGNet is proposed, which combines both the Inception modules and an connected layer of both Local and Global features. The ILGnet is based on GoogLeNet. Thus, it is easy to use a pre-trained GoogLeNet for large-scale image classification problem and fine tune our connected layers on an large scale database of aesthetic related images: AVA, i.e. \emph{domain adaptation}. The experiments reveal that our model achieves the state of the arts in AVA database. Both the training and testing speeds of our model are higher than those of the original GoogLeNet.
Tasks Domain Adaptation, Image Classification, Image Quality Estimation
Published 2016-10-07
URL http://arxiv.org/abs/1610.02256v3
PDF http://arxiv.org/pdf/1610.02256v3.pdf
PWC https://paperswithcode.com/paper/ilgnet-inception-modules-with-connected-local
Repo https://github.com/BestiVictory/ILGnet
Framework caffe2

Event-based, 6-DOF Camera Tracking from Photometric Depth Maps

Title Event-based, 6-DOF Camera Tracking from Photometric Depth Maps
Authors Guillermo Gallego, Jon E. A. Lund, Elias Mueggler, Henri Rebecq, Tobi Delbruck, Davide Scaramuzza
Abstract Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, make event cameras an ideal complement to standard cameras for VR/AR and video game applications. With these applications in mind, this paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. We successfully evaluate the method in both indoor and outdoor scenes and show that—because of the technological advantages of the event camera—our pipeline works in scenes characterized by high-speed motion, which are still unaccessible to standard cameras.
Tasks
Published 2016-07-12
URL http://arxiv.org/abs/1607.03468v2
PDF http://arxiv.org/pdf/1607.03468v2.pdf
PWC https://paperswithcode.com/paper/event-based-6-dof-camera-tracking-from
Repo https://github.com/uzh-rpg/event-based_vision_resources
Framework none

VConv-DAE: Deep Volumetric Shape Learning Without Object Labels

Title VConv-DAE: Deep Volumetric Shape Learning Without Object Labels
Authors Abhishek Sharma, Oliver Grau, Mario Fritz
Abstract With the advent of affordable depth sensors, 3D capture becomes more and more ubiquitous and already has made its way into commercial products. Yet, capturing the geometry or complete shapes of everyday objects using scanning devices (e.g. Kinect) still comes with several challenges that result in noise or even incomplete shapes. Recent success in deep learning has shown how to learn complex shape distributions in a data-driven way from large scale 3D CAD Model collections and to utilize them for 3D processing on volumetric representations and thereby circumventing problems of topology and tessellation. Prior work has shown encouraging results on problems ranging from shape completion to recognition. We provide an analysis of such approaches and discover that training as well as the resulting representation are strongly and unnecessarily tied to the notion of object labels. Thus, we propose a full convolutional volumetric auto encoder that learns volumetric representation from noisy data by estimating the voxel occupancy grids. The proposed method outperforms prior work on challenging tasks like denoising and shape completion. We also show that the obtained deep embedding gives competitive performance when used for classification and promising results for shape interpolation.
Tasks Denoising
Published 2016-04-13
URL http://arxiv.org/abs/1604.03755v3
PDF http://arxiv.org/pdf/1604.03755v3.pdf
PWC https://paperswithcode.com/paper/vconv-dae-deep-volumetric-shape-learning
Repo https://github.com/diskhkme/VCONV_DAE_TF
Framework tf

Deep Learning with Differential Privacy

Title Deep Learning with Differential Privacy
Authors Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang
Abstract Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
Tasks
Published 2016-07-01
URL http://arxiv.org/abs/1607.00133v2
PDF http://arxiv.org/pdf/1607.00133v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-with-differential-privacy
Repo https://github.com/cyrusgeyer/DiffPrivate_FedLearning
Framework tf

Deep Learning Convolutional Networks for Multiphoton Microscopy Vasculature Segmentation

Title Deep Learning Convolutional Networks for Multiphoton Microscopy Vasculature Segmentation
Authors Petteri Teikari, Marc Santos, Charissa Poon, Kullervo Hynynen
Abstract Recently there has been an increasing trend to use deep learning frameworks for both 2D consumer images and for 3D medical images. However, there has been little effort to use deep frameworks for volumetric vascular segmentation. We wanted to address this by providing a freely available dataset of 12 annotated two-photon vasculature microscopy stacks. We demonstrated the use of deep learning framework consisting both 2D and 3D convolutional filters (ConvNet). Our hybrid 2D-3D architecture produced promising segmentation result. We derived the architectures from Lee et al. who used the ZNN framework initially designed for electron microscope image segmentation. We hope that by sharing our volumetric vasculature datasets, we will inspire other researchers to experiment with vasculature dataset and improve the used network architectures.
Tasks Semantic Segmentation
Published 2016-06-08
URL http://arxiv.org/abs/1606.02382v1
PDF http://arxiv.org/pdf/1606.02382v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-convolutional-networks-for
Repo https://github.com/petteriTeikari/vesselNN
Framework tf

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition

Title Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
Authors Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, Yu Qiao
Abstract Traditional feature encoding scheme (e.g., Fisher vector) with local descriptors (e.g., SIFT) and recent convolutional neural networks (CNNs) are two classes of successful methods for image recognition. In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition. To this end, we make three main contributions from the following aspects. First, we propose a patch-level and end-to-end architecture to model the appearance of local patches, called {\em PatchNet}. PatchNet is essentially a customized network trained in a weakly supervised manner, which uses the image-level supervision to guide the patch-level feature extraction. Second, we present a hybrid visual representation, called {\em VSAD}, by utilizing the robust feature representations of PatchNet to describe local patches and exploiting the semantic probabilities of PatchNet to aggregate these local patches into a global representation. Third, based on the proposed VSAD representation, we propose a new state-of-the-art scene recognition approach, which achieves an excellent performance on two standard benchmarks: MIT Indoor67 (86.2%) and SUN397 (73.0%).
Tasks Scene Recognition
Published 2016-09-01
URL http://arxiv.org/abs/1609.00153v2
PDF http://arxiv.org/pdf/1609.00153v2.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-patchnets-describing-and
Repo https://github.com/wangzheallen/vsad
Framework none

Automatic tagging using deep convolutional neural networks

Title Automatic tagging using deep convolutional neural networks
Authors Keunwoo Choi, George Fazekas, Mark Sandler
Abstract We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.
Tasks
Published 2016-06-01
URL http://arxiv.org/abs/1606.00298v1
PDF http://arxiv.org/pdf/1606.00298v1.pdf
PWC https://paperswithcode.com/paper/automatic-tagging-using-deep-convolutional
Repo https://github.com/keunwoochoi/MSD_split_for_tagging
Framework none

Deep Temporal Linear Encoding Networks

Title Deep Temporal Linear Encoding Networks
Authors Ali Diba, Vivek Sharma, Luc Van Gool
Abstract The CNN-encoding of features from entire videos for the representation of human actions has rarely been addressed. Instead, CNN work has focused on approaches to fuse spatial and temporal networks, but these were typically limited to processing shorter sequences. We present a new video representation, called temporal linear encoding (TLE) and embedded inside of CNNs as a new layer, which captures the appearance and motion throughout entire videos. It encodes this aggregated information into a robust video feature representation, via end-to-end learning. Advantages of TLEs are: (a) they encode the entire video into a compact feature representation, learning the semantics and a discriminative feature space; (b) they are applicable to all kinds of networks like 2D and 3D CNNs for video classification; and (c) they model feature interactions in a more expressive way and without loss of information. We conduct experiments on two challenging human action datasets: HMDB51 and UCF101. The experiments show that TLE outperforms current state-of-the-art methods on both datasets.
Tasks Representation Learning, Video Classification
Published 2016-11-21
URL http://arxiv.org/abs/1611.06678v1
PDF http://arxiv.org/pdf/1611.06678v1.pdf
PWC https://paperswithcode.com/paper/deep-temporal-linear-encoding-networks
Repo https://github.com/bryanyzhu/two-stream-pytorch
Framework pytorch

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Title Second-Order Stochastic Optimization for Machine Learning in Linear Time
Authors Naman Agarwal, Brian Bullins, Elad Hazan
Abstract First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods. Furthermore, our algorithm has the desirable property of being implementable in time linear in the sparsity of the input data.
Tasks Stochastic Optimization
Published 2016-02-12
URL http://arxiv.org/abs/1602.03943v5
PDF http://arxiv.org/pdf/1602.03943v5.pdf
PWC https://paperswithcode.com/paper/second-order-stochastic-optimization-for
Repo https://github.com/darkonhub/darkon
Framework tf

Context-aware Sentiment Word Identification: sentiword2vec

Title Context-aware Sentiment Word Identification: sentiword2vec
Authors Yushi Yao, Guangjian Li
Abstract Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions.
Tasks Language Modelling, Sentiment Analysis
Published 2016-12-12
URL http://arxiv.org/abs/1612.03769v1
PDF http://arxiv.org/pdf/1612.03769v1.pdf
PWC https://paperswithcode.com/paper/context-aware-sentiment-word-identification
Repo https://github.com/brooksyd2/data-science.question-audience-assignment
Framework none

Structured Sequence Modeling with Graph Convolutional Recurrent Networks

Title Structured Sequence Modeling with Graph Convolutional Recurrent Networks
Authors Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson
Abstract This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep learning model able to predict structured sequences of data. Precisely, GCRN is a generalization of classical recurrent neural networks (RNN) to data structured by an arbitrary graph. Such structured sequences can represent series of frames in videos, spatio-temporal measurements on a network of sensors, or random walks on a vocabulary graph for natural language modeling. The proposed model combines convolutional neural networks (CNN) on graphs to identify spatial structures and RNN to find dynamic patterns. We study two possible architectures of GCRN, and apply the models to two practical problems: predicting moving MNIST data, and modeling natural language with the Penn Treebank dataset. Experiments show that exploiting simultaneously graph spatial and dynamic information about data can improve both precision and learning speed.
Tasks Language Modelling
Published 2016-12-22
URL http://arxiv.org/abs/1612.07659v1
PDF http://arxiv.org/pdf/1612.07659v1.pdf
PWC https://paperswithcode.com/paper/structured-sequence-modeling-with-graph
Repo https://github.com/dariush-salami/gcn-gesture-recognition
Framework pytorch

Dual Deep Network for Visual Tracking

Title Dual Deep Network for Visual Tracking
Authors Zhizhen Chi, Hongyang Li, Huchuan Lu, Ming-Hsuan Yang
Abstract Visual tracking addresses the problem of identifying and localizing an unknown target in a video given the target specified by a bounding box in the first frame. In this paper, we propose a dual network to better utilize features among layers for visual tracking. It is observed that features in higher layers encode semantic context while its counterparts in lower layers are sensitive to discriminative appearance. Thus we exploit the hierarchical features in different layers of a deep model and design a dual structure to obtain better feature representation from various streams, which is rarely investigated in previous work. To highlight geometric contours of the target, we integrate the hierarchical feature maps with an edge detector as the coarse prior maps to further embed local details around the target. To leverage the robustness of our dual network, we train it with random patches measuring the similarities between the network activation and target appearance, which serves as a regularization to enforce the dual network to focus on target object. The proposed dual network is updated online in a unique manner based on the observation that the target being tracked in consecutive frames should share more similar feature representations than those in the surrounding background. It is also found that for a target object, the prior maps can help further enhance performance by passing message into the output maps of the dual network. Therefore, an independent component analysis with reference algorithm (ICA-R) is employed to extract target context using prior maps as guidance. Online tracking is conducted by maximizing the posterior estimate on the final maps with stochastic and periodic update. Quantitative and qualitative evaluations on two large-scale benchmark data sets show that the proposed algorithm performs favourably against the state-of-the-arts.
Tasks Visual Tracking
Published 2016-12-19
URL http://arxiv.org/abs/1612.06053v1
PDF http://arxiv.org/pdf/1612.06053v1.pdf
PWC https://paperswithcode.com/paper/dual-deep-network-for-visual-tracking
Repo https://github.com/chizhizhen/DNT
Framework none

Supervised Learning with Quantum-Inspired Tensor Networks

Title Supervised Learning with Quantum-Inspired Tensor Networks
Authors E. Miles Stoudenmire, David J. Schwab
Abstract Tensor networks are efficient representations of high-dimensional tensors which have been very successful for physics and mathematics applications. We demonstrate how algorithms for optimizing such networks can be adapted to supervised learning tasks by using matrix product states (tensor trains) to parameterize models for classifying images. For the MNIST data set we obtain less than 1% test set classification error. We discuss how the tensor network form imparts additional structure to the learned model and suggest a possible generative interpretation.
Tasks Tensor Networks
Published 2016-05-18
URL http://arxiv.org/abs/1605.05775v2
PDF http://arxiv.org/pdf/1605.05775v2.pdf
PWC https://paperswithcode.com/paper/supervised-learning-with-quantum-inspired
Repo https://github.com/cylo/uni10-lamps
Framework none
comments powered by Disqus