May 7, 2019

2842 words 14 mins read

Paper Group AWR 43

A Deep Multi-Level Network for Saliency Prediction. Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation. ILGNet: Inception Modules with Connected Local and Global Features for Efficient Image Aesthetic Quality Classification using Domain Adaptation. Event-based, 6-DOF Camera Tracking from Photometric Depth Maps. VConv-DAE: Dee …

A Deep Multi-Level Network for Saliency Prediction


Title	A Deep Multi-Level Network for Saliency Prediction
Authors	Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara
Abstract	This paper presents a novel deep architecture for saliency prediction. Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps. We propose an architecture which, instead, combines features extracted at different levels of a Convolutional Neural Network (CNN). Our model is composed of three main blocks: a feature extraction CNN, a feature encoding network, that weights low and high level feature maps, and a prior learning network. We compare our solution with state of the art saliency models on two public benchmarks datasets. Results show that our model outperforms under all evaluation metrics on the SALICON dataset, which is currently the largest public dataset for saliency prediction, and achieves competitive results on the MIT300 benchmark.
Tasks	Saliency Prediction
Published	2016-09-05
URL	http://arxiv.org/abs/1609.01064v2
PDF	http://arxiv.org/pdf/1609.01064v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-multi-level-network-for-saliency
Repo	https://github.com/marcellacornia/mlnet
Framework	pytorch


Title	Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation
Authors	Golnaz Ghiasi, Charless C. Fowlkes
Abstract	CNN architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixel-accurate labeling. This paper makes two contributions: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the high-dimensional feature representation contains significant sub-pixel localization information. (2) We describe a multi-resolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lower-resolution maps. This approach yields state-of-the-art semantic segmentation results on the PASCAL VOC and Cityscapes segmentation benchmarks without resorting to more complex random-field inference or instance detection driven architectures.
Tasks	Semantic Segmentation
Published	2016-05-08
URL	http://arxiv.org/abs/1605.02264v2
PDF	http://arxiv.org/pdf/1605.02264v2.pdf
PWC	https://paperswithcode.com/paper/laplacian-pyramid-reconstruction-and
Repo	https://github.com/golnazghiasi/LRR
Framework	none

ILGNet: Inception Modules with Connected Local and Global Features for Efficient Image Aesthetic Quality Classification using Domain Adaptation


Title	ILGNet: Inception Modules with Connected Local and Global Features for Efficient Image Aesthetic Quality Classification using Domain Adaptation
Authors	Xin Jin, Le Wu, Xiaodong Li, Xiaokun Zhang, Jingying Chi, Siwei Peng, Shiming Ge, Geng Zhao, Shuying Li
Abstract	In this paper, we address a challenging problem of aesthetic image classification, which is to label an input image as high or low aesthetic quality. We take both the local and global features of images into consideration. A novel deep convolutional neural network named ILGNet is proposed, which combines both the Inception modules and an connected layer of both Local and Global features. The ILGnet is based on GoogLeNet. Thus, it is easy to use a pre-trained GoogLeNet for large-scale image classification problem and fine tune our connected layers on an large scale database of aesthetic related images: AVA, i.e. \emph{domain adaptation}. The experiments reveal that our model achieves the state of the arts in AVA database. Both the training and testing speeds of our model are higher than those of the original GoogLeNet.
Tasks	Domain Adaptation, Image Classification, Image Quality Estimation
Published	2016-10-07
URL	http://arxiv.org/abs/1610.02256v3
PDF	http://arxiv.org/pdf/1610.02256v3.pdf
PWC	https://paperswithcode.com/paper/ilgnet-inception-modules-with-connected-local
Repo	https://github.com/BestiVictory/ILGnet
Framework	caffe2

Event-based, 6-DOF Camera Tracking from Photometric Depth Maps


Title	Event-based, 6-DOF Camera Tracking from Photometric Depth Maps
Authors	Guillermo Gallego, Jon E. A. Lund, Elias Mueggler, Henri Rebecq, Tobi Delbruck, Davide Scaramuzza
Abstract	Event cameras are bio-inspired vision sensors that output pixel-level brightness changes instead of standard intensity frames. These cameras do not suffer from motion blur and have a very high dynamic range, which enables them to provide reliable visual information during high-speed motions or in scenes characterized by high dynamic range. These features, along with a very low power consumption, make event cameras an ideal complement to standard cameras for VR/AR and video game applications. With these applications in mind, this paper tackles the problem of accurate, low-latency tracking of an event camera from an existing photometric depth map (i.e., intensity plus depth information) built via classic dense reconstruction pipelines. Our approach tracks the 6-DOF pose of the event camera upon the arrival of each event, thus virtually eliminating latency. We successfully evaluate the method in both indoor and outdoor scenes and show that—because of the technological advantages of the event camera—our pipeline works in scenes characterized by high-speed motion, which are still unaccessible to standard cameras.
Tasks
Published	2016-07-12
URL	http://arxiv.org/abs/1607.03468v2
PDF	http://arxiv.org/pdf/1607.03468v2.pdf
PWC	https://paperswithcode.com/paper/event-based-6-dof-camera-tracking-from
Repo	https://github.com/uzh-rpg/event-based_vision_resources
Framework	none

VConv-DAE: Deep Volumetric Shape Learning Without Object Labels


Title	VConv-DAE: Deep Volumetric Shape Learning Without Object Labels
Authors	Abhishek Sharma, Oliver Grau, Mario Fritz
Abstract	With the advent of affordable depth sensors, 3D capture becomes more and more ubiquitous and already has made its way into commercial products. Yet, capturing the geometry or complete shapes of everyday objects using scanning devices (e.g. Kinect) still comes with several challenges that result in noise or even incomplete shapes. Recent success in deep learning has shown how to learn complex shape distributions in a data-driven way from large scale 3D CAD Model collections and to utilize them for 3D processing on volumetric representations and thereby circumventing problems of topology and tessellation. Prior work has shown encouraging results on problems ranging from shape completion to recognition. We provide an analysis of such approaches and discover that training as well as the resulting representation are strongly and unnecessarily tied to the notion of object labels. Thus, we propose a full convolutional volumetric auto encoder that learns volumetric representation from noisy data by estimating the voxel occupancy grids. The proposed method outperforms prior work on challenging tasks like denoising and shape completion. We also show that the obtained deep embedding gives competitive performance when used for classification and promising results for shape interpolation.
Tasks	Denoising
Published	2016-04-13
URL	http://arxiv.org/abs/1604.03755v3
PDF	http://arxiv.org/pdf/1604.03755v3.pdf
PWC	https://paperswithcode.com/paper/vconv-dae-deep-volumetric-shape-learning
Repo	https://github.com/diskhkme/VCONV_DAE_TF
Framework	tf

Deep Learning with Differential Privacy


Title	Deep Learning with Differential Privacy
Authors	Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang
Abstract	Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
Tasks
Published	2016-07-01
URL	http://arxiv.org/abs/1607.00133v2
PDF	http://arxiv.org/pdf/1607.00133v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-differential-privacy
Repo	https://github.com/cyrusgeyer/DiffPrivate_FedLearning
Framework	tf

Deep Learning Convolutional Networks for Multiphoton Microscopy Vasculature Segmentation


Title	Deep Learning Convolutional Networks for Multiphoton Microscopy Vasculature Segmentation
Authors	Petteri Teikari, Marc Santos, Charissa Poon, Kullervo Hynynen
Abstract	Recently there has been an increasing trend to use deep learning frameworks for both 2D consumer images and for 3D medical images. However, there has been little effort to use deep frameworks for volumetric vascular segmentation. We wanted to address this by providing a freely available dataset of 12 annotated two-photon vasculature microscopy stacks. We demonstrated the use of deep learning framework consisting both 2D and 3D convolutional filters (ConvNet). Our hybrid 2D-3D architecture produced promising segmentation result. We derived the architectures from Lee et al. who used the ZNN framework initially designed for electron microscope image segmentation. We hope that by sharing our volumetric vasculature datasets, we will inspire other researchers to experiment with vasculature dataset and improve the used network architectures.
Tasks	Semantic Segmentation
Published	2016-06-08
URL	http://arxiv.org/abs/1606.02382v1
PDF	http://arxiv.org/pdf/1606.02382v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-convolutional-networks-for
Repo	https://github.com/petteriTeikari/vesselNN
Framework	tf

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition


Title	Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
Authors	Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, Yu Qiao
Abstract	Traditional feature encoding scheme (e.g., Fisher vector) with local descriptors (e.g., SIFT) and recent convolutional neural networks (CNNs) are two classes of successful methods for image recognition. In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition. To this end, we make three main contributions from the following aspects. First, we propose a patch-level and end-to-end architecture to model the appearance of local patches, called {\em PatchNet}. PatchNet is essentially a customized network trained in a weakly supervised manner, which uses the image-level supervision to guide the patch-level feature extraction. Second, we present a hybrid visual representation, called {\em VSAD}, by utilizing the robust feature representations of PatchNet to describe local patches and exploiting the semantic probabilities of PatchNet to aggregate these local patches into a global representation. Third, based on the proposed VSAD representation, we propose a new state-of-the-art scene recognition approach, which achieves an excellent performance on two standard benchmarks: MIT Indoor67 (86.2%) and SUN397 (73.0%).
Tasks	Scene Recognition
Published	2016-09-01
URL	http://arxiv.org/abs/1609.00153v2
PDF	http://arxiv.org/pdf/1609.00153v2.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-patchnets-describing-and
Repo	https://github.com/wangzheallen/vsad
Framework	none

Automatic tagging using deep convolutional neural networks


Title	Automatic tagging using deep convolutional neural networks
Authors	Keunwoo Choi, George Fazekas, Mark Sandler
Abstract	We present a content-based automatic music tagging algorithm using fully convolutional neural networks (FCNs). We evaluate different architectures consisting of 2D convolutional layers and subsampling layers only. In the experiments, we measure the AUC-ROC scores of the architectures with different complexities and input types using the MagnaTagATune dataset, where a 4-layer architecture shows state-of-the-art performance with mel-spectrogram input. Furthermore, we evaluated the performances of the architectures with varying the number of layers on a larger dataset (Million Song Dataset), and found that deeper models outperformed the 4-layer architecture. The experiments show that mel-spectrogram is an effective time-frequency representation for automatic tagging and that more complex models benefit from more training data.
Tasks
Published	2016-06-01
URL	http://arxiv.org/abs/1606.00298v1
PDF	http://arxiv.org/pdf/1606.00298v1.pdf
PWC	https://paperswithcode.com/paper/automatic-tagging-using-deep-convolutional
Repo	https://github.com/keunwoochoi/MSD_split_for_tagging
Framework	none

Deep Temporal Linear Encoding Networks


Title	Deep Temporal Linear Encoding Networks
Authors	Ali Diba, Vivek Sharma, Luc Van Gool
Abstract	The CNN-encoding of features from entire videos for the representation of human actions has rarely been addressed. Instead, CNN work has focused on approaches to fuse spatial and temporal networks, but these were typically limited to processing shorter sequences. We present a new video representation, called temporal linear encoding (TLE) and embedded inside of CNNs as a new layer, which captures the appearance and motion throughout entire videos. It encodes this aggregated information into a robust video feature representation, via end-to-end learning. Advantages of TLEs are: (a) they encode the entire video into a compact feature representation, learning the semantics and a discriminative feature space; (b) they are applicable to all kinds of networks like 2D and 3D CNNs for video classification; and (c) they model feature interactions in a more expressive way and without loss of information. We conduct experiments on two challenging human action datasets: HMDB51 and UCF101. The experiments show that TLE outperforms current state-of-the-art methods on both datasets.
Tasks	Representation Learning, Video Classification
Published	2016-11-21
URL	http://arxiv.org/abs/1611.06678v1
PDF	http://arxiv.org/pdf/1611.06678v1.pdf
PWC	https://paperswithcode.com/paper/deep-temporal-linear-encoding-networks
Repo	https://github.com/bryanyzhu/two-stream-pytorch
Framework	pytorch

Second-Order Stochastic Optimization for Machine Learning in Linear Time


Title	Second-Order Stochastic Optimization for Machine Learning in Linear Time
Authors	Naman Agarwal, Brian Bullins, Elad Hazan
Abstract	First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods. Furthermore, our algorithm has the desirable property of being implementable in time linear in the sparsity of the input data.
Tasks	Stochastic Optimization
Published	2016-02-12
URL	http://arxiv.org/abs/1602.03943v5
PDF	http://arxiv.org/pdf/1602.03943v5.pdf
PWC	https://paperswithcode.com/paper/second-order-stochastic-optimization-for
Repo	https://github.com/darkonhub/darkon
Framework	tf

Context-aware Sentiment Word Identification: sentiword2vec


Title	Context-aware Sentiment Word Identification: sentiword2vec
Authors	Yushi Yao, Guangjian Li
Abstract	Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions.
Tasks	Language Modelling, Sentiment Analysis
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03769v1
PDF	http://arxiv.org/pdf/1612.03769v1.pdf
PWC	https://paperswithcode.com/paper/context-aware-sentiment-word-identification
Repo	https://github.com/brooksyd2/data-science.question-audience-assignment
Framework	none

Structured Sequence Modeling with Graph Convolutional Recurrent Networks


Title	Structured Sequence Modeling with Graph Convolutional Recurrent Networks
Authors	Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, Xavier Bresson
Abstract	This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep learning model able to predict structured sequences of data. Precisely, GCRN is a generalization of classical recurrent neural networks (RNN) to data structured by an arbitrary graph. Such structured sequences can represent series of frames in videos, spatio-temporal measurements on a network of sensors, or random walks on a vocabulary graph for natural language modeling. The proposed model combines convolutional neural networks (CNN) on graphs to identify spatial structures and RNN to find dynamic patterns. We study two possible architectures of GCRN, and apply the models to two practical problems: predicting moving MNIST data, and modeling natural language with the Penn Treebank dataset. Experiments show that exploiting simultaneously graph spatial and dynamic information about data can improve both precision and learning speed.
Tasks	Language Modelling
Published	2016-12-22
URL	http://arxiv.org/abs/1612.07659v1
PDF	http://arxiv.org/pdf/1612.07659v1.pdf
PWC	https://paperswithcode.com/paper/structured-sequence-modeling-with-graph
Repo	https://github.com/dariush-salami/gcn-gesture-recognition
Framework	pytorch

Dual Deep Network for Visual Tracking


Title	Dual Deep Network for Visual Tracking
Authors	Zhizhen Chi, Hongyang Li, Huchuan Lu, Ming-Hsuan Yang
Abstract	Visual tracking addresses the problem of identifying and localizing an unknown target in a video given the target specified by a bounding box in the first frame. In this paper, we propose a dual network to better utilize features among layers for visual tracking. It is observed that features in higher layers encode semantic context while its counterparts in lower layers are sensitive to discriminative appearance. Thus we exploit the hierarchical features in different layers of a deep model and design a dual structure to obtain better feature representation from various streams, which is rarely investigated in previous work. To highlight geometric contours of the target, we integrate the hierarchical feature maps with an edge detector as the coarse prior maps to further embed local details around the target. To leverage the robustness of our dual network, we train it with random patches measuring the similarities between the network activation and target appearance, which serves as a regularization to enforce the dual network to focus on target object. The proposed dual network is updated online in a unique manner based on the observation that the target being tracked in consecutive frames should share more similar feature representations than those in the surrounding background. It is also found that for a target object, the prior maps can help further enhance performance by passing message into the output maps of the dual network. Therefore, an independent component analysis with reference algorithm (ICA-R) is employed to extract target context using prior maps as guidance. Online tracking is conducted by maximizing the posterior estimate on the final maps with stochastic and periodic update. Quantitative and qualitative evaluations on two large-scale benchmark data sets show that the proposed algorithm performs favourably against the state-of-the-arts.
Tasks	Visual Tracking
Published	2016-12-19
URL	http://arxiv.org/abs/1612.06053v1
PDF	http://arxiv.org/pdf/1612.06053v1.pdf
PWC	https://paperswithcode.com/paper/dual-deep-network-for-visual-tracking
Repo	https://github.com/chizhizhen/DNT
Framework	none

Supervised Learning with Quantum-Inspired Tensor Networks


Title	Supervised Learning with Quantum-Inspired Tensor Networks
Authors	E. Miles Stoudenmire, David J. Schwab
Abstract	Tensor networks are efficient representations of high-dimensional tensors which have been very successful for physics and mathematics applications. We demonstrate how algorithms for optimizing such networks can be adapted to supervised learning tasks by using matrix product states (tensor trains) to parameterize models for classifying images. For the MNIST data set we obtain less than 1% test set classification error. We discuss how the tensor network form imparts additional structure to the learned model and suggest a possible generative interpretation.
Tasks	Tensor Networks
Published	2016-05-18
URL	http://arxiv.org/abs/1605.05775v2
PDF	http://arxiv.org/pdf/1605.05775v2.pdf
PWC	https://paperswithcode.com/paper/supervised-learning-with-quantum-inspired
Repo	https://github.com/cylo/uni10-lamps
Framework	none