January 26, 2020

3404 words 16 mins read

Paper Group ANR 1550

Multimodal Semantic Attention Network for Video Captioning. Stability and Generalization of Graph Convolutional Neural Networks. Automatic financial feature construction based on neural network. Recognizing License Plates in Real-Time. Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation. Effective transfer learni …

Multimodal Semantic Attention Network for Video Captioning


Title	Multimodal Semantic Attention Network for Video Captioning
Authors	Liang Sun, Bing Li, Chunfeng Yuan, Zhengjun Zha, Weiming Hu
Abstract	Inspired by the fact that different modalities in videos carry complementary information, we propose a Multimodal Semantic Attention Network(MSAN), which is a new encoder-decoder framework incorporating multimodal semantic attributes for video captioning. In the encoding phase, we detect and generate multimodal semantic attributes by formulating it as a multi-label classification problem. Moreover, we add auxiliary classification loss to our model that can obtain more effective visual features and high-level multimodal semantic attribute distributions for sufficient video encoding. In the decoding phase, we extend each weight matrix of the conventional LSTM to an ensemble of attribute-dependent weight matrices, and employ attention mechanism to pay attention to different attributes at each time of the captioning process. We evaluate algorithm on two popular public benchmarks: MSVD and MSR-VTT, achieving competitive results with current state-of-the-art across six evaluation metrics.
Tasks	Multi-Label Classification, Video Captioning
Published	2019-05-08
URL	https://arxiv.org/abs/1905.02963v1
PDF	https://arxiv.org/pdf/1905.02963v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-semantic-attention-network-for
Repo
Framework

Stability and Generalization of Graph Convolutional Neural Networks


Title	Stability and Generalization of Graph Convolutional Neural Networks
Authors	Saurabh Verma, Zhi-Li Zhang
Abstract	Inspired by convolutional neural networks on 1D and 2D data, graph convolutional neural networks (GCNNs) have been developed for various learning tasks on graph data, and have shown superior performance on real-world datasets. Despite their success, there is a dearth of theoretical explorations of GCNN models such as their generalization properties. In this paper, we take a first step towards developing a deeper theoretical understanding of GCNN models by analyzing the stability of single-layer GCNN models and deriving their generalization guarantees in a semi-supervised graph learning setting. In particular, we show that the algorithmic stability of a GCNN model depends upon the largest absolute eigenvalue of its graph convolution filter. Moreover, to ensure the uniform stability needed to provide strong generalization guarantees, the largest absolute eigenvalue must be independent of the graph size. Our results shed new insights on the design of new & improved graph convolution filters with guaranteed algorithmic stability. We evaluate the generalization gap and stability on various real-world graph datasets and show that the empirical results indeed support our theoretical findings. To the best of our knowledge, we are the first to study stability bounds on graph learning in a semi-supervised setting and derive generalization bounds for GCNN models.
Tasks
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01004v2
PDF	https://arxiv.org/pdf/1905.01004v2.pdf
PWC	https://paperswithcode.com/paper/stability-and-generalization-of-graph
Repo
Framework

Automatic financial feature construction based on neural network


Title	Automatic financial feature construction based on neural network
Authors	Jie Fang, Jianwu Lin, Yong Jiang, Shutao Xia
Abstract	In automatic financial feature construction task, the state of the art technic leverages reverse polish expression to represent the features, then use genetic programming (GP) to conduct its evolution process. In this paper, we propose a new framework based on neural network, alpha discovery neural network (ADNN). In this work, we made several contributions. Firstly, in this task, we make full use of neural network’s overwhelming advantage in feature extraction to construct highly informative features. Secondly, we use domain knowledge to design the object function, batch size, and sampling rules. Thirdly, we use pre-training to replace the GP’s evolution process. According to neural network’s universal approximation theorem, pre-training can conduct a more effective and explainable evolution process. Experiment shows that ADNN can remarkably produce more diversified and higher informative features than GP. Besides, ADNN can serve as a data augmentation algorithm. It further improves the the performance of financial features constructed by GP.
Tasks	Data Augmentation, Time Series
Published	2019-12-08
URL	https://arxiv.org/abs/1912.06236v2
PDF	https://arxiv.org/pdf/1912.06236v2.pdf
PWC	https://paperswithcode.com/paper/prior-knowledge-neural-network-for-automatic
Repo
Framework

Recognizing License Plates in Real-Time


Title	Recognizing License Plates in Real-Time
Authors	Xuewen Yang, Xin Wang
Abstract	License plate detection and recognition (LPDR) is of growing importance for enabling intelligent transportation and ensuring the security and safety of the cities. However, LPDR faces a big challenge in a practical environment. The license plates can have extremely diverse sizes, fonts and colors, and the plate images are usually of poor quality caused by skewed capturing angles, uneven lighting, occlusion, and blurring. In applications such as surveillance, it often requires fast processing. To enable real-time and accurate license plate recognition, in this work, we propose a set of techniques: 1) a contour reconstruction method along with edge-detection to quickly detect the candidate plates; 2) a simple zero-one-alternation scheme to effectively remove the fake top and bottom borders around plates to facilitate more accurate segmentation of characters on plates; 3) a set of techniques to augment the training data, incorporate SIFT features into the CNN network, and exploit transfer learning to obtain the initial parameters for more effective training; and 4) a two-phase verification procedure to determine the correct plate at low cost, a statistical filtering in the plate detection stage to quickly remove unwanted candidates, and the accurate CR results after the CR process to perform further plate verification without additional processing. We implement a complete LPDR system based on our algorithms. The experimental results demonstrate that our system can accurately recognize license plate in real-time. Additionally, it works robustly under various levels of illumination and noise, and in the presence of car movement. Compared to peer schemes, our system is not only among the most accurate ones but is also the fastest, and can be easily applied to other scenarios.
Tasks	Edge Detection, License Plate Recognition, Transfer Learning
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04376v1
PDF	https://arxiv.org/pdf/1906.04376v1.pdf
PWC	https://paperswithcode.com/paper/recognizing-license-plates-in-real-time
Repo
Framework

Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation


Title	Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation
Authors	Justin Domke, Daniel Sheldon
Abstract	Recent work in variational inference (VI) uses ideas from Monte Carlo estimation to tighten the lower bounds on the log-likelihood that are used as objectives. However, there is no systematic understanding of how optimizing different objectives relates to approximating the posterior distribution. Developing such a connection is important if the ideas are to be applied to inference-i.e., applications that require an approximate posterior and not just an approximation of the log-likelihood. Given a VI objective defined by a Monte Carlo estimator of the likelihood, we use a “divide and couple” procedure to identify augmented proposal and target distributions. The divergence between these is equal to the gap between the VI objective and the log-likelihood. Thus, after maximizing the VI objective, the augmented variational distribution may be used to approximate the posterior distribution.
Tasks
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10115v3
PDF	https://arxiv.org/pdf/1906.10115v3.pdf
PWC	https://paperswithcode.com/paper/divide-and-couple-using-monte-carlo
Repo
Framework

Effective transfer learning for hyperspectral image classification with deep convolutional neural networks


Title	Effective transfer learning for hyperspectral image classification with deep convolutional neural networks
Authors	Wojciech Masarczyk, Przemysław Głomb, Bartosz Grabowski, Mateusz Ostaszewski
Abstract	Hyperspectral imaging is a rich source of data, allowing for multitude of effective applications. On the other hand such imaging remains challenging because of large data dimension and, typically, small pool of available training examples. While deep learning approaches have been shown to be successful in providing effective classification solutions, especially for high dimensional problems, unfortunately they work best with a lot of labelled examples available. To alleviate the second requirement for a particular dataset the transfer learning approach can be used: first the network is pre-trained on some dataset with large amount of training labels available, then the actual dataset is used to fine-tune the network. This strategy is not straightforward to apply with hyperspectral images, as it is often the case that only one particular image of some type or characteristic is available. In this paper, we propose and investigate a simple and effective strategy of transfer learning that uses unsupervised pre-training step without label information. This approach can be applied to many of the hyperspectral classification problems. Performed experiments show that it is very effective in improving the classification accuracy without being restricted to a particular image type or neural network architecture. An additional advantage of the proposed approach is the unsupervised nature of the pre-training step, which can be done immediately after image acquisition, without the need of the potentially costly expert’s time.
Tasks	Hyperspectral Image Classification, Image Classification, Transfer Learning
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05507v1
PDF	https://arxiv.org/pdf/1909.05507v1.pdf
PWC	https://paperswithcode.com/paper/effective-transfer-learning-for-hyperspectral
Repo
Framework

Budget-Aware Adapters for Multi-Domain Learning


Title	Budget-Aware Adapters for Multi-Domain Learning
Authors	Rodrigo Berriel, Stéphane Lathuilière, Moin Nabi, Tassilo Klein, Thiago Oliveira-Santos, Nicu Sebe, Elisa Ricci
Abstract	Multi-Domain Learning (MDL) refers to the problem of learning a set of models derived from a common deep architecture, each one specialized to perform a task in a certain domain (e.g., photos, sketches, paintings). This paper tackles MDL with a particular interest in obtaining domain-specific models with an adjustable budget in terms of the number of network parameters and computational complexity. Our intuition is that, as in real applications the number of domains and tasks can be very large, an effective MDL approach should not only focus on accuracy but also on having as few parameters as possible. To implement this idea we derive specialized deep models for each domain by adapting a pre-trained architecture but, differently from other methods, we propose a novel strategy to automatically adjust the computational complexity of the network. To this aim, we introduce Budget-Aware Adapters that select the most relevant feature channels to better handle data from a novel domain. Some constraints on the number of active switches are imposed in order to obtain a network respecting the desired complexity budget. Experimentally, we show that our approach leads to recognition accuracy competitive with state-of-the-art approaches but with much lighter networks both in terms of storage and computation.
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06242v2
PDF	https://arxiv.org/pdf/1905.06242v2.pdf
PWC	https://paperswithcode.com/paper/budget-aware-adapters-for-multi-domain
Repo
Framework

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning


Title	Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
Authors	Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian
Abstract	Automatic generation of video captions is a fundamental challenge in computer vision. Recent techniques typically employ a combination of Convolutional Neural Networks (CNNs) and Recursive Neural Networks (RNNs) for video captioning. These methods mainly focus on tailoring sequence learning through RNNs for better caption generation, whereas off-the-shelf visual features are borrowed from CNNs. We argue that careful designing of visual features for this task is equally important, and present a visual feature encoding technique to generate semantically rich captions using Gated Recurrent Units (GRUs). Our method embeds rich temporal dynamics in visual features by hierarchically applying Short Fourier Transform to CNN features of the whole video. It additionally derives high level semantics from an object detector to enrich the representation with spatial dynamics of the detected objects. The final representation is projected to a compact space and fed to a language model. By learning a relatively simple language model comprising two GRU layers, we establish new state-of-the-art on MSVD and MSR-VTT datasets for METEOR and ROUGE_L metrics.
Tasks	Language Modelling, Video Captioning
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10322v2
PDF	http://arxiv.org/pdf/1902.10322v2.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-dynamics-and-semantic
Repo
Framework

A Convolutional Neural Network with Mapping Layers for Hyperspectral Image Classification


Title	A Convolutional Neural Network with Mapping Layers for Hyperspectral Image Classification
Authors	Rui Li, Zhibin Pan, Yang Wang, Ping Wang
Abstract	In this paper, we propose a convolutional neural network with mapping layers (MCNN) for hyperspectral image (HSI) classification. The proposed mapping layers map the input patch into a low dimensional subspace by multilinear algebra. We use our mapping layers to reduce the spectral and spatial redundancy and maintain most energy of the input. The feature extracted by our mapping layers can also reduce the number of following convolutional layers for feature extraction. Our MCNN architecture avoids the declining accuracy with increasing layers phenomenon of deep learning models for HSI classification and also saves the training time for its effective mapping layers. Furthermore, we impose the 3-D convolutional kernel on convolutional layer to extract the spectral-spatial features for HSI. We tested our MCNN on three datasets of Indian Pines, University of Pavia and Salinas, and we achieved the classification accuracy of 98.3%, 99.5% and 99.3%, respectively. Experimental results demonstrate that the proposed MCNN can significantly improve the classification accuracy and save much time consumption.
Tasks	Hyperspectral Image Classification, Image Classification
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09526v1
PDF	https://arxiv.org/pdf/1908.09526v1.pdf
PWC	https://paperswithcode.com/paper/a-convolutional-neural-network-with-mapping
Repo
Framework

Band Attention Convolutional Networks For Hyperspectral Image Classification


Title	Band Attention Convolutional Networks For Hyperspectral Image Classification
Authors	Hongwei Dong, Lamei Zhang, Bin Zou
Abstract	Redundancy and noise exist in the bands of hyperspectral images (HSIs). Thus, it is a good property to be able to select suitable parts from hundreds of input bands for HSIs classification methods. In this letter, a band attention module (BAM) is proposed to implement the deep learning based HSIs classification with the capacity of band selection or weighting. The proposed BAM can be seen as a plug-and-play complementary component of the existing classification networks which fully considers the adverse effects caused by the redundancy of the bands when using convolutional neural networks (CNNs) for HSIs classification. Unlike most of deep learning methods used in HSIs, the band attention module which is customized according to the characteristics of hyperspectral images is embedded in the ordinary CNNs for better performance. At the same time, unlike classical band selection or weighting methods, the proposed method achieves the end-to-end training instead of the separated stages. Experiments are carried out on two HSI benchmark datasets. Compared to some classical and advanced deep learning methods, numerical simulations under different evaluation criteria show that the proposed method have good performance. Last but not least, some advanced CNNs are combined with the proposed BAM for better performance.
Tasks	Hyperspectral Image Classification, Image Classification
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04379v1
PDF	https://arxiv.org/pdf/1906.04379v1.pdf
PWC	https://paperswithcode.com/paper/band-attention-convolutional-networks-for
Repo
Framework

When Attackers Meet AI: Learning-empowered Attacks in Cooperative Spectrum Sensing


Title	When Attackers Meet AI: Learning-empowered Attacks in Cooperative Spectrum Sensing
Authors	Zhengping Luo, Shangqing Zhao, Zhuo Lu, Jie Xu, Yalin E. Sagduyu
Abstract	Defense strategies have been well studied to combat Byzantine attacks that aim to disrupt cooperative spectrum sensing by sending falsified sensing data. However, existing studies usually make network or attack assumptions biased towards the defense (e.g., assuming the prior knowledge of attacks is known). In practice, attackers can adopt any arbitrary behavior and avoid any pre-assumed pattern or assumption used by defense strategies. In this paper, we revisit this traditional security problem and propose a novel learning-empowered framework named Learn-Evaluate-Beat (LEB) to mislead the fusion center. Based on the black-box nature of the fusion center in cooperative spectrum sensing process, our new perspective is to make the adversarial use of machine learning to construct a surrogate model of the fusion center’s decision model. Then, we propose a generic algorithm to create malicious sensing data. Our real-world experiments show that the LEB attack is very effective to beat a wide range of existing defense strategies with an up to 82% of success ratio. Given the gap between the new LEB attack and existing defenses, we introduce a non-invasive and parallel method named as influence-limiting policy sided with existing defenses to defend against the LEB-based or other similar attacks, which demonstrates a strong performance in terms of overall disruption ratio reduction by up to 80% of the LEB attacks.
Tasks
Published	2019-05-04
URL	https://arxiv.org/abs/1905.01430v1
PDF	https://arxiv.org/pdf/1905.01430v1.pdf
PWC	https://paperswithcode.com/paper/190501430
Repo
Framework

Deep Distance Transform for Tubular Structure Segmentation in CT Scans


Title	Deep Distance Transform for Tubular Structure Segmentation in CT Scans
Authors	Yan Wang, Xu Wei, Fengze Liu, Jieneng Chen, Yuyin Zhou, Wei Shen, Elliot K. Fishman, Alan L. Yuille
Abstract	Tubular structure segmentation in medical images, e.g., segmenting vessels in CT scans, serves as a vital step in the use of computers to aid in screening early stages of related diseases. But automatic tubular structure segmentation in CT scans is a challenging problem, due to issues such as poor contrast, noise and complicated background. A tubular structure usually has a cylinder-like shape which can be well represented by its skeleton and cross-sectional radii (scales). Inspired by this, we propose a geometry-aware tubular structure segmentation method, Deep Distance Transform (DDT), which combines intuitions from the classical distance transform for skeletonization and modern deep segmentation networks. DDT first learns a multi-task network to predict a segmentation mask for a tubular structure and a distance map. Each value in the map represents the distance from each tubular structure voxel to the tubular structure surface. Then the segmentation mask is refined by leveraging the shape prior reconstructed from the distance map. We apply our DDT on six medical image datasets. The experiments show that (1) DDT can boost tubular structure segmentation performance significantly (e.g., over 13% improvement measured by DSC for pancreatic duct segmentation), and (2) DDT additionally provides a geometrical measurement for a tubular structure, which is important for clinical diagnosis (e.g., the cross-sectional scale of a pancreatic duct can be an indicator for pancreatic cancer).
Tasks
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03383v1
PDF	https://arxiv.org/pdf/1912.03383v1.pdf
PWC	https://paperswithcode.com/paper/deep-distance-transform-for-tubular-structure
Repo
Framework

A novel statistical metric learning for hyperspectral image classification


Title	A novel statistical metric learning for hyperspectral image classification
Authors	Zhiqiang Gong, Ping Zhong, Weidong Hu, Zixuan Xiao, Xuping Yin
Abstract	In this paper, a novel statistical metric learning is developed for spectral-spatial classification of the hyperspectral image. First, the standard variance of the samples of each class in each batch is used to decrease the intra-class variance within each class. Then, the distances between the means of different classes are used to penalize the inter-class variance of the training samples. Finally, the standard variance between the means of different classes is added as an additional diversity term to repulse different classes from each other. Experiments have conducted over two real-world hyperspectral image datasets and the experimental results have shown the effectiveness of the proposed statistical metric learning.
Tasks	Hyperspectral Image Classification, Image Classification, Metric Learning
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05087v1
PDF	https://arxiv.org/pdf/1905.05087v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-statistical-metric-learning-for
Repo
Framework

On instabilities of deep learning in image reconstruction - Does AI come at a cost?


Title	On instabilities of deep learning in image reconstruction - Does AI come at a cost?
Authors	Vegard Antun, Francesco Renna, Clarice Poon, Ben Adcock, Anders C. Hansen
Abstract	Deep learning, due to its unprecedented success in tasks such as image classification, has emerged as a new tool in image reconstruction with potential to change the field. In this paper we demonstrate a crucial phenomenon: deep learning typically yields unstablemethods for image reconstruction. The instabilities usually occur in several forms: (1) tiny, almost undetectable perturbations, both in the image and sampling domain, may result in severe artefacts in the reconstruction, (2) a small structural change, for example a tumour, may not be captured in the reconstructed image and (3) (a counterintuitive type of instability) more samples may yield poorer performance. Our new stability test with algorithms and easy to use software detects the instability phenomena. The test is aimed at researchers to test their networks for instabilities and for government agencies, such as the Food and Drug Administration (FDA), to secure safe use of deep learning methods.
Tasks	Image Classification, Image Reconstruction
Published	2019-02-14
URL	http://arxiv.org/abs/1902.05300v1
PDF	http://arxiv.org/pdf/1902.05300v1.pdf
PWC	https://paperswithcode.com/paper/on-instabilities-of-deep-learning-in-image
Repo
Framework

MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation


Title	MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation
Authors	Sanghyeon Na, Seungjoo Yoo, Jaegul Choo
Abstract	Unpaired multimodal image-to-image translation is a task of translating a given image in a source domain into diverse images in the target domain, overcoming the limitation of one-to-one mapping. Existing multimodal translation models are mainly based on the disentangled representations with an image reconstruction loss. We propose two approaches to improve multimodal translation quality. First, we use a content representation from the source domain conditioned on a style representation from the target domain. Second, rather than using a typical image reconstruction loss, we design MILO (Mutual Information LOss), a new stochastically-defined loss function based on information theory. This loss function directly reflects the interpretation of latent variables as a random variable. We show that our proposed model Mutual Information with StOchastic Style Representation(MISO) achieves state-of-the-art performance through extensive experiments on various real-world datasets.
Tasks	Image Reconstruction, Image-to-Image Translation
Published	2019-02-11
URL	http://arxiv.org/abs/1902.03938v1
PDF	http://arxiv.org/pdf/1902.03938v1.pdf
PWC	https://paperswithcode.com/paper/miso-mutual-information-loss-with-stochastic
Repo
Framework