Paper Group ANR 1550
Multimodal Semantic Attention Network for Video Captioning. Stability and Generalization of Graph Convolutional Neural Networks. Automatic financial feature construction based on neural network. Recognizing License Plates in Real-Time. Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation. Effective transfer learni …
Multimodal Semantic Attention Network for Video Captioning
Title | Multimodal Semantic Attention Network for Video Captioning |
Authors | Liang Sun, Bing Li, Chunfeng Yuan, Zhengjun Zha, Weiming Hu |
Abstract | Inspired by the fact that different modalities in videos carry complementary information, we propose a Multimodal Semantic Attention Network(MSAN), which is a new encoder-decoder framework incorporating multimodal semantic attributes for video captioning. In the encoding phase, we detect and generate multimodal semantic attributes by formulating it as a multi-label classification problem. Moreover, we add auxiliary classification loss to our model that can obtain more effective visual features and high-level multimodal semantic attribute distributions for sufficient video encoding. In the decoding phase, we extend each weight matrix of the conventional LSTM to an ensemble of attribute-dependent weight matrices, and employ attention mechanism to pay attention to different attributes at each time of the captioning process. We evaluate algorithm on two popular public benchmarks: MSVD and MSR-VTT, achieving competitive results with current state-of-the-art across six evaluation metrics. |
Tasks | Multi-Label Classification, Video Captioning |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.02963v1 |
https://arxiv.org/pdf/1905.02963v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-semantic-attention-network-for |
Repo | |
Framework | |
Stability and Generalization of Graph Convolutional Neural Networks
Title | Stability and Generalization of Graph Convolutional Neural Networks |
Authors | Saurabh Verma, Zhi-Li Zhang |
Abstract | Inspired by convolutional neural networks on 1D and 2D data, graph convolutional neural networks (GCNNs) have been developed for various learning tasks on graph data, and have shown superior performance on real-world datasets. Despite their success, there is a dearth of theoretical explorations of GCNN models such as their generalization properties. In this paper, we take a first step towards developing a deeper theoretical understanding of GCNN models by analyzing the stability of single-layer GCNN models and deriving their generalization guarantees in a semi-supervised graph learning setting. In particular, we show that the algorithmic stability of a GCNN model depends upon the largest absolute eigenvalue of its graph convolution filter. Moreover, to ensure the uniform stability needed to provide strong generalization guarantees, the largest absolute eigenvalue must be independent of the graph size. Our results shed new insights on the design of new & improved graph convolution filters with guaranteed algorithmic stability. We evaluate the generalization gap and stability on various real-world graph datasets and show that the empirical results indeed support our theoretical findings. To the best of our knowledge, we are the first to study stability bounds on graph learning in a semi-supervised setting and derive generalization bounds for GCNN models. |
Tasks | |
Published | 2019-05-03 |
URL | https://arxiv.org/abs/1905.01004v2 |
https://arxiv.org/pdf/1905.01004v2.pdf | |
PWC | https://paperswithcode.com/paper/stability-and-generalization-of-graph |
Repo | |
Framework | |
Automatic financial feature construction based on neural network
Title | Automatic financial feature construction based on neural network |
Authors | Jie Fang, Jianwu Lin, Yong Jiang, Shutao Xia |
Abstract | In automatic financial feature construction task, the state of the art technic leverages reverse polish expression to represent the features, then use genetic programming (GP) to conduct its evolution process. In this paper, we propose a new framework based on neural network, alpha discovery neural network (ADNN). In this work, we made several contributions. Firstly, in this task, we make full use of neural network’s overwhelming advantage in feature extraction to construct highly informative features. Secondly, we use domain knowledge to design the object function, batch size, and sampling rules. Thirdly, we use pre-training to replace the GP’s evolution process. According to neural network’s universal approximation theorem, pre-training can conduct a more effective and explainable evolution process. Experiment shows that ADNN can remarkably produce more diversified and higher informative features than GP. Besides, ADNN can serve as a data augmentation algorithm. It further improves the the performance of financial features constructed by GP. |
Tasks | Data Augmentation, Time Series |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.06236v2 |
https://arxiv.org/pdf/1912.06236v2.pdf | |
PWC | https://paperswithcode.com/paper/prior-knowledge-neural-network-for-automatic |
Repo | |
Framework | |
Recognizing License Plates in Real-Time
Title | Recognizing License Plates in Real-Time |
Authors | Xuewen Yang, Xin Wang |
Abstract | License plate detection and recognition (LPDR) is of growing importance for enabling intelligent transportation and ensuring the security and safety of the cities. However, LPDR faces a big challenge in a practical environment. The license plates can have extremely diverse sizes, fonts and colors, and the plate images are usually of poor quality caused by skewed capturing angles, uneven lighting, occlusion, and blurring. In applications such as surveillance, it often requires fast processing. To enable real-time and accurate license plate recognition, in this work, we propose a set of techniques: 1) a contour reconstruction method along with edge-detection to quickly detect the candidate plates; 2) a simple zero-one-alternation scheme to effectively remove the fake top and bottom borders around plates to facilitate more accurate segmentation of characters on plates; 3) a set of techniques to augment the training data, incorporate SIFT features into the CNN network, and exploit transfer learning to obtain the initial parameters for more effective training; and 4) a two-phase verification procedure to determine the correct plate at low cost, a statistical filtering in the plate detection stage to quickly remove unwanted candidates, and the accurate CR results after the CR process to perform further plate verification without additional processing. We implement a complete LPDR system based on our algorithms. The experimental results demonstrate that our system can accurately recognize license plate in real-time. Additionally, it works robustly under various levels of illumination and noise, and in the presence of car movement. Compared to peer schemes, our system is not only among the most accurate ones but is also the fastest, and can be easily applied to other scenarios. |
Tasks | Edge Detection, License Plate Recognition, Transfer Learning |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04376v1 |
https://arxiv.org/pdf/1906.04376v1.pdf | |
PWC | https://paperswithcode.com/paper/recognizing-license-plates-in-real-time |
Repo | |
Framework | |
Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation
Title | Divide and Couple: Using Monte Carlo Variational Objectives for Posterior Approximation |
Authors | Justin Domke, Daniel Sheldon |
Abstract | Recent work in variational inference (VI) uses ideas from Monte Carlo estimation to tighten the lower bounds on the log-likelihood that are used as objectives. However, there is no systematic understanding of how optimizing different objectives relates to approximating the posterior distribution. Developing such a connection is important if the ideas are to be applied to inference-i.e., applications that require an approximate posterior and not just an approximation of the log-likelihood. Given a VI objective defined by a Monte Carlo estimator of the likelihood, we use a “divide and couple” procedure to identify augmented proposal and target distributions. The divergence between these is equal to the gap between the VI objective and the log-likelihood. Thus, after maximizing the VI objective, the augmented variational distribution may be used to approximate the posterior distribution. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10115v3 |
https://arxiv.org/pdf/1906.10115v3.pdf | |
PWC | https://paperswithcode.com/paper/divide-and-couple-using-monte-carlo |
Repo | |
Framework | |
Effective transfer learning for hyperspectral image classification with deep convolutional neural networks
Title | Effective transfer learning for hyperspectral image classification with deep convolutional neural networks |
Authors | Wojciech Masarczyk, Przemysław Głomb, Bartosz Grabowski, Mateusz Ostaszewski |
Abstract | Hyperspectral imaging is a rich source of data, allowing for multitude of effective applications. On the other hand such imaging remains challenging because of large data dimension and, typically, small pool of available training examples. While deep learning approaches have been shown to be successful in providing effective classification solutions, especially for high dimensional problems, unfortunately they work best with a lot of labelled examples available. To alleviate the second requirement for a particular dataset the transfer learning approach can be used: first the network is pre-trained on some dataset with large amount of training labels available, then the actual dataset is used to fine-tune the network. This strategy is not straightforward to apply with hyperspectral images, as it is often the case that only one particular image of some type or characteristic is available. In this paper, we propose and investigate a simple and effective strategy of transfer learning that uses unsupervised pre-training step without label information. This approach can be applied to many of the hyperspectral classification problems. Performed experiments show that it is very effective in improving the classification accuracy without being restricted to a particular image type or neural network architecture. An additional advantage of the proposed approach is the unsupervised nature of the pre-training step, which can be done immediately after image acquisition, without the need of the potentially costly expert’s time. |
Tasks | Hyperspectral Image Classification, Image Classification, Transfer Learning |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05507v1 |
https://arxiv.org/pdf/1909.05507v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-transfer-learning-for-hyperspectral |
Repo | |
Framework | |
Budget-Aware Adapters for Multi-Domain Learning
Title | Budget-Aware Adapters for Multi-Domain Learning |
Authors | Rodrigo Berriel, Stéphane Lathuilière, Moin Nabi, Tassilo Klein, Thiago Oliveira-Santos, Nicu Sebe, Elisa Ricci |
Abstract | Multi-Domain Learning (MDL) refers to the problem of learning a set of models derived from a common deep architecture, each one specialized to perform a task in a certain domain (e.g., photos, sketches, paintings). This paper tackles MDL with a particular interest in obtaining domain-specific models with an adjustable budget in terms of the number of network parameters and computational complexity. Our intuition is that, as in real applications the number of domains and tasks can be very large, an effective MDL approach should not only focus on accuracy but also on having as few parameters as possible. To implement this idea we derive specialized deep models for each domain by adapting a pre-trained architecture but, differently from other methods, we propose a novel strategy to automatically adjust the computational complexity of the network. To this aim, we introduce Budget-Aware Adapters that select the most relevant feature channels to better handle data from a novel domain. Some constraints on the number of active switches are imposed in order to obtain a network respecting the desired complexity budget. Experimentally, we show that our approach leads to recognition accuracy competitive with state-of-the-art approaches but with much lighter networks both in terms of storage and computation. |
Tasks | |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06242v2 |
https://arxiv.org/pdf/1905.06242v2.pdf | |
PWC | https://paperswithcode.com/paper/budget-aware-adapters-for-multi-domain |
Repo | |
Framework | |
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
Title | Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning |
Authors | Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian |
Abstract | Automatic generation of video captions is a fundamental challenge in computer vision. Recent techniques typically employ a combination of Convolutional Neural Networks (CNNs) and Recursive Neural Networks (RNNs) for video captioning. These methods mainly focus on tailoring sequence learning through RNNs for better caption generation, whereas off-the-shelf visual features are borrowed from CNNs. We argue that careful designing of visual features for this task is equally important, and present a visual feature encoding technique to generate semantically rich captions using Gated Recurrent Units (GRUs). Our method embeds rich temporal dynamics in visual features by hierarchically applying Short Fourier Transform to CNN features of the whole video. It additionally derives high level semantics from an object detector to enrich the representation with spatial dynamics of the detected objects. The final representation is projected to a compact space and fed to a language model. By learning a relatively simple language model comprising two GRU layers, we establish new state-of-the-art on MSVD and MSR-VTT datasets for METEOR and ROUGE_L metrics. |
Tasks | Language Modelling, Video Captioning |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10322v2 |
http://arxiv.org/pdf/1902.10322v2.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-dynamics-and-semantic |
Repo | |
Framework | |
A Convolutional Neural Network with Mapping Layers for Hyperspectral Image Classification
Title | A Convolutional Neural Network with Mapping Layers for Hyperspectral Image Classification |
Authors | Rui Li, Zhibin Pan, Yang Wang, Ping Wang |
Abstract | In this paper, we propose a convolutional neural network with mapping layers (MCNN) for hyperspectral image (HSI) classification. The proposed mapping layers map the input patch into a low dimensional subspace by multilinear algebra. We use our mapping layers to reduce the spectral and spatial redundancy and maintain most energy of the input. The feature extracted by our mapping layers can also reduce the number of following convolutional layers for feature extraction. Our MCNN architecture avoids the declining accuracy with increasing layers phenomenon of deep learning models for HSI classification and also saves the training time for its effective mapping layers. Furthermore, we impose the 3-D convolutional kernel on convolutional layer to extract the spectral-spatial features for HSI. We tested our MCNN on three datasets of Indian Pines, University of Pavia and Salinas, and we achieved the classification accuracy of 98.3%, 99.5% and 99.3%, respectively. Experimental results demonstrate that the proposed MCNN can significantly improve the classification accuracy and save much time consumption. |
Tasks | Hyperspectral Image Classification, Image Classification |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09526v1 |
https://arxiv.org/pdf/1908.09526v1.pdf | |
PWC | https://paperswithcode.com/paper/a-convolutional-neural-network-with-mapping |
Repo | |
Framework | |
Band Attention Convolutional Networks For Hyperspectral Image Classification
Title | Band Attention Convolutional Networks For Hyperspectral Image Classification |
Authors | Hongwei Dong, Lamei Zhang, Bin Zou |
Abstract | Redundancy and noise exist in the bands of hyperspectral images (HSIs). Thus, it is a good property to be able to select suitable parts from hundreds of input bands for HSIs classification methods. In this letter, a band attention module (BAM) is proposed to implement the deep learning based HSIs classification with the capacity of band selection or weighting. The proposed BAM can be seen as a plug-and-play complementary component of the existing classification networks which fully considers the adverse effects caused by the redundancy of the bands when using convolutional neural networks (CNNs) for HSIs classification. Unlike most of deep learning methods used in HSIs, the band attention module which is customized according to the characteristics of hyperspectral images is embedded in the ordinary CNNs for better performance. At the same time, unlike classical band selection or weighting methods, the proposed method achieves the end-to-end training instead of the separated stages. Experiments are carried out on two HSI benchmark datasets. Compared to some classical and advanced deep learning methods, numerical simulations under different evaluation criteria show that the proposed method have good performance. Last but not least, some advanced CNNs are combined with the proposed BAM for better performance. |
Tasks | Hyperspectral Image Classification, Image Classification |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04379v1 |
https://arxiv.org/pdf/1906.04379v1.pdf | |
PWC | https://paperswithcode.com/paper/band-attention-convolutional-networks-for |
Repo | |
Framework | |
When Attackers Meet AI: Learning-empowered Attacks in Cooperative Spectrum Sensing
Title | When Attackers Meet AI: Learning-empowered Attacks in Cooperative Spectrum Sensing |
Authors | Zhengping Luo, Shangqing Zhao, Zhuo Lu, Jie Xu, Yalin E. Sagduyu |
Abstract | Defense strategies have been well studied to combat Byzantine attacks that aim to disrupt cooperative spectrum sensing by sending falsified sensing data. However, existing studies usually make network or attack assumptions biased towards the defense (e.g., assuming the prior knowledge of attacks is known). In practice, attackers can adopt any arbitrary behavior and avoid any pre-assumed pattern or assumption used by defense strategies. In this paper, we revisit this traditional security problem and propose a novel learning-empowered framework named Learn-Evaluate-Beat (LEB) to mislead the fusion center. Based on the black-box nature of the fusion center in cooperative spectrum sensing process, our new perspective is to make the adversarial use of machine learning to construct a surrogate model of the fusion center’s decision model. Then, we propose a generic algorithm to create malicious sensing data. Our real-world experiments show that the LEB attack is very effective to beat a wide range of existing defense strategies with an up to 82% of success ratio. Given the gap between the new LEB attack and existing defenses, we introduce a non-invasive and parallel method named as influence-limiting policy sided with existing defenses to defend against the LEB-based or other similar attacks, which demonstrates a strong performance in terms of overall disruption ratio reduction by up to 80% of the LEB attacks. |
Tasks | |
Published | 2019-05-04 |
URL | https://arxiv.org/abs/1905.01430v1 |
https://arxiv.org/pdf/1905.01430v1.pdf | |
PWC | https://paperswithcode.com/paper/190501430 |
Repo | |
Framework | |
Deep Distance Transform for Tubular Structure Segmentation in CT Scans
Title | Deep Distance Transform for Tubular Structure Segmentation in CT Scans |
Authors | Yan Wang, Xu Wei, Fengze Liu, Jieneng Chen, Yuyin Zhou, Wei Shen, Elliot K. Fishman, Alan L. Yuille |
Abstract | Tubular structure segmentation in medical images, e.g., segmenting vessels in CT scans, serves as a vital step in the use of computers to aid in screening early stages of related diseases. But automatic tubular structure segmentation in CT scans is a challenging problem, due to issues such as poor contrast, noise and complicated background. A tubular structure usually has a cylinder-like shape which can be well represented by its skeleton and cross-sectional radii (scales). Inspired by this, we propose a geometry-aware tubular structure segmentation method, Deep Distance Transform (DDT), which combines intuitions from the classical distance transform for skeletonization and modern deep segmentation networks. DDT first learns a multi-task network to predict a segmentation mask for a tubular structure and a distance map. Each value in the map represents the distance from each tubular structure voxel to the tubular structure surface. Then the segmentation mask is refined by leveraging the shape prior reconstructed from the distance map. We apply our DDT on six medical image datasets. The experiments show that (1) DDT can boost tubular structure segmentation performance significantly (e.g., over 13% improvement measured by DSC for pancreatic duct segmentation), and (2) DDT additionally provides a geometrical measurement for a tubular structure, which is important for clinical diagnosis (e.g., the cross-sectional scale of a pancreatic duct can be an indicator for pancreatic cancer). |
Tasks | |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03383v1 |
https://arxiv.org/pdf/1912.03383v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-distance-transform-for-tubular-structure |
Repo | |
Framework | |
A novel statistical metric learning for hyperspectral image classification
Title | A novel statistical metric learning for hyperspectral image classification |
Authors | Zhiqiang Gong, Ping Zhong, Weidong Hu, Zixuan Xiao, Xuping Yin |
Abstract | In this paper, a novel statistical metric learning is developed for spectral-spatial classification of the hyperspectral image. First, the standard variance of the samples of each class in each batch is used to decrease the intra-class variance within each class. Then, the distances between the means of different classes are used to penalize the inter-class variance of the training samples. Finally, the standard variance between the means of different classes is added as an additional diversity term to repulse different classes from each other. Experiments have conducted over two real-world hyperspectral image datasets and the experimental results have shown the effectiveness of the proposed statistical metric learning. |
Tasks | Hyperspectral Image Classification, Image Classification, Metric Learning |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05087v1 |
https://arxiv.org/pdf/1905.05087v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-statistical-metric-learning-for |
Repo | |
Framework | |
On instabilities of deep learning in image reconstruction - Does AI come at a cost?
Title | On instabilities of deep learning in image reconstruction - Does AI come at a cost? |
Authors | Vegard Antun, Francesco Renna, Clarice Poon, Ben Adcock, Anders C. Hansen |
Abstract | Deep learning, due to its unprecedented success in tasks such as image classification, has emerged as a new tool in image reconstruction with potential to change the field. In this paper we demonstrate a crucial phenomenon: deep learning typically yields unstablemethods for image reconstruction. The instabilities usually occur in several forms: (1) tiny, almost undetectable perturbations, both in the image and sampling domain, may result in severe artefacts in the reconstruction, (2) a small structural change, for example a tumour, may not be captured in the reconstructed image and (3) (a counterintuitive type of instability) more samples may yield poorer performance. Our new stability test with algorithms and easy to use software detects the instability phenomena. The test is aimed at researchers to test their networks for instabilities and for government agencies, such as the Food and Drug Administration (FDA), to secure safe use of deep learning methods. |
Tasks | Image Classification, Image Reconstruction |
Published | 2019-02-14 |
URL | http://arxiv.org/abs/1902.05300v1 |
http://arxiv.org/pdf/1902.05300v1.pdf | |
PWC | https://paperswithcode.com/paper/on-instabilities-of-deep-learning-in-image |
Repo | |
Framework | |
MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation
Title | MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation |
Authors | Sanghyeon Na, Seungjoo Yoo, Jaegul Choo |
Abstract | Unpaired multimodal image-to-image translation is a task of translating a given image in a source domain into diverse images in the target domain, overcoming the limitation of one-to-one mapping. Existing multimodal translation models are mainly based on the disentangled representations with an image reconstruction loss. We propose two approaches to improve multimodal translation quality. First, we use a content representation from the source domain conditioned on a style representation from the target domain. Second, rather than using a typical image reconstruction loss, we design MILO (Mutual Information LOss), a new stochastically-defined loss function based on information theory. This loss function directly reflects the interpretation of latent variables as a random variable. We show that our proposed model Mutual Information with StOchastic Style Representation(MISO) achieves state-of-the-art performance through extensive experiments on various real-world datasets. |
Tasks | Image Reconstruction, Image-to-Image Translation |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.03938v1 |
http://arxiv.org/pdf/1902.03938v1.pdf | |
PWC | https://paperswithcode.com/paper/miso-mutual-information-loss-with-stochastic |
Repo | |
Framework | |