Paper Group ANR 1028
Robust Deep Multi-modal Learning Based on Gated Information Fusion Network. A writer-independent approach for offline signature verification using deep convolutional neural networks features. Classification of Functioning, Disability, and Health for Children and Youth: ICF-CY Self Care (SCADI Dataset) Using Predictive Analytics. Online Bearing Rema …
Robust Deep Multi-modal Learning Based on Gated Information Fusion Network
Title | Robust Deep Multi-modal Learning Based on Gated Information Fusion Network |
Authors | Jaekyum Kim, Junho Koh, Yecheol Kim, Jaehyung Choi, Youngbae Hwang, Jun Won Choi |
Abstract | The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learning architecture in the presence of imperfect modalities. We introduce deep fusion architecture for object detection which processes each modality using the separate convolutional neural network (CNN) and constructs the joint feature map by combining the intermediate features from the CNNs. In order to facilitate the robustness to the degraded modalities, we employ the gated information fusion (GIF) network which weights the contribution from each modality according to the input feature maps to be fused. The weights are determined through the convolutional layers followed by a sigmoid function and trained along with the information fusion network in an end-to-end fashion. Our experiments show that the proposed GIF network offers the additional architectural flexibility to achieve robust performance in handling some degraded modalities, and show a significant performance improvement based on Single Shot Detector (SSD) for KITTI dataset using the proposed fusion network and data augmentation schemes. |
Tasks | Data Augmentation, Object Detection |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06233v2 |
http://arxiv.org/pdf/1807.06233v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-deep-multi-modal-learning-based-on |
Repo | |
Framework | |
A writer-independent approach for offline signature verification using deep convolutional neural networks features
Title | A writer-independent approach for offline signature verification using deep convolutional neural networks features |
Authors | Victor L. F. Souza, Adriano L. I. Oliveira, Robert Sabourin |
Abstract | The use of features extracted using a deep convolutional neural network (CNN) combined with a writer-dependent (WD) SVM classifier resulted in significant improvement in performance of handwritten signature verification (HSV) when compared to the previous state-of-the-art methods. In this work it is investigated whether the use of these CNN features provide good results in a writer-independent (WI) HSV context, based on the dichotomy transformation combined with the use of an SVM writer-independent classifier. The experiments performed in the Brazilian and GPDS datasets show that (i) the proposed approach outperformed other WI-HSV methods from the literature, (ii) in the global threshold scenario, the proposed approach was able to outperform the writer-dependent method with CNN features in the Brazilian dataset, (iii) in an user threshold scenario, the results are similar to those obtained by the writer-dependent method with CNN features. |
Tasks | |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10755v1 |
http://arxiv.org/pdf/1807.10755v1.pdf | |
PWC | https://paperswithcode.com/paper/a-writer-independent-approach-for-offline |
Repo | |
Framework | |
Classification of Functioning, Disability, and Health for Children and Youth: ICF-CY Self Care (SCADI Dataset) Using Predictive Analytics
Title | Classification of Functioning, Disability, and Health for Children and Youth: ICF-CY Self Care (SCADI Dataset) Using Predictive Analytics |
Authors | Avishek Choudhury, Christopher Greene |
Abstract | The International Classification of Functioning, Disability, and Health for Children and Youth (ICF-CY) is a scaffold for designating and systematizing data on functioning and disability. It offers a standard semantic and a theoretical foundation for the demarcation and extent of wellbeing and infirmity. The multidimensional layout of ICF-CY comprehends a plethora of information with about 1400 categories making it difficult to analyze. Our research proposes a predictive model that classify self-care problems on Self-Care Activities Dataset based on the ICF- CY. The data used in this study resides 206 attributes of 70 children with motor and physical disability. Our study implements, compare and analyze Random Forest, Support vector machine, Naive Bayes, Hoeffding tree, and Lazy locally weighted learning using two-tailed T-test at 95% confidence interval. Boruta algorithm involved in the study minimizes the data dimensionality to advocate the minimal-optimal set of predictors. Random forest gave the best classification accuracy of 84.75%; root mean squared error of 0.18 and receiver operating characteristic of 0.99. Predictive analytics can simplify the usage of ICF-CY by automating the classification process of disability, functioning, and health. |
Tasks | |
Published | 2018-12-29 |
URL | https://arxiv.org/abs/1901.00756v3 |
https://arxiv.org/pdf/1901.00756v3.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-functioning-disability-and |
Repo | |
Framework | |
Online Bearing Remaining Useful Life Prediction Based on a Novel Degradation Indicator and Convolutional Neural Networks
Title | Online Bearing Remaining Useful Life Prediction Based on a Novel Degradation Indicator and Convolutional Neural Networks |
Authors | Cheng Cheng, Guijun Ma, Yong Zhang, Mingyang Sun, Fei Teng, Han Ding, Ye Yuan |
Abstract | In industrial applications, nearly half the failures of motors are caused by the degradation of rolling element bearings (REBs). Therefore, accurately estimating the remaining useful life (RUL) for REBs are of crucial importance to ensure the reliability and safety of mechanical systems. To tackle this challenge, model-based approaches are often limited by the complexity of mathematical modeling. Conventional data-driven approaches, on the other hand, require massive efforts to extract the degradation features and construct health index. In this paper, a novel online data-driven framework is proposed to exploit the adoption of deep convolutional neural networks (CNN) in predicting the RUL of bearings. More concretely, the raw vibrations of training bearings are first processed using the Hilbert-Huang transform (HHT) and a novel nonlinear degradation indicator is constructed as the label for learning. The CNN is then employed to identify the hidden pattern between the extracted degradation indicator and the vibration of training bearings, which makes it possible to estimate the degradation of the test bearings automatically. Finally, testing bearings’ RULs are predicted by using a $\epsilon$-support vector regression model. The superior performance of the proposed RUL estimation framework, compared with the state-of-the-art approaches, is demonstrated through the experimental results. The generality of the proposed CNN model is also validated by transferring to bearings undergoing different operating conditions. |
Tasks | |
Published | 2018-12-08 |
URL | http://arxiv.org/abs/1812.03315v1 |
http://arxiv.org/pdf/1812.03315v1.pdf | |
PWC | https://paperswithcode.com/paper/online-bearing-remaining-useful-life |
Repo | |
Framework | |
UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits
Title | UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits |
Authors | Fang Liu, Sinong Wang, Swapna Buccapatnam, Ness Shroff |
Abstract | In this work, we address the open problem of finding low-complexity near-optimal multi-armed bandit algorithms for sequential decision making problems. Existing bandit algorithms are either sub-optimal and computationally simple (e.g., UCB1) or optimal and computationally complex (e.g., kl-UCB). We propose a boosting approach to Upper Confidence Bound based algorithms for stochastic bandits, that we call UCBoost. Specifically, we propose two types of UCBoost algorithms. We show that UCBoost($D$) enjoys $O(1)$ complexity for each arm per round as well as regret guarantee that is $1/e$-close to that of the kl-UCB algorithm. We propose an approximation-based UCBoost algorithm, UCBoost($\epsilon$), that enjoys a regret guarantee $\epsilon$-close to that of kl-UCB as well as $O(\log(1/\epsilon))$ complexity for each arm per round. Hence, our algorithms provide practitioners a practical way to trade optimality with computational complexity. Finally, we present numerical results which show that UCBoost($\epsilon$) can achieve the same regret performance as the standard kl-UCB while incurring only $1%$ of the computational cost of kl-UCB. |
Tasks | Decision Making |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05929v1 |
http://arxiv.org/pdf/1804.05929v1.pdf | |
PWC | https://paperswithcode.com/paper/ucboost-a-boosting-approach-to-tame |
Repo | |
Framework | |
Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack
Title | Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness against Adversarial Attack |
Authors | Adnan Siraj Rakin, Zhezhi He, Deliang Fan |
Abstract | Recent development in the field of Deep Learning have exposed the underlying vulnerability of Deep Neural Network (DNN) against adversarial examples. In image classification, an adversarial example is a carefully modified image that is visually imperceptible to the original image but can cause DNN model to misclassify it. Training the network with Gaussian noise is an effective technique to perform model regularization, thus improving model robustness against input variation. Inspired by this classical method, we explore to utilize the regularization characteristic of noise injection to improve DNN’s robustness against adversarial attack. In this work, we propose Parametric-Noise-Injection (PNI) which involves trainable Gaussian noise injection at each layer on either activation or weights through solving the min-max optimization problem, embedded with adversarial training. These parameters are trained explicitly to achieve improved robustness. To the best of our knowledge, this is the first work that uses trainable noise injection to improve network robustness against adversarial attacks, rather than manually configuring the injected noise level through cross-validation. The extensive results show that our proposed PNI technique effectively improves the robustness against a variety of powerful white-box and black-box attacks such as PGD, C & W, FGSM, transferable attack and ZOO attack. Last but not the least, PNI method improves both clean- and perturbed-data accuracy in comparison to the state-of-the-art defense methods, which outperforms current unbroken PGD defense by 1.1 % and 6.8 % on clean test data and perturbed test data respectively using Resnet-20 architecture. |
Tasks | Adversarial Attack, Adversarial Defense, Image Classification |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09310v1 |
http://arxiv.org/pdf/1811.09310v1.pdf | |
PWC | https://paperswithcode.com/paper/parametric-noise-injection-trainable |
Repo | |
Framework | |
The unreasonable effectiveness of small neural ensembles in high-dimensional brain
Title | The unreasonable effectiveness of small neural ensembles in high-dimensional brain |
Authors | A. N. Gorban, V. A. Makarov, I. Y. Tyukin |
Abstract | Despite the widely-spread consensus on the brain complexity, sprouts of the single neuron revolution emerged in neuroscience in the 1970s. They brought many unexpected discoveries, including grandmother or concept cells and sparse coding of information in the brain. In machine learning for a long time, the famous curse of dimensionality seemed to be an unsolvable problem. Nevertheless, the idea of the blessing of dimensionality becomes gradually more and more popular. Ensembles of non-interacting or weakly interacting simple units prove to be an effective tool for solving essentially multidimensional problems. This approach is especially useful for one-shot (non-iterative) correction of errors in large legacy artificial intelligence systems. These simplicity revolutions in the era of complexity have deep fundamental reasons grounded in geometry of multidimensional data spaces. To explore and understand these reasons we revisit the background ideas of statistical physics. In the course of the 20th century they were developed into the concentration of measure theory. New stochastic separation theorems reveal the fine structure of the data clouds. We review and analyse biological, physical, and mathematical problems at the core of the fundamental question: how can high-dimensional brain organise reliable and fast learning in high-dimensional world of data by simple tools? Two critical applications are reviewed to exemplify the approach: one-shot correction of errors in intellectual systems and emergence of static and associative memories in ensembles of single neurons. |
Tasks | |
Published | 2018-09-20 |
URL | http://arxiv.org/abs/1809.07656v2 |
http://arxiv.org/pdf/1809.07656v2.pdf | |
PWC | https://paperswithcode.com/paper/the-unreasonable-effectiveness-of-small |
Repo | |
Framework | |
Hierarchical Visualization of Materials Space with Graph Convolutional Neural Networks
Title | Hierarchical Visualization of Materials Space with Graph Convolutional Neural Networks |
Authors | Tian Xie, Jeffrey C. Grossman |
Abstract | The combination of high throughput computation and machine learning has led to a new paradigm in materials design by allowing for the direct screening of vast portions of structural, chemical, and property space. The use of these powerful techniques leads to the generation of enormous amounts of data, which in turn calls for new techniques to efficiently explore and visualize the materials space to help identify underlying patterns. In this work, we develop a unified framework to hierarchically visualize the compositional and structural similarities between materials in an arbitrary material space with representations learned from different layers of graph convolutional neural networks. We demonstrate the potential for such a visualization approach by showing that patterns emerge automatically that reflect similarities at different scales in three representative classes of materials: perovskites, elemental boron, and general inorganic crystals, covering material spaces of different compositions, structures, and both. For perovskites, elemental similarities are learned that reflects multiple aspects of atom properties. For elemental boron, structural motifs emerge automatically showing characteristic boron local environments. For inorganic crystals, the similarity and stability of local coordination environments are shown combining different center and neighbor atoms. The method could help transition to a data-centered exploration of materials space in automated materials design. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03404v2 |
http://arxiv.org/pdf/1807.03404v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-visualization-of-materials-space |
Repo | |
Framework | |
Weakly-Supervised Convolutional Neural Networks for Multimodal Image Registration
Title | Weakly-Supervised Convolutional Neural Networks for Multimodal Image Registration |
Authors | Yipeng Hu, Marc Modat, Eli Gibson, Wenqi Li, Nooshin Ghavami, Ester Bonmati, Guotai Wang, Steven Bandula, Caroline M. Moore, Mark Emberton, Sébastien Ourselin, J. Alison Noble, Dean C. Barratt, Tom Vercauteren |
Abstract | One of the fundamental challenges in supervised learning for multimodal image registration is the lack of ground-truth for voxel-level spatial correspondence. This work describes a method to infer voxel-level transformation from higher-level correspondence information contained in anatomical labels. We argue that such labels are more reliable and practical to obtain for reference sets of image pairs than voxel-level correspondence. Typical anatomical labels of interest may include solid organs, vessels, ducts, structure boundaries and other subject-specific ad hoc landmarks. The proposed end-to-end convolutional neural network approach aims to predict displacement fields to align multiple labelled corresponding structures for individual image pairs during the training, while only unlabelled image pairs are used as the network input for inference. We highlight the versatility of the proposed strategy, for training, utilising diverse types of anatomical labels, which need not to be identifiable over all training image pairs. At inference, the resulting 3D deformable image registration algorithm runs in real-time and is fully-automated without requiring any anatomical labels or initialisation. Several network architecture variants are compared for registering T2-weighted magnetic resonance images and 3D transrectal ultrasound images from prostate cancer patients. A median target registration error of 3.6 mm on landmark centroids and a median Dice of 0.87 on prostate glands are achieved from cross-validation experiments, in which 108 pairs of multimodal images from 76 patients were tested with high-quality anatomical labels. |
Tasks | Image Registration |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03361v1 |
http://arxiv.org/pdf/1807.03361v1.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-convolutional-neural |
Repo | |
Framework | |
Surveillance Face Recognition Challenge
Title | Surveillance Face Recognition Challenge |
Authors | Zhiyi Cheng, Xiatian Zhu, Shaogang Gong |
Abstract | Face recognition (FR) is one of the most extensively investigated problems in computer vision. Significant progress in FR has been made due to the recent introduction of the larger scale FR challenges, particularly with constrained social media web images, e.g. high-resolution photos of celebrity faces taken by professional photo-journalists. However, the more challenging FR in unconstrained and low-resolution surveillance images remains largely under-studied. To facilitate more studies on developing FR models that are effective and robust for low-resolution surveillance facial images, we introduce a new Surveillance Face Recognition Challenge, which we call the QMUL-SurvFace benchmark. This new benchmark is the largest and more importantly the only true surveillance FR benchmark to our best knowledge, where low-resolution images are not synthesised by artificial down-sampling of native high-resolution images. This challenge contains 463,507 face images of 15,573 distinct identities captured in real-world uncooperative surveillance scenes over wide space and time. As a consequence, it presents an extremely challenging FR benchmark. We benchmark the FR performance on this challenge using five representative deep learning face recognition models, in comparison to existing benchmarks. We show that the current state of the arts are still far from being satisfactory to tackle the under-investigated surveillance FR problem in practical forensic scenarios. Face recognition is generally more difficult in an open-set setting which is typical for surveillance scenarios, owing to a large number of non-target people (distractors) appearing open spaced scenes. This is evidently so that on the new Surveillance FR Challenge, the top-performing CentreFace deep learning FR model on the MegaFace benchmark can now only achieve 13.2% success rate (at Rank-20) at a 10% false alarm rate. |
Tasks | Face Recognition |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1804.09691v6 |
http://arxiv.org/pdf/1804.09691v6.pdf | |
PWC | https://paperswithcode.com/paper/surveillance-face-recognition-challenge |
Repo | |
Framework | |
Mode matching in GANs through latent space learning and inversion
Title | Mode matching in GANs through latent space learning and inversion |
Authors | Deepak Mishra, Prathosh A. P., Aravind Jayendran, Varun Srivastava, Santanu Chaudhury |
Abstract | Generative adversarial networks (GANs) have shown remarkable success in generation of unstructured data, such as, natural images. However, discovery and separation of modes in the generated space, essential for several tasks beyond naive data generation, is still a challenge. In this paper, we address the problem of imposing desired modal properties on the generated space using a latent distribution, engineered in accordance with the modal properties of the true data distribution. This is achieved by training a latent space inversion network in tandem with the generative network using a divergence loss. The latent space is made to follow a continuous multimodal distribution generated by reparameterization of a pair of continuous and discrete random variables. In addition, the modal priors of the latent distribution are learned to match with the true data distribution using minimal-supervision with negligible increment in number of learnable parameters. We validate our method on multiple tasks such as mode separation, conditional generation, and attribute discovery on multiple real world image datasets and demonstrate its efficacy over other state-of-the-art methods. |
Tasks | |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03692v3 |
http://arxiv.org/pdf/1811.03692v3.pdf | |
PWC | https://paperswithcode.com/paper/nemgan-noise-engineered-mode-matching-gan |
Repo | |
Framework | |
Defending against Adversarial Attack towards Deep Neural Networks via Collaborative Multi-task Training
Title | Defending against Adversarial Attack towards Deep Neural Networks via Collaborative Multi-task Training |
Authors | Derek Wang, Chaoran Li, Sheng Wen, Surya Nepal, Yang Xiang |
Abstract | Deep neural networks (DNNs) are known to be vulnerable to adversarial examples which contain imperceptible perturbations. A series of defending methods, either proactive defence or reactive defence, have been proposed in the recent years. However, most of the methods can only handle specific attacks. For example, proactive defending methods are invalid against grey-box or white-box attack, while reactive defending methods are challenged by low-distortion adversarial examples or transferring adversarial examples. This becomes a critical problem since a defender usually do not have the type of the attack as a priori knowledge. Moreover, the two-pronged defence (e.g. MagNet), which takes the advantages of both proactive and reactive methods, has been reported as broken under transferring attacks. To address this problem, this paper proposed a novel defensive framework based on collaborative multi-task training, aiming at providing defence for different types of attacks. The proposed defence first encodes training labels into label pairs and counters black-box attack leveraging adversarial training supervised by the encoded label pairs. The defence further constructs a detector to identify and reject high-confidence adversarial examples that bypass the black-box defence. In addition, the proposed collaborative architecture can prevent adversaries from finding valid adversarial examples when the defending strategy is exposed. As far as we know, our method is a new two-pronged defence that is resilient to the transferring attack targeting MagNet. |
Tasks | Adversarial Attack |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05123v3 |
http://arxiv.org/pdf/1803.05123v3.pdf | |
PWC | https://paperswithcode.com/paper/defending-against-adversarial-attack-towards |
Repo | |
Framework | |
Multimodal Image Denoising based on Coupled Dictionary Learning
Title | Multimodal Image Denoising based on Coupled Dictionary Learning |
Authors | Pingfan Song, Miguel R. D. Rodrigues |
Abstract | In this paper, we propose a new multimodal image denoising approach to attenuate white Gaussian additive noise in a given image modality under the aid of a guidance image modality. The proposed coupled image denoising approach consists of two stages: coupled sparse coding and reconstruction. The first stage performs joint sparse transform for multimodal images with respect to a group of learned coupled dictionaries, followed by a shrinkage operation on the sparse representations. Then, in the second stage, the shrunken representations, together with coupled dictionaries, contribute to the reconstruction of the denoised image via an inverse transform. The proposed denoising scheme demonstrates the capability to capture both the common and distinct features of different data modalities. This capability makes our approach more robust to inconsistencies between the guidance and the target images, thereby overcoming drawbacks such as the texture copying artifacts. Experiments on real multimodal images demonstrate that the proposed approach is able to better employ guidance information to bring notable benefits in the image denoising task with respect to the state-of-the-art. |
Tasks | Denoising, Dictionary Learning, Image Denoising |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.10678v1 |
http://arxiv.org/pdf/1806.10678v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-image-denoising-based-on-coupled |
Repo | |
Framework | |
Scattering Networks for Hybrid Representation Learning
Title | Scattering Networks for Hybrid Representation Learning |
Authors | Edouard Oyallon, Sergey Zagoruyko, Gabriel Huang, Nikos Komodakis, Simon Lacoste-Julien, Matthew Blaschko, Eugene Belilovsky |
Abstract | Scattering networks are a class of designed Convolutional Neural Networks (CNNs) with fixed weights. We argue they can serve as generic representations for modelling images. In particular, by working in scattering space, we achieve competitive results both for supervised and unsupervised learning tasks, while making progress towards constructing more interpretable CNNs. For supervised learning, we demonstrate that the early layers of CNNs do not necessarily need to be learned, and can be replaced with a scattering network instead. Indeed, using hybrid architectures, we achieve the best results with predefined representations to-date, while being competitive with end-to-end learned CNNs. Specifically, even applying a shallow cascade of small-windowed scattering coefficients followed by 1$\times$1-convolutions results in AlexNet accuracy on the ILSVRC2012 classification task. Moreover, by combining scattering networks with deep residual networks, we achieve a single-crop top-5 error of 11.4% on ILSVRC2012. Also, we show they can yield excellent performance in the small sample regime on CIFAR-10 and STL-10 datasets, exceeding their end-to-end counterparts, through their ability to incorporate geometrical priors. For unsupervised learning, scattering coefficients can be a competitive representation that permits image recovery. We use this fact to train hybrid GANs to generate images. Finally, we empirically analyze several properties related to stability and reconstruction of images from scattering coefficients. |
Tasks | Representation Learning |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06367v1 |
http://arxiv.org/pdf/1809.06367v1.pdf | |
PWC | https://paperswithcode.com/paper/scattering-networks-for-hybrid-representation |
Repo | |
Framework | |
Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions
Title | Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions |
Authors | Quanzeng You, Hailin Jin, Jiebo Luo |
Abstract | Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of the current models can only generate plain factual descriptions about the content of a given image. However, for human beings, image caption writing is quite flexible and diverse, where additional language dimensions, such as emotion, humor and language styles, are often incorporated to produce diverse, emotional, or appealing captions. In particular, we are interested in generating sentiment-conveying image descriptions, which has received little attention. The main challenge is how to effectively inject sentiments into the generated captions without altering the semantic matching between the visual content and the generated descriptions. In this work, we propose two different models, which employ different schemes for injecting sentiments into image captions. Compared with the few existing approaches, the proposed models are much simpler and yet more effective. The experimental results show that our model outperform the state-of-the-art models in generating sentimental (i.e., sentiment-bearing) image captions. In addition, we can also easily manipulate the model by assigning different sentiments to the testing image to generate captions with the corresponding sentiments. |
Tasks | Image Captioning |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.10121v1 |
http://arxiv.org/pdf/1801.10121v1.pdf | |
PWC | https://paperswithcode.com/paper/image-captioning-at-will-a-versatile-scheme |
Repo | |
Framework | |