Paper Group AWR 130
Zoom Out-and-In Network with Recursive Training for Object Proposal. Structured Attentions for Visual Question Answering. Multilingual Hierarchical Attention Networks for Document Classification. Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections. On the Automatic Generation of Medical Imaging Repor …
Zoom Out-and-In Network with Recursive Training for Object Proposal
Title | Zoom Out-and-In Network with Recursive Training for Object Proposal |
Authors | Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang |
Abstract | In this paper, we propose a zoom-out-and-in network for generating object proposals. We utilize different resolutions of feature maps in the network to detect object instances of various sizes. Specifically, we divide the anchor candidates into three clusters based on the scale size and place them on feature maps of distinct strides to detect small, medium and large objects, respectively. Deeper feature maps contain region-level semantics which can help shallow counterparts to identify small objects. Therefore we design a zoom-in sub-network to increase the resolution of high level features via a deconvolution operation. The high-level features with high resolution are then combined and merged with low-level features to detect objects. Furthermore, we devise a recursive training pipeline to consecutively regress region proposals at the training stage in order to match the iterative regression at the testing stage. We demonstrate the effectiveness of the proposed method on ILSVRC DET and MS COCO datasets, where our algorithm performs better than the state-of-the-arts in various evaluation metrics. It also increases average precision by around 2% in the detection system. |
Tasks | |
Published | 2017-02-19 |
URL | http://arxiv.org/abs/1702.05711v1 |
http://arxiv.org/pdf/1702.05711v1.pdf | |
PWC | https://paperswithcode.com/paper/zoom-out-and-in-network-with-recursive |
Repo | https://github.com/hli2020/zoom_network |
Framework | none |
Structured Attentions for Visual Question Answering
Title | Structured Attentions for Visual Question Answering |
Authors | Chen Zhu, Yanpeng Zhao, Shuaiyi Huang, Kewei Tu, Yi Ma |
Abstract | Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex relations among multiple regions, few attention models can effectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Conditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evaluated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset by 9.5%, and the best published model on the VQA dataset by 1.25%. Source code is available at https: //github.com/zhuchen03/vqa-sva. |
Tasks | Visual Question Answering |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02071v1 |
http://arxiv.org/pdf/1708.02071v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-attentions-for-visual-question |
Repo | https://github.com/zhuchen03/vqa-sva |
Framework | pytorch |
Multilingual Hierarchical Attention Networks for Document Classification
Title | Multilingual Hierarchical Attention Networks for Document Classification |
Authors | Nikolaos Pappas, Andrei Popescu-Belis |
Abstract | Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. However, when multilingual document collections are considered, training such models separately for each language entails linear parameter growth and lack of cross-language transfer. Learning a single multilingual model with fewer parameters is therefore a challenging but potentially beneficial objective. To this end, we propose multilingual hierarchical attention networks for learning document structures, with shared encoders and/or shared attention mechanisms across languages, using multi-task learning and an aligned semantic space as input. We evaluate the proposed models on multilingual document classification with disjoint label sets, on a large dataset which we provide, with 600k news documents in 8 languages, and 5k labels. The multilingual models outperform monolingual ones in low-resource as well as full-resource settings, and use fewer parameters, thus confirming their computational efficiency and the utility of cross-language transfer. |
Tasks | Document Classification, Multi-Task Learning, Transfer Learning |
Published | 2017-07-04 |
URL | http://arxiv.org/abs/1707.00896v4 |
http://arxiv.org/pdf/1707.00896v4.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-hierarchical-attention-networks |
Repo | https://github.com/idiap/gile |
Framework | none |
Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections
Title | Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections |
Authors | Tomas Wilkinson, Jonas Lindström, Anders Brun |
Abstract | In this paper, we approach the problem of segmentation-free query-by-string word spotting for handwritten documents. In other words, we use methods inspired from computer vision and machine learning to search for words in large collections of digitized manuscripts. In particular, we are interested in historical handwritten texts, which are often far more challenging than modern printed documents. This task is important, as it provides people with a way to quickly find what they are looking for in large collections that are tedious and difficult to read manually. To this end, we introduce an end-to-end trainable model based on deep neural networks that we call Ctrl-F-Net. Given a full manuscript page, the model simultaneously generates region proposals, and embeds these into a distributed word embedding space, where searches are performed. We evaluate the model on common benchmarks for handwritten word spotting, outperforming the previous state-of-the-art segmentation-free approaches by a large margin, and in some cases even segmentation-based approaches. One interesting real-life application of our approach is to help historians to find and count specific words in court records that are related to women’s sustenance activities and division of labor. We provide promising preliminary experiments that validate our method on this task. |
Tasks | |
Published | 2017-03-22 |
URL | http://arxiv.org/abs/1703.07645v2 |
http://arxiv.org/pdf/1703.07645v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-ctrl-f-segmentation-free-query-by |
Repo | https://github.com/tomfalainen/neural-ctrlf |
Framework | torch |
On the Automatic Generation of Medical Imaging Reports
Title | On the Automatic Generation of Medical Imaging Reports |
Authors | Baoyu Jing, Pengtao Xie, Eric Xing |
Abstract | Medical imaging is widely used in clinical practice for diagnosis and treatment. Report-writing can be error-prone for unexperienced physicians, and time- consuming and tedious for experienced physicians. To address these issues, we study the automatic generation of medical imaging reports. This task presents several challenges. First, a complete report contains multiple heterogeneous forms of information, including findings and tags. Second, abnormal regions in medical images are difficult to identify. Third, the re- ports are typically long, containing multiple sentences. To cope with these challenges, we (1) build a multi-task learning framework which jointly performs the pre- diction of tags and the generation of para- graphs, (2) propose a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, (3) develop a hierarchical LSTM model to generate long paragraphs. We demonstrate the effectiveness of the proposed methods on two publicly available datasets. |
Tasks | Medical Report Generation, Multi-Task Learning |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08195v3 |
http://arxiv.org/pdf/1711.08195v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-automatic-generation-of-medical |
Repo | https://github.com/ZexinYan/Medical-Report-Generation |
Framework | pytorch |
PyPhi: A toolbox for integrated information theory
Title | PyPhi: A toolbox for integrated information theory |
Authors | William G. P. Mayner, William Marshall, Larissa Albantakis, Graham Findlay, Robert Marchman, Giulio Tononi |
Abstract | Integrated information theory provides a mathematical framework to fully characterize the cause-effect structure of a physical system. Here, we introduce PyPhi, a Python software package that implements this framework for causal analysis and unfolds the full cause-effect structure of discrete dynamical systems of binary elements. The software allows users to easily study these structures, serves as an up-to-date reference implementation of the formalisms of integrated information theory, and has been applied in research on complexity, emergence, and certain biological questions. We first provide an overview of the main algorithm and demonstrate PyPhi’s functionality in the course of analyzing an example system, and then describe details of the algorithm’s design and implementation. PyPhi can be installed with Python’s package manager via the command ‘pip install pyphi’ on Linux and macOS systems equipped with Python 3.4 or higher. PyPhi is open-source and licensed under the GPLv3; the source code is hosted on GitHub at https://github.com/wmayner/pyphi . Comprehensive and continually-updated documentation is available at https://pyphi.readthedocs.io/ . The pyphi-users mailing list can be joined at https://groups.google.com/forum/#!forum/pyphi-users . A web-based graphical interface to the software is available at http://integratedinformationtheory.org/calculate.html . |
Tasks | |
Published | 2017-12-27 |
URL | http://arxiv.org/abs/1712.09644v3 |
http://arxiv.org/pdf/1712.09644v3.pdf | |
PWC | https://paperswithcode.com/paper/pyphi-a-toolbox-for-integrated-information |
Repo | https://github.com/wmayner/pyphi |
Framework | none |
Estimating speech from lip dynamics
Title | Estimating speech from lip dynamics |
Authors | Jithin Donny George, Ronan Keane, Conor Zellmer |
Abstract | The goal of this project is to develop a limited lip reading algorithm for a subset of the English language. We consider a scenario in which no audio information is available. The raw video is processed and the position of the lips in each frame is extracted. We then prepare the lip data for processing and classify the lips into visemes and phonemes. Hidden Markov Models are used to predict the words the speaker is saying based on the sequences of classified phonemes and visemes. The GRID audiovisual sentence corpus [10][11] database is used for our study. |
Tasks | |
Published | 2017-08-03 |
URL | http://arxiv.org/abs/1708.01198v1 |
http://arxiv.org/pdf/1708.01198v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-speech-from-lip-dynamics |
Repo | https://github.com/Dirivian/dynamic_lips |
Framework | none |
Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Title | Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs |
Authors | Yi Tay, Luu Anh Tuan, Minh C. Phan, Siu Cheung Hui |
Abstract | Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a list of non-discrete attributes for each entity. Intuitively, these attributes such as height, price or population count are able to richly characterize entities in knowledge graphs. This additional source of information may help to alleviate the inherent sparsity and incompleteness problem that are prevalent in knowledge graphs. Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs. In this paper, we propose a novel multi-task neural network approach for both encoding and prediction of non-discrete attribute information in a relational setting. Specifically, we train a neural network for triplet prediction along with a separate network for attribute value regression. Via multi-task learning, we are able to learn representations of entities, relations and attributes that encode information about both tasks. Moreover, such attributes are not only central to many predictive tasks as an information source but also as a prediction target. Therefore, models that are able to encode, incorporate and predict such information in a relational learning context are highly attractive as well. We show that our approach outperforms many state-of-the-art methods for the tasks of relational triplet classification and attribute value prediction. |
Tasks | Knowledge Graphs, Multi-Task Learning, Relational Reasoning |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.04828v1 |
http://arxiv.org/pdf/1708.04828v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-neural-network-for-non-discrete |
Repo | https://github.com/BaeSeulki/WhySoMuch |
Framework | none |
Order-Free RNN with Visual Attention for Multi-Label Classification
Title | Order-Free RNN with Visual Attention for Multi-Label Classification |
Authors | Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, Yu-Chiang Frank Wang |
Abstract | In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model. |
Tasks | Image Captioning, Multi-Label Classification |
Published | 2017-07-18 |
URL | http://arxiv.org/abs/1707.05495v3 |
http://arxiv.org/pdf/1707.05495v3.pdf | |
PWC | https://paperswithcode.com/paper/order-free-rnn-with-visual-attention-for |
Repo | https://github.com/EricYangsw/Multi-Label-Classification |
Framework | tf |
Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection
Title | Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection |
Authors | Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn |
Abstract | We present a novel approach for vanishing point detection from uncalibrated monocular images. In contrast to state-of-the-art, we make no a priori assumptions about the observed scene. Our method is based on a convolutional neural network (CNN) which does not use natural images, but a Gaussian sphere representation arising from an inverse gnomonic projection of lines detected in an image. This allows us to rely on synthetic data for training, eliminating the need for labelled images. Our method achieves competitive performance on three horizon estimation benchmark datasets. We further highlight some additional use cases for which our vanishing point detection algorithm can be used. |
Tasks | Horizon Line Estimation |
Published | 2017-07-08 |
URL | http://arxiv.org/abs/1707.02427v2 |
http://arxiv.org/pdf/1707.02427v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-vanishing-point-detection |
Repo | https://github.com/fkluger/Vanishing_Points_GCPR17 |
Framework | none |
SLING: A framework for frame semantic parsing
Title | SLING: A framework for frame semantic parsing |
Authors | Michael Ringgaard, Rahul Gupta, Fernando C. N. Pereira |
Abstract | We describe SLING, a framework for parsing natural language into semantic frames. SLING supports general transition-based, neural-network parsing with bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. The parsing model is trained end-to-end using only the text tokens as input. The transition system has been designed to output frame graphs directly without any intervening symbolic representation. The SLING framework includes an efficient and scalable frame store implementation as well as a neural network JIT compiler for fast inference during parsing. SLING is implemented in C++ and it is available for download on GitHub. |
Tasks | Semantic Parsing |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07032v1 |
http://arxiv.org/pdf/1710.07032v1.pdf | |
PWC | https://paperswithcode.com/paper/sling-a-framework-for-frame-semantic-parsing |
Repo | https://github.com/google/sling |
Framework | none |
Class-Splitting Generative Adversarial Networks
Title | Class-Splitting Generative Adversarial Networks |
Authors | Guillermo L. Grinblat, Lucas C. Uzal, Pablo M. Granitto |
Abstract | Generative Adversarial Networks (GANs) produce systematically better quality samples when class label information is provided., i.e. in the conditional GAN setup. This is still observed for the recently proposed Wasserstein GAN formulation which stabilized adversarial training and allows considering high capacity network architectures such as ResNet. In this work we show how to boost conditional GAN by augmenting available class labels. The new classes come from clustering in the representation space learned by the same GAN model. The proposed strategy is also feasible when no class information is available, i.e. in the unsupervised setup. Our generated samples reach state-of-the-art Inception scores for CIFAR-10 and STL-10 datasets in both supervised and unsupervised setup. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07359v2 |
http://arxiv.org/pdf/1709.07359v2.pdf | |
PWC | https://paperswithcode.com/paper/class-splitting-generative-adversarial |
Repo | https://github.com/CIFASIS/splitting_gan |
Framework | tf |
Deep Learning for IoT Big Data and Streaming Analytics: A Survey
Title | Deep Learning for IoT Big Data and Streaming Analytics: A Survey |
Authors | Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, Mohsen Guizani |
Abstract | In the era of the Internet of Things (IoT), an enormous amount of sensing devices collect and/or generate various sensory data over time for a wide range of fields and applications. Based on the nature of the application, these devices will result in big or fast/real-time data streams. Applying analytics over such data streams to discover new information, predict future insights, and make control decisions is a crucial process that makes IoT a worthy paradigm for businesses and a quality-of-life improving technology. In this paper, we provide a thorough overview on using a class of advanced machine learning techniques, namely Deep Learning (DL), to facilitate the analytics and learning in the IoT domain. We start by articulating IoT data characteristics and identifying two major treatments for IoT data from a machine learning perspective, namely IoT big data analytics and IoT streaming data analytics. We also discuss why DL is a promising approach to achieve the desired analytics in these types of data and applications. The potential of using emerging DL techniques for IoT data analytics are then discussed, and its promises and challenges are introduced. We present a comprehensive background on different DL architectures and algorithms. We also analyze and summarize major reported research attempts that leveraged DL in the IoT domain. The smart IoT devices that have incorporated DL in their intelligence background are also discussed. DL implementation approaches on the fog and cloud centers in support of IoT applications are also surveyed. Finally, we shed light on some challenges and potential directions for future research. At the end of each section, we highlight the lessons learned based on our experiments and review of the recent literature. |
Tasks | |
Published | 2017-12-09 |
URL | http://arxiv.org/abs/1712.04301v2 |
http://arxiv.org/pdf/1712.04301v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-iot-big-data-and-streaming |
Repo | https://github.com/avinashbarnwal/Deep-Learning |
Framework | pytorch |
Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network
Title | Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network |
Authors | Fangzhou Liao, Ming Liang, Zhe Li, Xiaolin Hu, Sen Song |
Abstract | Automatic diagnosing lung cancer from Computed Tomography (CT) scans involves two steps: detect all suspicious lesions (pulmonary nodules) and evaluate the whole-lung/pulmonary malignancy. Currently, there are many studies about the first step, but few about the second step. Since the existence of nodule does not definitely indicate cancer, and the morphology of nodule has a complicated relationship with cancer, the diagnosis of lung cancer demands careful investigations on every suspicious nodule and integration of information of all nodules. We propose a 3D deep neural network to solve this problem. The model consists of two modules. The first one is a 3D region proposal network for nodule detection, which outputs all suspicious nodules for a subject. The second one selects the top five nodules based on the detection confidence, evaluates their cancer probabilities and combines them with a leaky noisy-or gate to obtain the probability of lung cancer for the subject. The two modules share the same backbone network, a modified U-net. The over-fitting caused by the shortage of training data is alleviated by training the two modules alternately. The proposed model won the first place in the Data Science Bowl 2017 competition. The code has been made publicly available. |
Tasks | Computed Tomography (CT) |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08324v1 |
http://arxiv.org/pdf/1711.08324v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluate-the-malignancy-of-pulmonary-nodules |
Repo | https://github.com/Hydron063/Rage |
Framework | pytorch |
Hit Song Prediction for Pop Music by Siamese CNN with Ranking Loss
Title | Hit Song Prediction for Pop Music by Siamese CNN with Ranking Loss |
Authors | Lang-Chi Yu, Yi-Hsuan Yang, Yun-Ning Hung, Yi-An Chen |
Abstract | A model for hit song prediction can be used in the pop music industry to identify emerging trends and potential artists or songs before they are marketed to the public. While most previous work formulates hit song prediction as a regression or classification problem, we present in this paper a convolutional neural network (CNN) model that treats it as a ranking problem. Specifically, we use a commercial dataset with daily play-counts to train a multi-objective Siamese CNN model with Euclidean loss and pairwise ranking loss to learn from audio the relative ranking relations among songs. Besides, we devise a number of pair sampling methods according to some empirical observation of the data. Our experiment shows that the proposed model with a sampling method called A/B sampling leads to much higher accuracy in hit song prediction than the baseline regression model. Moreover, we can further improve the accuracy by using a neural attention mechanism to extract the highlights of songs and by using a separate CNN model to offer high-level features of songs. |
Tasks | |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.10814v1 |
http://arxiv.org/pdf/1710.10814v1.pdf | |
PWC | https://paperswithcode.com/paper/hit-song-prediction-for-pop-music-by-siamese |
Repo | https://github.com/OckhamsRazor/HSP_CNN |
Framework | none |