July 29, 2019

3120 words 15 mins read

Paper Group AWR 130

Zoom Out-and-In Network with Recursive Training for Object Proposal. Structured Attentions for Visual Question Answering. Multilingual Hierarchical Attention Networks for Document Classification. Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections. On the Automatic Generation of Medical Imaging Repor …

Zoom Out-and-In Network with Recursive Training for Object Proposal


Title	Zoom Out-and-In Network with Recursive Training for Object Proposal
Authors	Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang
Abstract	In this paper, we propose a zoom-out-and-in network for generating object proposals. We utilize different resolutions of feature maps in the network to detect object instances of various sizes. Specifically, we divide the anchor candidates into three clusters based on the scale size and place them on feature maps of distinct strides to detect small, medium and large objects, respectively. Deeper feature maps contain region-level semantics which can help shallow counterparts to identify small objects. Therefore we design a zoom-in sub-network to increase the resolution of high level features via a deconvolution operation. The high-level features with high resolution are then combined and merged with low-level features to detect objects. Furthermore, we devise a recursive training pipeline to consecutively regress region proposals at the training stage in order to match the iterative regression at the testing stage. We demonstrate the effectiveness of the proposed method on ILSVRC DET and MS COCO datasets, where our algorithm performs better than the state-of-the-arts in various evaluation metrics. It also increases average precision by around 2% in the detection system.
Tasks
Published	2017-02-19
URL	http://arxiv.org/abs/1702.05711v1
PDF	http://arxiv.org/pdf/1702.05711v1.pdf
PWC	https://paperswithcode.com/paper/zoom-out-and-in-network-with-recursive
Repo	https://github.com/hli2020/zoom_network
Framework	none

Structured Attentions for Visual Question Answering


Title	Structured Attentions for Visual Question Answering
Authors	Chen Zhu, Yanpeng Zhao, Shuaiyi Huang, Kewei Tu, Yi Ma
Abstract	Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex relations among multiple regions, few attention models can effectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Conditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evaluated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset by 9.5%, and the best published model on the VQA dataset by 1.25%. Source code is available at https: //github.com/zhuchen03/vqa-sva.
Tasks	Visual Question Answering
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02071v1
PDF	http://arxiv.org/pdf/1708.02071v1.pdf
PWC	https://paperswithcode.com/paper/structured-attentions-for-visual-question
Repo	https://github.com/zhuchen03/vqa-sva
Framework	pytorch

Multilingual Hierarchical Attention Networks for Document Classification


Title	Multilingual Hierarchical Attention Networks for Document Classification
Authors	Nikolaos Pappas, Andrei Popescu-Belis
Abstract	Hierarchical attention networks have recently achieved remarkable performance for document classification in a given language. However, when multilingual document collections are considered, training such models separately for each language entails linear parameter growth and lack of cross-language transfer. Learning a single multilingual model with fewer parameters is therefore a challenging but potentially beneficial objective. To this end, we propose multilingual hierarchical attention networks for learning document structures, with shared encoders and/or shared attention mechanisms across languages, using multi-task learning and an aligned semantic space as input. We evaluate the proposed models on multilingual document classification with disjoint label sets, on a large dataset which we provide, with 600k news documents in 8 languages, and 5k labels. The multilingual models outperform monolingual ones in low-resource as well as full-resource settings, and use fewer parameters, thus confirming their computational efficiency and the utility of cross-language transfer.
Tasks	Document Classification, Multi-Task Learning, Transfer Learning
Published	2017-07-04
URL	http://arxiv.org/abs/1707.00896v4
PDF	http://arxiv.org/pdf/1707.00896v4.pdf
PWC	https://paperswithcode.com/paper/multilingual-hierarchical-attention-networks
Repo	https://github.com/idiap/gile
Framework	none

Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections


Title	Neural Ctrl-F: Segmentation-free Query-by-String Word Spotting in Handwritten Manuscript Collections
Authors	Tomas Wilkinson, Jonas Lindström, Anders Brun
Abstract	In this paper, we approach the problem of segmentation-free query-by-string word spotting for handwritten documents. In other words, we use methods inspired from computer vision and machine learning to search for words in large collections of digitized manuscripts. In particular, we are interested in historical handwritten texts, which are often far more challenging than modern printed documents. This task is important, as it provides people with a way to quickly find what they are looking for in large collections that are tedious and difficult to read manually. To this end, we introduce an end-to-end trainable model based on deep neural networks that we call Ctrl-F-Net. Given a full manuscript page, the model simultaneously generates region proposals, and embeds these into a distributed word embedding space, where searches are performed. We evaluate the model on common benchmarks for handwritten word spotting, outperforming the previous state-of-the-art segmentation-free approaches by a large margin, and in some cases even segmentation-based approaches. One interesting real-life application of our approach is to help historians to find and count specific words in court records that are related to women’s sustenance activities and division of labor. We provide promising preliminary experiments that validate our method on this task.
Tasks
Published	2017-03-22
URL	http://arxiv.org/abs/1703.07645v2
PDF	http://arxiv.org/pdf/1703.07645v2.pdf
PWC	https://paperswithcode.com/paper/neural-ctrl-f-segmentation-free-query-by
Repo	https://github.com/tomfalainen/neural-ctrlf
Framework	torch

On the Automatic Generation of Medical Imaging Reports


Title	On the Automatic Generation of Medical Imaging Reports
Authors	Baoyu Jing, Pengtao Xie, Eric Xing
Abstract	Medical imaging is widely used in clinical practice for diagnosis and treatment. Report-writing can be error-prone for unexperienced physicians, and time- consuming and tedious for experienced physicians. To address these issues, we study the automatic generation of medical imaging reports. This task presents several challenges. First, a complete report contains multiple heterogeneous forms of information, including findings and tags. Second, abnormal regions in medical images are difficult to identify. Third, the re- ports are typically long, containing multiple sentences. To cope with these challenges, we (1) build a multi-task learning framework which jointly performs the pre- diction of tags and the generation of para- graphs, (2) propose a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, (3) develop a hierarchical LSTM model to generate long paragraphs. We demonstrate the effectiveness of the proposed methods on two publicly available datasets.
Tasks	Medical Report Generation, Multi-Task Learning
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08195v3
PDF	http://arxiv.org/pdf/1711.08195v3.pdf
PWC	https://paperswithcode.com/paper/on-the-automatic-generation-of-medical
Repo	https://github.com/ZexinYan/Medical-Report-Generation
Framework	pytorch

PyPhi: A toolbox for integrated information theory


Title	PyPhi: A toolbox for integrated information theory
Authors	William G. P. Mayner, William Marshall, Larissa Albantakis, Graham Findlay, Robert Marchman, Giulio Tononi
Abstract	Integrated information theory provides a mathematical framework to fully characterize the cause-effect structure of a physical system. Here, we introduce PyPhi, a Python software package that implements this framework for causal analysis and unfolds the full cause-effect structure of discrete dynamical systems of binary elements. The software allows users to easily study these structures, serves as an up-to-date reference implementation of the formalisms of integrated information theory, and has been applied in research on complexity, emergence, and certain biological questions. We first provide an overview of the main algorithm and demonstrate PyPhi’s functionality in the course of analyzing an example system, and then describe details of the algorithm’s design and implementation. PyPhi can be installed with Python’s package manager via the command ‘pip install pyphi’ on Linux and macOS systems equipped with Python 3.4 or higher. PyPhi is open-source and licensed under the GPLv3; the source code is hosted on GitHub at https://github.com/wmayner/pyphi . Comprehensive and continually-updated documentation is available at https://pyphi.readthedocs.io/ . The pyphi-users mailing list can be joined at https://groups.google.com/forum/#!forum/pyphi-users . A web-based graphical interface to the software is available at http://integratedinformationtheory.org/calculate.html .
Tasks
Published	2017-12-27
URL	http://arxiv.org/abs/1712.09644v3
PDF	http://arxiv.org/pdf/1712.09644v3.pdf
PWC	https://paperswithcode.com/paper/pyphi-a-toolbox-for-integrated-information
Repo	https://github.com/wmayner/pyphi
Framework	none

Estimating speech from lip dynamics


Title	Estimating speech from lip dynamics
Authors	Jithin Donny George, Ronan Keane, Conor Zellmer
Abstract	The goal of this project is to develop a limited lip reading algorithm for a subset of the English language. We consider a scenario in which no audio information is available. The raw video is processed and the position of the lips in each frame is extracted. We then prepare the lip data for processing and classify the lips into visemes and phonemes. Hidden Markov Models are used to predict the words the speaker is saying based on the sequences of classified phonemes and visemes. The GRID audiovisual sentence corpus [10][11] database is used for our study.
Tasks
Published	2017-08-03
URL	http://arxiv.org/abs/1708.01198v1
PDF	http://arxiv.org/pdf/1708.01198v1.pdf
PWC	https://paperswithcode.com/paper/estimating-speech-from-lip-dynamics
Repo	https://github.com/Dirivian/dynamic_lips
Framework	none

Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs


Title	Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Authors	Yi Tay, Luu Anh Tuan, Minh C. Phan, Siu Cheung Hui
Abstract	Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a list of non-discrete attributes for each entity. Intuitively, these attributes such as height, price or population count are able to richly characterize entities in knowledge graphs. This additional source of information may help to alleviate the inherent sparsity and incompleteness problem that are prevalent in knowledge graphs. Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs. In this paper, we propose a novel multi-task neural network approach for both encoding and prediction of non-discrete attribute information in a relational setting. Specifically, we train a neural network for triplet prediction along with a separate network for attribute value regression. Via multi-task learning, we are able to learn representations of entities, relations and attributes that encode information about both tasks. Moreover, such attributes are not only central to many predictive tasks as an information source but also as a prediction target. Therefore, models that are able to encode, incorporate and predict such information in a relational learning context are highly attractive as well. We show that our approach outperforms many state-of-the-art methods for the tasks of relational triplet classification and attribute value prediction.
Tasks	Knowledge Graphs, Multi-Task Learning, Relational Reasoning
Published	2017-08-16
URL	http://arxiv.org/abs/1708.04828v1
PDF	http://arxiv.org/pdf/1708.04828v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-neural-network-for-non-discrete
Repo	https://github.com/BaeSeulki/WhySoMuch
Framework	none

Order-Free RNN with Visual Attention for Multi-Label Classification


Title	Order-Free RNN with Visual Attention for Multi-Label Classification
Authors	Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, Yu-Chiang Frank Wang
Abstract	In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.
Tasks	Image Captioning, Multi-Label Classification
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05495v3
PDF	http://arxiv.org/pdf/1707.05495v3.pdf
PWC	https://paperswithcode.com/paper/order-free-rnn-with-visual-attention-for
Repo	https://github.com/EricYangsw/Multi-Label-Classification
Framework	tf

Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection


Title	Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection
Authors	Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn
Abstract	We present a novel approach for vanishing point detection from uncalibrated monocular images. In contrast to state-of-the-art, we make no a priori assumptions about the observed scene. Our method is based on a convolutional neural network (CNN) which does not use natural images, but a Gaussian sphere representation arising from an inverse gnomonic projection of lines detected in an image. This allows us to rely on synthetic data for training, eliminating the need for labelled images. Our method achieves competitive performance on three horizon estimation benchmark datasets. We further highlight some additional use cases for which our vanishing point detection algorithm can be used.
Tasks	Horizon Line Estimation
Published	2017-07-08
URL	http://arxiv.org/abs/1707.02427v2
PDF	http://arxiv.org/pdf/1707.02427v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-vanishing-point-detection
Repo	https://github.com/fkluger/Vanishing_Points_GCPR17
Framework	none

SLING: A framework for frame semantic parsing


Title	SLING: A framework for frame semantic parsing
Authors	Michael Ringgaard, Rahul Gupta, Fernando C. N. Pereira
Abstract	We describe SLING, a framework for parsing natural language into semantic frames. SLING supports general transition-based, neural-network parsing with bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. The parsing model is trained end-to-end using only the text tokens as input. The transition system has been designed to output frame graphs directly without any intervening symbolic representation. The SLING framework includes an efficient and scalable frame store implementation as well as a neural network JIT compiler for fast inference during parsing. SLING is implemented in C++ and it is available for download on GitHub.
Tasks	Semantic Parsing
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07032v1
PDF	http://arxiv.org/pdf/1710.07032v1.pdf
PWC	https://paperswithcode.com/paper/sling-a-framework-for-frame-semantic-parsing
Repo	https://github.com/google/sling
Framework	none

Class-Splitting Generative Adversarial Networks


Title	Class-Splitting Generative Adversarial Networks
Authors	Guillermo L. Grinblat, Lucas C. Uzal, Pablo M. Granitto
Abstract	Generative Adversarial Networks (GANs) produce systematically better quality samples when class label information is provided., i.e. in the conditional GAN setup. This is still observed for the recently proposed Wasserstein GAN formulation which stabilized adversarial training and allows considering high capacity network architectures such as ResNet. In this work we show how to boost conditional GAN by augmenting available class labels. The new classes come from clustering in the representation space learned by the same GAN model. The proposed strategy is also feasible when no class information is available, i.e. in the unsupervised setup. Our generated samples reach state-of-the-art Inception scores for CIFAR-10 and STL-10 datasets in both supervised and unsupervised setup.
Tasks	Conditional Image Generation, Image Generation
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07359v2
PDF	http://arxiv.org/pdf/1709.07359v2.pdf
PWC	https://paperswithcode.com/paper/class-splitting-generative-adversarial
Repo	https://github.com/CIFASIS/splitting_gan
Framework	tf

Deep Learning for IoT Big Data and Streaming Analytics: A Survey


Title	Deep Learning for IoT Big Data and Streaming Analytics: A Survey
Authors	Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, Mohsen Guizani
Abstract	In the era of the Internet of Things (IoT), an enormous amount of sensing devices collect and/or generate various sensory data over time for a wide range of fields and applications. Based on the nature of the application, these devices will result in big or fast/real-time data streams. Applying analytics over such data streams to discover new information, predict future insights, and make control decisions is a crucial process that makes IoT a worthy paradigm for businesses and a quality-of-life improving technology. In this paper, we provide a thorough overview on using a class of advanced machine learning techniques, namely Deep Learning (DL), to facilitate the analytics and learning in the IoT domain. We start by articulating IoT data characteristics and identifying two major treatments for IoT data from a machine learning perspective, namely IoT big data analytics and IoT streaming data analytics. We also discuss why DL is a promising approach to achieve the desired analytics in these types of data and applications. The potential of using emerging DL techniques for IoT data analytics are then discussed, and its promises and challenges are introduced. We present a comprehensive background on different DL architectures and algorithms. We also analyze and summarize major reported research attempts that leveraged DL in the IoT domain. The smart IoT devices that have incorporated DL in their intelligence background are also discussed. DL implementation approaches on the fog and cloud centers in support of IoT applications are also surveyed. Finally, we shed light on some challenges and potential directions for future research. At the end of each section, we highlight the lessons learned based on our experiments and review of the recent literature.
Tasks
Published	2017-12-09
URL	http://arxiv.org/abs/1712.04301v2
PDF	http://arxiv.org/pdf/1712.04301v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-iot-big-data-and-streaming
Repo	https://github.com/avinashbarnwal/Deep-Learning
Framework	pytorch

Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network


Title	Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network
Authors	Fangzhou Liao, Ming Liang, Zhe Li, Xiaolin Hu, Sen Song
Abstract	Automatic diagnosing lung cancer from Computed Tomography (CT) scans involves two steps: detect all suspicious lesions (pulmonary nodules) and evaluate the whole-lung/pulmonary malignancy. Currently, there are many studies about the first step, but few about the second step. Since the existence of nodule does not definitely indicate cancer, and the morphology of nodule has a complicated relationship with cancer, the diagnosis of lung cancer demands careful investigations on every suspicious nodule and integration of information of all nodules. We propose a 3D deep neural network to solve this problem. The model consists of two modules. The first one is a 3D region proposal network for nodule detection, which outputs all suspicious nodules for a subject. The second one selects the top five nodules based on the detection confidence, evaluates their cancer probabilities and combines them with a leaky noisy-or gate to obtain the probability of lung cancer for the subject. The two modules share the same backbone network, a modified U-net. The over-fitting caused by the shortage of training data is alleviated by training the two modules alternately. The proposed model won the first place in the Data Science Bowl 2017 competition. The code has been made publicly available.
Tasks	Computed Tomography (CT)
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08324v1
PDF	http://arxiv.org/pdf/1711.08324v1.pdf
PWC	https://paperswithcode.com/paper/evaluate-the-malignancy-of-pulmonary-nodules
Repo	https://github.com/Hydron063/Rage
Framework	pytorch

Hit Song Prediction for Pop Music by Siamese CNN with Ranking Loss


Title	Hit Song Prediction for Pop Music by Siamese CNN with Ranking Loss
Authors	Lang-Chi Yu, Yi-Hsuan Yang, Yun-Ning Hung, Yi-An Chen
Abstract	A model for hit song prediction can be used in the pop music industry to identify emerging trends and potential artists or songs before they are marketed to the public. While most previous work formulates hit song prediction as a regression or classification problem, we present in this paper a convolutional neural network (CNN) model that treats it as a ranking problem. Specifically, we use a commercial dataset with daily play-counts to train a multi-objective Siamese CNN model with Euclidean loss and pairwise ranking loss to learn from audio the relative ranking relations among songs. Besides, we devise a number of pair sampling methods according to some empirical observation of the data. Our experiment shows that the proposed model with a sampling method called A/B sampling leads to much higher accuracy in hit song prediction than the baseline regression model. Moreover, we can further improve the accuracy by using a neural attention mechanism to extract the highlights of songs and by using a separate CNN model to offer high-level features of songs.
Tasks
Published	2017-10-30
URL	http://arxiv.org/abs/1710.10814v1
PDF	http://arxiv.org/pdf/1710.10814v1.pdf
PWC	https://paperswithcode.com/paper/hit-song-prediction-for-pop-music-by-siamese
Repo	https://github.com/OckhamsRazor/HSP_CNN
Framework	none