May 7, 2019

2785 words 14 mins read

Paper Group ANR 108

Paper Group ANR 108

An Empirical Study on Academic Commentary and Its Implications on Reading and Writing. Fast Training of Convolutional Neural Networks via Kernel Rescaling. Fusion of EEG and Musical Features in Continuous Music-emotion Recognition. Multi-pretrained Deep Neural Network. Interdependent Scheduling Games. Dense Image Representation with Spatial Pyramid …

An Empirical Study on Academic Commentary and Its Implications on Reading and Writing

Title An Empirical Study on Academic Commentary and Its Implications on Reading and Writing
Authors Tai Wang, Xiangen Hu, Keith Shubeck, Zhiqiang Cai, Jie Tang
Abstract The relationship between reading and writing (RRW) is one of the major themes in learning science. One of its obstacles is that it is difficult to define or measure the latent background knowledge of the individual. However, in an academic research setting, scholars are required to explicitly list their background knowledge in the citation sections of their manuscripts. This unique opportunity was taken advantage of to observe RRW, especially in the published academic commentary scenario. RRW was visualized under a proposed topic process model by using a state of the art version of latent Dirichlet allocation (LDA). The empirical study showed that the academic commentary is modulated both by its target paper and the author’s background knowledge. Although this conclusion was obtained in a unique environment, we suggest its implications can also shed light on other similar interesting areas, such as dialog and conversation, group discussion, and social media.
Published 2016-02-12

Fast Training of Convolutional Neural Networks via Kernel Rescaling

Title Fast Training of Convolutional Neural Networks via Kernel Rescaling
Authors Pedro Porto Buarque de Gusmão, Gianluca Francini, Skjalg Lepsøy, Enrico Magli
Abstract Training deep Convolutional Neural Networks (CNN) is a time consuming task that may take weeks to complete. In this article we propose a novel, theoretically founded method for reducing CNN training time without incurring any loss in accuracy. The basic idea is to begin training with a pre-train network using lower-resolution kernels and input images, and then refine the results at the full resolution by exploiting the spatial scaling property of convolutions. We apply our method to the ImageNet winner OverFeat and to the more recent ResNet architecture and show a reduction in training time of nearly 20% while test set accuracy is preserved in both cases.
Published 2016-10-12

Fusion of EEG and Musical Features in Continuous Music-emotion Recognition

Title Fusion of EEG and Musical Features in Continuous Music-emotion Recognition
Authors Nattapong Thammasan, Ken-ichi Fukui, Masayuki Numao
Abstract Emotion estimation in music listening is confronting challenges to capture the emotion variation of listeners. Recent years have witnessed attempts to exploit multimodality fusing information from musical contents and physiological signals captured from listeners to improve the performance of emotion recognition. In this paper, we present a study of fusion of signals of electroencephalogram (EEG), a tool to capture brainwaves at a high-temporal resolution, and musical features at decision level in recognizing the time-varying binary classes of arousal and valence. Our empirical results showed that the fusion could outperform the performance of emotion recognition using only EEG modality that was suffered from inter-subject variability, and this suggested the promise of multimodal fusion in improving the accuracy of music-emotion recognition.
Tasks EEG, Emotion Recognition, Music Emotion Recognition
Published 2016-11-30

Multi-pretrained Deep Neural Network

Title Multi-pretrained Deep Neural Network
Authors Zhen Hu, Zhuyin Xue, Tong Cui, Shiqiang Zong, Chenglong He
Abstract Pretraining is widely used in deep neutral network and one of the most famous pretraining models is Deep Belief Network (DBN). The optimization formulas are different during the pretraining process for different pretraining models. In this paper, we pretrained deep neutral network by different pretraining models and hence investigated the difference between DBN and Stacked Denoising Autoencoder (SDA) when used as pretraining model. The experimental results show that DBN get a better initial model. However the model converges to a relatively worse model after the finetuning process. Yet after pretrained by SDA for the second time the model converges to a better model if finetuned.
Tasks Denoising
Published 2016-06-02

Interdependent Scheduling Games

Title Interdependent Scheduling Games
Authors Andres Abeliuk, Haris Aziz, Gerardo Berbeglia, Serge Gaspers, Petr Kalina, Nicholas Mattei, Dominik Peters, Paul Stursberg, Pascal Van Hentenryck, Toby Walsh
Abstract We propose a model of interdependent scheduling games in which each player controls a set of services that they schedule independently. A player is free to schedule his own services at any time; however, each of these services only begins to accrue reward for the player when all predecessor services, which may or may not be controlled by the same player, have been activated. This model, where players have interdependent services, is motivated by the problems faced in planning and coordinating large-scale infrastructures, e.g., restoring electricity and gas to residents after a natural disaster or providing medical care in a crisis when different agencies are responsible for the delivery of staff, equipment, and medicine. We undertake a game-theoretic analysis of this setting and in particular consider the issues of welfare maximization, computing best responses, Nash dynamics, and existence and computation of Nash equilibria.
Published 2016-05-31

Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning

Title Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
Authors Andrew Shin, Masataka Yamaguchi, Katsunori Ohnishi, Tatsuya Harada
Abstract The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey information on local, secondary elements. We propose to incorporate coding with vector of locally aggregated descriptors (VLAD) on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the local information of the images. Our results show that our method of compact VLAD coding can match CNN features with as little as 3% of dimensionality and, when combined with spatial pyramid, it results in image captions that more accurately take local elements into account.
Tasks Image Captioning
Published 2016-03-30

A New Theoretical and Technological System of Imprecise-Information Processing

Title A New Theoretical and Technological System of Imprecise-Information Processing
Authors Shiyou Lian
Abstract Imprecise-information processing will play an indispensable role in intelligent systems, especially in the anthropomorphic intelligent systems (as intelligent robots). A new theoretical and technological system of imprecise-information processing has been founded in Principles of Imprecise-Information Processing: A New Theoretical and Technological System[1] which is different from fuzzy technology. The system has clear hierarchy and rigorous structure, which results from the formation principle of imprecise information and has solid mathematical and logical bases, and which has many advantages beyond fuzzy technology. The system provides a technological platform for relevant applications and lays a theoretical foundation for further research.
Published 2016-10-10

Deep Outdoor Illumination Estimation

Title Deep Outdoor Illumination Estimation
Authors Yannick Hold-Geoffroy, Kalyan Sunkavalli, Sunil Hadap, Emiliano Gambaretto, Jean-François Lalonde
Abstract We present a CNN-based technique to estimate high-dynamic range outdoor illumination from a single low dynamic range image. To train the CNN, we leverage a large dataset of outdoor panoramas. We fit a low-dimensional physically-based outdoor illumination model to the skies in these panoramas giving us a compact set of parameters (including sun position, atmospheric conditions, and camera parameters). We extract limited field-of-view images from the panoramas, and train a CNN with this large set of input image–output lighting parameter pairs. Given a test image, this network can be used to infer illumination parameters that can, in turn, be used to reconstruct an outdoor illumination environment map. We demonstrate that our approach allows the recovery of plausible illumination conditions and enables photorealistic virtual object insertion from a single image. An extensive evaluation on both the panorama dataset and captured HDR environment maps shows that our technique significantly outperforms previous solutions to this problem.
Tasks Outdoor Light Source Estimation
Published 2016-11-19

Demographical Priors for Health Conditions Diagnosis Using Medicare Data

Title Demographical Priors for Health Conditions Diagnosis Using Medicare Data
Authors Fahad Alhasoun, May Alhazzani, Marta C. González
Abstract This paper presents an example of how demographical characteristics of patients influence their susceptibility to certain medical conditions. In this paper, we investigate the association of health conditions to age of patients in a heterogeneous population. We show that besides the symptoms a patients is having, the age has the potential of aiding the diagnostic process in hospitals. Working with Electronic Health Records (EHR), we show that medical conditions group into clusters that share distinctive population age densities. We use Electronic Health Records from Brazil for a period of 15 months from March of 2013 to July of 2014. The number of patients in the data is 1.7 million patients and the number of records is 47 million records. The findings has the potential of helping in a setting where an automated system undergoes the task of predicting the condition of a patient given their symptoms and demographical information.
Published 2016-12-07

Deep image mining for diabetic retinopathy screening

Title Deep image mining for diabetic retinopathy screening
Authors Gwenolé Quellec, Katia Charrière, Yassine Boudi, Béatrice Cochener, Mathieu Lamard
Abstract Deep learning is quickly becoming the leading methodology for medical image analysis. Given a large medical archive, where each image is associated with a diagnosis, efficient pathology detectors or classifiers can be trained with virtually no expert knowledge about the target pathologies. However, deep learning algorithms, including the popular ConvNets, are black boxes: little is known about the local patterns analyzed by ConvNets to make a decision at the image level. A solution is proposed in this paper to create heatmaps showing which pixels in images play a role in the image-level predictions. In other words, a ConvNet trained for image-level classification can be used to detect lesions as well. A generalization of the backpropagation method is proposed in order to train ConvNets that produce high-quality heatmaps. The proposed solution is applied to diabetic retinopathy (DR) screening in a dataset of almost 90,000 fundus photographs from the 2015 Kaggle Diabetic Retinopathy competition and a private dataset of almost 110,000 photographs (e-ophtha). For the task of detecting referable DR, very good detection performance was achieved: $A_z = 0.954$ in Kaggle’s dataset and $A_z = 0.949$ in e-ophtha. Performance was also evaluated at the image level and at the lesion level in the DiaretDB1 dataset, where four types of lesions are manually segmented: microaneurysms, hemorrhages, exudates and cotton-wool spots. The proposed detector outperforms recent algorithms trained to detect those lesions specifically, as well as competing heatmap generation algorithms for ConvNets. This detector is part of the Messidor system for mobile eye pathology screening. Because it does not rely on expert knowledge or manual segmentation for detecting relevant patterns, the proposed solution is a promising image mining tool, which has the potential to discover new biomarkers in images.
Published 2016-10-22

Automatic Construction of Discourse Corpora for Dialogue Translation

Title Automatic Construction of Discourse Corpora for Dialogue Translation
Authors Longyue Wang, Xiaojun Zhang, Zhaopeng Tu, Andy Way, Qun Liu
Abstract In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation.
Tasks Information Retrieval, Language Modelling, Machine Translation
Published 2016-05-22

Asynchronous Stochastic Block Coordinate Descent with Variance Reduction

Title Asynchronous Stochastic Block Coordinate Descent with Variance Reduction
Authors Bin Gu, Zhouyuan Huo, Heng Huang
Abstract Asynchronous parallel implementations for stochastic optimization have received huge successes in theory and practice recently. Asynchronous implementations with lock-free are more efficient than the one with writing or reading lock. In this paper, we focus on a composite objective function consisting of a smooth convex function $f$ and a block separable convex function, which widely exists in machine learning and computer vision. We propose an asynchronous stochastic block coordinate descent algorithm with the accelerated technology of variance reduction (AsySBCDVR), which are with lock-free in the implementation and analysis. AsySBCDVR is particularly important because it can scale well with the sample size and dimension simultaneously. We prove that AsySBCDVR achieves a linear convergence rate when the function $f$ is with the optimal strong convexity property, and a sublinear rate when $f$ is with the general convexity. More importantly, a near-linear speedup on a parallel system with shared memory can be obtained.
Tasks Stochastic Optimization
Published 2016-10-29

Should Algorithms for Random SAT and Max-SAT be Different?

Title Should Algorithms for Random SAT and Max-SAT be Different?
Authors Sixue Liu, Gerard de Melo
Abstract We analyze to what extent the random SAT and Max-SAT problems differ in their properties. Our findings suggest that for random $k$-CNF with ratio in a certain range, Max-SAT can be solved by any SAT algorithm with subexponential slowdown, while for formulae with ratios greater than some constant, algorithms under the random walk framework require substantially different heuristics. In light of these results, we propose a novel probabilistic approach for random Max-SAT called ProMS. Experimental results illustrate that ProMS outperforms many state-of-the-art local search solvers on random Max-SAT benchmarks.
Published 2016-10-03

Bridging Neural Machine Translation and Bilingual Dictionaries

Title Bridging Neural Machine Translation and Bilingual Dictionaries
Authors Jiajun Zhang, Chengqing Zong
Abstract Neural Machine Translation (NMT) has become the new state-of-the-art in several language pairs. However, it remains a challenging problem how to integrate NMT with a bilingual dictionary which mainly contains words rarely or never seen in the bilingual training data. In this paper, we propose two methods to bridge NMT and the bilingual dictionaries. The core idea behind is to design novel models that transform the bilingual dictionaries into adequate sentence pairs, so that NMT can distil latent bilingual mappings from the ample and repetitive phenomena. One method leverages a mixed word/character model and the other attempts at synthesizing parallel sentences guaranteeing massive occurrence of the translation lexicon. Extensive experiments demonstrate that the proposed methods can remarkably improve the translation quality, and most of the rare words in the test sentences can obtain correct translations if they are covered by the dictionary.
Tasks Machine Translation
Published 2016-10-24

Semantic Image Based Geolocation Given a Map

Title Semantic Image Based Geolocation Given a Map
Authors Arsalan Mousavian, Jana Kosecka
Abstract The problem visual place recognition is commonly used strategy for localization. Most successful appearance based methods typically rely on a large database of views endowed with local or global image descriptors and strive to retrieve the views of the same location. The quality of the results is often affected by the density of the reference views and the robustness of the image representation with respect to viewpoint variations, clutter and seasonal changes. In this work we present an approach for geo-locating a novel view and determining camera location and orientation using a map and a sparse set of geo-tagged reference views. We propose a novel technique for detection and identification of building facades from geo-tagged reference view using the map and geometry of the building facades. We compute the likelihood of camera location and orientation of the query images using the detected landmark (building) identities from reference views, 2D map of the environment, and geometry of building facades. We evaluate our approach for building identification and geo-localization on a new challenging outdoors urban dataset exhibiting large variations in appearance and viewpoint.
Tasks Visual Place Recognition
Published 2016-09-01
comments powered by Disqus