October 17, 2019

3326 words 16 mins read

Paper Group ANR 767

The LORACs prior for VAEs: Letting the Trees Speak for the Data. Paragraph-based complex networks: application to document classification and authenticity verification. Statistical Learnability of Generalized Additive Models based on Total Variation Regularization. Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Ca …

The LORACs prior for VAEs: Letting the Trees Speak for the Data


Title	The LORACs prior for VAEs: Letting the Trees Speak for the Data
Authors	Sharad Vikram, Matthew D. Hoffman, Matthew J. Johnson
Abstract	In variational autoencoders, the prior on the latent codes $z$ is often treated as an afterthought, but the prior shapes the kind of latent representation that the model learns. If the goal is to learn a representation that is interpretable and useful, then the prior should reflect the ways in which the high-level factors that describe the data vary. The “default” prior is an isotropic normal, but if the natural factors of variation in the dataset exhibit discrete structure or are not independent, then the isotropic-normal prior will actually encourage learning representations that mask this structure. To alleviate this problem, we propose using a flexible Bayesian nonparametric hierarchical clustering prior based on the time-marginalized coalescent (TMC). To scale learning to large datasets, we develop a new inducing-point approximation and inference algorithm. We then apply the method without supervision to several datasets and examine the interpretability and practical performance of the inferred hierarchies and learned latent space.
Tasks
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06891v1
PDF	http://arxiv.org/pdf/1810.06891v1.pdf
PWC	https://paperswithcode.com/paper/the-loracs-prior-for-vaes-letting-the-trees
Repo
Framework

Paragraph-based complex networks: application to document classification and authenticity verification


Title	Paragraph-based complex networks: application to document classification and authenticity verification
Authors	Henrique F. de Arruda, Vanessa Q. Marinho, Luciano da F. Costa, Diego R. Amancio
Abstract	With the increasing number of texts made available on the Internet, many applications have relied on text mining tools to tackle a diversity of problems. A relevant model to represent texts is the so-called word adjacency (co-occurrence) representation, which is known to capture mainly syntactical features of texts.In this study, we introduce a novel network representation that considers the semantic similarity between paragraphs. Two main properties of paragraph networks are considered: (i) their ability to incorporate characteristics that can discriminate real from artificial, shuffled manuscripts and (ii) their ability to capture syntactical and semantic textual features. Our results revealed that real texts are organized into communities, which turned out to be an important feature for discriminating them from artificial texts. Interestingly, we have also found that, differently from traditional co-occurrence networks, the adopted representation is able to capture semantic features. Additionally, the proposed framework was employed to analyze the Voynich manuscript, which was found to be compatible with texts written in natural languages. Taken together, our findings suggest that the proposed methodology can be combined with traditional network models to improve text classification tasks.
Tasks	Document Classification, Semantic Similarity, Semantic Textual Similarity, Text Classification
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08467v1
PDF	http://arxiv.org/pdf/1806.08467v1.pdf
PWC	https://paperswithcode.com/paper/paragraph-based-complex-networks-application
Repo
Framework

Statistical Learnability of Generalized Additive Models based on Total Variation Regularization


Title	Statistical Learnability of Generalized Additive Models based on Total Variation Regularization
Authors	Shin Matsushima
Abstract	A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a nonparametric model by the sum of univariate functions with respect to each explanatory variable, i.e., $f({\mathbf x}) = \sum f_j(x_j)$, where $x_j\in\mathbb{R}$ is $j$-th component of a sample ${\mathbf x}\in \mathbb{R}^p$. In this paper, we introduce the total variation (TV) of a function as a measure of the complexity of functions in $L^1_{\rm c}(\mathbb{R})$-space. Our analysis shows that a GAM based on TV-regularization exhibits a Rademacher complexity of $O(\sqrt{\frac{\log p}{m}})$, which is tight in terms of both $m$ and $p$ in the agnostic case of the classification problem. In result, we obtain generalization error bounds for finite samples according to work by Bartlett and Mandelson (2002).
Tasks
Published	2018-02-08
URL	http://arxiv.org/abs/1802.03001v2
PDF	http://arxiv.org/pdf/1802.03001v2.pdf
PWC	https://paperswithcode.com/paper/statistical-learnability-of-generalized
Repo
Framework

Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search


Title	Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
Authors	Karl Kurzer, Chenyang Zhou, J. Marius Zöllner
Abstract	Today’s automated vehicles lack the ability to cooperate implicitly with others. This work presents a Monte Carlo Tree Search (MCTS) based approach for decentralized cooperative planning using macro-actions for automated vehicles in heterogeneous environments. Based on cooperative modeling of other agents and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the state-action-values of each agent in a cooperative and decentralized manner, explicitly modeling the interdependence of actions between traffic participants. Macro-actions allow for temporal extension over multiple time steps and increase the effective search depth requiring fewer iterations to plan over longer horizons. Without predefined policies for macro-actions, the algorithm simultaneously learns policies over and within macro-actions. The proposed method is evaluated under several conflict scenarios, showing that the algorithm can achieve effective cooperative planning with learned macro-actions in heterogeneous environments.
Tasks
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09530v1
PDF	http://arxiv.org/pdf/1807.09530v1.pdf
PWC	https://paperswithcode.com/paper/decentralized-cooperative-planning-for
Repo
Framework

Deep learning in agriculture: A survey


Title	Deep learning in agriculture: A survey
Authors	Andreas Kamilaris, Francesc X. Prenafeta-Boldu
Abstract	Deep learning constitutes a recent, modern technique for image processing and data analysis, with promising results and large potential. As deep learning has been successfully applied in various domains, it has recently entered also the domain of agriculture. In this paper, we perform a survey of 40 research efforts that employ deep learning techniques, applied to various agricultural and food production challenges. We examine the particular agricultural problems under study, the specific models and frameworks employed, the sources, nature and pre-processing of data used, and the overall performance achieved according to the metrics used at each work under study. Moreover, we study comparisons of deep learning with other existing popular techniques, in respect to differences in classification or regression performance. Our findings indicate that deep learning provides high accuracy, outperforming existing commonly used image processing techniques.
Tasks
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11809v1
PDF	http://arxiv.org/pdf/1807.11809v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-in-agriculture-a-survey
Repo
Framework

Training for ‘Unstable’ CNN Accelerator:A Case Study on FPGA


Title	Training for ‘Unstable’ CNN Accelerator:A Case Study on FPGA
Authors	KouZi Xing
Abstract	With the great advancements of convolution neural networks(CNN), CNN accelerators are increasingly developed and deployed in the major computing systems.To make use of the CNN accelerators, CNN models are trained via the off-line training systems such as Caffe, Pytorch and Tensorflow on multi-core CPUs and GPUs first and then compiled to the target accelerators. Although the two-step process seems to be natural and has been widely applied, it assumes that the accelerators’ behavior can be fully modeled on CPUs and GPUs. This does not hold true and the behavior of the CNN accelerators is un-deterministic when the circuit works at ‘unstable’ mode when it is overclocked or is affected by the environment like fault-prone aerospace. The exact behaviors of the accelerators are determined by both the chip fabrication and the working environment or status. In this case, applying the conventional off-line training result to the accelerators directly may lead to considerable accuracy loss. To address this problem, we propose to train for the ‘unstable’ CNN accelerator and have the ‘un-determined behavior’ learned together with the data in the same framework. Basically, it starts from the off-line trained model and then integrates the uncertain circuit behaviors into the CNN models through additional accelerator-specific training. The fine-tuned training makes the CNN models less sensitive to the circuit uncertainty. We apply the design method to both an overclocked CNN accelerator and a faulty accelerator. According to our experiments on a subset of ImageNet, the accelerator-specific training can improve the top 5 accuracy up to 3.4% and 2.4% on average when the CNN accelerator is at extreme overclocking. When the accelerator is exposed to a faulty environment, the top 5 accuracy improves up to 6.8% and 4.28% on average under the most severe fault injection.
Tasks
Published	2018-12-02
URL	http://arxiv.org/abs/1812.01689v1
PDF	http://arxiv.org/pdf/1812.01689v1.pdf
PWC	https://paperswithcode.com/paper/training-for-unstable-cnn-acceleratora-case
Repo
Framework

Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval


Title	Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval
Authors	Haotian Zhang, Gordon V. Cormack, Maura R. Grossman, Mark D. Smucker
Abstract	This study uses a novel simulation framework to evaluate whether the time and effort necessary to achieve high recall using active learning is reduced by presenting the reviewer with isolated sentences, as opposed to full documents, for relevance feedback. Under the weak assumption that more time and effort is required to review an entire document than a single sentence, simulation results indicate that the use of isolated sentences for relevance feedback can yield comparable accuracy and higher efficiency, relative to the state-of-the-art Baseline Model Implementation (BMI) of the AutoTAR Continuous Active Learning (“CAL”) method employed in the TREC 2015 and 2016 Total Recall Track.
Tasks	Active Learning, Information Retrieval
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08988v2
PDF	http://arxiv.org/pdf/1803.08988v2.pdf
PWC	https://paperswithcode.com/paper/evaluating-sentence-level-relevance-feedback
Repo
Framework

Time Perception Machine: Temporal Point Processes for the When, Where and What of Activity Prediction


Title	Time Perception Machine: Temporal Point Processes for the When, Where and What of Activity Prediction
Authors	Yatao Zhong, Bicheng Xu, Guang-Tong Zhou, Luke Bornn, Greg Mori
Abstract	Numerous powerful point process models have been developed to understand temporal patterns in sequential data from fields such as health-care, electronic commerce, social networks, and natural disaster forecasting. In this paper, we develop novel models for learning the temporal distribution of human activities in streaming data (e.g., videos and person trajectories). We propose an integrated framework of neural networks and temporal point processes for predicting when the next activity will happen. Because point processes are limited to taking event frames as input, we propose a simple yet effective mechanism to extract features at frames of interest while also preserving the rich information in the remaining frames. We evaluate our model on two challenging datasets. The results show that our model outperforms traditional statistical point process approaches significantly, demonstrating its effectiveness in capturing the underlying temporal dynamics as well as the correlation within sequential activities. Furthermore, we also extend our model to a joint estimation framework for predicting the timing, spatial location, and category of the activity simultaneously, to answer the when, where, and what of activity prediction.
Tasks	Activity Prediction, Point Processes
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04063v2
PDF	http://arxiv.org/pdf/1808.04063v2.pdf
PWC	https://paperswithcode.com/paper/time-perception-machine-temporal-point
Repo
Framework

Brain Tumor Segmentation using an Ensemble of 3D U-Nets and Overall Survival Prediction using Radiomic Features


Title	Brain Tumor Segmentation using an Ensemble of 3D U-Nets and Overall Survival Prediction using Radiomic Features
Authors	Xue Feng, Nicholas Tustison, Craig Meyer
Abstract	Accurate segmentation of different sub-regions of gliomas including peritumoral edema, necrotic core, enhancing and non-enhancing tumor core from multimodal MRI scans has important clinical relevance in diagnosis, prognosis and treatment of brain tumors. However, due to the highly heterogeneous appearance and shape, segmentation of the sub-regions is very challenging. Recent development using deep learning models has proved its effectiveness in the past several brain segmentation challenges as well as other semantic and medical image segmentation problems. Most models in brain tumor segmentation use a 2D/3D patch to predict the class label for the center voxel and variant patch sizes and scales are used to improve the model performance. However, it has low computation efficiency and also has limited receptive field. U-Net is a widely used network structure for end-to-end segmentation and can be used on the entire image or extracted patches to provide classification labels over the entire input voxels so that it is more efficient and expect to yield better performance with larger input size. Furthermore, instead of picking the best network structure, an ensemble of multiple models, trained on different dataset or different hyper-parameters, can generally improve the segmentation performance. In this study we propose to use an ensemble of 3D U-Nets with different hyper-parameters for brain tumor segmentation. Preliminary results showed effectiveness of this model. In addition, we developed a linear model for survival prediction using extracted imaging and non-imaging features, which, despite the simplicity, can effectively reduce overfitting and regression errors.
Tasks	Brain Segmentation, Brain Tumor Segmentation, Medical Image Segmentation, Semantic Segmentation
Published	2018-12-03
URL	http://arxiv.org/abs/1812.01049v1
PDF	http://arxiv.org/pdf/1812.01049v1.pdf
PWC	https://paperswithcode.com/paper/brain-tumor-segmentation-using-an-ensemble-of
Repo
Framework

Knowing what you know in brain segmentation using Bayesian deep neural networks


Title	Knowing what you know in brain segmentation using Bayesian deep neural networks
Authors	Patrick McClure, Nao Rho, John A. Lee, Jakub R. Kaczmarzyk, Charles Zheng, Satrajit S. Ghosh, Dylan Nielson, Adam G. Thomas, Peter Bandettini, Francisco Pereira
Abstract	In this paper, we describe a Bayesian deep neural network (DNN) for predicting FreeSurfer segmentations of structural MRI volumes, in minutes rather than hours. The network was trained and evaluated on a large dataset (n = 11,480), obtained by combining data from more than a hundred different sites, and also evaluated on another completely held-out dataset (n = 418). The network was trained using a novel spike-and-slab dropout-based variational inference approach. We show that, on these datasets, the proposed Bayesian DNN outperforms previously proposed methods, in terms of the similarity between the segmentation predictions and the FreeSurfer labels, and the usefulness of the estimate uncertainty of these predictions. In particular, we demonstrated that the prediction uncertainty of this network at each voxel is a good indicator of whether the network has made an error and that the uncertainty across the whole brain can predict the manual quality control ratings of a scan. The proposed Bayesian DNN method should be applicable to any new network architecture for addressing the segmentation problem.
Tasks	Brain Segmentation
Published	2018-12-03
URL	https://arxiv.org/abs/1812.01719v5
PDF	https://arxiv.org/pdf/1812.01719v5.pdf
PWC	https://paperswithcode.com/paper/knowing-what-you-know-in-brain-segmentation
Repo
Framework

Lesion segmentation using U-Net network


Title	Lesion segmentation using U-Net network
Authors	Adrien Motsch, Sebastien Motsch, Thibaut Saguet
Abstract	This paper explains the method used in the segmentation challenge (Task 1) in the International Skin Imaging Collaboration’s (ISIC) Skin Lesion Analysis Towards Melanoma Detection challenge held in 2018. We have trained a U-Net network to perform the segmentation. The key elements for the training were first to adjust the loss function to incorporate unbalanced proportion of background and second to perform post-processing operation to adjust the contour of the prediction.
Tasks	Lesion Segmentation
Published	2018-07-23
URL	http://arxiv.org/abs/1807.08844v1
PDF	http://arxiv.org/pdf/1807.08844v1.pdf
PWC	https://paperswithcode.com/paper/lesion-segmentation-using-u-net-network
Repo
Framework

Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra


Title	Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra
Authors	Toru Nakashika, Shinji Takaki, Junichi Yamagishi
Abstract	This paper describes a novel energy-based probabilistic distribution that represents complex-valued data and explains how to apply it to direct feature extraction from complex-valued spectra. The proposed model, the complex-valued restricted Boltzmann machine (CRBM), is designed to deal with complex-valued visible units as an extension of the well-known restricted Boltzmann machine (RBM). Like the RBM, the CRBM learns the relationships between visible and hidden units without having connections between units in the same layer, which dramatically improves training efficiency by using Gibbs sampling or contrastive divergence (CD). Another important characteristic is that the CRBM also has connections between real and imaginary parts of each of the complex-valued visible units that help represent the data distribution in the complex domain. In speech signal processing, classification and generation features are often based on amplitude spectra (e.g., MFCC, cepstra, and mel-cepstra) even if they are calculated from complex spectra, and they ignore phase information. In contrast, the proposed feature extractor using the CRBM directly encodes the complex spectra (or another complex-valued representation of the complex spectra) into binary-valued latent features (hidden units). Since the visible-hidden connections are undirected, we can also recover (decode) the complex spectra from the latent features directly. Our speech coding experiments demonstrated that the CRBM outperformed other speech coding methods, such as methods using the conventional RBM, the mel-log spectrum approximate (MLSA) decoder, etc.
Tasks
Published	2018-03-27
URL	http://arxiv.org/abs/1803.09946v1
PDF	http://arxiv.org/pdf/1803.09946v1.pdf
PWC	https://paperswithcode.com/paper/complex-valued-restricted-boltzmann-machine
Repo
Framework

Fully Convolutional Networks and Generative Adversarial Networks Applied to Sclera Segmentation


Title	Fully Convolutional Networks and Generative Adversarial Networks Applied to Sclera Segmentation
Authors	Diego R. Lucio, Rayson Laroca, Evair Severo, Alceu S. Britto Jr., David Menotti
Abstract	Due to the world’s demand for security systems, biometrics can be seen as an important topic of research in computer vision. One of the biometric forms that has been gaining attention is the recognition based on sclera. The initial and paramount step for performing this type of recognition is the segmentation of the region of interest, i.e. the sclera. In this context, two approaches for such task based on the Fully Convolutional Network (FCN) and on Generative Adversarial Network (GAN) are introduced in this work. FCN is similar to a common convolution neural network, however the fully connected layers (i.e., the classification layers) are removed from the end of the network and the output is generated by combining the output of pooling layers from different convolutional ones. The GAN is based on the game theory, where we have two networks competing with each other to generate the best segmentation. In order to perform fair comparison with baselines and quantitative and objective evaluations of the proposed approaches, we provide to the scientific community new 1,300 manually segmented images from two databases. The experiments are performed on the UBIRIS.v2 and MICHE databases and the best performing configurations of our propositions achieved F-score’s measures of 87.48% and 88.32%, respectively.
Tasks
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08722v3
PDF	http://arxiv.org/pdf/1806.08722v3.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-networks-and-generative
Repo
Framework

A neural interlingua for multilingual machine translation


Title	A neural interlingua for multilingual machine translation
Authors	Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, Jason Sun
Abstract	We incorporate an explicit neural interlingua into a multilingual encoder-decoder neural machine translation (NMT) architecture. We demonstrate that our model learns a language-independent representation by performing direct zero-shot translation (without using pivot translation), and by using the source sentence embeddings to create an English Yelp review classifier that, through the mediation of the neural interlingua, can also classify French and German reviews. Furthermore, we show that, despite using a smaller number of parameters than a pairwise collection of bilingual NMT models, our approach produces comparable BLEU scores for each language pair in WMT15.
Tasks	Machine Translation, Sentence Embeddings
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08198v3
PDF	http://arxiv.org/pdf/1804.08198v3.pdf
PWC	https://paperswithcode.com/paper/a-neural-interlingua-for-multilingual-machine
Repo
Framework

The Method of Multimodal MRI Brain Image Segmentation Based on Differential Geometric Features


Title	The Method of Multimodal MRI Brain Image Segmentation Based on Differential Geometric Features
Authors	Yongpei Zhu, Zicong Zhou, Guojun Liao, Qianxi Yang, Kehong Yuan
Abstract	Accurate segmentation of brain tissue in magnetic resonance images (MRI) is a diffcult task due to different types of brain abnormalities. Using information and features from multimodal MRI including T1, T1-weighted inversion recovery (T1-IR) and T2-FLAIR and differential geometric features including the Jacobian determinant(JD) and the curl vector(CV) derived from T1 modality can result in a more accurate analysis of brain images. In this paper, we use the differential geometric information including JD and CV as image characteristics to measure the differences between different MRI images, which represent local size changes and local rotations of the brain image, and we can use them as one CNN channel with other three modalities (T1-weighted, T1-IR and T2-FLAIR) to get more accurate results of brain segmentation. We test this method on two datasets including IBSR dataset and MRBrainS datasets based on the deep voxelwise residual network, namely VoxResNet, and obtain excellent improvement over single modality or three modalities and increases average DSC(Cerebrospinal Fluid (CSF), Gray Matter (GM) and White Matter (WM)) by about 1.5% on the well-known MRBrainS18 dataset and about 2.5% on the IBSR dataset. Moreover, we discuss that one modality combined with its JD or CV information can replace the segmentation effect of three modalities, which can provide medical conveniences for doctor to diagnose because only to extract T1-modality MRI image of patients. Finally, we also compare the segmentation performance of our method in two networks, VoxResNet and U-Net network. The results show VoxResNet has a better performance than U-Net network with our method in brain MRI segmentation. We believe the proposed method can advance the performance in brain segmentation and clinical diagnosis.
Tasks	Brain Image Segmentation, Brain Segmentation, Semantic Segmentation
Published	2018-11-10
URL	http://arxiv.org/abs/1811.04281v5
PDF	http://arxiv.org/pdf/1811.04281v5.pdf
PWC	https://paperswithcode.com/paper/the-method-of-multimodal-mri-brain-image
Repo
Framework