July 29, 2019

2958 words 14 mins read

Paper Group AWR 193

Paper Group AWR 193

Brain Extraction from Normal and Pathological Images: A Joint PCA/Image-Reconstruction Approach. Deep Image Harmonization. Semi-supervised sequence tagging with bidirectional language models. Implicit Weight Uncertainty in Neural Networks. Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition. The Impact of Random Models on C …

Brain Extraction from Normal and Pathological Images: A Joint PCA/Image-Reconstruction Approach

Title Brain Extraction from Normal and Pathological Images: A Joint PCA/Image-Reconstruction Approach
Authors Xu Han, Roland Kwitt, Stephen Aylward, Spyridon Bakas, Bjoern Menze, Alexander Asturias, Paul Vespa, John Van Horn, Marc Niethammer
Abstract Brain extraction from images is a common pre-processing step. Many approaches exist, but they are frequently only designed to perform brain extraction from images without strong pathologies. Extracting the brain from images with strong pathologies, for example, the presence of a tumor or of a traumatic brain injury, is challenging. In such cases, tissue appearance may deviate from normal tissue and violates algorithmic assumptions for these approaches; hence, the brain may not be correctly extracted. This paper proposes a brain extraction approach which can explicitly account for pathologies by jointly modeling normal tissue and pathologies. Specifically, our model uses a three-part image decomposition: (1) normal tissue appearance is captured by principal component analysis, (2) pathologies are captured via a total variation term, and (3) non-brain tissue is captured by a sparse term. Decomposition and image registration steps are alternated to allow statistical modeling in a fixed atlas space. As a beneficial side effect, the model allows for the identification of potential pathologies and the reconstruction of a quasi-normal image in atlas space. We demonstrate the effectiveness of our method on four datasets: the IBSR and LPBA40 datasets which show normal images, the BRATS dataset containing images with brain tumors and a dataset containing clinical TBI images. We compare the performance with other popular models: ROBEX, BEaST, MASS, BET, BSE and a recently proposed deep learning approach. Our model performs better than these competing methods on all four datasets. Specifically, our model achieves the best median (97.11) and mean (96.88) Dice scores over all datasets. The two best performing competitors, ROBEX and MASS, achieve scores of 96.23/95.62 and 96.67/94.25 respectively. Hence, our approach is an effective method for high quality brain extraction on a wide variety of images.
Tasks Image Reconstruction, Image Registration
Published 2017-11-15
URL http://arxiv.org/abs/1711.05702v2
PDF http://arxiv.org/pdf/1711.05702v2.pdf
PWC https://paperswithcode.com/paper/brain-extraction-from-normal-and-pathological
Repo https://github.com/uncbiag/pstrip
Framework none

Deep Image Harmonization

Title Deep Image Harmonization
Authors Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
Abstract Compositing is one of the most common operations in photo editing. To generate realistic composites, the appearances of foreground and background need to be adjusted to make them compatible. Previous approaches to harmonize composites have focused on learning statistical relationships between hand-crafted appearance features of the foreground and background, which is unreliable especially when the contents in the two layers are vastly different. In this work, we propose an end-to-end deep convolutional neural network for image harmonization, which can capture both the context and semantic information of the composite images during harmonization. We also introduce an efficient way to collect large-scale and high-quality training data that can facilitate the training process. Experiments on the synthesized dataset and real composite images show that the proposed network outperforms previous state-of-the-art methods.
Tasks
Published 2017-02-28
URL http://arxiv.org/abs/1703.00069v1
PDF http://arxiv.org/pdf/1703.00069v1.pdf
PWC https://paperswithcode.com/paper/deep-image-harmonization
Repo https://github.com/wasidennis/DeepHarmonization
Framework caffe2

Semi-supervised sequence tagging with bidirectional language models

Title Semi-supervised sequence tagging with bidirectional language models
Authors Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, Russell Power
Abstract Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks. We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.
Tasks Chunking, Named Entity Recognition
Published 2017-04-29
URL http://arxiv.org/abs/1705.00108v1
PDF http://arxiv.org/pdf/1705.00108v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-sequence-tagging-with
Repo https://github.com/flairNLP/flair
Framework pytorch

Implicit Weight Uncertainty in Neural Networks

Title Implicit Weight Uncertainty in Neural Networks
Authors Nick Pawlowski, Andrew Brock, Matthew C. H. Lee, Martin Rajchl, Ben Glocker
Abstract Modern neural networks tend to be overconfident on unseen, noisy or incorrectly labelled data and do not produce meaningful uncertainty measures. Bayesian deep learning aims to address this shortcoming with variational approximations (such as Bayes by Backprop or Multiplicative Normalising Flows). However, current approaches have limitations regarding flexibility and scalability. We introduce Bayes by Hypernet (BbH), a new method of variational approximation that interprets hypernetworks as implicit distributions. It naturally uses neural networks to model arbitrarily complex distributions and scales to modern deep learning architectures. In our experiments, we demonstrate that our method achieves competitive accuracies and predictive uncertainties on MNIST and a CIFAR5 task, while being the most robust against adversarial attacks.
Tasks Normalising Flows
Published 2017-11-03
URL http://arxiv.org/abs/1711.01297v2
PDF http://arxiv.org/pdf/1711.01297v2.pdf
PWC https://paperswithcode.com/paper/implicit-weight-uncertainty-in-neural
Repo https://github.com/Aiqz/bayes-by-hypernet
Framework tf

Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition

Title Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition
Authors Chee Kheng Chng, Chee Seng Chan
Abstract Text in curve orientation, despite being one of the common text orientations in real world environment, has close to zero existence in well received scene text datasets such as ICDAR2013 and MSRA-TD500. The main motivation of Total-Text is to fill this gap and facilitate a new research direction for the scene text community. On top of the conventional horizontal and multi-oriented texts, it features curved-oriented text. Total-Text is highly diversified in orientations, more than half of its images have a combination of more than two orientations. Recently, a new breed of solutions that casted text detection as a segmentation problem has demonstrated their effectiveness against multi-oriented text. In order to evaluate its robustness against curved text, we fine-tuned DeconvNet and benchmark it on Total-Text. Total-Text with its annotation is available at https://github.com/cs-chan/Total-Text-Dataset
Tasks Curved Text Detection, Scene Text Detection, Scene Text Recognition
Published 2017-10-28
URL http://arxiv.org/abs/1710.10400v1
PDF http://arxiv.org/pdf/1710.10400v1.pdf
PWC https://paperswithcode.com/paper/total-text-a-comprehensive-dataset-for-scene
Repo https://github.com/cs-chan/Total-Text-Dataset
Framework none

The Impact of Random Models on Clustering Similarity

Title The Impact of Random Models on Clustering Similarity
Authors Alexander J Gates, Yong-Yeol Ahn
Abstract Clustering is a central approach for unsupervised learning. After clustering is applied, the most fundamental analysis is to quantitatively compare clusterings. Such comparisons are crucial for the evaluation of clustering methods as well as other tasks such as consensus clustering. It is often argued that, in order to establish a baseline, clustering similarity should be assessed in the context of a random ensemble of clusterings. The prevailing assumption for the random clustering ensemble is the permutation model in which the number and sizes of clusters are fixed. However, this assumption does not necessarily hold in practice; for example, multiple runs of K-means clustering returns clusterings with a fixed number of clusters, while the cluster size distribution varies greatly. Here, we derive corrected variants of two clustering similarity measures (the Rand index and Mutual Information) in the context of two random clustering ensembles in which the number and sizes of clusters vary. In addition, we study the impact of one-sided comparisons in the scenario with a reference clustering. The consequences of different random models are illustrated using synthetic examples, handwriting recognition, and gene expression data. We demonstrate that the choice of random model can have a drastic impact on the ranking of similar clustering pairs, and the evaluation of a clustering method with respect to a random baseline; thus, the choice of random clustering model should be carefully justified.
Tasks
Published 2017-01-23
URL http://arxiv.org/abs/1701.06508v2
PDF http://arxiv.org/pdf/1701.06508v2.pdf
PWC https://paperswithcode.com/paper/the-impact-of-random-models-on-clustering
Repo https://github.com/ajgates42/clusim
Framework none

The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge

Title The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge
Authors He-Da Wang, Teng Zhang, Ji Wu
Abstract This article describes the final solution of team monkeytyping, who finished in second place in the YouTube-8M video understanding challenge. The dataset used in this challenge is a large-scale benchmark for multi-label video classification. We extend the work in [1] and propose several improvements for frame sequence modeling. We propose a network structure called Chaining that can better capture the interactions between labels. Also, we report our approaches in dealing with multi-scale information and attention pooling. In addition, We find that using the output of model ensemble as a side target in training can boost single model performance. We report our experiments in bagging, boosting, cascade, and stacking, and propose a stacking algorithm called attention weighted stacking. Our final submission is an ensemble that consists of 74 sub models, all of which are listed in the appendix.
Tasks Video Classification, Video Understanding
Published 2017-06-16
URL http://arxiv.org/abs/1706.05150v1
PDF http://arxiv.org/pdf/1706.05150v1.pdf
PWC https://paperswithcode.com/paper/the-monkeytyping-solution-to-the-youtube-8m
Repo https://github.com/wangheda/youtube-8m
Framework tf

Proximal Policy Optimization Algorithms

Title Proximal Policy Optimization Algorithms
Authors John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Abstract We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
Tasks Dota 2, Policy Gradient Methods
Published 2017-07-20
URL http://arxiv.org/abs/1707.06347v2
PDF http://arxiv.org/pdf/1707.06347v2.pdf
PWC https://paperswithcode.com/paper/proximal-policy-optimization-algorithms
Repo https://github.com/clwainwright/proximal_policy_optimization
Framework tf

Scalable Log Determinants for Gaussian Process Kernel Learning

Title Scalable Log Determinants for Gaussian Process Kernel Learning
Authors Kun Dong, David Eriksson, Hannes Nickisch, David Bindel, Andrew Gordon Wilson
Abstract For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an $n \times n$ positive definite matrix, and its derivatives - leading to prohibitive $\mathcal{O}(n^3)$ computations. We propose novel $\mathcal{O}(n)$ approaches to estimating these quantities from only fast matrix vector multiplications (MVMs). These stochastic approximations are based on Chebyshev, Lanczos, and surrogate models, and converge quickly even for kernel matrices that have challenging spectra. We leverage these approximations to develop a scalable Gaussian process approach to kernel learning. We find that Lanczos is generally superior to Chebyshev for kernel learning, and that a surrogate approach can be highly efficient and accurate with popular kernels.
Tasks Gaussian Processes, Point Processes
Published 2017-11-09
URL http://arxiv.org/abs/1711.03481v1
PDF http://arxiv.org/pdf/1711.03481v1.pdf
PWC https://paperswithcode.com/paper/scalable-log-determinants-for-gaussian
Repo https://github.com/ericlee0803/GP_Derivatives
Framework none

Unsupervised Image-to-Image Translation Networks

Title Unsupervised Image-to-Image Translation Networks
Authors Ming-Yu Liu, Thomas Breuel, Jan Kautz
Abstract Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit .
Tasks Domain Adaptation, Image-to-Image Translation, Multimodal Unsupervised Image-To-Image Translation, Unsupervised Image-To-Image Translation
Published 2017-03-02
URL http://arxiv.org/abs/1703.00848v6
PDF http://arxiv.org/pdf/1703.00848v6.pdf
PWC https://paperswithcode.com/paper/unsupervised-image-to-image-translation
Repo https://github.com/taki0112/UNIT-Tensorflow
Framework tf

AutoEncoder Inspired Unsupervised Feature Selection

Title AutoEncoder Inspired Unsupervised Feature Selection
Authors Kai Han, Yunhe Wang, Chao Zhang, Chao Li, Chao Xu
Abstract High-dimensional data in many areas such as computer vision and machine learning tasks brings in computational and analytical difficulty. Feature selection which selects a subset from observed features is a widely used approach for improving performance and effectiveness of machine learning models with high-dimensional data. In this paper, we propose a novel AutoEncoder Feature Selector (AEFS) for unsupervised feature selection which combines autoencoder regression and group lasso tasks. Compared to traditional feature selection methods, AEFS can select the most important features by excavating both linear and nonlinear information among features, which is more flexible than the conventional self-representation method for unsupervised feature selection with only linear assumptions. Experimental results on benchmark dataset show that the proposed method is superior to the state-of-the-art method.
Tasks Feature Selection
Published 2017-10-23
URL http://arxiv.org/abs/1710.08310v3
PDF http://arxiv.org/pdf/1710.08310v3.pdf
PWC https://paperswithcode.com/paper/autoencoder-inspired-unsupervised-feature
Repo https://github.com/NoahLuffy/AEFS
Framework none

DLTK: State of the Art Reference Implementations for Deep Learning on Medical Images

Title DLTK: State of the Art Reference Implementations for Deep Learning on Medical Images
Authors Nick Pawlowski, Sofia Ira Ktena, Matthew C. H. Lee, Bernhard Kainz, Daniel Rueckert, Ben Glocker, Martin Rajchl
Abstract We present DLTK, a toolkit providing baseline implementations for efficient experimentation with deep learning methods on biomedical images. It builds on top of TensorFlow and its high modularity and easy-to-use examples allow for a low-threshold access to state-of-the-art implementations for typical medical imaging problems. A comparison of DLTK’s reference implementations of popular network architectures for image segmentation demonstrates new top performance on the publicly available challenge data “Multi-Atlas Labeling Beyond the Cranial Vault”. The average test Dice similarity coefficient of $81.5$ exceeds the previously best performing CNN ($75.7$) and the accuracy of the challenge winning method ($79.0$).
Tasks Semantic Segmentation
Published 2017-11-18
URL http://arxiv.org/abs/1711.06853v1
PDF http://arxiv.org/pdf/1711.06853v1.pdf
PWC https://paperswithcode.com/paper/dltk-state-of-the-art-reference
Repo https://github.com/DLTK/DLTK
Framework tf

Lip2AudSpec: Speech reconstruction from silent lip movements video

Title Lip2AudSpec: Speech reconstruction from silent lip movements video
Authors Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani
Abstract In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy.
Tasks
Published 2017-10-26
URL http://arxiv.org/abs/1710.09798v1
PDF http://arxiv.org/pdf/1710.09798v1.pdf
PWC https://paperswithcode.com/paper/lip2audspec-speech-reconstruction-from-silent
Repo https://github.com/hassanhub/LipReading
Framework tf

Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models

Title Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models
Authors Hussein S. Al-Olimat, Krishnaprasad Thirunarayan, Valerie Shalin, Amit Sheth
Abstract Extracting location names from informal and unstructured social media data requires the identification of referent boundaries and partitioning compound names. Variability, particularly systematic variability in location names (Carroll, 1983), challenges the identification task. Some of this variability can be anticipated as operations within a statistical language model, in this case drawn from gazetteers such as OpenStreetMap (OSM), Geonames, and DBpedia. This permits evaluation of an observed n-gram in Twitter targeted text as a legitimate location name variant from the same location-context. Using n-gram statistics and location-related dictionaries, our Location Name Extraction tool (LNEx) handles abbreviations and automatically filters and augments the location names in gazetteers (handling name contractions and auxiliary contents) to help detect the boundaries of multi-word location names and thereby delimit them in texts. We evaluated our approach on 4,500 event-specific tweets from three targeted streams to compare the performance of LNEx against that of ten state-of-the-art taggers that rely on standard semantic, syntactic and/or orthographic features. LNEx improved the average F-Score by 33-179%, outperforming all taggers. Further, LNEx is capable of stream processing.
Tasks Language Modelling
Published 2017-08-10
URL http://arxiv.org/abs/1708.03105v2
PDF http://arxiv.org/pdf/1708.03105v2.pdf
PWC https://paperswithcode.com/paper/location-name-extraction-from-targeted-text
Repo https://github.com/halolimat/LNEx
Framework none

A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction

Title A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
Authors Yi Zhang, Xu Sun
Abstract Abbreviation is a common phenomenon across languages, especially in Chinese. In most cases, if an expression can be abbreviated, its abbreviation is used more often than its fully expanded forms, since people tend to convey information in a most concise way. For various language processing tasks, abbreviation is an obstacle to improving the performance, as the textual form of an abbreviation does not express useful information, unless it’s expanded to the full form. Abbreviation prediction means associating the fully expanded forms with their abbreviations. However, due to the deficiency in the abbreviation corpora, such a task is limited in current studies, especially considering general abbreviation prediction should also include those full form expressions that do not have valid abbreviations, namely the negative full forms (NFFs). Corpora incorporating negative full forms for general abbreviation prediction are few in number. In order to promote the research in this area, we build a dataset for general Chinese abbreviation prediction, which needs a few preprocessing steps, and evaluate several different models on the built dataset. The dataset is available at https://github.com/lancopku/Chinese-abbreviation-dataset
Tasks
Published 2017-12-18
URL http://arxiv.org/abs/1712.06289v1
PDF http://arxiv.org/pdf/1712.06289v1.pdf
PWC https://paperswithcode.com/paper/a-chinese-dataset-with-negative-full-forms
Repo https://github.com/lancopku/Chinese-abbreviation-dataset
Framework none
comments powered by Disqus