October 18, 2019

3022 words 15 mins read

Paper Group ANR 508

On the Complexity and Typology of Inflectional Morphological Systems. Scale Space Approximation in Convolutional Neural Networks for Retinal Vessel Segmentation. Liver Segmentation in Abdominal CT Images by Adaptive 3D Region Growing. Monte Carlo Information Geometry: The dually flat case. Precision Highway for Ultra Low-Precision Quantization. Alt …

On the Complexity and Typology of Inflectional Morphological Systems


Title	On the Complexity and Typology of Inflectional Morphological Systems
Authors	Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner
Abstract	We quantify the linguistic complexity of different languages’ morphological systems. We verify that there is an empirical trade-off between paradigm size and irregularity: a language’s inflectional paradigms may be either large in size or highly irregular, but never both. Our methodology measures paradigm irregularity as the entropy of the surface realization of a paradigm – how hard it is to jointly predict all the surface forms of a paradigm. We estimate this by a variational approximation. Our measurements are taken on large morphological paradigms from 31 typologically diverse languages.
Tasks
Published	2018-07-08
URL	http://arxiv.org/abs/1807.02747v1
PDF	http://arxiv.org/pdf/1807.02747v1.pdf
PWC	https://paperswithcode.com/paper/on-the-complexity-and-typology-of
Repo
Framework

Scale Space Approximation in Convolutional Neural Networks for Retinal Vessel Segmentation


Title	Scale Space Approximation in Convolutional Neural Networks for Retinal Vessel Segmentation
Authors	Kyoung Jin Noh, Sang Jun Park, Soochahn Lee
Abstract	Retinal images have the highest resolution and clarity among medical images. Thus, vessel analysis in retinal images may facilitate early diagnosis and treatment of many chronic diseases. In this paper, we propose a novel multi-scale residual convolutional neural network structure based on a \emph{scale-space approximation (SSA)} block of layers, comprising subsampling and subsequent upsampling, for multi-scale representation. Through analysis in the frequency domain, we show that this block structure is a close approximation of Gaussian filtering, the operation to achieve scale variations in scale-space theory. Experimental evaluations demonstrate that the proposed network outperforms current state-of-the-art methods. Ablative analysis shows that the SSA is indeed an important factor in performance improvement.
Tasks	Retinal Vessel Segmentation
Published	2018-06-24
URL	http://arxiv.org/abs/1806.09230v2
PDF	http://arxiv.org/pdf/1806.09230v2.pdf
PWC	https://paperswithcode.com/paper/scale-space-approximation-in-convolutional
Repo
Framework

Liver Segmentation in Abdominal CT Images by Adaptive 3D Region Growing


Title	Liver Segmentation in Abdominal CT Images by Adaptive 3D Region Growing
Authors	Shima Rafiei, Nader Karimi, Behzad Mirmahboub, S. M. Reza Soroushmehr, Banafsheh Felfelian, Shadrokh Samavi, Kayvan Najarian
Abstract	Automatic liver segmentation plays an important role in computer-aided diagnosis and treatment. Manual segmentation of organs is a difficult and tedious task and so prone to human errors. In this paper, we propose an adaptive 3D region growing with subject-specific conditions. For this aim we use the intensity distribution of most probable voxels in prior map along with location prior. We also incorporate the boundary of target organs to restrict the region growing. In order to obtain strong edges and high contrast, we propose an effective contrast enhancement algorithm to facilitate more accurate segmentation. In this paper, 92.56% Dice score is achieved. We compare our method with the method of hard thresholding on Deeds prior map and also with the majority voting on Deeds registration with 13 organs.
Tasks	Liver Segmentation
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07794v2
PDF	http://arxiv.org/pdf/1802.07794v2.pdf
PWC	https://paperswithcode.com/paper/liver-segmentation-in-abdominal-ct-images-by
Repo
Framework

Monte Carlo Information Geometry: The dually flat case


Title	Monte Carlo Information Geometry: The dually flat case
Authors	Frank Nielsen, Gaëtan Hadjeres
Abstract	Exponential families and mixture families are parametric probability models that can be geometrically studied as smooth statistical manifolds with respect to any statistical divergence like the Kullback-Leibler (KL) divergence or the Hellinger divergence. When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters. In practice, the corresponding Bregman generators of mixture/exponential families require to perform definite integral calculus that can either be too time-consuming (for exponentially large discrete support case) or even do not admit closed-form formula (for continuous support case). In these cases, the dually flat construction remains theoretical and cannot be used by information-geometric algorithms. To bypass this problem, we consider performing stochastic Monte Carlo (MC) estimation of those integral-based mixture/exponential family Bregman generators. We show that, under natural assumptions, these MC generators are almost surely Bregman generators. We define a series of dually flat information geometries, termed Monte Carlo Information Geometries, that increasingly-finely approximate the untractable geometry. The advantage of this MCIG is that it allows a practical use of the Bregman algorithmic toolbox on a wide range of probability distribution families. We demonstrate our approach with a clustering task on a mixture family manifold.
Tasks
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07225v1
PDF	http://arxiv.org/pdf/1803.07225v1.pdf
PWC	https://paperswithcode.com/paper/monte-carlo-information-geometry-the-dually
Repo
Framework

Precision Highway for Ultra Low-Precision Quantization


Title	Precision Highway for Ultra Low-Precision Quantization
Authors	Eunhyeok Park, Dongyoung Kim, Sungjoo Yoo, Peter Vajda
Abstract	Neural network quantization has an inherent problem called accumulated quantization error, which is the key obstacle towards ultra-low precision, e.g., 2- or 3-bit precision. To resolve this problem, we propose precision highway, which forms an end-to-end high-precision information flow while performing the ultra low-precision computation. First, we describe how the precision highway reduce the accumulated quantization error in both convolutional and recurrent neural networks. We also provide the quantitative analysis of the benefit of precision highway and evaluate the overhead on the state-of-the-art hardware accelerator. In the experiments, our proposed method outperforms the best existing quantization methods while offering 3-bit weight/activation quantization with no accuracy loss and 2-bit quantization with a 2.45 % top-1 accuracy loss in ResNet-50. We also report that the proposed method significantly outperforms the existing method in the 2-bit quantization of an LSTM for language modeling.
Tasks	Language Modelling, Quantization
Published	2018-12-24
URL	http://arxiv.org/abs/1812.09818v1
PDF	http://arxiv.org/pdf/1812.09818v1.pdf
PWC	https://paperswithcode.com/paper/precision-highway-for-ultra-low-precision
Repo
Framework

Alternating Multi-bit Quantization for Recurrent Neural Networks


Title	Alternating Multi-bit Quantization for Recurrent Neural Networks
Authors	Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, Hongbin Zha
Abstract	Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent requests, the latency during inference can also be very critical for costly computing resources. In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes {-1,+1}. We formulate the quantization as an optimization problem. Under the key observation that once the quantization coefficients are fixed the binary codes can be derived efficiently by binary search tree, alternating minimization is then applied. We test the quantization for two well-known RNNs, i.e., long short term memory (LSTM) and gated recurrent unit (GRU), on the language models. Compared with the full-precision counter part, by 2-bit quantization we can achieve ~16x memory saving and ~6x real inference acceleration on CPUs, with only a reasonable loss in the accuracy. By 3-bit quantization, we can achieve almost no loss in the accuracy or even surpass the original model, with ~10.5x memory saving and ~3x real inference acceleration. Both results beat the exiting quantization works with large margins. We extend our alternating quantization to image classification tasks. In both RNNs and feedforward neural networks, the method also achieves excellent performance.
Tasks	Image Classification, Quantization
Published	2018-02-01
URL	http://arxiv.org/abs/1802.00150v1
PDF	http://arxiv.org/pdf/1802.00150v1.pdf
PWC	https://paperswithcode.com/paper/alternating-multi-bit-quantization-for
Repo
Framework

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model


Title	Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model
Authors	Suwon Shon, Hao Tang, James Glass
Abstract	In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings. The embedding can be extracted efficiently with linear activation in the embedding layer. To understand how the speaker recognition model operates with text-independent input, we modify the structure to extract frame-level speaker embeddings from each hidden layer. We feed utterances from the TIMIT dataset to the trained network and use several proxy tasks to study the networks ability to represent speech input and differentiate voice identity. We found that the networks are better at discriminating broad phonetic classes than individual phonemes. In particular, frame-level embeddings that belong to the same phonetic classes are similar (based on cosine distance) for the same speaker. The frame level representation also allows us to analyze the networks at the frame level, and has the potential for other analyses to improve speaker recognition.
Tasks	Speaker Recognition, Text-Independent Speaker Recognition
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04437v1
PDF	http://arxiv.org/pdf/1809.04437v1.pdf
PWC	https://paperswithcode.com/paper/frame-level-speaker-embeddings-for-text
Repo
Framework

An Iterative Path-Breaking Approach with Mutation and Restart Strategies for the MAX-SAT Problem


Title	An Iterative Path-Breaking Approach with Mutation and Restart Strategies for the MAX-SAT Problem
Authors	Zhen-Xing Xu, Kun He, Chu-Min Li
Abstract	Although Path-Relinking is an effective local search method for many combinatorial optimization problems, its application is not straightforward in solving the MAX-SAT, an optimization variant of the satisfiability problem (SAT) that has many real-world applications and has gained more and more attention in academy and industry. Indeed, it was not used in any recent competitive MAX-SAT algorithms in our knowledge. In this paper, we propose a new local search algorithm called IPBMR for the MAX-SAT, that remedies the drawbacks of the Path-Relinking method by using a careful combination of three components: a new strategy named Path-Breaking to avoid unpromising regions of the search space when generating trajectories between two elite solutions; a weak and a strong mutation strategies, together with restarts, to diversify the search; and stochastic path generating steps to avoid premature local optimum solutions. We then present experimental results to show that IPBMR outperforms two of the best state-of-the-art MAX-SAT solvers, and an empirical investigation to identify and explain the effect of the three components in IPBMR.
Tasks	Combinatorial Optimization
Published	2018-08-10
URL	http://arxiv.org/abs/1808.03611v1
PDF	http://arxiv.org/pdf/1808.03611v1.pdf
PWC	https://paperswithcode.com/paper/an-iterative-path-breaking-approach-with
Repo
Framework

Compressive Light Field Reconstructions using Deep Learning


Title	Compressive Light Field Reconstructions using Deep Learning
Authors	Mayank Gupta, Arjun Jauhari, Kuldeep Kulkarni, Suren Jayasuriya, Alyosha Molnar, Pavan Turaga
Abstract	Light field imaging is limited in its computational processing demands of high sampling for both spatial and angular dimensions. Single-shot light field cameras sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing incoming rays onto a 2D sensor array. While this resolution can be recovered using compressive sensing, these iterative solutions are slow in processing a light field. We present a deep learning approach using a new, two branch network architecture, consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution 4D light field from a single coded 2D image. This network decreases reconstruction time significantly while achieving average PSNR values of 26-32 dB on a variety of light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7 minutes as compared to the dictionary method for equivalent visual quality. These reconstructions are performed at small sampling/compression ratios as low as 8%, allowing for cheaper coded light field cameras. We test our network reconstructions on synthetic light fields, simulated coded measurements of real light fields captured from a Lytro Illum camera, and real coded images from a custom CMOS diffractive light field camera. The combination of compressive light field capture with deep learning allows the potential for real-time light field video acquisition systems in the future.
Tasks	Compressive Sensing
Published	2018-02-05
URL	http://arxiv.org/abs/1802.01722v1
PDF	http://arxiv.org/pdf/1802.01722v1.pdf
PWC	https://paperswithcode.com/paper/compressive-light-field-reconstructions-using
Repo
Framework

Coulomb Autoencoders


Title	Coulomb Autoencoders
Authors	Emanuele Sansone, Hafiz Tiomoko Ali, Sun Jiacheng
Abstract	Learning the true density in high-dimensional feature spaces is a well-known problem in machine learning. In this work, we consider generative autoencoders based on maximum-mean discrepancy (MMD) and provide theoretical insights. In particular, (i) we prove that MMD coupled with Coulomb kernels has optimal convergence properties, which are similar to convex functionals, thus improving the training of autoencoders, and (ii) we provide a probabilistic bound on the generalization performance, highlighting some fundamental conditions to achieve better generalization. We validate the theory on synthetic examples and on the popular dataset of celebrities’ faces, showing that our model, called Coulomb autoencoders, outperform the state-of-the-art.
Tasks
Published	2018-02-10
URL	https://arxiv.org/abs/1802.03505v6
PDF	https://arxiv.org/pdf/1802.03505v6.pdf
PWC	https://paperswithcode.com/paper/coulomb-autoencoders
Repo
Framework

chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models


Title	chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models
Authors	Jeremy R. Ash Jacqueline M. Hughes-Oliver
Abstract	The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of new models. While focused on implementing methods for model fitting and assessment that have been accepted by experts in the cheminformatics field, all of the methods in chemmodlab have broad utility for the machine learning community. chemmodlab contains several assessment utilities including a plotting function that constructs accumulation curves and a function that computes many performance measures. The most novel feature of chemmodlab is the ease with which statistically significant performance differences for many machine learning models is presented by means of the multiple comparisons similarity plot. Differences are assessed using repeated k-fold cross validation where blocking increases precision and multiplicity adjustments are applied.
Tasks
Published	2018-06-30
URL	http://arxiv.org/abs/1807.00243v3
PDF	http://arxiv.org/pdf/1807.00243v3.pdf
PWC	https://paperswithcode.com/paper/chemmodlab-a-cheminformatics-modeling
Repo
Framework

Deep Adversarial Context-Aware Landmark Detection for Ultrasound Imaging


Title	Deep Adversarial Context-Aware Landmark Detection for Ultrasound Imaging
Authors	Ahmet Tuysuzoglu, Jeremy Tan, Kareem Eissa, Atilla P. Kiraly, Mamadou Diallo, Ali Kamen
Abstract	Real-time localization of prostate gland in trans-rectal ultrasound images is a key technology that is required to automate the ultrasound guided prostate biopsy procedures. In this paper, we propose a new deep learning based approach which is aimed at localizing several prostate landmarks efficiently and robustly. We propose a multitask learning approach primarily to make the overall algorithm more contextually aware. In this approach, we not only consider the explicit learning of landmark locations, but also build-in a mechanism to learn the contour of the prostate. This multitask learning is further coupled with an adversarial arm to promote the generation of feasible structures. We have trained this network using ~4000 labeled trans-rectal ultrasound images and tested on an independent set of images with ground truth landmark locations. We have achieved an overall Dice score of 92.6% for the adversarially trained multitask approach, which is significantly better than the Dice score of 88.3% obtained by only learning of landmark locations. The overall mean distance error using the adversarial multitask approach has also improved by 20% while reducing the standard deviation of the error compared to learning landmark locations only. In terms of computational complexity both approaches can process the images in real-time using standard computer with a standard CUDA enabled GPU.
Tasks
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10737v1
PDF	http://arxiv.org/pdf/1805.10737v1.pdf
PWC	https://paperswithcode.com/paper/deep-adversarial-context-aware-landmark
Repo
Framework

Learning from Multiview Correlations in Open-Domain Videos


Title	Learning from Multiview Correlations in Open-Domain Videos
Authors	Nils Holzenberger, Shruti Palaskar, Pranava Madhyastha, Florian Metze, Raman Arora
Abstract	An increasing number of datasets contain multiple views, such as video, sound and automatic captions. A basic challenge in representation learning is how to leverage multiple views to learn better representations. This is further complicated by the existence of a latent alignment between views, such as between speech and its transcription, and by the multitude of choices for the learning objective. We explore an advanced, correlation-based representation learning method on a 4-way parallel, multimodal dataset, and assess the quality of the learned representations on retrieval-based tasks. We show that the proposed approach produces rich representations that capture most of the information shared across views. Our best models for speech and textual modalities achieve retrieval rates from 70.7% to 96.9% on open-domain, user-generated instructional videos. This shows it is possible to learn reliable representations across disparate, unaligned and noisy modalities, and encourages using the proposed approach on larger datasets.
Tasks	Representation Learning
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08890v2
PDF	http://arxiv.org/pdf/1811.08890v2.pdf
PWC	https://paperswithcode.com/paper/learning-from-multiview-correlations-in-open
Repo
Framework

DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery


Title	DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery
Authors	Junming Zhang, Katherine A. Skinner, Ram Vasudevan, Matthew Johnson-Roberson
Abstract	Recent work has shown that convolutional neural networks (CNNs) can be applied successfully in disparity estimation, but these methods still suffer from errors in regions of low-texture, occlusions and reflections. Concurrently, deep learning for semantic segmentation has shown great progress in recent years. In this paper, we design a CNN architecture that combines these two tasks to improve the quality and accuracy of disparity estimation with the help of semantic segmentation. Specifically, we propose a network structure in which these two tasks are highly coupled. One key novelty of this approach is the two-stage refinement process. Initial disparity estimates are refined with an embedding learned from the semantic segmentation branch of the network. The proposed model is trained using an unsupervised approach, in which images from one half of the stereo pair are warped and compared against images from the other camera. Another key advantage of the proposed approach is that a single network is capable of outputting disparity estimates and semantic labels. These outputs are of great use in autonomous vehicle operation; with real-time constraints being key, such performance improvements increase the viability of driving applications. Experiments on KITTI and Cityscapes datasets show that our model can achieve state-of-the-art results and that leveraging embedding learned from semantic segmentation improves the performance of disparity estimation.
Tasks	Disparity Estimation, Semantic Segmentation
Published	2018-09-13
URL	http://arxiv.org/abs/1809.04734v2
PDF	http://arxiv.org/pdf/1809.04734v2.pdf
PWC	https://paperswithcode.com/paper/dispsegnet-leveraging-semantics-for-end-to
Repo
Framework

Semantic Structural Evaluation for Text Simplification


Title	Semantic Structural Evaluation for Text Simplification
Authors	Elior Sulem, Omri Abend, Ari Rappoport
Abstract	Current measures for evaluating text simplification systems focus on evaluating lexical text aspects, neglecting its structural aspects. In this paper we propose the first measure to address structural aspects of text simplification, called SAMSA. It leverages recent advances in semantic parsing to assess simplification quality by decomposing the input based on its semantic structure and comparing it to the output. SAMSA provides a reference-less automatic evaluation procedure, avoiding the problems that reference-based methods face due to the vast space of valid simplifications for a given sentence. Our human evaluation experiments show both SAMSA’s substantial correlation with human judgments, as well as the deficiency of existing reference-based measures in evaluating structural simplification.
Tasks	Semantic Parsing, Text Simplification
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05022v1
PDF	http://arxiv.org/pdf/1810.05022v1.pdf
PWC	https://paperswithcode.com/paper/semantic-structural-evaluation-for-text
Repo
Framework