Paper Group AWR 9
Hamiltonian Monte Carlo for Probabilistic Programs with Discontinuities. Insect cyborgs: Bio-mimetic feature generators improve machine learning accuracy on limited data. Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes. Exploiting High-Level Semantics for No-Reference Image Quality Assessment of Realistic Blur …
Hamiltonian Monte Carlo for Probabilistic Programs with Discontinuities
Title | Hamiltonian Monte Carlo for Probabilistic Programs with Discontinuities |
Authors | Bradley Gram-Hansen, Yuan Zhou, Tobias Kohn, Tom Rainforth, Hongseok Yang, Frank Wood |
Abstract | Hamiltonian Monte Carlo (HMC) is arguably the dominant statistical inference algorithm used in most popular “first-order differentiable” Probabilistic Programming Languages (PPLs). However, the fact that HMC uses derivative information causes complications when the target distribution is non-differentiable with respect to one or more of the latent variables. In this paper, we show how to use extensions to HMC to perform inference in probabilistic programs that contain discontinuities. To do this, we design a Simple first-order Probabilistic Programming Language (SPPL) that contains a sufficient set of language restrictions together with a compilation scheme. This enables us to preserve both the statistical and syntactic interpretation of if-else statements in the probabilistic program, within the scope of first-order PPLs. We also provide a corresponding mathematical formalism that ensures any joint density denoted in such a language has a suitably low measure of discontinuities. |
Tasks | Probabilistic Programming |
Published | 2018-04-07 |
URL | http://arxiv.org/abs/1804.03523v2 |
http://arxiv.org/pdf/1804.03523v2.pdf | |
PWC | https://paperswithcode.com/paper/hamiltonian-monte-carlo-for-probabilistic |
Repo | https://github.com/bradleygramhansen/pyfo |
Framework | pytorch |
Insect cyborgs: Bio-mimetic feature generators improve machine learning accuracy on limited data
Title | Insect cyborgs: Bio-mimetic feature generators improve machine learning accuracy on limited data |
Authors | Charles B Delahunt, J Nathan Kutz |
Abstract | Machine learning (ML) classifiers always benefit from more informative input features. We seek to auto-generate stronger feature sets in order to address the difficulty that ML methods often experience given limited training data. A wide range of biological neural nets (BNNs) excel at fast learning, implying that they are adept at extracting informative features. We can thus look to BNNs for tools to improve ML performance in this low-data regime. The insect olfactory network learns new odors very rapidly, by means of three key elements: A competitive inhibition layer; a high-dimensional sparse plastic layer; and Hebbian updates of synaptic weights. In this work, we deployed MothNet, a computational model of the insect olfactory network, as an automatic feature generator: Attached as a front-end pre-processor, its Readout Neurons provided new features, derived from the original features, for use by standard ML classifiers. We found that these “insect cyborgs”, i.e. classifiers that are part-insect model and part-ML method, had significantly better performance than baseline ML methods alone on a vectorized MNIST dataset. The MothNet feature generator also substantially out-performed other feature generating methods such as PCA, PLS, and NNs, as well as pre-training to initialize NN weights. Cyborgs improved relative test set accuracy by an average of 6% to 33% depending on baseline ML accuracy, while relative reduction in test set error exceeded 50% for higher baseline accuracy ML models. These results indicate the potential value of BNN-inspired feature generators in the ML context. |
Tasks | |
Published | 2018-08-23 |
URL | https://arxiv.org/abs/1808.08124v4 |
https://arxiv.org/pdf/1808.08124v4.pdf | |
PWC | https://paperswithcode.com/paper/insect-cyborgs-bio-mimetic-feature-generators |
Repo | https://github.com/charlesDelahunt/PuttingABugInML |
Framework | none |
Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes
Title | Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes |
Authors | Fangneng Zhan, Shijian Lu, Chuhui Xue |
Abstract | The requirement of large amounts of annotated images has become one grand challenge while training deep neural network models for various visual detection and recognition tasks. This paper presents a novel image synthesis technique that aims to generate a large amount of annotated scene text images for training accurate and robust scene text detection and recognition models. The proposed technique consists of three innovative designs. First, it realizes “semantic coherent” synthesis by embedding texts at semantically sensible regions within the background image, where the semantic coherence is achieved by leveraging the semantic annotations of objects and image regions that have been created in the prior semantic segmentation research. Second, it exploits visual saliency to determine the embedding locations within each semantic sensible region, which coincides with the fact that texts are often placed around homogeneous regions for better visibility in scenes. Third, it designs an adaptive text appearance model that determines the color and brightness of embedded texts by learning from the feature of real scene text images adaptively. The proposed technique has been evaluated over five public datasets and the experiments show its superior performance in training accurate and robust scene text detection and recognition models. |
Tasks | Image Generation, Scene Text Detection, Semantic Segmentation |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03021v2 |
http://arxiv.org/pdf/1807.03021v2.pdf | |
PWC | https://paperswithcode.com/paper/verisimilar-image-synthesis-for-accurate |
Repo | https://github.com/fnzhan/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes |
Framework | none |
Exploiting High-Level Semantics for No-Reference Image Quality Assessment of Realistic Blur Images
Title | Exploiting High-Level Semantics for No-Reference Image Quality Assessment of Realistic Blur Images |
Authors | Dingquan Li, Tingting Jiang, Ming Jiang |
Abstract | To guarantee a satisfying Quality of Experience (QoE) for consumers, it is required to measure image quality efficiently and reliably. The neglect of the high-level semantic information may result in predicting a clear blue sky as bad quality, which is inconsistent with human perception. Therefore, in this paper, we tackle this problem by exploiting the high-level semantics and propose a novel no-reference image quality assessment method for realistic blur images. Firstly, the whole image is divided into multiple overlapping patches. Secondly, each patch is represented by the high-level feature extracted from the pre-trained deep convolutional neural network model. Thirdly, three different kinds of statistical structures are adopted to aggregate the information from different patches, which mainly contain some common statistics (i.e., the mean&standard deviation, quantiles and moments). Finally, the aggregated features are fed into a linear regression model to predict the image quality. Experiments show that, compared with low-level features, high-level features indeed play a more critical role in resolving the aforementioned challenging problem for quality estimation. Besides, the proposed method significantly outperforms the state-of-the-art methods on two realistic blur image databases and achieves comparable performance on two synthetic blur image databases. |
Tasks | Blind Image Quality Assessment, Image Quality Assessment, Image Quality Estimation, No-Reference Image Quality Assessment |
Published | 2018-10-18 |
URL | http://arxiv.org/abs/1810.08169v1 |
http://arxiv.org/pdf/1810.08169v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-high-level-semantics-for-no |
Repo | https://github.com/lidq92/SFA |
Framework | pytorch |
A Multi-View Ensemble Classification Model for Clinically Actionable Genetic Mutations
Title | A Multi-View Ensemble Classification Model for Clinically Actionable Genetic Mutations |
Authors | Xi Sheryl Zhang, Dandi Chen, Yongjun Zhu, Chao Che, Chang Su, Sendong Zhao, Xu Min, Fei Wang |
Abstract | This paper presents details of our winning solutions to the task IV of NIPS 2017 Competition Track entitled Classifying Clinically Actionable Genetic Mutations. The machine learning task aims to classify genetic mutations based on text evidence from clinical literature with promising performance. We develop a novel multi-view machine learning framework with ensemble classification models to solve the problem. During the Challenge, feature combinations derived from three views including document view, entity text view, and entity name view, which complements each other, are comprehensively explored. As the final solution, we submitted an ensemble of nine basic gradient boosting models which shows the best performance in the evaluation. The approach scores 0.5506 and 0.6694 in terms of logarithmic loss on a fixed split in stage-1 testing phase and 5-fold cross validation respectively, which also makes us ranked as a top-1 team out of more than 1,300 solutions in NIPS 2017 Competition Track IV. |
Tasks | |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09737v2 |
http://arxiv.org/pdf/1806.09737v2.pdf | |
PWC | https://paperswithcode.com/paper/the-nips17-competition-a-multi-view-ensemble |
Repo | https://github.com/sheryl-ai/NeurIPS17-Competition-Classifying-Genetic-Mutations |
Framework | none |
Virtual CNN Branching: Efficient Feature Ensemble for Person Re-Identification
Title | Virtual CNN Branching: Efficient Feature Ensemble for Person Re-Identification |
Authors | Albert Gong, Qiang Qiu, Guillermo Sapiro |
Abstract | In this paper we introduce an ensemble method for convolutional neural network (CNN), called “virtual branching,” which can be implemented with nearly no additional parameters and computation on top of standard CNNs. We propose our method in the context of person re-identification (re-ID). Our CNN model consists of shared bottom layers, followed by “virtual” branches, where neurons from a block of regular convolutional and fully-connected layers are partitioned into multiple sets. Each virtual branch is trained with different data to specialize in different aspects, e.g., a specific body region or pose orientation. In this way, robust ensemble representations are obtained against human body misalignment, deformations, or variations in viewing angles, at nearly no any additional cost. The proposed method achieves competitive performance on multiple person re-ID benchmark datasets, including Market-1501, CUHK03, and DukeMTMC-reID. |
Tasks | Person Re-Identification |
Published | 2018-03-15 |
URL | http://arxiv.org/abs/1803.05872v1 |
http://arxiv.org/pdf/1803.05872v1.pdf | |
PWC | https://paperswithcode.com/paper/virtual-cnn-branching-efficient-feature |
Repo | https://github.com/agongt408/vbranch |
Framework | tf |
Dynamic Channel Pruning: Feature Boosting and Suppression
Title | Dynamic Channel Pruning: Feature Boosting and Suppression |
Authors | Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, Cheng-zhong Xu |
Abstract | Making deep convolutional neural networks more accurate typically comes at the cost of increased computational and memory resources. In this paper, we reduce this cost by exploiting the fact that the importance of features computed by convolutional layers is highly input-dependent, and propose feature boosting and suppression (FBS), a new method to predictively amplify salient convolutional channels and skip unimportant ones at run-time. FBS introduces small auxiliary connections to existing convolutional layers. In contrast to channel pruning methods which permanently remove channels, it preserves the full network structures and accelerates convolution by dynamically skipping unimportant input and output channels. FBS-augmented networks are trained with conventional stochastic gradient descent, making it readily available for many state-of-the-art CNNs. We compare FBS to a range of existing channel pruning and dynamic execution schemes and demonstrate large improvements on ImageNet classification. Experiments show that FBS can respectively provide $5\times$ and $2\times$ savings in compute on VGG-16 and ResNet-18, both with less than $0.6%$ top-5 accuracy loss. |
Tasks | Model Compression, Network Pruning |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.05331v2 |
http://arxiv.org/pdf/1810.05331v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-channel-pruning-feature-boosting-and |
Repo | https://github.com/deep-fry/mayo |
Framework | tf |
Image-to-image translation for cross-domain disentanglement
Title | Image-to-image translation for cross-domain disentanglement |
Authors | Abel Gonzalez-Garcia, Joost van de Weijer, Yoshua Bengio |
Abstract | Deep image translation methods have recently shown excellent results, outputting high-quality images covering multiple modes of the data distribution. There has also been increased interest in disentangling the internal representations learned by deep methods to further improve their performance and achieve a finer control. In this paper, we bridge these two objectives and introduce the concept of cross-domain disentanglement. We aim to separate the internal representation into three parts. The shared part contains information for both domains. The exclusive parts, on the other hand, contain only factors of variation that are particular to each domain. We achieve this through bidirectional image translation based on Generative Adversarial Networks and cross-domain autoencoders, a novel network component. Our model offers multiple advantages. We can output diverse samples covering multiple modes of the distributions of both domains, perform domain-specific image transfer and interpolation, and cross-domain retrieval without the need of labeled data, only paired images. We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets. |
Tasks | Image-to-Image Translation |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09730v3 |
http://arxiv.org/pdf/1805.09730v3.pdf | |
PWC | https://paperswithcode.com/paper/image-to-image-translation-for-cross-domain |
Repo | https://github.com/agonzgarc/cross-domain-disen |
Framework | tf |
Fast Disparity Estimation using Dense Networks
Title | Fast Disparity Estimation using Dense Networks |
Authors | Rowel Atienza |
Abstract | Disparity estimation is a difficult problem in stereo vision because the correspondence technique fails in images with textureless and repetitive regions. Recent body of work using deep convolutional neural networks (CNN) overcomes this problem with semantics. Most CNN implementations use an autoencoder method; stereo images are encoded, merged and finally decoded to predict the disparity map. In this paper, we present a CNN implementation inspired by dense networks to reduce the number of parameters. Furthermore, our approach takes into account semantic reasoning in disparity estimation. Our proposed network, called DenseMapNet, is compact, fast and can be trained end-to-end. DenseMapNet requires 290k parameters only and runs at 30Hz or faster on color stereo images in full resolution. Experimental results show that DenseMapNet accuracy is comparable with other significantly bigger CNN-based methods. |
Tasks | Disparity Estimation |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07499v1 |
http://arxiv.org/pdf/1805.07499v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-disparity-estimation-using-dense |
Repo | https://github.com/roatienza/densemapnet |
Framework | tf |
CGNet: A Light-weight Context Guided Network for Semantic Segmentation
Title | CGNet: A Light-weight Context Guided Network for Semantic Segmentation |
Authors | Tianyi Wu, Sheng Tang, Rui Zhang, Yongdong Zhang |
Abstract | The demand of applying semantic segmentation model on mobile devices has been increasing rapidly. Current state-of-the-art networks have enormous amount of parameters hence unsuitable for mobile devices, while other small memory footprint models follow the spirit of classification network and ignore the inherent characteristic of semantic segmentation. To tackle this problem, we propose a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation. We first propose the Context Guided (CG) block, which learns the joint feature of both local feature and surrounding context, and further improves the joint feature with the global context. Based on the CG block, we develop CGNet which captures contextual information in all stages of the network and is specially tailored for increasing segmentation accuracy. CGNet is also elaborately designed to reduce the number of parameters and save memory footprint. Under an equivalent number of parameters, the proposed CGNet significantly outperforms existing segmentation networks. Extensive experiments on Cityscapes and CamVid datasets verify the effectiveness of the proposed approach. Specifically, without any post-processing and multi-scale testing, the proposed CGNet achieves 64.8% mean IoU on Cityscapes with less than 0.5 M parameters. The source code for the complete system can be found at https://github.com/wutianyiRosun/CGNet. |
Tasks | Semantic Segmentation |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08201v2 |
http://arxiv.org/pdf/1811.08201v2.pdf | |
PWC | https://paperswithcode.com/paper/cgnet-a-light-weight-context-guided-network |
Repo | https://github.com/wutianyiRosun/CGNet |
Framework | pytorch |
GPdoemd: a Python package for design of experiments for model discrimination
Title | GPdoemd: a Python package for design of experiments for model discrimination |
Authors | Simon Olofsson, Lukas Hebing, Sebastian Niedenführ, Marc Peter Deisenroth, Ruth Misener |
Abstract | Model discrimination identifies a mathematical model that usefully explains and predicts a given system’s behaviour. Researchers will often have several models, i.e. hypotheses, about an underlying system mechanism, but insufficient experimental data to discriminate between the models, i.e. discard inaccurate models. Given rival mathematical models and an initial experimental data set, optimal design of experiments suggests maximally informative experimental observations that maximise a design criterion weighted by prediction uncertainty. The model uncertainty requires gradients, which may not be readily available for black-box models. This paper (i) proposes a new design criterion using the Jensen-R'enyi divergence, and (ii) develops a novel method replacing black-box models with Gaussian process surrogates. Using the surrogates, we marginalise out the model parameters with approximate inference. Results show these contributions working well for both classical and new test instances. We also (iii) introduce and discuss GPdoemd, the open-source implementation of the Gaussian process surrogate method. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02561v3 |
http://arxiv.org/pdf/1810.02561v3.pdf | |
PWC | https://paperswithcode.com/paper/gpdoemd-a-python-package-for-design-of |
Repo | https://github.com/cog-imperial/GPdoemd |
Framework | none |
Night-to-Day Image Translation for Retrieval-based Localization
Title | Night-to-Day Image Translation for Retrieval-based Localization |
Authors | Asha Anoosheh, Torsten Sattler, Radu Timofte, Marc Pollefeys, Luc Van Gool |
Abstract | Visual localization is a key step in many robotics pipelines, allowing the robot to (approximately) determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These approaches identify the image most similar to a query photo in a database of geo-tagged images and approximate the query’s pose via the pose of the retrieved database image. However, image retrieval across drastically different illumination conditions, e.g. day and night, is still a problem with unsatisfactory results, even in this age of powerful neural models. This is due to a lack of a suitably diverse dataset with true correspondences to perform end-to-end learning. A recent class of neural models allows for realistic translation of images among visual domains with relatively little training data and, most importantly, without ground-truth pairings. In this paper, we explore the task of accurately localizing images captured from two traversals of the same area in both day and night. We propose ToDayGAN - a modified image-translation model to alter nighttime driving images to a more useful daytime representation. We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image. Our approach improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories. |
Tasks | Image Retrieval, Style Generalization, Visual Localization |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.09767v2 |
http://arxiv.org/pdf/1809.09767v2.pdf | |
PWC | https://paperswithcode.com/paper/night-to-day-image-translation-for-retrieval |
Repo | https://github.com/AAnoosheh/ToDayGAN |
Framework | pytorch |
InLoc: Indoor Visual Localization with Dense Matching and View Synthesis
Title | InLoc: Indoor Visual Localization with Dense Matching and View Synthesis |
Authors | Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, Akihiko Torii |
Abstract | We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor environments. The method proceeds along three steps: (i) efficient retrieval of candidate poses that ensures scalability to large-scale environments, (ii) pose estimation using dense matching rather than local features to deal with textureless indoor scenes, and (iii) pose verification by virtual view synthesis to cope with significant changes in viewpoint, scene layout, and occluders. Second, we collect a new dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario. Third, we demonstrate that our method significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data. |
Tasks | Pose Estimation, Visual Localization |
Published | 2018-03-28 |
URL | http://arxiv.org/abs/1803.10368v2 |
http://arxiv.org/pdf/1803.10368v2.pdf | |
PWC | https://paperswithcode.com/paper/inloc-indoor-visual-localization-with-dense |
Repo | https://github.com/HajimeTaira/InLoc_demo |
Framework | none |
Particle Filter Networks with Application to Visual Localization
Title | Particle Filter Networks with Application to Visual Localization |
Authors | Peter Karkus, David Hsu, Wee Sun Lee |
Abstract | Particle filtering is a powerful approach to sequential state estimation and finds application in many domains, including robot localization, object tracking, etc. To apply particle filtering in practice, a critical challenge is to construct probabilistic system models, especially for systems with complex dynamics or rich sensory inputs such as camera images. This paper introduces the Particle Filter Network (PFnet), which encodes both a system model and a particle filter algorithm in a single neural network. The PF-net is fully differentiable and trained end-to-end from data. Instead of learning a generic system model, it learns a model optimized for the particle filter algorithm. We apply the PF-net to a visual localization task, in which a robot must localize in a rich 3-D world, using only a schematic 2-D floor map. In simulation experiments, PF-net consistently outperforms alternative learning architectures, as well as a traditional model-based method, under a variety of sensor inputs. Further, PF-net generalizes well to new, unseen environments. |
Tasks | Object Tracking, Visual Localization |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.08975v3 |
http://arxiv.org/pdf/1805.08975v3.pdf | |
PWC | https://paperswithcode.com/paper/particle-filter-networks-with-application-to |
Repo | https://github.com/AdaCompNUS/pfnet |
Framework | tf |
Incorporating Subword Information into Matrix Factorization Word Embeddings
Title | Incorporating Subword Information into Matrix Factorization Word Embeddings |
Authors | Alexandre Salle, Aline Villavicencio |
Abstract | The positive effect of adding subword information to word embeddings has been demonstrated for predictive models. In this paper we investigate whether similar benefits can also be derived from incorporating subwords into counting models. We evaluate the impact of different types of subwords (n-grams and unsupervised morphemes), with results confirming the importance of subword information in learning representations of rare and out-of-vocabulary words. |
Tasks | Word Embeddings |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03710v1 |
http://arxiv.org/pdf/1805.03710v1.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-subword-information-into-matrix |
Repo | https://github.com/alexandres/lexvec |
Framework | none |