October 21, 2019

3168 words 15 mins read

Paper Group AWR 91

Fast Multiple Landmark Localisation Using a Patch-based Iterative Network. Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering. Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification. Relational Autoencoder for Feature Extraction. Averaging Weights Leads to Wider Optima and Better G …

Fast Multiple Landmark Localisation Using a Patch-based Iterative Network


Title	Fast Multiple Landmark Localisation Using a Patch-based Iterative Network
Authors	Yuanwei Li, Amir Alansary, Juan J. Cerrolaza, Bishesh Khanal, Matthew Sinclair, Jacqueline Matthew, Chandni Gupta, Caroline Knight, Bernhard Kainz, Daniel Rueckert
Abstract	We propose a new Patch-based Iterative Network (PIN) for fast and accurate landmark localisation in 3D medical volumes. PIN utilises a Convolutional Neural Network (CNN) to learn the spatial relationship between an image patch and anatomical landmark positions. During inference, patches are repeatedly passed to the CNN until the estimated landmark position converges to the true landmark location. PIN is computationally efficient since the inference stage only selectively samples a small number of patches in an iterative fashion rather than a dense sampling at every location in the volume. Our approach adopts a multi-task learning framework that combines regression and classification to improve localisation accuracy. We extend PIN to localise multiple landmarks by using principal component analysis, which models the global anatomical relationships between landmarks. We have evaluated PIN using 72 3D ultrasound images from fetal screening examinations. PIN achieves quantitatively an average landmark localisation error of 5.59mm and a runtime of 0.44s to predict 10 landmarks per volume. Qualitatively, anatomical 2D standard scan planes derived from the predicted landmark locations are visually similar to the clinical ground truth. Source code is publicly available at https://github.com/yuanwei1989/landmark-detection.
Tasks	Multi-Task Learning
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06987v2
PDF	http://arxiv.org/pdf/1806.06987v2.pdf
PWC	https://paperswithcode.com/paper/fast-multiple-landmark-localisation-using-a
Repo	https://github.com/yuanwei1989/landmark-detection
Framework	tf

Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering


Title	Automatic topography of high-dimensional data sets by non-parametric Density Peak clustering
Authors	Maria d’Errico, Elena Facco, Alessandro Laio, Alex Rodriguez
Abstract	Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach for charting data spaces, providing a topography of the probability distribution from which the data are harvested. This topography includes information on the number and the height of the probability peaks, the depth of the “valleys” separating them, the relative location of the peaks and their hierarchical organization. The topography is reconstructed by using an unsupervised variant of Density Peak clustering exploiting a non-parametric density estimator, which automatically measures the density in the manifold containing the data. Importantly, the density estimator provides an estimate of the error. This is a key feature, which allows distinguishing genuine probability peaks from density fluctuations due to finite sampling.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10549v1
PDF	http://arxiv.org/pdf/1802.10549v1.pdf
PWC	https://paperswithcode.com/paper/automatic-topography-of-high-dimensional-data
Repo	https://github.com/alexdepremia/Advanced-Density-Peaks
Framework	none

Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification


Title	Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification
Authors	Zhenpeng Chen, Sheng Shen, Ziniu Hu, Xuan Lu, Qiaozhu Mei, Xuanzhe Liu
Abstract	Sentiment classification typically relies on a large amount of labeled data. In practice, the availability of labels is highly imbalanced among different languages, e.g., more English texts are labeled than texts in any other languages, which creates a considerable inequality in the quality of related information services received by users speaking different languages. To tackle this problem, cross-lingual sentiment classification approaches aim to transfer knowledge learned from one language that has abundant labeled examples (i.e., the source language, usually English) to another language with fewer labels (i.e., the target language). The source and the target languages are usually bridged through off-the-shelf machine translation tools. Through such a channel, cross-language sentiment patterns can be successfully learned from English and transferred into the target languages. This approach, however, often fails to capture sentiment knowledge specific to the target language, and thus compromises the accuracy of the downstream classification task. In this paper, we employ emojis, which are widely available in many languages, as a new channel to learn both the cross-language and the language-specific sentiment patterns. We propose a novel representation learning method that uses emoji prediction as an instrument to learn respective sentiment-aware representations for each language. The learned representations are then integrated to facilitate cross-lingual sentiment classification. The proposed method demonstrates state-of-the-art performance on benchmark datasets, which is sustained even when sentiment labels are scarce.
Tasks	Machine Translation, Representation Learning, Sentiment Analysis
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02557v2
PDF	http://arxiv.org/pdf/1806.02557v2.pdf
PWC	https://paperswithcode.com/paper/emoji-powered-representation-learning-for
Repo	https://github.com/sInceraSs/ELSA
Framework	tf

Relational Autoencoder for Feature Extraction


Title	Relational Autoencoder for Feature Extraction
Authors	Qinxue Meng, Daniel Catchpoole, David Skillicorn, Paul J. Kennedy
Abstract	Feature extraction becomes increasingly important as data grows high dimensional. Autoencoder as a neural network based feature extraction method achieves great success in generating abstract features of high dimensional data. However, it fails to consider the relationships of data samples which may affect experimental results of using original and new features. In this paper, we propose a Relation Autoencoder model considering both data features and their relationships. We also extend it to work with other major autoencoder models including Sparse Autoencoder, Denoising Autoencoder and Variational Autoencoder. The proposed relational autoencoder models are evaluated on a set of benchmark datasets and the experimental results show that considering data relationships can generate more robust features which achieve lower construction loss and then lower error rate in further classification compared to the other variants of autoencoders.
Tasks	Denoising, Skeleton Based Action Recognition
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03145v1
PDF	http://arxiv.org/pdf/1802.03145v1.pdf
PWC	https://paperswithcode.com/paper/relational-autoencoder-for-feature-extraction
Repo	https://github.com/ser-art/RAE-vs-AE
Framework	none

Averaging Weights Leads to Wider Optima and Better Generalization


Title	Averaging Weights Leads to Wider Optima and Better Generalization
Authors	Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson
Abstract	Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much flatter solutions than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and Shake-Shake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.
Tasks	Image Classification, Stochastic Optimization
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05407v3
PDF	http://arxiv.org/pdf/1803.05407v3.pdf
PWC	https://paperswithcode.com/paper/averaging-weights-leads-to-wider-optima-and
Repo	https://github.com/kristpapadopoulos/keras-stochastic-weight-averaging
Framework	none

Revisiting Decomposable Submodular Function Minimization with Incidence Relations


Title	Revisiting Decomposable Submodular Function Minimization with Incidence Relations
Authors	Pan Li, Olgica Milenkovic
Abstract	We introduce a new approach to decomposable submodular function minimization (DSFM) that exploits incidence relations. Incidence relations describe which variables effectively influence the component functions, and when properly utilized, they allow for improving the convergence rates of DSFM solvers. Our main results include the precise parametrization of the DSFM problem based on incidence relations, the development of new scalable alternative projections and parallel coordinate descent methods and an accompanying rigorous analysis of their convergence rates.
Tasks
Published	2018-03-10
URL	http://arxiv.org/abs/1803.03851v3
PDF	http://arxiv.org/pdf/1803.03851v3.pdf
PWC	https://paperswithcode.com/paper/revisiting-decomposable-submodular-function
Repo	https://github.com/lipan00123/DSFM-with-incidence-relations
Framework	none


Title	How To Extract Fashion Trends From Social Media? A Robust Object Detector With Support For Unsupervised Learning
Authors	Vijay Gabale, Anand Prabhu Subramanian
Abstract	With the proliferation of social media, fashion inspired from celebrities, reputed designers as well as fashion influencers has shortened the cycle of fashion design and manufacturing. However, with the explosion of fashion related content and large number of user generated fashion photos, it is an arduous task for fashion designers to wade through social media photos and create a digest of trending fashion. This necessitates deep parsing of fashion photos on social media to localize and classify multiple fashion items from a given fashion photo. While object detection competitions such as MSCOCO have thousands of samples for each of the object categories, it is quite difficult to get large labeled datasets for fast fashion items. Moreover, state-of-the-art object detectors do not have any functionality to ingest large amount of unlabeled data available on social media in order to fine tune object detectors with labeled datasets. In this work, we show application of a generic object detector, that can be pretrained in an unsupervised manner, on 24 categories from recently released Open Images V4 dataset. We first train the base architecture of the object detector using unsupervisd learning on 60K unlabeled photos from 24 categories gathered from social media, and then subsequently fine tune it on 8.2K labeled photos from Open Images V4 dataset. On 300 X 300 image inputs, we achieve 72.7% mAP on a test dataset of 2.4K photos while performing 11% to 17% better as compared to the state-of-the-art object detectors. We show that this improvement is due to our choice of architecture that lets us do unsupervised learning and that performs significantly better in identifying small objects.
Tasks	Object Detection
Published	2018-06-28
URL	http://arxiv.org/abs/1806.10787v1
PDF	http://arxiv.org/pdf/1806.10787v1.pdf
PWC	https://paperswithcode.com/paper/how-to-extract-fashion-trends-from-social
Repo	https://github.com/trhgu/awesome-fashion-contents
Framework	none

Target Contrastive Pessimistic Discriminant Analysis


Title	Target Contrastive Pessimistic Discriminant Analysis
Authors	Wouter M. Kouw, Marco Loog
Abstract	Domain-adaptive classifiers learn from a source domain and aim to generalize to a target domain. If the classifier’s assumptions on the relationship between domains (e.g. covariate shift) are valid, then it will usually outperform a non-adaptive source classifier. Unfortunately, it can perform substantially worse when its assumptions are invalid. Validating these assumptions requires labeled target samples, which are usually not available. We argue that, in order to make domain-adaptive classifiers more practical, it is necessary to focus on robust methods; robust in the sense that the model still achieves a particular level of performance without making strong assumptions on the relationship between domains. With this objective in mind, we formulate a conservative parameter estimator that only deviates from the source classifier when a lower or equal risk is guaranteed for all possible labellings of the given target samples. We derive the corresponding estimator for a discriminant analysis model, and show that its risk is actually strictly smaller than that of the source classifier. Experiments indicate that our classifier outperforms state-of-the-art classifiers for geographically biased samples.
Tasks
Published	2018-06-21
URL	http://arxiv.org/abs/1806.09463v1
PDF	http://arxiv.org/pdf/1806.09463v1.pdf
PWC	https://paperswithcode.com/paper/target-contrastive-pessimistic-discriminant
Repo	https://github.com/wmkouw/tcpr
Framework	none

Taxi Demand-Supply Forecasting: Impact of Spatial Partitioning on the Performance of Neural Networks


Title	Taxi Demand-Supply Forecasting: Impact of Spatial Partitioning on the Performance of Neural Networks
Authors	Neema Davis, Gaurav Raina, Krishna Jagannathan
Abstract	In this paper, we investigate the significance of choosing an appropriate tessellation strategy for a spatio-temporal taxi demand-supply modeling framework. Our study compares (i) the variable-sized polygon based Voronoi tessellation, and (ii) the fixed-sized grid based Geohash tessellation, using taxi demand-supply GPS data for the cities of Bengaluru, India and New York, USA. Long Short-Term Memory (LSTM) networks are used for modeling and incorporating information from spatial neighbors into the model. We find that the LSTM model based on input features extracted from a variable-sized polygon tessellation yields superior performance over the LSTM model based on fixed-sized grid tessellation. Our study highlights the need to explore multiple spatial partitioning techniques for improving the prediction performance in neural network models.
Tasks
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03699v1
PDF	http://arxiv.org/pdf/1812.03699v1.pdf
PWC	https://paperswithcode.com/paper/taxi-demand-supply-forecasting-impact-of
Repo	https://github.com/R4h4/AIforSEA_Traffic_Management
Framework	none

Bag of Tricks for Image Classification with Convolutional Neural Networks


Title	Bag of Tricks for Image Classification with Convolutional Neural Networks
Authors	Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li
Abstract	Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. We will show that, by combining these refinements together, we are able to improve various CNN models significantly. For example, we raise ResNet-50’s top-1 validation accuracy from 75.3% to 79.29% on ImageNet. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation.
Tasks	Image Classification, Object Detection, Semantic Segmentation, Transfer Learning
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01187v2
PDF	http://arxiv.org/pdf/1812.01187v2.pdf
PWC	https://paperswithcode.com/paper/bag-of-tricks-for-image-classification-with
Repo	https://github.com/sherdencooper/tricks-in-deeplearning
Framework	tf

LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks


Title	LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks
Authors	Luca Caltagirone, Mauro Bellone, Lennart Svensson, Mattias Wahde
Abstract	In this work, a deep learning approach has been developed to carry out road detection by fusing LIDAR point clouds and camera images. An unstructured and sparse point cloud is first projected onto the camera image plane and then upsampled to obtain a set of dense 2D images encoding spatial information. Several fully convolutional neural networks (FCNs) are then trained to carry out road detection, either by using data from a single sensor, or by using three fusion strategies: early, late, and the newly proposed cross fusion. Whereas in the former two fusion approaches, the integration of multimodal information is carried out at a predefined depth level, the cross fusion FCN is designed to directly learn from data where to integrate information; this is accomplished by using trainable cross connections between the LIDAR and the camera processing branches. To further highlight the benefits of using a multimodal system for road detection, a data set consisting of visually challenging scenes was extracted from driving sequences of the KITTI raw data set. It was then demonstrated that, as expected, a purely camera-based FCN severely underperforms on this data set. A multimodal system, on the other hand, is still able to provide high accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI road benchmark where it achieved excellent performance, with a MaxF score of 96.03%, ranking it among the top-performing approaches.
Tasks
Published	2018-09-21
URL	http://arxiv.org/abs/1809.07941v1
PDF	http://arxiv.org/pdf/1809.07941v1.pdf
PWC	https://paperswithcode.com/paper/lidar-camera-fusion-for-road-detection-using
Repo	https://github.com/Shuijing725/ece498sm_project
Framework	tf


Title	A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models
Authors	Beilun Wang, Arshdeep Sekhon, Yanjun Qi
Abstract	We consider the problem of including additional knowledge in estimating sparse Gaussian graphical models (sGGMs) from aggregated samples, arising often in bioinformatics and neuroimaging applications. Previous joint sGGM estimators either fail to use existing knowledge or cannot scale-up to many tasks (large $K$) under a high-dimensional (large $p$) situation. In this paper, we propose a novel \underline{J}oint \underline{E}lementary \underline{E}stimator incorporating additional \underline{K}nowledge (JEEK) to infer multiple related sparse Gaussian Graphical models from large-scale heterogeneous data. Using domain knowledge as weights, we design a novel hybrid norm as the minimization objective to enforce the superposition of two weighted sparsity constraints, one on the shared interactions and the other on the task-specific structural patterns. This enables JEEK to elegantly consider various forms of existing knowledge based on the domain at hand and avoid the need to design knowledge-specific optimization. JEEK is solved through a fast and entry-wise parallelizable solution that largely improves the computational efficiency of the state-of-the-art $O(p^5K^4)$ to $O(p^2K^4)$. We conduct a rigorous statistical analysis showing that JEEK achieves the same convergence rate $O(\log(Kp)/n_{tot})$ as the state-of-the-art estimators that are much harder to compute. Empirically, on multiple synthetic datasets and two real-world data, JEEK outperforms the speed of the state-of-arts significantly while achieving the same level of prediction accuracy. Available as R tool @ http://jointnets.org/
Tasks
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00548v4
PDF	http://arxiv.org/pdf/1806.00548v4.pdf
PWC	https://paperswithcode.com/paper/a-fast-and-scalable-joint-estimator-for-1
Repo	https://github.com/QData/JEEK
Framework	none

One-to-one Mapping between Stimulus and Neural State: Memory and Classification


Title	One-to-one Mapping between Stimulus and Neural State: Memory and Classification
Authors	Sizhong Lan
Abstract	Synaptic strength can be seen as probability to propagate impulse, and according to synaptic plasticity, function could exist from propagation activity to synaptic strength. If the function satisfies constraints such as continuity and monotonicity, neural network under external stimulus will always go to fixed point, and there could be one-to-one mapping between external stimulus and synaptic strength at fixed point. In other words, neural network “memorizes” external stimulus in its synapses. A biological classifier is proposed to utilize this mapping.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09001v6
PDF	http://arxiv.org/pdf/1805.09001v6.pdf
PWC	https://paperswithcode.com/paper/one-to-one-mapping-between-stimulus-and
Repo	https://github.com/lansiz/neuron
Framework	none

Real-Time MDNet


Title	Real-Time MDNet
Authors	Ilchae Jung, Jeany Son, Mooyeol Baek, Bohyung Han
Abstract	We present a fast and accurate visual tracking algorithm based on the multi-domain convolutional neural network (MDNet). The proposed approach accelerates feature extraction procedure and learns more discriminative models for instance classification; it enhances representation quality of target and background by maintaining a high resolution feature map with a large receptive field per activation. We also introduce a novel loss term to differentiate foreground instances across multiple domains and learn a more discriminative embedding of target objects with similar semantics. The proposed techniques are integrated into the pipeline of a well known CNN-based visual tracking algorithm, MDNet. We accomplish approximately 25 times speed-up with almost identical accuracy compared to MDNet. Our algorithm is evaluated in multiple popular tracking benchmark datasets including OTB2015, UAV123, and TempleColor, and outperforms the state-of-the-art real-time tracking methods consistently even without dataset-specific parameter tuning.
Tasks	Visual Tracking
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08834v1
PDF	http://arxiv.org/pdf/1808.08834v1.pdf
PWC	https://paperswithcode.com/paper/real-time-mdnet
Repo	https://github.com/IlchaeJung/RT-MDNet.git
Framework	none

Empirical Analysis of Foundational Distinctions in Linked Open Data


Title	Empirical Analysis of Foundational Distinctions in Linked Open Data
Authors	Luigi Asprino, Valerio Basile, Paolo Ciancarini, Valentina Presutti
Abstract	The Web and its Semantic extension (i.e. Linked Open Data) contain open global-scale knowledge and make it available to potentially intelligent machines that want to benefit from it. Nevertheless, most of Linked Open Data lack ontological distinctions and have sparse axiomatisation. For example, distinctions such as whether an entity is inherently a class or an individual, or whether it is a physical object or not, are hardly expressed in the data, although they have been largely studied and formalised by foundational ontologies (e.g. DOLCE, SUMO). These distinctions belong to common sense too, which is relevant for many artificial intelligence tasks such as natural language understanding, scene recognition, and the like. There is a gap between foundational ontologies, that often formalise or are inspired by pre-existing philosophical theories and are developed with a top-down approach, and Linked Open Data that mostly derive from existing databases or crowd-based effort (e.g. DBpedia, Wikidata). We investigate whether machines can learn foundational distinctions over Linked Open Data entities, and if they match common sense. We want to answer questions such as “does the DBpedia entity for dog refer to a class or to an instance?". We report on a set of experiments based on machine learning and crowdsourcing that show promising results.
Tasks	Common Sense Reasoning, Scene Recognition
Published	2018-03-26
URL	http://arxiv.org/abs/1803.09840v2
PDF	http://arxiv.org/pdf/1803.09840v2.pdf
PWC	https://paperswithcode.com/paper/empirical-analysis-of-foundational
Repo	https://github.com/fdistinctions/ijcai18
Framework	none