October 18, 2019

3159 words 15 mins read

Paper Group ANR 419

Cats or CAT scans: transfer learning from natural or medical image source datasets?. Learning Noise-Invariant Representations for Robust Speech Recognition. Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging. On the Diachronic Stability of Irregularity in Inflectional Morph …

Cats or CAT scans: transfer learning from natural or medical image source datasets?


Title	Cats or CAT scans: transfer learning from natural or medical image source datasets?
Authors	Veronika Cheplygina
Abstract	Transfer learning is a widely used strategy in medical image analysis. Instead of only training a network with a limited amount of data from the target task of interest, we can first train the network with other, potentially larger source datasets, creating a more robust model. The source datasets do not have to be related to the target task. For a classification task in lung CT images, we could use both head CT images, or images of cats, as the source. While head CT images appear more similar to lung CT images, the number and diversity of cat images might lead to a better model overall. In this survey we review a number of papers that have performed similar comparisons. Although the answer to which strategy is best seems to be “it depends”, we discuss a number of research directions we need to take as a community, to gain more understanding of this topic.
Tasks	Transfer Learning
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05444v2
PDF	http://arxiv.org/pdf/1810.05444v2.pdf
PWC	https://paperswithcode.com/paper/cats-or-cat-scans-transfer-learning-from
Repo
Framework

Learning Noise-Invariant Representations for Robust Speech Recognition


Title	Learning Noise-Invariant Representations for Robust Speech Recognition
Authors	Davis Liang, Zhiheng Huang, Zachary C. Lipton
Abstract	Despite rapid advances in speech recognition, current models remain brittle to superficial perturbations to their inputs. Small amounts of noise can destroy the performance of an otherwise state-of-the-art model. To harden models against background noise, practitioners often perform data augmentation, adding artificially-noised examples to the training set, carrying over the original label. In this paper, we hypothesize that a clean example and its superficially perturbed counterparts shouldn’t merely map to the same class — they should map to the same representation. We propose invariant-representation-learning (IRL): At each training iteration, for each training example,we sample a noisy counterpart. We then apply a penalty term to coerce matched representations at each layer (above some chosen layer). Our key results, demonstrated on the Librispeech dataset are the following: (i) IRL significantly reduces character error rates (CER) on both ‘clean’ (3.3% vs 6.5%) and ‘other’ (11.0% vs 18.1%) test sets; (ii) on several out-of-domain noise settings (different from those seen during training), IRL’s benefits are even more pronounced. Careful ablations confirm that our results are not simply due to shrinking activations at the chosen layers.
Tasks	Data Augmentation, Representation Learning, Robust Speech Recognition, Speech Recognition
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06610v1
PDF	http://arxiv.org/pdf/1807.06610v1.pdf
PWC	https://paperswithcode.com/paper/learning-noise-invariant-representations-for
Repo
Framework

Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging


Title	Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging
Authors	Z. Alaverdyan, C. Lartizien
Abstract	Computer aided diagnosis (CAD) systems are designed to assist clinicians in various tasks, including highlighting abnormal regions in a medical image. A common approach consists in training a voxel-level binary classifier on a set of feature vectors extracted from normal and pathological areas in patients’ scans. However, many pathologies (such as epilepsy) are characterized by lesions that may be located anywhere in the brain, have various shapes, sizes and texture. An adequate representation of such a heterogeneity requires a significant amount of annotated data which is a major issue in the medical domain. Therefore, we built on a previously proposed approach that considers epilepsy lesion detection task as a voxel-level outlier detection problem. It consists in building a oc-SVM classifier for each voxel in the brain volume using a small number of clinically-guided features El Azami et al., 2016. Our goal in this study is to make a step forward by replacing the handcrafted features with automatically learnt representations using neural networks. We propose a novel version of siamese networks trained on patches extracted from healthy patients’ scans only. This network, composed of stacked autoencoders as subnetworks, is regularized by the reconstruction error of the patches. It is designed to learn representations that bring patches centered at the same voxel localization ‘closer’ with respect to the chosen metric (i.e. cosine). Finally, the middle layer representations of the subnetworks are fed to oc-SVM classifiers at voxel-level. The method is validated on 3 patients’ MRI scans with confirmed epilepsy lesions and shows a promising performance.
Tasks	Outlier Detection
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01717v1
PDF	http://arxiv.org/pdf/1805.01717v1.pdf
PWC	https://paperswithcode.com/paper/feature-extraction-with-regularized-siamese
Repo
Framework

On the Diachronic Stability of Irregularity in Inflectional Morphology


Title	On the Diachronic Stability of Irregularity in Inflectional Morphology
Authors	Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner
Abstract	Many languages’ inflectional morphological systems are replete with irregulars, i.e., words that do not seem to follow standard inflectional rules. In this work, we quantitatively investigate the conditions under which irregulars can survive in a language over the course of time. Using recurrent neural networks to simulate language learners, we test the diachronic relation between frequency of words and their irregularity.
Tasks
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08262v1
PDF	http://arxiv.org/pdf/1804.08262v1.pdf
PWC	https://paperswithcode.com/paper/on-the-diachronic-stability-of-irregularity
Repo
Framework

An Evaluation of Classification and Outlier Detection Algorithms


Title	An Evaluation of Classification and Outlier Detection Algorithms
Authors	Victoria J. Hodge, Jim Austin
Abstract	This paper evaluates algorithms for classification and outlier detection accuracies in temporal data. We focus on algorithms that train and classify rapidly and can be used for systems that need to incorporate new data regularly. Hence, we compare the accuracy of six fast algorithms using a range of well-known time-series datasets. The analyses demonstrate that the choice of algorithm is task and data specific but that we can derive heuristics for choosing. Gradient Boosting Machines are generally best for classification but there is no single winner for outlier detection though Gradient Boosting Machines (again) and Random Forest are better. Hence, we recommend running evaluations of a number of algorithms using our heuristics.
Tasks	Outlier Detection, Time Series
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00811v1
PDF	http://arxiv.org/pdf/1805.00811v1.pdf
PWC	https://paperswithcode.com/paper/an-evaluation-of-classification-and-outlier
Repo
Framework

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis


Title	Combination of Domain Knowledge and Deep Learning for Sentiment Analysis
Authors	Khuong Vo, Dang Pham, Mao Nguyen, Trung Mai, Tho Quan
Abstract	The emerging technique of deep learning has been widely applied in many different areas. However, when adopted in a certain specific domain, this technique should be combined with domain knowledge to improve efficiency and accuracy. In particular, when analyzing the applications of deep learning in sentiment analysis, we found that the current approaches are suffering from the following drawbacks: (i) the existing works have not paid much attention to the importance of different types of sentiment terms, which is an important concept in this area; and (ii) the loss function currently employed does not well reflect the degree of error of sentiment misclassification. To overcome such problem, we propose to combine domain knowledge with deep learning. Our proposal includes using sentiment scores, learnt by quadratic programming, to augment training data; and introducing the penalty matrix for enhancing the loss function of cross entropy. When experimented, we achieved a significant improvement in classification results.
Tasks	Sentiment Analysis
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08760v3
PDF	http://arxiv.org/pdf/1806.08760v3.pdf
PWC	https://paperswithcode.com/paper/combination-of-domain-knowledge-and-deep-1
Repo
Framework

Clustering and Labelling Auction Fraud Data


Title	Clustering and Labelling Auction Fraud Data
Authors	Ahmad Alzahrani, Samira Sadaoui
Abstract	Although shill bidding is a common auction fraud, it is however very tough to detect. Due to the unavailability and lack of training data, in this study, we build a high-quality labeled shill bidding dataset based on recently collected auctions from eBay. Labeling shill biding instances with multidimensional features is a critical phase for the fraud classification task. For this purpose, we introduce a new approach to systematically label the fraud data with the help of the hierarchical clustering CURE that returns remarkable results as illustrated in the experiments.
Tasks
Published	2018-08-22
URL	http://arxiv.org/abs/1808.07288v1
PDF	http://arxiv.org/pdf/1808.07288v1.pdf
PWC	https://paperswithcode.com/paper/clustering-and-labelling-auction-fraud-data
Repo
Framework

Pixel personality for dense object tracking in a 2D honeybee hive


Title	Pixel personality for dense object tracking in a 2D honeybee hive
Authors	Katarzyna Bozek, Laetitia Hebert, Alexander S Mikheyev, Greg J Stephens
Abstract	Tracking large numbers of densely-arranged, interacting objects is challenging due to occlusions and the resulting complexity of possible trajectory combinations, as well as the sparsity of relevant, labeled datasets. Here we describe a novel technique of collective tracking in the model environment of a 2D honeybee hive in which sample colonies consist of $N\sim10^3$ highly similar individuals, tightly packed, and in rapid, irregular motion. Such a system offers universal challenges for multi-object tracking, while being conveniently accessible for image recording. We first apply an accurate, segmentation-based object detection method to build initial short trajectory segments by matching object configurations based on class, position and orientation. We then join these tracks into full single object trajectories by creating an object recognition model which is adaptively trained to recognize honeybee individuals through their visual appearance across multiple frames, an attribute we denote as pixel personality. Overall, we reconstruct ~46% of the trajectories in 5 min recordings from two different hives and over 71% of the tracks for at least 2 min. We provide validated trajectories spanning 3000 video frames of 876 unmarked moving bees in two distinct colonies in different locations and filmed with different pixel resolutions, which we expect to be useful in the further development of general-purpose tracking solutions.
Tasks	Multi-Object Tracking, Object Detection, Object Recognition, Object Tracking
Published	2018-12-31
URL	http://arxiv.org/abs/1812.11797v1
PDF	http://arxiv.org/pdf/1812.11797v1.pdf
PWC	https://paperswithcode.com/paper/pixel-personality-for-dense-object-tracking
Repo
Framework

Auto-adaptive Resonance Equalization using Dilated Residual Networks


Title	Auto-adaptive Resonance Equalization using Dilated Residual Networks
Authors	Maarten Grachten, Emmanuel Deruty, Alexandre Tanguy
Abstract	In music and audio production, attenuation of spectral resonances is an important step towards a technically correct result. In this paper we present a two-component system to automate the task of resonance equalization. The first component is a dynamic equalizer that automatically detects resonances and offers to attenuate them by a user-specified factor. The second component is a deep neural network that predicts the optimal attenuation factor based on the windowed audio. The network is trained and validated on empirical data gathered from an experiment in which sound engineers choose their preferred attenuation factors for a set of tracks. We test two distinct network architectures for the predictive model and find that a dilated residual network operating directly on the audio signal is on a par with a network architecture that requires a prior audio feature extraction stage. Both architectures predict human-preferred resonance attenuation factors significantly better than a baseline approach.
Tasks
Published	2018-07-23
URL	http://arxiv.org/abs/1807.08636v1
PDF	http://arxiv.org/pdf/1807.08636v1.pdf
PWC	https://paperswithcode.com/paper/auto-adaptive-resonance-equalization-using
Repo
Framework


Title	A Cross-Modal Distillation Network for Person Re-identification in RGB-Depth
Authors	Frank Hafner, Amran Bhuiyan, Julian F. P. Kooij, Eric Granger
Abstract	Person re-identification involves the recognition over time of individuals captured using multiple distributed sensors. With the advent of powerful deep learning methods able to learn discriminant representations for visual recognition, cross-modal person re-identification based on different sensor modalities has become viable in many challenging applications in, e.g., autonomous driving, robotics and video surveillance. Although some methods have been proposed for re-identification between infrared and RGB images, few address depth and RGB images. In addition to the challenges for each modality associated with occlusion, clutter, misalignment, and variations in pose and illumination, there is a considerable shift across modalities since data from RGB and depth images are heterogeneous. In this paper, a new cross-modal distillation network is proposed for robust person re-identification between RGB and depth sensors. Using a two-step optimization process, the proposed method transfers supervision between modalities such that similar structural features are extracted from both RGB and depth modalities, yielding a discriminative mapping to a common feature space. Our experiments investigate the influence of the dimensionality of the embedding space, compares transfer learning from depth to RGB and vice versa, and compares against other state-of-the-art cross-modal re-identification methods. Results obtained with BIWI and RobotPKU datasets indicate that the proposed method can successfully transfer descriptive structural features from the depth modality to the RGB modality. It can significantly outperform state-of-the-art conventional methods and deep neural networks for cross-modal sensing between RGB and depth, with no impact on computational complexity.
Tasks	Autonomous Driving, Cross-Modal Person Re-Identification, Person Re-Identification, Transfer Learning
Published	2018-10-27
URL	http://arxiv.org/abs/1810.11641v2
PDF	http://arxiv.org/pdf/1810.11641v2.pdf
PWC	https://paperswithcode.com/paper/a-cross-modal-distillation-network-for-person
Repo
Framework

Convolutional Recurrent Predictor: Implicit Representation for Multi-target Filtering and Tracking


Title	Convolutional Recurrent Predictor: Implicit Representation for Multi-target Filtering and Tracking
Authors	Mehryar Emambakhsh, Alessandro Bay, Eduard Vazquez
Abstract	Defining a multi-target motion model, which is an important step of tracking algorithms, can be very challenging. Using fixed models (as in several generative Bayesian algorithms, such as Kalman filters) can fail to accurately predict sophisticated target motions. On the other hand, sequential learning of the motion model (for example, using recurrent neural networks) can be computationally complex and difficult due to the variable unknown number of targets. In this paper, we propose a multi-target filtering and tracking algorithm which learns the motion model, simultaneously for all targets, from an implicitly represented state map and performs spatio-temporal data prediction. To this end, the multi-target state is modelled over a continuous hypothetical target space, using random finite sets and Gaussian mixture probability hypothesis density formulations. The prediction step is recursively performed using a deep convolutional recurrent neural network with a long short-term memory architecture, which is trained as a regression block, on the fly, over “probability density difference” maps. Our approach is evaluated over widely used pedestrian tracking benchmarks, remarkably outperforming state-of-the-art multi-target filtering algorithms, while giving competitive results when compared with other tracking approaches: The proposed approach generates an average 40.40 and 62.29 optimal sub-pattern assignment (OSPA) errors on MOT15 and MOT16/17 datasets, respectively, while producing 62.0%, 70.0% and 66.9% multi-object tracking accuracy (MOTA) on MOT16/17, PNNL Parking Lot and PETS09 pedestrian tracking datasets, respectively, when publicly available detectors are used.
Tasks	Multi-Object Tracking, Object Tracking
Published	2018-11-01
URL	https://arxiv.org/abs/1811.00313v3
PDF	https://arxiv.org/pdf/1811.00313v3.pdf
PWC	https://paperswithcode.com/paper/convolutional-recurrent-predictor-implicit
Repo
Framework

Fast Kernel Approximations for Latent Force Models and Convolved Multiple-Output Gaussian processes


Title	Fast Kernel Approximations for Latent Force Models and Convolved Multiple-Output Gaussian processes
Authors	Cristian Guarnizo, Mauricio A. Álvarez
Abstract	A latent force model is a Gaussian process with a covariance function inspired by a differential operator. Such covariance function is obtained by performing convolution integrals between Green’s functions associated to the differential operators, and covariance functions associated to latent functions. In the classical formulation of latent force models, the covariance functions are obtained analytically by solving a double integral, leading to expressions that involve numerical solutions of different types of error functions. In consequence, the covariance matrix calculation is considerably expensive, because it requires the evaluation of one or more of these error functions. In this paper, we use random Fourier features to approximate the solution of these double integrals obtaining simpler analytical expressions for such covariance functions. We show experimental results using ordinary differential operators and provide an extension to build general kernel functions for convolved multiple output Gaussian processes.
Tasks	Gaussian Processes
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07460v1
PDF	http://arxiv.org/pdf/1805.07460v1.pdf
PWC	https://paperswithcode.com/paper/fast-kernel-approximations-for-latent-force
Repo
Framework

Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios


Title	Deconvolutional Networks for Point-Cloud Vehicle Detection and Tracking in Driving Scenarios
Authors	Victor Vaquero, Ivan del Pino, Francesc Moreno-Noguer, Joan Solà, Alberto Sanfeliu, Juan Andrade-Cetto
Abstract	Vehicle detection and tracking is a core ingredient for developing autonomous driving applications in urban scenarios. Recent image-based Deep Learning (DL) techniques are obtaining breakthrough results in these perceptive tasks. However, DL research has not yet advanced much towards processing 3D point clouds from lidar range-finders. These sensors are very common in autonomous vehicles since, despite not providing as semantically rich information as images, their performance is more robust under harsh weather conditions than vision sensors. In this paper we present a full vehicle detection and tracking system that works with 3D lidar information only. Our detection step uses a Convolutional Neural Network (CNN) that receives as input a featured representation of the 3D information provided by a Velodyne HDL-64 sensor and returns a per-point classification of whether it belongs to a vehicle or not. The classified point cloud is then geometrically processed to generate observations for a multi-object tracking system implemented via a number of Multi-Hypothesis Extended Kalman Filters (MH-EKF) that estimate the position and velocity of the surrounding vehicles. The system is thoroughly evaluated on the KITTI tracking dataset, and we show the performance boost provided by our CNN-based vehicle detector over a standard geometric approach. Our lidar-based approach uses about a 4% of the data needed for an image-based detector with similarly competitive results.
Tasks	Autonomous Driving, Autonomous Vehicles, Multi-Object Tracking, Object Tracking
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07935v1
PDF	http://arxiv.org/pdf/1808.07935v1.pdf
PWC	https://paperswithcode.com/paper/deconvolutional-networks-for-point-cloud
Repo
Framework

Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem


Title	Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem
Authors	Alon Brutzkus, Amir Globerson
Abstract	Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we provide theoretical and empirical evidence that, in certain cases, overparameterized convolutional networks generalize better than small networks because of an interplay between weight clustering and feature exploration at initialization. We demonstrate this theoretically for a 3-layer convolutional neural network with max-pooling, in a novel setting which extends the XOR problem. We show that this interplay implies that with overparamterization, gradient descent converges to global minima with better generalization performance compared to global minima of small networks. Empirically, we demonstrate these phenomena for a 3-layer convolutional neural network in the MNIST task.
Tasks
Published	2018-10-06
URL	http://arxiv.org/abs/1810.03037v2
PDF	http://arxiv.org/pdf/1810.03037v2.pdf
PWC	https://paperswithcode.com/paper/why-do-larger-models-generalize-better-a
Repo
Framework

Challenging Images For Minds and Machines


Title	Challenging Images For Minds and Machines
Authors	Amir Rosenfeld, John K. Tsotsos
Abstract	There is no denying the tremendous leap in the performance of machine learning methods in the past half-decade. Some might even say that specific sub-fields in pattern recognition, such as machine-vision, are as good as solved, reaching human and super-human levels. Arguably, lack of training data and computation power are all that stand between us and solving the remaining ones. In this position paper we underline cases in vision which are challenging to machines and even to human observers. This is to show limitations of contemporary models that are hard to ameliorate by following the current trend to increase training data, network capacity or computational power. Moreover, we claim that attempting to do so is in principle a suboptimal approach. We provide a taster of such examples in hope to encourage and challenge the machine learning community to develop new directions to solve the said difficulties.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04834v1
PDF	http://arxiv.org/pdf/1802.04834v1.pdf
PWC	https://paperswithcode.com/paper/challenging-images-for-minds-and-machines
Repo
Framework