October 17, 2019

3528 words 17 mins read

Paper Group ANR 817

Supervised classification of Dermatological diseases by Deep learning. RULLS: Randomized Union of Locally Linear Subspaces for Feature Engineering. Object Ordering with Bidirectional Matchings for Visual Reasoning. 3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification. EgoReID Dataset: Person Re-identification …

Supervised classification of Dermatological diseases by Deep learning


Title	Supervised classification of Dermatological diseases by Deep learning
Authors	Sourav Mishra, Toshihiko Yamasaki, Hideaki Imaizumi
Abstract	This paper introduces a deep-learning based efficient classifier for common dermatological conditions, aimed at people without easy access to skin specialists. We report approximately 80% accuracy, in a situation where primary care doctors have attained 57% success rate, according to recent literature. The rationale of its design is centered on deploying and updating it on handheld devices in near future. Dermatological diseases are common in every population and have a wide spectrum in severity. With a shortage of dermatological expertise being observed in several countries, machine learning solutions can augment medical services and advise regarding existence of common diseases. The paper implements supervised classification of nine distinct conditions which have high occurrence in East Asian countries. Our current attempt establishes that deep learning based techniques are viable avenues for preliminary information to aid patients.
Tasks
Published	2018-02-11
URL	http://arxiv.org/abs/1802.03752v3
PDF	http://arxiv.org/pdf/1802.03752v3.pdf
PWC	https://paperswithcode.com/paper/supervised-classification-of-dermatological
Repo
Framework

RULLS: Randomized Union of Locally Linear Subspaces for Feature Engineering


Title	RULLS: Randomized Union of Locally Linear Subspaces for Feature Engineering
Authors	Namita Lokare, Jorge Silva, Ilknur Kaynar Kabul
Abstract	Feature engineering plays an important role in the success of a machine learning model. Most of the effort in training a model goes into data preparation and choosing the right representation. In this paper, we propose a robust feature engineering method, Randomized Union of Locally Linear Subspaces (RULLS). We generate sparse, non-negative, and rotation invariant features in an unsupervised fashion. RULLS aggregates features from a random union of subspaces by describing each point using globally chosen landmarks. These landmarks serve as anchor points for choosing subspaces. Our method provides a way to select features that are relevant in the neighborhood around these chosen landmarks. Distances from each data point to $k$ closest landmarks are encoded in the feature matrix. The final feature representation is a union of features from all chosen subspaces. The effectiveness of our algorithm is shown on various real-world datasets for tasks such as clustering and classification of raw data and in the presence of noise. We compare our method with existing feature generation methods. Results show a high performance of our method on both classification and clustering tasks.
Tasks	Feature Engineering
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09770v1
PDF	http://arxiv.org/pdf/1804.09770v1.pdf
PWC	https://paperswithcode.com/paper/rulls-randomized-union-of-locally-linear
Repo
Framework

Object Ordering with Bidirectional Matchings for Visual Reasoning


Title	Object Ordering with Bidirectional Matchings for Visual Reasoning
Authors	Hao Tan, Mohit Bansal
Abstract	Visual reasoning with compositional natural language instructions, e.g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an accurate mapping between the diverse phrases and the several objects placed in complex arrangements in the image. Further, this mapping needs to be processed to answer the question in the statement given the ordering and relationship of the objects across three similar images. In this paper, we propose a novel end-to-end neural model for the NLVR task, where we first use joint bidirectional attention to build a two-way conditioning between the visual information and the language phrases. Next, we use an RL-based pointer network to sort and process the varying number of unordered objects (so as to match the order of the statement phrases) in each of the three images and then pool over the three decisions. Our model achieves strong improvements (of 4-6% absolute) over the state-of-the-art on both the structured representation and raw image versions of the dataset.
Tasks	Visual Reasoning
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06870v2
PDF	http://arxiv.org/pdf/1804.06870v2.pdf
PWC	https://paperswithcode.com/paper/object-ordering-with-bidirectional-matchings
Repo
Framework

3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification


Title	3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification
Authors	Lin Wu, Yang Wang, Ling Shao, Meng Wang
Abstract	In this paper, we introduce a global video representation to video-based person re-identification (re-ID) that aggregates local 3D features across the entire video extent. Most of the existing methods rely on 2D convolutional networks (ConvNets) to extract frame-wise deep features which are pooled temporally to generate the video-level representations. However, 2D ConvNets lose temporal input information immediately after the convolution, and a separate temporal pooling is limited in capturing human motion in shorter sequences. To this end, we present a \textit{global} video representation (3D PersonVLAD), complementary to 3D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. However, encoding each video frame in its entirety and computing an aggregate global representation across all frames is tremendously challenging due to occlusions and misalignments. To resolve this, our proposed network is further augmented with 3D part alignment module to learn local features through soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3D features are demonstrated to achieve state-of-the-art results on three benchmark datasets: MARS \cite{MARS}, iLIDS-VID \cite{VideoRanking}, and PRID 2011
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2018-12-26
URL	http://arxiv.org/abs/1812.10222v3
PDF	http://arxiv.org/pdf/1812.10222v3.pdf
PWC	https://paperswithcode.com/paper/3d-personvlad-learning-deep-global
Repo
Framework

EgoReID Dataset: Person Re-identification in Videos Acquired by Mobile Devices with First-Person Point-of-View


Title	EgoReID Dataset: Person Re-identification in Videos Acquired by Mobile Devices with First-Person Point-of-View
Authors	Emrah Basaran, Yonatan Tariku Tesfaye, Mubarak Shah
Abstract	In recent years, we have seen the performance of video-based person Re-Identification (ReID) methods have improved considerably. However, most of the work in this area has dealt with videos acquired by fixed cameras with wider field of view. Recently, widespread use of wearable cameras and recording devices such as cellphones have opened the door to interesting research in first-person Point-of-view (POV) videos (egocentric videos). Nonetheless, analysis of such videos is challenging due to factors such as poor video quality due to ego-motion, blurriness, severe changes in lighting conditions and perspective distortions. To facilitate the research towards conquering these challenges, this paper contributes a new dataset called EgoReID. The dataset is captured using 3 mobile cellphones with non-overlapping field-of-view. It contains 900 IDs and around 10,200 tracks with a total of 176,000 detections. The dataset also contains 12-sensor meta data e.g. camera orientation pitch and rotation for each video. In addition, we propose a new framework which takes advantage of both visual and sensor meta data to successfully perform Person ReID. We extend image-based re-ID method employing human body parsing trained on ten datasets to video-based re-ID. In our method, first frame level local features are extracted for each semantic region, then 3D convolutions are applied to encode the temporal information in each sequence of semantic regions. Additionally, we employ sensor meta data to predict targets’ next camera and their estimated time of arrival, which considerably improves our ReID performance as it significantly reduces our search space.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2018-12-22
URL	https://arxiv.org/abs/1812.09570v4
PDF	https://arxiv.org/pdf/1812.09570v4.pdf
PWC	https://paperswithcode.com/paper/egoreid-person-re-identification-in
Repo
Framework

An Infinitesimal Probabilistic Model for Principal Component Analysis of Manifold Valued Data


Title	An Infinitesimal Probabilistic Model for Principal Component Analysis of Manifold Valued Data
Authors	Stefan Sommer
Abstract	We provide a probabilistic and infinitesimal view of how the principal component analysis procedure (PCA) can be generalized to analysis of nonlinear manifold valued data. Starting with the probabilistic PCA interpretation of the Euclidean PCA procedure, we show how PCA can be generalized to manifolds in an intrinsic way that does not resort to linearization of the data space. The underlying probability model is constructed by mapping a Euclidean stochastic process to the manifold using stochastic development of Euclidean semimartingales. The construction uses a connection and bundles of covariant tensors to allow global transport of principal eigenvectors, and the model is thereby an example of how principal fiber bundles can be used to handle the lack of global coordinate system and orientations that characterizes manifold valued statistics. We show how curvature implies non-integrability of the equivalent of Euclidean principal subspaces, and how the stochastic flows provide an alternative to explicit construction of such subspaces. We describe estimation procedures for inference of parameters and prediction of principal components, and we give examples of properties of the model on embedded surfaces.
Tasks
Published	2018-01-31
URL	http://arxiv.org/abs/1801.10341v2
PDF	http://arxiv.org/pdf/1801.10341v2.pdf
PWC	https://paperswithcode.com/paper/an-infinitesimal-probabilistic-model-for
Repo
Framework

Learning Pose Estimation for High-Precision Robotic Assembly Using Simulated Depth Images


Title	Learning Pose Estimation for High-Precision Robotic Assembly Using Simulated Depth Images
Authors	Yuval Litvak, Armin Biess, Aharon Bar-Hillel
Abstract	Most of industrial robotic assembly tasks today require fixed initial conditions for successful assembly. These constraints induce high production costs and low adaptability to new tasks. In this work we aim towards flexible and adaptable robotic assembly by using 3D CAD models for all parts to be assembled. We focus on a generic assembly task - the Siemens Innovation Challenge - in which a robot needs to assemble a gear-like mechanism with high precision into an operating system. To obtain the millimeter-accuracy required for this task and industrial settings alike, we use a depth camera mounted near the robot end-effector. We present a high-accuracy two-stage pose estimation procedure based on deep convolutional neural networks, which includes detection, pose estimation, refinement, and handling of near- and full symmetries of parts. The networks are trained on simulated depth images with means to ensure successful transfer to the real robot. We obtain an average pose estimation error of 2.16 millimeters and 0.64 degree leading to 91% success rate for robotic assembly of randomly distributed parts. To the best of our knowledge, this is the first time that the Siemens Innovation Challenge is fully addressed, with all the parts assembled with high success rates.
Tasks	Pose Estimation
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10699v2
PDF	http://arxiv.org/pdf/1809.10699v2.pdf
PWC	https://paperswithcode.com/paper/learning-a-high-precision-robotic-assembly
Repo
Framework

Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data


Title	Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data
Authors	Arghya Pal, Vineeth N Balasubramanian
Abstract	Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.
Tasks	Multi-Task Learning, Transfer Learning
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05137v1
PDF	http://arxiv.org/pdf/1803.05137v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-data-programming-using-gans-to
Repo
Framework

A continuous-time analysis of distributed stochastic gradient


Title	A continuous-time analysis of distributed stochastic gradient
Authors	Nicholas M. Boffi, Jean-Jacques E. Slotine
Abstract	We analyze the effect of synchronization on distributed stochastic gradient algorithms. By exploiting an analogy with dynamical models of biological quorum sensing – where synchronization between agents is induced through communication with a common signal – we quantify how synchronization can significantly reduce the magnitude of the noise felt by the individual distributed agents and by their spatial mean. This noise reduction is in turn associated with a reduction in the smoothing of the loss function imposed by the stochastic gradient approximation. Through simulations on model non-convex objectives, we demonstrate that coupling can stabilize higher noise levels and improve convergence. We provide a convergence analysis for strongly convex functions by deriving a bound on the expected deviation of the spatial mean of the agents from the global minimizer for an algorithm based on quorum sensing, the same algorithm with momentum, and the Elastic Averaging SGD (EASGD) algorithm. We discuss extensions to new algorithms which allow each agent to broadcast its current measure of success and shape the collective computation accordingly. We supplement our theoretical analysis with numerical experiments on convolutional neural networks trained on the CIFAR-10 dataset, where we note a surprising regularizing property of EASGD even when applied to the non-distributed case. This observation suggests alternative second-order in-time algorithms for non-distributed optimization that are competitive with momentum methods.
Tasks	Distributed Optimization
Published	2018-12-28
URL	https://arxiv.org/abs/1812.10995v3
PDF	https://arxiv.org/pdf/1812.10995v3.pdf
PWC	https://paperswithcode.com/paper/a-continuous-time-analysis-of-distributed
Repo
Framework

Deep Active Learning for Video-based Person Re-identification


Title	Deep Active Learning for Video-based Person Re-identification
Authors	Menglin Wang, Baisheng Lai, Zhongming Jin, Xiaojin Gong, Jianqiang Huang, Xiansheng Hua
Abstract	It is prohibitively expensive to annotate a large-scale video-based person re-identification (re-ID) dataset, which makes fully supervised methods inapplicable to real-world deployment. How to maximally reduce the annotation cost while retaining the re-ID performance becomes an interesting problem. In this paper, we address this problem by integrating an active learning scheme into a deep learning framework. Noticing that the truly matched tracklet-pairs, also denoted as true positives (TP), are the most informative samples for our re-ID model, we propose a sampling criterion to choose the most TP-likely tracklet-pairs for annotation. A view-aware sampling strategy considering view-specific biases is designed to facilitate candidate selection, followed by an adaptive resampling step to leave out the selected candidates that are unnecessary to annotate. Our method learns the re-ID model and updates the annotation set iteratively. The re-ID model is supervised by the tracklets’ pesudo labels that are initialized by treating each tracklet as a distinct class. With the gained annotations of the actively selected candidates, the tracklets’ pesudo labels are updated by label merging and further used to re-train our re-ID model. While being simple, the proposed method demonstrates its effectiveness on three video-based person re-ID datasets. Experimental results show that less than 3% pairwise annotations are needed for our method to reach comparable performance with the fully-supervised setting.
Tasks	Active Learning, Person Re-Identification, Video-Based Person Re-Identification
Published	2018-12-14
URL	http://arxiv.org/abs/1812.05785v1
PDF	http://arxiv.org/pdf/1812.05785v1.pdf
PWC	https://paperswithcode.com/paper/deep-active-learning-for-video-based-person
Repo
Framework


Title	From Social to Individuals: a Parsimonious Path of Multi-level Models for Crowdsourced Preference Aggregation
Authors	Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan Yao
Abstract	In crowdsourced preference aggregation, it is often assumed that all the annotators are subject to a common preference or social utility function which generates their comparison behaviors in experiments. However, in reality annotators are subject to variations due to multi-criteria, abnormal, or a mixture of such behaviors. In this paper, we propose a parsimonious mixed-effects model, which takes into account both the fixed effect that the majority of annotators follows a common linear utility model, and the random effect that some annotators might deviate from the common significantly and exhibit strongly personalized preferences. The key algorithm in this paper establishes a dynamic path from the social utility to individual variations, with different levels of sparsity on personalization. The algorithm is based on the Linearized Bregman Iterations, which leads to easy parallel implementations to meet the need of large-scale data analysis. In this unified framework, three kinds of random utility models are presented, including the basic linear model with L2 loss, Bradley-Terry model, and Thurstone-Mosteller model. The validity of these multi-level models are supported by experiments with both simulated and real-world datasets, which shows that the parsimonious multi-level models exhibit improvements in both interpretability and predictive precision compared with traditional HodgeRank.
Tasks
Published	2018-03-08
URL	http://arxiv.org/abs/1804.11177v1
PDF	http://arxiv.org/pdf/1804.11177v1.pdf
PWC	https://paperswithcode.com/paper/from-social-to-individuals-a-parsimonious
Repo
Framework

The Incremental Proximal Method: A Probabilistic Perspective


Title	The Incremental Proximal Method: A Probabilistic Perspective
Authors	Ömer Deniz Akyildiz, Victor Elvira, Joaquin Miguez
Abstract	In this work, we highlight a connection between the incremental proximal method and stochastic filters. We begin by showing that the proximal operators coincide, and hence can be realized with, Bayes updates. We give the explicit form of the updates for the linear regression problem and show that there is a one-to-one correspondence between the proximal operator of the least-squares regression and the Bayes update when the prior and the likelihood are Gaussian. We then carry out this observation to a general sequential setting: We consider the incremental proximal method, which is an algorithm for large-scale optimization, and show that, for a linear-quadratic cost function, it can naturally be realized by the Kalman filter. We then discuss the implications of this idea for nonlinear optimization problems where proximal operators are in general not realizable. In such settings, we argue that the extended Kalman filter can provide a systematic way for the derivation of practical procedures.
Tasks
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04594v1
PDF	http://arxiv.org/pdf/1807.04594v1.pdf
PWC	https://paperswithcode.com/paper/the-incremental-proximal-method-a
Repo
Framework

Repartitioning of the ComplexWebQuestions Dataset


Title	Repartitioning of the ComplexWebQuestions Dataset
Authors	Alon Talmor, Jonathan Berant
Abstract	Recently, Talmor and Berant (2018) introduced ComplexWebQuestions - a dataset focused on answering complex questions by decomposing them into a sequence of simpler questions and extracting the answer from retrieved web snippets. In their work the authors used a pre-trained reading comprehension (RC) model (Salant and Berant, 2018) to extract the answer from the web snippets. In this short note we show that training a RC model directly on the training data of ComplexWebQuestions reveals a leakage from the training set to the test set that allows to obtain unreasonably high performance. As a solution, we construct a new partitioning of ComplexWebQuestions that does not suffer from this leakage and publicly release it. We also perform an empirical evaluation on these two datasets and show that training a RC model on the training data substantially improves state-of-the-art performance.
Tasks	Reading Comprehension
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09623v1
PDF	http://arxiv.org/pdf/1807.09623v1.pdf
PWC	https://paperswithcode.com/paper/repartitioning-of-the-complexwebquestions
Repo
Framework

Multi-scale 3D Convolution Network for Video Based Person Re-Identification


Title	Multi-scale 3D Convolution Network for Video Based Person Re-Identification
Authors	Jianing Li, Shiliang Zhang, Tiejun Huang
Abstract	This paper proposes a two-stream convolution network to extract spatial and temporal cues for video based person Re-Identification (ReID). A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network. The resulting M3D convolution network introduces a fraction of parameters into the 2D CNN, but gains the ability of multi-scale temporal feature learning. With this compact architecture, M3D convolution network is also more efficient and easier to optimize than existing 3D convolution networks. The temporal stream further involves Residual Attention Layers (RAL) to refine the temporal features. By jointly learning spatial-temporal attention masks in a residual manner, RAL identifies the discriminative spatial regions and temporal cues. The other stream in our network is implemented with a 2D CNN for spatial feature extraction. The spatial and temporal features from two streams are finally fused for the video based person ReID. Evaluations on three widely used benchmarks datasets, i.e., MARS, PRID2011, and iLIDS-VID demonstrate the substantial advantages of our method over existing 3D convolution networks and state-of-art methods.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07468v1
PDF	http://arxiv.org/pdf/1811.07468v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-3d-convolution-network-for-video
Repo
Framework

From Context to Concept: Exploring Semantic Relationships in Music with Word2Vec


Title	From Context to Concept: Exploring Semantic Relationships in Music with Word2Vec
Authors	Ching-Hua Chuan, Kat Agres, Dorien Herremans
Abstract	We explore the potential of a popular distributional semantics vector space model, word2vec, for capturing meaningful relationships in ecological (complex polyphonic) music. More precisely, the skip-gram version of word2vec is used to model slices of music from a large corpus spanning eight musical genres. In this newly learned vector space, a metric based on cosine distance is able to distinguish between functional chord relationships, as well as harmonic associations in the music. Evidence, based on cosine distance between chord-pair vectors, suggests that an implicit circle-of-fifths exists in the vector space. In addition, a comparison between pieces in different keys reveals that key relationships are represented in word2vec space. These results suggest that the newly learned embedded vector representation does in fact capture tonal and harmonic characteristics of music, without receiving explicit information about the musical content of the constituent slices. In order to investigate whether proximity in the discovered space of embeddings is indicative of `semantically-related’ slices, we explore a music generation task, by automatically replacing existing slices from a given piece of music with new slices. We propose an algorithm to find substitute slices based on spatial proximity and the pitch class distribution inferred in the chosen subspace. The results indicate that the size of the subspace used has a significant effect on whether slices belonging to the same key are selected. In sum, the proposed word2vec model is able to learn music-vector embeddings that capture meaningful tonal and harmonic relationships in music, thereby providing a useful tool for exploring musical properties and comparisons across pieces, as a potential input representation for deep learning models, and as a music generation device. \|
Tasks	Music Generation
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12408v1
PDF	http://arxiv.org/pdf/1811.12408v1.pdf
PWC	https://paperswithcode.com/paper/from-context-to-concept-exploring-semantic
Repo
Framework