January 31, 2020

3417 words 17 mins read

Paper Group ANR 99

Improving Head Pose Estimation with a Combined Loss and Bounding Box Margin Adjustment. Learning to aggregate feature representations. Almost Uniform Sampling From Neural Networks. Ego-CNN: Distributed, Egocentric Representations of Graphs for Detecting Critical Structures. Graph Width Measures for CNF-Encodings with Auxiliary Variables. Machine Le …

Improving Head Pose Estimation with a Combined Loss and Bounding Box Margin Adjustment


Title	Improving Head Pose Estimation with a Combined Loss and Bounding Box Margin Adjustment
Authors	Mingzhen Shao, Zhun Sun, Mete Ozay, Takayuki Okatani
Abstract	We address a problem of estimating pose of a person’s head from its RGB image. The employment of CNNs for the problem has contributed to significant improvement in accuracy in recent works. However, we show that the following two methods, despite their simplicity, can attain further improvement: (i) proper adjustment of the margin of bounding box of a detected face, and (ii) choice of loss functions. We show that the integration of these two methods achieve the new state-of-the-art on standard benchmark datasets for in-the-wild head pose estimation.
Tasks	Head Pose Estimation, Pose Estimation
Published	2019-05-14
URL	https://arxiv.org/abs/1905.08609v1
PDF	https://arxiv.org/pdf/1905.08609v1.pdf
PWC	https://paperswithcode.com/paper/190508609
Repo
Framework

Learning to aggregate feature representations


Title	Learning to aggregate feature representations
Authors	Guy Gaziv
Abstract	The Algonauts challenge requires to construct a multi-subject encoder of images to brain activity. Deep networks such as ResNet-50 and AlexNet trained for image classification are known to produce feature representations along their intermediate stages which closely mimic the visual hierarchy. However the challenges introduced in the Algonauts project, including combining data from multiple subjects, relying on very few similarity data points, solving for various ROIs, and multi-modality, require devising a flexible framework which can efficiently accommodate them. Here we build upon a recent state-of-the-art classification network (SE-ResNeXt-50) and construct an adaptive combination of its intermediate representations. While the pretrained network serves as a backbone of our model, we learn how to aggregate feature representations along five stages of the network. During learning, our method enables to modulate and screen outputs from each stage along the network as governed by the optimized objective. We applied our method to the Algonauts2019 fMRI and MEG challenges. Using the combined fMRI and MEG data, our approach was rated among the leading five for both challenges. Surprisingly we find that for both the lower and higher order areas (EVC and IT) the adaptive aggregation favors features stemming at later stages of the network.
Tasks	Image Classification
Published	2019-07-01
URL	https://arxiv.org/abs/1907.01034v3
PDF	https://arxiv.org/pdf/1907.01034v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-aggregate-feature-representations
Repo
Framework

Almost Uniform Sampling From Neural Networks


Title	Almost Uniform Sampling From Neural Networks
Authors	Changlong Wu, Narayana Prasad Santhanam
Abstract	Given a length $n$ sample from $\mathbb{R}^d$ and a neural network with a fixed architecture with $W$ weights, $k$ neurons, linear threshold activation functions, and binary outputs on each neuron, we study the problem of uniformly sampling from all possible labelings on the sample corresponding to different choices of weights. We provide an algorithm that runs in time polynomial both in $n$ and $W$ such that any labeling appears with probability at least $\left(\frac{W}{2ekn}\right)^W$ for $W<n$. For a single neuron, we also provide a random walk based algorithm that samples exactly uniformly.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04994v1
PDF	https://arxiv.org/pdf/1912.04994v1.pdf
PWC	https://paperswithcode.com/paper/almost-uniform-sampling-from-neural-networks
Repo
Framework

Ego-CNN: Distributed, Egocentric Representations of Graphs for Detecting Critical Structures


Title	Ego-CNN: Distributed, Egocentric Representations of Graphs for Detecting Critical Structures
Authors	Ruo-Chun Tzeng, Shan-Hung Wu
Abstract	We study the problem of detecting critical structures using a graph embedding model. Existing graph embedding models lack the ability to precisely detect critical structures that are specific to a task at the global scale. In this paper, we propose a novel graph embedding model, called the Ego-CNNs, that employs the ego-convolutions convolutions at each layer and stacks up layers using an ego-centric way to detects precise critical structures efficiently. An Ego-CNN can be jointly trained with a task model and help explain/discover knowledge for the task. We conduct extensive experiments and the results show that Ego-CNNs (1) can lead to comparable task performance as the state-of-the-art graph embedding models, (2) works nicely with CNN visualization techniques to illustrate the detected structures, and (3) is efficient and can incorporate with scale-free priors, which commonly occurs in social network datasets, to further improve the training efficiency.
Tasks	Graph Embedding
Published	2019-06-23
URL	https://arxiv.org/abs/1906.09602v1
PDF	https://arxiv.org/pdf/1906.09602v1.pdf
PWC	https://paperswithcode.com/paper/ego-cnn-distributed-egocentric
Repo
Framework

Graph Width Measures for CNF-Encodings with Auxiliary Variables


Title	Graph Width Measures for CNF-Encodings with Auxiliary Variables
Authors	Stefan Mengel, Romain Wallon
Abstract	We consider bounded width CNF-formulas where the width is measured by popular graph width measures on graphs associated to CNF-formulas. Such restricted graph classes, in particular those of bounded treewidth, have been extensively studied for their uses in the design of algorithms for various computational problems on CNF-formulas. Here we consider the expressivity of these formulas in the model of clausal encodings with auxiliary variables. We first show that bounding the width for many of the measures from the literature leads to a dramatic loss of expressivity, restricting the formulas to such of low communication complexity. We then show that the width of optimal encodings with respect to different measures is strongly linked: there are two classes of width measures, one containing primal treewidth and the other incidence cliquewidth, such that in each class the width of optimal encodings only differs by constant factors. Moreover, between the two classes the width differs at most by a factor logarithmic in the number of variables. Both these results are in stark contrast to the setting without auxiliary variables where all width measures we consider here differ by more than constant factors and in many cases even by linear factors.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.05290v2
PDF	https://arxiv.org/pdf/1905.05290v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-graph-width-measures-for-cnf
Repo
Framework

Machine Learning-Based Analysis of Sperm Videos and Participant Data for Male Fertility Prediction


Title	Machine Learning-Based Analysis of Sperm Videos and Participant Data for Male Fertility Prediction
Authors	Steven A. Hicks, Jorunn M. Andersen, Oliwia Witczak, Vajira Thambawita, Påll Halvorsen, Hugo L. Hammer, Trine B. Haugen, Michael A. Riegler
Abstract	Methods for automatic analysis of clinical data are usually targeted towards a specific modality and do not make use of all relevant data available. In the field of male human reproduction, clinical and biological data are not used to its fullest potential. Manual evaluation of a semen sample using a microscope is time-consuming and requires extensive training. Furthermore, the validity of manual semen analysis has been questioned due to limited reproducibility, and often high inter-personnel variation. The existing computer-aided sperm analyzer systems are not recommended for routine clinical use due to methodological challenges caused by the consistency of the semen sample. Thus, there is a need for an improved methodology. We use modern and classical machine learning techniques together with a dataset consisting of 85 videos of human semen samples and related participant data to automatically predict sperm motility. Used techniques include simple linear regression and more sophisticated methods using convolutional neural networks. Our results indicate that sperm motility prediction based on deep learning using sperm motility videos is rapid to perform and consistent. The algorithms performed worse when participant data was added. In conclusion, machine learning-based automatic analysis may become a valuable tool in male infertility investigation and research.
Tasks
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13327v1
PDF	https://arxiv.org/pdf/1910.13327v1.pdf
PWC	https://paperswithcode.com/paper/191013327
Repo
Framework

Meta-Learning to Cluster


Title	Meta-Learning to Cluster
Authors	Yibo Jiang, Nakul Verma
Abstract	Clustering is one of the most fundamental and wide-spread techniques in exploratory data analysis. Yet, the basic approach to clustering has not really changed: a practitioner hand-picks a task-specific clustering loss to optimize and fit the given data to reveal the underlying cluster structure. Some types of losses—such as k-means, or its non-linear version: kernelized k-means (centroid based), and DBSCAN (density based)—are popular choices due to their good empirical performance on a range of applications. Although every so often the clustering output using these standard losses fails to reveal the underlying structure, and the practitioner has to custom-design their own variation. In this work we take an intrinsically different approach to clustering: rather than fitting a dataset to a specific clustering loss, we train a recurrent model that learns how to cluster. The model uses as training pairs examples of datasets (as input) and its corresponding cluster identities (as output). By providing multiple types of training datasets as inputs, our model has the ability to generalize well on unseen datasets (new clustering tasks). Our experiments reveal that by training on simple synthetically generated datasets or on existing real datasets, we can achieve better clustering performance on unseen real-world datasets when compared with standard benchmark clustering techniques. Our meta clustering model works well even for small datasets where the usual deep learning models tend to perform worse.
Tasks	Meta-Learning
Published	2019-10-30
URL	https://arxiv.org/abs/1910.14134v1
PDF	https://arxiv.org/pdf/1910.14134v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-to-cluster
Repo
Framework

MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models


Title	MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models
Authors	Itzik Malkiel, Lior Wolf
Abstract	Recent state-of-the-art language models utilize a two-phase training procedure comprised of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. More recently, many studies have been focused on trying to improve these models by enhancing the pre-training phase, either via better choice of hyperparameters or by leveraging an improved formulation. However, the pre-training phase is computationally expensive and often done on private datasets. In this work, we present a method that leverages BERT’s fine-tuning phase to its fullest, by applying an extensive number of parallel classifier heads, which are enforced to be orthogonal, while adaptively eliminating the weaker heads during training. Our method allows the model to converge to an optimal number of parallel classifiers, depending on the given dataset at hand. We conduct an extensive inter- and intra-dataset evaluations, showing that our method improves the robustness of BERT, sometimes leading to a +9% gain in accuracy. These results highlight the importance of a proper fine-tuning procedure, especially for relatively smaller-sized datasets. Our code is attached as supplementary and our models will be made completely public.
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.06182v1
PDF	https://arxiv.org/pdf/1911.06182v1.pdf
PWC	https://paperswithcode.com/paper/mml-maximal-multiverse-learning-for-robust
Repo
Framework

Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM


Title	Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM
Authors	Hongxu Yin, Guoyang Chen, Yingmin Li, Shuai Che, Weifeng Zhang, Niraj K. Jha
Abstract	Many long short-term memory (LSTM) applications need fast yet compact models. Neural network compression approaches, such as the grow-and-prune paradigm, have proved to be promising for cutting down network complexity by skipping insignificant weights. However, current compression strategies are mostly hardware-agnostic and network complexity reduction does not always translate into execution efficiency. In this work, we propose a hardware-guided symbiotic training methodology for compact, accurate, yet execution-efficient inference models. It is based on our observation that hardware may introduce substantial non-monotonic behavior, which we call the latency hysteresis effect, when evaluating network size vs. inference latency. This observation raises question about the mainstream smaller-dimension-is-better compression strategy, which often leads to a sub-optimal model architecture. By leveraging the hardware-impacted hysteresis effect and sparsity, we are able to achieve the symbiosis of model compactness and accuracy with execution efficiency, thus reducing LSTM latency while increasing its accuracy. We have evaluated our algorithms on language modeling and speech recognition applications. Relative to the traditional stacked LSTM architecture obtained for the Penn Treebank dataset, we reduce the number of parameters by 18.0x (30.5x) and measured run-time latency by up to 2.4x (5.2x) on Nvidia GPUs (Intel Xeon CPUs) without any accuracy degradation. For the DeepSpeech2 architecture obtained for the AN4 dataset, we reduce the number of parameters by 7.0x (19.4x), word error rate from 12.9% to 9.9% (10.4%), and measured run-time latency by up to 1.7x (2.4x) on Nvidia GPUs (Intel Xeon CPUs). Thus, our method yields compact, accurate, yet execution-efficient inference models.
Tasks	Language Modelling, Neural Network Compression, Speech Recognition
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10997v1
PDF	http://arxiv.org/pdf/1901.10997v1.pdf
PWC	https://paperswithcode.com/paper/hardware-guided-symbiotic-training-for
Repo
Framework

Local Label Propagation for Large-Scale Semi-Supervised Learning


Title	Local Label Propagation for Large-Scale Semi-Supervised Learning
Authors	Chengxu Zhuang, Xuehao Ding, Divyanshu Murli, Daniel Yamins
Abstract	A significant issue in training deep neural networks to solve supervised learning tasks is the need for large numbers of labelled datapoints. The goal of semi-supervised learning is to leverage ubiquitous unlabelled data, together with small quantities of labelled data, to achieve high task performance. Though substantial recent progress has been made in developing semi-supervised algorithms that are effective for comparatively small datasets, many of these techniques do not scale readily to the large (unlaballed) datasets characteristic of real-world applications. In this paper we introduce a novel approach to scalable semi-supervised learning, called Local Label Propagation (LLP). Extending ideas from recent work on unsupervised embedding learning, LLP first embeds datapoints, labelled and otherwise, in a common latent space using a deep neural network. It then propagates pseudolabels from known to unknown datapoints in a manner that depends on the local geometry of the embedding, taking into account both inter-point distance and local data density as a weighting on propagation likelihood. The parameters of the deep embedding are then trained to simultaneously maximize pseudolabel categorization performance as well as a metric of the clustering of datapoints within each psuedo-label group, iteratively alternating stages of network training and label propagation. We illustrate the utility of the LLP method on the ImageNet dataset, achieving results that outperform previous state-of-the-art scalable semi-supervised learning algorithms by large margins, consistently across a wide variety of training regimes. We also show that the feature representation learned with LLP transfers well to scene recognition in the Places 205 dataset.
Tasks	Scene Recognition
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11581v1
PDF	https://arxiv.org/pdf/1905.11581v1.pdf
PWC	https://paperswithcode.com/paper/local-label-propagation-for-large-scale-semi
Repo
Framework

Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval


Title	Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval
Authors	Yadan Luo, Ziwei Wang, Zi Huang, Yang Yang, Huimin Lu
Abstract	With the increasing number of online stores, there is a pressing need for intelligent search systems to understand the item photos snapped by customers and search against large-scale product databases to find their desired items. However, it is challenging for conventional retrieval systems to match up the item photos captured by customers and the ones officially released by stores, especially for garment images. To bridge the customer- and store- provided garment photos, existing studies have been widely exploiting the clothing attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to learn a common embedding space for garment representations. Unfortunately they omit the sequential correlation of attributes and consume large quantity of human labors to label the landmarks. In this paper, we propose a deep multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain embedding and sequential attribute learning are modeled simultaneously. Sequential attribute learning not only provides the semantic guidance for embedding, but also generates rich attention on discriminative local details (\textit{e.g.,} black buttons) of clothing items without requiring extra landmark labels. This leads to promising performance and 306$\times$ boost on efficiency when compared with the state-of-the-art models, which is demonstrated through rigorous experiments on two public fashion datasets.
Tasks	Image Retrieval
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02887v1
PDF	http://arxiv.org/pdf/1904.02887v1.pdf
PWC	https://paperswithcode.com/paper/snap-and-find-deep-discrete-cross-domain
Repo
Framework

VARENN: Graphical representation of spatiotemporal data and application to climate studies


Title	VARENN: Graphical representation of spatiotemporal data and application to climate studies
Authors	Takeshi Ise, Yurika Oba
Abstract	Analyzing and utilizing spatiotemporal big data are essential for studies concerning climate change. However, such data are not fully integrated into climate models owing to limitations in statistical frameworks. Herein, we employ VARENN (visually augmented representation of environment for neural networks) to efficiently summarize monthly observations of climate data for 1901-2016 into 2-dimensional graphical images. Using red, green, and blue channels of color images, three different variables are simultaneously represented in a single image. For global datasets, models were trained via convolutional neural networks. These models successfully classified rises and falls in temperature and precipitation. Moreover, similarities between the input and target variables were observed to have a significant effect on model accuracy. The input variables had both seasonal and interannual variations, whose importance was quantified for model efficacy. VARENN is thus an effective method to summarize spatiotemporal data objectively and accurately.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09725v1
PDF	https://arxiv.org/pdf/1907.09725v1.pdf
PWC	https://paperswithcode.com/paper/varenn-graphical-representation-of
Repo
Framework

Representation of White- and Black-Box Adversarial Examples in Deep Neural Networks and Humans: A Functional Magnetic Resonance Imaging Study


Title	Representation of White- and Black-Box Adversarial Examples in Deep Neural Networks and Humans: A Functional Magnetic Resonance Imaging Study
Authors	Chihye Han, Wonjun Yoon, Gihyun Kwon, Seungkyu Nam, Daeshik Kim
Abstract	The recent success of brain-inspired deep neural networks (DNNs) in solving complex, high-level visual tasks has led to rising expectations for their potential to match the human visual system. However, DNNs exhibit idiosyncrasies that suggest their visual representation and processing might be substantially different from human vision. One limitation of DNNs is that they are vulnerable to adversarial examples, input images on which subtle, carefully designed noises are added to fool a machine classifier. The robustness of the human visual system against adversarial examples is potentially of great importance as it could uncover a key mechanistic feature that machine vision is yet to incorporate. In this study, we compare the visual representations of white- and black-box adversarial examples in DNNs and humans by leveraging functional magnetic resonance imaging (fMRI). We find a small but significant difference in representation patterns for different (i.e. white- versus black- box) types of adversarial examples for both humans and DNNs. However, human performance on categorical judgment is not degraded by noise regardless of the type unlike DNN. These results suggest that adversarial examples may be differentially represented in the human visual system, but unable to affect the perceptual experience.
Tasks
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02422v1
PDF	https://arxiv.org/pdf/1905.02422v1.pdf
PWC	https://paperswithcode.com/paper/representation-of-white-and-black-box
Repo
Framework

IMMVP: An Efficient Daytime and Nighttime On-Road Object Detector


Title	IMMVP: An Efficient Daytime and Nighttime On-Road Object Detector
Authors	Cheng-En Wu, Yi-Ming Chan, Chien-Hung Chen, Wen-Cheng Chen, Chu-Song Chen
Abstract	It is hard to detect on-road objects under various lighting conditions. To improve the quality of the classifier, three techniques are used. We define subclasses to separate daytime and nighttime samples. Then we skip similar samples in the training set to prevent overfitting. With the help of the outside training samples, the detection accuracy is also improved. To detect objects in an edge device, Nvidia Jetson TX2 platform, we exert the lightweight model ResNet-18 FPN as the backbone feature extractor. The FPN (Feature Pyramid Network) generates good features for detecting objects over various scales. With Cascade R-CNN technique, the bounding boxes are iteratively refined for better results.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06573v3
PDF	https://arxiv.org/pdf/1910.06573v3.pdf
PWC	https://paperswithcode.com/paper/immvp-an-efficient-daytime-and-nighttime-on
Repo
Framework

Accurate Tissue Interface Segmentation via Adversarial Pre-Segmentation of Anterior Segment OCT Images


Title	Accurate Tissue Interface Segmentation via Adversarial Pre-Segmentation of Anterior Segment OCT Images
Authors	Jiahong Ouyang, Tejas Sudharshan Mathai, Kira Lathrop, John Galeotti
Abstract	Optical Coherence Tomography (OCT) is an imaging modality that has been widely adopted for visualizing corneal, retinal and limbal tissue structure with micron resolution. It can be used to diagnose pathological conditions of the eye, and for developing pre-operative surgical plans. In contrast to the posterior retina, imaging the anterior tissue structures, such as the limbus and cornea, results in B-scans that exhibit increased speckle noise patterns and imaging artifacts. These artifacts, such as shadowing and specularity, pose a challenge during the analysis of the acquired volumes as they substantially obfuscate the location of tissue interfaces. To deal with the artifacts and speckle noise patterns and accurately segment the shallowest tissue interface, we propose a cascaded neural network framework, which comprises of a conditional Generative Adversarial Network (cGAN) and a Tissue Interface Segmentation Network (TISN). The cGAN pre-segments OCT B-scans by removing undesired specular artifacts and speckle noise patterns just above the shallowest tissue interface, and the TISN combines the original OCT image with the pre-segmentation to segment the shallowest interface. We show the applicability of the cascaded framework to corneal datasets, demonstrate that it precisely segments the shallowest corneal interface, and also show its generalization capacity to limbal datasets. We also propose a hybrid framework, wherein the cGAN pre-segmentation is passed to a traditional image analysis-based segmentation algorithm, and describe the improved segmentation performance. To the best of our knowledge, this is the first approach to remove severe specular artifacts and speckle noise patterns (prior to the shallowest interface) that affects the interpretation of anterior segment OCT datasets, thereby resulting in the accurate segmentation of the shallowest tissue interface.
Tasks
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02378v1
PDF	https://arxiv.org/pdf/1905.02378v1.pdf
PWC	https://paperswithcode.com/paper/accurate-tissue-interface-segmentation-via
Repo
Framework