October 20, 2019

3034 words 15 mins read

Paper Group ANR 25

Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks. Near-Optimal Coresets of Kernel Density Estimates. Prioritized Multi-View Stereo Depth Map Generation Using Confidence Prediction. Context-Aware Synthesis and Placement of Object Instances. Adversarial Examples in Remote Sensing. Hyper-Hue and EMAP on Hyperspectral …

Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks


Title	Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks
Authors	Amirmohammad Rooshenas, Dongxu Zhang, Gopal Sharma, Andrew McCallum
Abstract	In structured output prediction tasks, labeling ground-truth training output is often expensive. However, for many tasks, even when the true output is unknown, we can evaluate predictions using a scalar reward function, which may be easily assembled from human knowledge or non-differentiable pipelines. But searching through the entire output space to find the best output with respect to this reward function is typically intractable. In this paper, we instead use efficient truncated randomized search in this reward function to train structured prediction energy networks (SPENs), which provide efficient test-time inference using gradient-based search on a smooth, learned representation of the score landscape, and have previously yielded state-of-the-art results in structured prediction. In particular, this truncated randomized search in the reward function yields previously unknown local improvements, providing effective supervision to SPENs, avoiding their traditional need for labeled training data.
Tasks	Structured Prediction
Published	2018-12-22
URL	https://arxiv.org/abs/1812.09603v2
PDF	https://arxiv.org/pdf/1812.09603v2.pdf
PWC	https://paperswithcode.com/paper/search-guided-lightly-supervised-training-of
Repo
Framework

Near-Optimal Coresets of Kernel Density Estimates


Title	Near-Optimal Coresets of Kernel Density Estimates
Authors	Jeff M. Phillips, Wai Ming Tai
Abstract	We construct near-optimal coresets for kernel density estimates for points in $\mathbb{R}^d$ when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size $O(\sqrt{d}/\varepsilon\cdot \sqrt{\log 1/\varepsilon} )$, and we show a near-matching lower bound of size $\Omega(\min{\sqrt{d}/\varepsilon, 1/\varepsilon^2})$. When $d\geq 1/\varepsilon^2$, it is known that the size of coreset can be $O(1/\varepsilon^2)$. The upper bound is a polynomial-in-$(1/\varepsilon)$ improvement when $d \in [3,1/\varepsilon^2)$ and the lower bound is the first known lower bound to depend on $d$ for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.
Tasks
Published	2018-02-06
URL	http://arxiv.org/abs/1802.01751v5
PDF	http://arxiv.org/pdf/1802.01751v5.pdf
PWC	https://paperswithcode.com/paper/near-optimal-coresets-of-kernel-density
Repo
Framework

Prioritized Multi-View Stereo Depth Map Generation Using Confidence Prediction


Title	Prioritized Multi-View Stereo Depth Map Generation Using Confidence Prediction
Authors	Christian Mostegel, Friedrich Fraundorfer, Horst Bischof
Abstract	In this work, we propose a novel approach to prioritize the depth map computation of multi-view stereo (MVS) to obtain compact 3D point clouds of high quality and completeness at low computational cost. Our prioritization approach operates before the MVS algorithm is executed and consists of two steps. In the first step, we aim to find a good set of matching partners for each view. In the second step, we rank the resulting view clusters (i.e. key views with matching partners) according to their impact on the fulfillment of desired quality parameters such as completeness, ground resolution and accuracy. Additional to geometric analysis, we use a novel machine learning technique for training a confidence predictor. The purpose of this confidence predictor is to estimate the chances of a successful depth reconstruction for each pixel in each image for one specific MVS algorithm based on the RGB images and the image constellation. The underlying machine learning technique does not require any ground truth or manually labeled data for training, but instead adapts ideas from depth map fusion for providing a supervision signal. The trained confidence predictor allows us to evaluate the quality of image constellations and their potential impact to the resulting 3D reconstruction and thus builds a solid foundation for our prioritization approach. In our experiments, we are thus able to reach more than 70% of the maximal reachable quality fulfillment using only 5% of the available images as key views. For evaluating our approach within and across different domains, we use two completely different scenarios, i.e. cultural heritage preservation and reconstruction of single family houses.
Tasks	3D Reconstruction
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08323v1
PDF	http://arxiv.org/pdf/1803.08323v1.pdf
PWC	https://paperswithcode.com/paper/prioritized-multi-view-stereo-depth-map
Repo
Framework

Context-Aware Synthesis and Placement of Object Instances


Title	Context-Aware Synthesis and Placement of Object Instances
Authors	Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz
Abstract	Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem. Solving it requires (a) determining a location to place an object in the scene and (b) determining its appearance at the location. Such an object insertion model can potentially facilitate numerous image editing and scene parsing applications. In this paper, we propose an end-to-end trainable neural network for the task of inserting an object instance mask of a specified class into the semantic label map of an image. Our network consists of two generative modules where one determines where the inserted object mask should be (i.e., location and scale) and the other determines what the object mask shape (and pose) should look like. The two modules are connected together via a spatial transformation network and jointly trained. We devise a learning procedure that leverage both supervised and unsupervised data and show our model can insert an object at diverse locations with various appearances. We conduct extensive experimental validations with comparisons to strong baselines to verify the effectiveness of the proposed network.
Tasks	Scene Parsing
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02350v2
PDF	http://arxiv.org/pdf/1812.02350v2.pdf
PWC	https://paperswithcode.com/paper/context-aware-synthesis-and-placement-of
Repo
Framework

Adversarial Examples in Remote Sensing


Title	Adversarial Examples in Remote Sensing
Authors	Wojciech Czaja, Neil Fendley, Michael Pekala, Christopher Ratto, I-Jeng Wang
Abstract	This paper considers attacks against machine learning algorithms used in remote sensing applications, a domain that presents a suite of challenges that are not fully addressed by current research focused on natural image data such as ImageNet. In particular, we present a new study of adversarial examples in the context of satellite image classification problems. Using a recently curated data set and associated classifier, we provide a preliminary analysis of adversarial examples in settings where the targeted classifier is permitted multiple observations of the same location over time. While our experiments to date are purely digital, our problem setup explicitly incorporates a number of practical considerations that a real-world attacker would need to take into account when mounting a physical attack. We hope this work provides a useful starting point for future studies of potential vulnerabilities in this setting.
Tasks	Image Classification
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10997v1
PDF	http://arxiv.org/pdf/1805.10997v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-examples-in-remote-sensing
Repo
Framework

Hyper-Hue and EMAP on Hyperspectral Images for Supervised Layer Decomposition of Old Master Drawings


Title	Hyper-Hue and EMAP on Hyperspectral Images for Supervised Layer Decomposition of Old Master Drawings
Authors	AmirAbbas Davari, Nikolaos Sakaltras, Armin Haeberle, Sulaiman Vesal, Vincent Christlein, Andreas Maier, Christian Riess
Abstract	Old master drawings were mostly created step by step in several layers using different materials. To art historians and restorers, examination of these layers brings various insights into the artistic work process and helps to answer questions about the object, its attribution and its authenticity. However, these layers typically overlap and are oftentimes difficult to differentiate with the unaided eye. For example, a common layer combination is red chalk under ink. In this work, we propose an image processing pipeline that operates on hyperspectral images to separate such layers. Using this pipeline, we show that hyperspectral images enable better layer separation than RGB images, and that spectral focus stacking aids the layer separation. In particular, we propose to use two descriptors in hyperspectral historical document analysis, namely hyper-hue and extended multi-attribute profile (EMAP). Our comparative results with other features underline the efficacy of the three proposed improvements.
Tasks
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09472v2
PDF	http://arxiv.org/pdf/1801.09472v2.pdf
PWC	https://paperswithcode.com/paper/hyper-hue-and-emap-on-hyperspectral-images
Repo
Framework

Token-level and sequence-level loss smoothing for RNN language models


Title	Token-level and sequence-level loss smoothing for RNN language models
Authors	Maha Elbayad, Laurent Besacier, Jakob Verbeek
Abstract	Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from “exposure bias”: during training tokens are predicted given ground-truth sequences, while at test time prediction is conditioned on generated output sequences. To overcome these limitations we build upon the recent reward augmented maximum likelihood approach \ie sequence-level smoothing that encourages the model to predict sentences close to the ground truth according to a given performance metric. We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach. Our experiments on two different tasks, image captioning and machine translation, show that token-level and sequence-level loss smoothing are complementary, and significantly improve results.
Tasks	Image Captioning, Machine Translation
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05062v1
PDF	http://arxiv.org/pdf/1805.05062v1.pdf
PWC	https://paperswithcode.com/paper/token-level-and-sequence-level-loss-smoothing
Repo
Framework


Title	Blind Source Separation with Optimal Transport Non-negative Matrix Factorization
Authors	Antoine Rolet, Vivien Seguy, Mathieu Blondel, Hiroshi Sawada
Abstract	Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier transform (STFT) spectrogram frequencies, which takes into account how humans perceive sound. We give empirical evidence that using our proposed optimal transport NMF leads to perceptually better results than Euclidean NMF, for both isolated voice reconstruction and BSS tasks. Finally, we demonstrate how to use optimal transport for cross domain sound processing tasks, where frequencies represented in the input spectrograms may be different from one spectrogram to another.
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05429v1
PDF	http://arxiv.org/pdf/1802.05429v1.pdf
PWC	https://paperswithcode.com/paper/blind-source-separation-with-optimal
Repo
Framework

Image Segmentation Using Subspace Representation and Sparse Decomposition


Title	Image Segmentation Using Subspace Representation and Sparse Decomposition
Authors	Shervin Minaee
Abstract	Image foreground extraction is a classical problem in image processing and vision, with a large range of applications. In this dissertation, we focus on the extraction of text and graphics in mixed-content images, and design novel approaches for various aspects of this problem. We first propose a sparse decomposition framework, which models the background by a subspace containing smooth basis vectors, and foreground as a sparse and connected component. We then formulate an optimization framework to solve this problem, by adding suitable regularizations to the cost function to promote the desired characteristics of each component. We present two techniques to solve the proposed optimization problem, one based on alternating direction method of multipliers (ADMM), and the other one based on robust regression. Promising results are obtained for screen content image segmentation using the proposed algorithm. We then propose a robust subspace learning algorithm for the representation of the background component using training images that could contain both background and foreground components, as well as noise. With the learnt subspace for the background, we can further improve the segmentation results, compared to using a fixed subspace. Lastly, we investigate a different class of signal/image decomposition problem, where only one signal component is active at each signal element. In this case, besides estimating each component, we need to find their supports, which can be specified by a binary mask. We propose a mixed-integer programming problem, that jointly estimates the two components and their supports through an alternating optimization scheme. We show the application of this algorithm on various problems, including image segmentation, video motion segmentation, and also separation of text from textured images.
Tasks	Motion Segmentation, Semantic Segmentation
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02419v1
PDF	http://arxiv.org/pdf/1804.02419v1.pdf
PWC	https://paperswithcode.com/paper/image-segmentation-using-subspace
Repo
Framework

OCAPIS: R package for Ordinal Classification And Preprocessing In Scala


Title	OCAPIS: R package for Ordinal Classification And Preprocessing In Scala
Authors	M. Cristina Heredia-Gómez, Salvador García, Pedro Antonio Gutiérrez, Francisco Herrera
Abstract	Ordinal Data are those where a natural order exist between the labels. The classification and pre-processing of this type of data is attracting more and more interest in the area of machine learning, due to its presence in many common problems. Traditionally, ordinal classification problems have been approached as nominal problems. However, that implies not taking into account their natural order constraints. In this paper, an innovative R package named ocapis (Ordinal Classification and Preprocessing In Scala) is introduced. Implemented mainly in Scala and available through Github, this library includes four learners and two pre-processing algorithms for ordinal and monotonic data. Main features of the package and examples of installation and use are explained throughout this manuscript.
Tasks
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09733v3
PDF	http://arxiv.org/pdf/1810.09733v3.pdf
PWC	https://paperswithcode.com/paper/ocapis-r-package-for-ordinal-classification
Repo
Framework

A Mixture of Views Network with Applications to the Classification of Breast Microcalcifications


Title	A Mixture of Views Network with Applications to the Classification of Breast Microcalcifications
Authors	Yaniv Shachor, Hayit Greenspan, Jacob Goldberger
Abstract	In this paper we examine data fusion methods for multi-view data classification. We present a decision concept which explicitly takes into account the input multi-view structure, where for each case there is a different subset of relevant views. This data fusion concept, which we dub Mixture of Views, is implemented by a special purpose neural network architecture. It is demonstrated on the task of classifying breast microcalcifications as benign or malignant based on CC and MLO mammography views. The single view decisions are combined by a data-driven decision, according to the relevance of each view in a given case, into a global decision. The method is evaluated on a large multi-view dataset extracted from the standardized digital database for screening mammography (DDSM). The experimental results show that our method outperforms previously suggested fusion methods.
Tasks
Published	2018-03-19
URL	http://arxiv.org/abs/1803.06898v1
PDF	http://arxiv.org/pdf/1803.06898v1.pdf
PWC	https://paperswithcode.com/paper/a-mixture-of-views-network-with-applications
Repo
Framework

Self-Normalization Properties of Language Modeling


Title	Self-Normalization Properties of Language Modeling
Authors	Jacob Goldberger, Oren Melamud
Abstract	Self-normalizing discriminative models approximate the normalized probability of a class without having to compute the partition function. In the context of language modeling, this property is particularly appealing as it may significantly reduce run-times due to large word vocabularies. In this study, we provide a comprehensive investigation of language modeling self-normalization. First, we theoretically analyze the inherent self-normalization properties of Noise Contrastive Estimation (NCE) language models. Then, we compare them empirically to softmax-based approaches, which are self-normalized using explicit regularization, and suggest a hybrid model with compelling properties. Finally, we uncover a surprising negative correlation between self-normalization and perplexity across the board, as well as some regularity in the observed errors, which may potentially be used for improving self-normalization algorithms in the future.
Tasks	Language Modelling
Published	2018-06-04
URL	http://arxiv.org/abs/1806.00913v1
PDF	http://arxiv.org/pdf/1806.00913v1.pdf
PWC	https://paperswithcode.com/paper/self-normalization-properties-of-language
Repo
Framework

Video Person Re-identification by Temporal Residual Learning


Title	Video Person Re-identification by Temporal Residual Learning
Authors	Ju Dai, Pingping Zhang, Huchuan Lu, Hongyu Wang
Abstract	In this paper, we propose a novel feature learning framework for video person re-identification (re-ID). The proposed framework largely aims to exploit the adequate temporal information of video sequences and tackle the poor spatial alignment of moving pedestrians. More specifically, for exploiting the temporal information, we design a temporal residual learning (TRL) module to simultaneously extract the generic and specific features of consecutive frames. The TRL module is equipped with two bi-directional LSTM (BiLSTM), which are respectively responsible to describe a moving person in different aspects, providing complementary information for better feature representations. To deal with the poor spatial alignment in video re-ID datasets, we propose a spatial-temporal transformer network (ST^2N) module. Transformation parameters in the ST^2N module are learned by leveraging the high-level semantic information of the current frame as well as the temporal context knowledge from other frames. The proposed ST^2N module with less learnable parameters allows effective person alignments under significant appearance changes. Extensive experimental results on the large-scale MARS, PRID2011, ILIDS-VID and SDU-VID datasets demonstrate that the proposed method achieves consistently superior performance and outperforms most of the very recent state-of-the-art methods.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2018-02-22
URL	http://arxiv.org/abs/1802.07918v1
PDF	http://arxiv.org/pdf/1802.07918v1.pdf
PWC	https://paperswithcode.com/paper/video-person-re-identification-by-temporal
Repo
Framework

Partial Person Re-identification with Alignment and Hallucination


Title	Partial Person Re-identification with Alignment and Hallucination
Authors	Sara Iodice, Krystian Mikolajczyk
Abstract	Partial person re-identification involves matching pedestrian frames where only a part of a body is visible in corresponding images. This reflects practical CCTV surveillance scenario, where full person views are often not available. Missing body parts make the comparison very challenging due to significant misalignment and varying scale of the views. We propose Partial Matching Net (PMN) that detects body joints, aligns partial views and hallucinates the missing parts based on the information present in the frame and a learned model of a person. The aligned and reconstructed views are then combined into a joint representation and used for matching images. We evaluate our approach and compare to other methods on three different datasets, demonstrating significant improvements.
Tasks	Person Re-Identification
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09162v1
PDF	http://arxiv.org/pdf/1807.09162v1.pdf
PWC	https://paperswithcode.com/paper/partial-person-re-identification-with
Repo
Framework

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron


Title	Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Authors	Sharan Vaswani, Francis Bach, Mark Schmidt
Abstract	Modern machine learning focuses on highly expressive models that are able to fit or interpolate the data completely, resulting in zero training loss. For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions. We also show that this condition implies that SGD can find a first-order stationary point as efficiently as full gradient descent in non-convex settings. Under interpolation, we further show that all smooth loss functions with a finite-sum structure satisfy a weaker growth condition. Given this weaker condition, we prove that SGD with a constant step-size attains the deterministic convergence rate in both the strongly-convex and convex settings. Under additional assumptions, the above results enable us to prove an O(1/k^2) mistake bound for k iterations of a stochastic perceptron algorithm using the squared-hinge loss. Finally, we validate our theoretical findings with experiments on synthetic and real datasets.
Tasks
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07288v3
PDF	http://arxiv.org/pdf/1810.07288v3.pdf
PWC	https://paperswithcode.com/paper/fast-and-faster-convergence-of-sgd-for-over
Repo
Framework