July 26, 2019

3237 words 16 mins read

Paper Group ANR 779

Soft Correspondences in Multimodal Scene Parsing. Beyond triplet loss: a deep quadruplet network for person re-identification. Deep Interactive Region Segmentation and Captioning. Efficient learning with robust gradient descent. Dominant Sets for “Constrained” Image Segmentation. Survey of Visual Question Answering: Datasets and Techniques. Interac …

Soft Correspondences in Multimodal Scene Parsing


Title	Soft Correspondences in Multimodal Scene Parsing
Authors	Sarah Taghavi Namin, Mohammad Najafi, Mathieu Salzmann, Lars Petersson
Abstract	Exploiting multiple modalities for semantic scene parsing has been shown to improve accuracy over the singlemodality scenario. However multimodal datasets often suffer from problems such as data misalignment and label inconsistencies, where the existing methods assume that corresponding regions in two modalities must have identical labels. We propose to address this issue, by formulating multimodal semantic labeling as inference in a CRF and introducing latent nodes to explicitly model inconsistencies between two modalities. These latent nodes allow us not only to leverage information from both domains to improve their labeling, but also to cut the edges between inconsistent regions. We propose to learn intradomain and inter-domain potential functions from training data to avoid hand-tuning of the model parameters. We evaluate our approach on two publicly available datasets containing 2D and 3D data. Thanks to our latent nodes and our learning strategy, our method outperforms the state-of-the-art in both cases. Moreover, in order to highlight the benefits of the geometric information and the potential of our method in simultaneous 2D/3D semantic and geometric inference, we performed simultaneous inference of semantic and geometric classes both in 2D and 3D that led to satisfactory improvements of the labeling results in both datasets.
Tasks	Scene Parsing
Published	2017-09-28
URL	http://arxiv.org/abs/1709.09843v1
PDF	http://arxiv.org/pdf/1709.09843v1.pdf
PWC	https://paperswithcode.com/paper/soft-correspondences-in-multimodal-scene
Repo
Framework

Beyond triplet loss: a deep quadruplet network for person re-identification


Title	Beyond triplet loss: a deep quadruplet network for person re-identification
Authors	Weihua Chen, Xiaotang Chen, Jianguo Zhang, Kaiqi Huang
Abstract	Person re-identification (ReID) is an important task in wide area video surveillance which focuses on identifying people across different cameras. Recently, deep learning networks with a triplet loss become a common framework for person ReID. However, the triplet loss pays main attentions on obtaining correct orders on the training set. It still suffers from a weaker generalization capability from the training set to the testing set, thus resulting in inferior performance. In this paper, we design a quadruplet loss, which can lead to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss. As a result, our model has a better generalization ability and can achieve a higher performance on the testing set. In particular, a quadruplet deep network using a margin-based online hard negative mining is proposed based on the quadruplet loss for the person ReID. In extensive experiments, the proposed network outperforms most of the state-of-the-art algorithms on representative datasets which clearly demonstrates the effectiveness of our proposed method.
Tasks	Person Re-Identification
Published	2017-04-06
URL	http://arxiv.org/abs/1704.01719v1
PDF	http://arxiv.org/pdf/1704.01719v1.pdf
PWC	https://paperswithcode.com/paper/beyond-triplet-loss-a-deep-quadruplet-network
Repo
Framework

Deep Interactive Region Segmentation and Captioning


Title	Deep Interactive Region Segmentation and Captioning
Authors	Ali Sharifi Boroujerdi, Maryam Khanian, Michael Breuss
Abstract	With recent innovations in dense image captioning, it is now possible to describe every object of the scene with a caption while objects are determined by bounding boxes. However, interpretation of such an output is not trivial due to the existence of many overlapping bounding boxes. Furthermore, in current captioning frameworks, the user is not able to involve personal preferences to exclude out of interest areas. In this paper, we propose a novel hybrid deep learning architecture for interactive region segmentation and captioning where the user is able to specify an arbitrary region of the image that should be processed. To this end, a dedicated Fully Convolutional Network (FCN) named Lyncean FCN (LFCN) is trained using our special training data to isolate the User Intention Region (UIR) as the output of an efficient segmentation. In parallel, a dense image captioning model is utilized to provide a wide variety of captions for that region. Then, the UIR will be explained with the caption of the best match bounding box. To the best of our knowledge, this is the first work that provides such a comprehensive output. Our experiments show the superiority of the proposed approach over state-of-the-art interactive segmentation methods on several well-known datasets. In addition, replacement of the bounding boxes with the result of the interactive segmentation leads to a better understanding of the dense image captioning output as well as accuracy enhancement for the object detection in terms of Intersection over Union (IoU).
Tasks	Image Captioning, Interactive Segmentation, Object Detection
Published	2017-07-26
URL	http://arxiv.org/abs/1707.08364v1
PDF	http://arxiv.org/pdf/1707.08364v1.pdf
PWC	https://paperswithcode.com/paper/deep-interactive-region-segmentation-and
Repo
Framework

Efficient learning with robust gradient descent


Title	Efficient learning with robust gradient descent
Authors	Matthew J. Holland, Kazushi Ikeda
Abstract	Minimizing the empirical risk is a popular training strategy, but for learning tasks where the data may be noisy or heavy-tailed, one may require many observations in order to generalize well. To achieve better performance under less stringent requirements, we introduce a procedure which constructs a robust approximation of the risk gradient for use in an iterative learning routine. Using high-probability bounds on the excess risk of this algorithm, we show that our update does not deviate far from the ideal gradient-based update. Empirical tests using both controlled simulations and real-world benchmark data show that in diverse settings, the proposed procedure can learn more efficiently, using less resources (iterations and observations) while generalizing better.
Tasks
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00182v3
PDF	http://arxiv.org/pdf/1706.00182v3.pdf
PWC	https://paperswithcode.com/paper/efficient-learning-with-robust-gradient
Repo
Framework

Dominant Sets for “Constrained” Image Segmentation


Title	Dominant Sets for “Constrained” Image Segmentation
Authors	Eyasu Zemene, Leulseged Tesfaye Alemu, Marcello Pelillo
Abstract	Image segmentation has come a long way since the early days of computer vision, and still remains a challenging task. Modern variations of the classical (purely bottom-up) approach, involve, e.g., some form of user assistance (interactive segmentation) or ask for the simultaneous segmentation of two or more images (co-segmentation). At an abstract level, all these variants can be thought of as “constrained” versions of the original formulation, whereby the segmentation process is guided by some external source of information. In this paper, we propose a new approach to tackle this kind of problems in a unified way. Our work is based on some properties of a family of quadratic optimization problems related to dominant sets, a well-known graph-theoretic notion of a cluster which generalizes the concept of a maximal clique to edge-weighted graphs. In particular, we show that by properly controlling a regularization parameter which determines the structure and the scale of the underlying problem, we are in a position to extract groups of dominant-set clusters that are constrained to contain predefined elements. In particular, we shall focus on interactive segmentation and co-segmentation (in both the unsupervised and the interactive versions). The proposed algorithm can deal naturally with several type of constraints and input modality, including scribbles, sloppy contours, and bounding boxes, and is able to robustly handle noisy annotations on the part of the user. Experiments on standard benchmark datasets show the effectiveness of our approach as compared to state-of-the-art algorithms on a variety of natural images under several input conditions and constraints.
Tasks	Interactive Segmentation, Semantic Segmentation
Published	2017-07-15
URL	http://arxiv.org/abs/1707.05309v1
PDF	http://arxiv.org/pdf/1707.05309v1.pdf
PWC	https://paperswithcode.com/paper/dominant-sets-for-constrained-image
Repo
Framework

Survey of Visual Question Answering: Datasets and Techniques


Title	Survey of Visual Question Answering: Datasets and Techniques
Authors	Akshay Kumar Gupta
Abstract	Visual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The first part of the survey details the various datasets for VQA and compares them along some common factors. The second part of this survey details the different approaches for VQA, classified into four types: non-deep learning models, deep learning models without attention, deep learning models with attention, and other models which do not fit into the first three. Finally, we compare the performances of these approaches and provide some directions for future work.
Tasks	Question Answering, Visual Question Answering
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03865v2
PDF	http://arxiv.org/pdf/1705.03865v2.pdf
PWC	https://paperswithcode.com/paper/survey-of-visual-question-answering-datasets
Repo
Framework

Interactive Outlining of Pancreatic Cancer Liver Metastases in Ultrasound Images


Title	Interactive Outlining of Pancreatic Cancer Liver Metastases in Ultrasound Images
Authors	Jan Egger, Dieter Schmalstieg, Xiaojun Chen, Wolfram G. Zoller, Alexander Hann
Abstract	Ultrasound (US) is the most commonly used liver imaging modality worldwide. Due to its low cost, it is increasingly used in the follow-up of cancer patients with metastases localized in the liver. In this contribution, we present the results of an interactive segmentation approach for liver metastases in US acquisitions. A (semi-) automatic segmentation is still very challenging because of the low image quality and the low contrast between the metastasis and the surrounding liver tissue. Thus, the state of the art in clinical practice is still manual measurement and outlining of the metastases in the US images. We tackle the problem by providing an interactive segmentation approach providing real-time feedback of the segmentation results. The approach has been evaluated with typical US acquisitions from the clinical routine, and the datasets consisted of pancreatic cancer metastases. Even for difficult cases, satisfying segmentations results could be achieved because of the interactive real-time behavior of the approach. In total, 40 clinical images have been evaluated with our method by comparing the results against manual ground truth segmentations. This evaluation yielded to an average Dice Score of 85% and an average Hausdorff Distance of 13 pixels.
Tasks	Interactive Segmentation
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05368v1
PDF	http://arxiv.org/pdf/1704.05368v1.pdf
PWC	https://paperswithcode.com/paper/interactive-outlining-of-pancreatic-cancer
Repo
Framework

Heinrich Behmann’s Contributions to Second-Order Quantifier Elimination from the View of Computational Logic


Title	Heinrich Behmann’s Contributions to Second-Order Quantifier Elimination from the View of Computational Logic
Authors	Christoph Wernhard
Abstract	For relational monadic formulas (the L"owenheim class) second-order quantifier elimination, which is closely related to computation of uniform interpolants, projection and forgetting - operations that currently receive much attention in knowledge processing - always succeeds. The decidability proof for this class by Heinrich Behmann from 1922 explicitly proceeds by elimination with equivalence preserving formula rewriting. Here we reconstruct the results from Behmann’s publication in detail and discuss related issues that are relevant in the context of modern approaches to second-order quantifier elimination in computational logic. In addition, an extensive documentation of the letters and manuscripts in Behmann’s bequest that concern second-order quantifier elimination is given, including a commented register and English abstracts of the German sources with focus on technical material. In the late 1920s Behmann attempted to develop an elimination-based decision method for formulas with predicates whose arity is larger than one. His manuscripts and the correspondence with Wilhelm Ackermann show technical aspects that are still of interest today and give insight into the genesis of Ackermann’s landmark paper “Untersuchungen "uber das Eliminationsproblem der mathematischen Logik” from 1935, which laid the foundation of the two prevailing modern approaches to second-order quantifier elimination.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.06868v1
PDF	http://arxiv.org/pdf/1712.06868v1.pdf
PWC	https://paperswithcode.com/paper/heinrich-behmanns-contributions-to-second
Repo
Framework

On the Distortion of Voting with Multiple Representative Candidates


Title	On the Distortion of Voting with Multiple Representative Candidates
Authors	Yu Cheng, Shaddin Dughmi, David Kempe
Abstract	We study positional voting rules when candidates and voters are embedded in a common metric space, and cardinal preferences are naturally given by distances in the metric space. In a positional voting rule, each candidate receives a score from each ballot based on the ballot’s rank order; the candidate with the highest total score wins the election. The cost of a candidate is his sum of distances to all voters, and the distortion of an election is the ratio between the cost of the elected candidate and the cost of the optimum candidate. We consider the case when candidates are representative of the population, in the sense that they are drawn i.i.d. from the population of the voters, and analyze the expected distortion of positional voting rules. Our main result is a clean and tight characterization of positional voting rules that have constant expected distortion (independent of the number of candidates and the metric space). Our characterization result immediately implies constant expected distortion for Borda Count and elections in which each voter approves a constant fraction of all candidates. On the other hand, we obtain super-constant expected distortion for Plurality, Veto, and approving a constant number of candidates. These results contrast with previous results on voting with metric preferences: When the candidates are chosen adversarially, all of the preceding voting rules have distortion linear in the number of candidates or voters. Thus, the model of representative candidates allows us to distinguish voting rules which seem equally bad in the worst case.
Tasks
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07600v1
PDF	http://arxiv.org/pdf/1711.07600v1.pdf
PWC	https://paperswithcode.com/paper/on-the-distortion-of-voting-with-multiple
Repo
Framework

The Robust Reading Competition Annotation and Evaluation Platform


Title	The Robust Reading Competition Annotation and Evaluation Platform
Authors	Dimosthenis Karatzas, Lluis Gómez, Anguelos Nicolaou, Marçal Rusiñol
Abstract	The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become a de-facto evaluation standard for robust reading systems and algorithms. Concurrent with its second incarnation in 2011, a continuous effort started to develop an on-line framework to facilitate the hosting and management of competitions. This paper outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the competitions. The RRC Annotation and Evaluation Platform is a modular framework, fully accessible through on-line interfaces. It comprises a collection of tools and services for managing all processes involved with defining and evaluating a research task, from dataset definition to annotation management, evaluation specification and results analysis. Although the framework has been designed with robust reading research in mind, many of the provided tools are generic by design. All aspects of the RRC Annotation and Evaluation Framework are available for research use.
Tasks
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06617v2
PDF	http://arxiv.org/pdf/1710.06617v2.pdf
PWC	https://paperswithcode.com/paper/the-robust-reading-competition-annotation-and
Repo
Framework

Map-guided Hyperspectral Image Superpixel Segmentation Using Proportion Maps


Title	Map-guided Hyperspectral Image Superpixel Segmentation Using Proportion Maps
Authors	Hao Sun, Alina Zare
Abstract	A map-guided superpixel segmentation method for hyperspectral imagery is developed and introduced. The proposed approach develops a hyperspectral-appropriate version of the SLIC superpixel segmentation algorithm, leverages map information to guide segmentation, and incorporates the semi-supervised Partial Membership Latent Dirichlet Allocation (sPM-LDA) to obtain a final superpixel segmentation. The proposed method is applied to two real hyperspectral data sets and quantitative cluster validity metrics indicate that the proposed approach outperforms existing hyperspectral superpixel segmentation methods.
Tasks
Published	2017-01-06
URL	http://arxiv.org/abs/1701.01745v1
PDF	http://arxiv.org/pdf/1701.01745v1.pdf
PWC	https://paperswithcode.com/paper/map-guided-hyperspectral-image-superpixel
Repo
Framework

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections


Title	ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections
Authors	Sujith Ravi
Abstract	Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly “on” device. We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks–a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler “projection” network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question “how many neural bits are required to solve a given task?” using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets.
Tasks	Image Classification, Text Classification
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00630v2
PDF	http://arxiv.org/pdf/1708.00630v2.pdf
PWC	https://paperswithcode.com/paper/projectionnet-learning-efficient-on-device
Repo
Framework

Uncovering Latent Style Factors for Expressive Speech Synthesis


Title	Uncovering Latent Style Factors for Expressive Speech Synthesis
Authors	Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Battenberg, Rob Clark, Rif A. Saurous
Abstract	Prosodic modeling is a core problem in speech synthesis. The key challenge is producing desirable prosody from textual input containing only phonetic information. In this preliminary study, we introduce the concept of “style tokens” in Tacotron, a recently proposed end-to-end neural speech synthesis model. Using style tokens, we aim to extract independent prosodic styles from training data. We show that without annotation data or an explicit supervision signal, our approach can automatically learn a variety of prosodic variations in a purely data-driven way. Importantly, each style token corresponds to a fixed style factor regardless of the given text sequence. As a result, we can control the prosodic style of synthetic speech in a somewhat predictable and globally consistent way.
Tasks	Speech Synthesis
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00520v1
PDF	http://arxiv.org/pdf/1711.00520v1.pdf
PWC	https://paperswithcode.com/paper/uncovering-latent-style-factors-for
Repo
Framework

Time Series Compression Based on Adaptive Piecewise Recurrent Autoencoder


Title	Time Series Compression Based on Adaptive Piecewise Recurrent Autoencoder
Authors	Daniel Hsu
Abstract	Time series account for a large proportion of the data stored in financial, medical and scientific databases. The efficient storage of time series is important in practical applications. In this paper, we propose a novel compression scheme for time series. The encoder and decoder are both composed by recurrent neural networks (RNN) such as long short-term memory (LSTM). There is an autoencoder between encoder and decoder, which encodes the hidden state and input together and decodes them at the decoder side. Moreover, we pre-process the original time series by partitioning it into segments with various lengths which have similar total variation. The experimental study shows that the proposed algorithm can achieve competitive compression ratio on real-world time series.
Tasks	Time Series
Published	2017-07-23
URL	http://arxiv.org/abs/1707.07961v2
PDF	http://arxiv.org/pdf/1707.07961v2.pdf
PWC	https://paperswithcode.com/paper/time-series-compression-based-on-adaptive
Repo
Framework

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks


Title	AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Authors	Aditya Devarakonda, Maxim Naumov, Michael Garland
Abstract	Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer more parallelism and hence better computational efficiency. We have developed a new training approach that, rather than statically choosing a single batch size for all epochs, adaptively increases the batch size during the training process. Our method delivers the convergence rate of small batch sizes while achieving performance similar to large batch sizes. We analyse our approach using the standard AlexNet, ResNet, and VGG networks operating on the popular CIFAR-10, CIFAR-100, and ImageNet datasets. Our results demonstrate that learning with adaptive batch sizes can improve performance by factors of up to 6.25 on 4 NVIDIA Tesla P100 GPUs while changing accuracy by less than 1% relative to training with fixed batch sizes.
Tasks
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02029v2
PDF	http://arxiv.org/pdf/1712.02029v2.pdf
PWC	https://paperswithcode.com/paper/adabatch-adaptive-batch-sizes-for-training
Repo
Framework