July 28, 2019

3135 words 15 mins read

Paper Group ANR 344

Automatic Quality Estimation for ASR System Combination. Large-scale Image Geo-Localization Using Dominant Sets. The Geometry of Nodal Sets and Outlier Detection. Image Captioning and Classification of Dangerous Situations. DFUNet: Convolutional Neural Networks for Diabetic Foot Ulcer Classification. Data Distillation for Controlling Specificity in …

Automatic Quality Estimation for ASR System Combination


Title	Automatic Quality Estimation for ASR System Combination
Authors	Shahab Jalalvand, Matteo Negri, Daniele Falavigna, Marco Matassoni, Marco Turchi
Abstract	Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at “segment level” instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%.
Tasks	Speech Recognition
Published	2017-06-22
URL	http://arxiv.org/abs/1706.07238v1
PDF	http://arxiv.org/pdf/1706.07238v1.pdf
PWC	https://paperswithcode.com/paper/automatic-quality-estimation-for-asr-system
Repo
Framework

Large-scale Image Geo-Localization Using Dominant Sets


Title	Large-scale Image Geo-Localization Using Dominant Sets
Authors	Eyasu Zemene, Yonatan Tariku, Haroon Idrees, Andrea Prati, Marcello Pelillo, Mubarak Shah
Abstract	This paper presents a new approach for the challenging problem of geo-locating an image using image matching in a structured database of city-wide reference images with known GPS coordinates. We cast the geo-localization as a clustering problem on local image features. Akin to existing approaches on the problem, our framework builds on low-level features which allow partial matching between images. For each local feature in the query image, we find its approximate nearest neighbors in the reference set. Next, we cluster the features from reference images using Dominant Set clustering, which affords several advantages over existing approaches. First, it permits variable number of nodes in the cluster which we use to dynamically select the number of nearest neighbors (typically coming from multiple reference images) for each query feature based on its discrimination value. Second, as we also quantify in our experiments, this approach is several orders of magnitude faster than existing approaches. Thus, we obtain multiple clusters (different local maximizers) and obtain a robust final solution to the problem using multiple weak solutions through constrained Dominant Set clustering on global image features, where we enforce the constraint that the query image must be included in the cluster. This second level of clustering also bypasses heuristic approaches to voting and selecting the reference image that matches to the query. We evaluated the proposed framework on an existing dataset of 102k street view images as well as a new dataset of 300k images, and show that it outperforms the state-of-the-art by 20% and 7%, respectively, on the two datasets.
Tasks
Published	2017-02-04
URL	http://arxiv.org/abs/1702.01238v3
PDF	http://arxiv.org/pdf/1702.01238v3.pdf
PWC	https://paperswithcode.com/paper/large-scale-image-geo-localization-using
Repo
Framework

The Geometry of Nodal Sets and Outlier Detection


Title	The Geometry of Nodal Sets and Outlier Detection
Authors	Xiuyuan Cheng, Gal Mishne, Stefan Steinerberger
Abstract	Let $(M,g)$ be a compact manifold and let $-\Delta \phi_k = \lambda_k \phi_k$ be the sequence of Laplacian eigenfunctions. We present a curious new phenomenon which, so far, we only managed to understand in a few highly specialized cases: the family of functions $f_N:M \rightarrow \mathbb{R}{\geq 0}$ $$ f_N(x) = \sum{k \leq N}{ \frac{1}{\sqrt{\lambda_k}} \frac{\phi_k(x)}{\phi_k_{L^{\infty}(M)}}}$$ seems strangely suited for the detection of anomalous points on the manifold. It may be heuristically interpreted as the sum over distances to the nearest nodal line and potentially hints at a new phenomenon in spectral geometry. We give rigorous statements on the unit square $[0,1]^2$ (where minima localize in $\mathbb{Q}^2$) and on Paley graphs (where $f_N$ recovers the geometry of quadratic residues of the underlying finite field $\mathbb{F}_p$). Numerical examples show that the phenomenon seems to arise on fairly generic manifolds.
Tasks	Outlier Detection
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01362v1
PDF	http://arxiv.org/pdf/1706.01362v1.pdf
PWC	https://paperswithcode.com/paper/the-geometry-of-nodal-sets-and-outlier
Repo
Framework

Image Captioning and Classification of Dangerous Situations


Title	Image Captioning and Classification of Dangerous Situations
Authors	Octavio Arriaga, Paul Plöger, Matias Valdenegro-Toro
Abstract	Current robot platforms are being employed to collaborate with humans in a wide range of domestic and industrial tasks. These environments require autonomous systems that are able to classify and communicate anomalous situations such as fires, injured persons, car accidents; or generally, any potentially dangerous situation for humans. In this paper we introduce an anomaly detection dataset for the purpose of robot applications as well as the design and implementation of a deep learning architecture that classifies and describes dangerous situations using only a single image as input. We report a classification accuracy of 97 % and METEOR score of 16.2. We will make the dataset publicly available after this paper is accepted.
Tasks	Anomaly Detection, Image Captioning
Published	2017-11-07
URL	http://arxiv.org/abs/1711.02578v1
PDF	http://arxiv.org/pdf/1711.02578v1.pdf
PWC	https://paperswithcode.com/paper/image-captioning-and-classification-of
Repo
Framework

DFUNet: Convolutional Neural Networks for Diabetic Foot Ulcer Classification


Title	DFUNet: Convolutional Neural Networks for Diabetic Foot Ulcer Classification
Authors	Manu Goyal, Neil D. Reeves, Adrian K. Davison, Satyan Rajbhandari, Jennifer Spragg, Moi Hoon Yap
Abstract	Globally, in 2016, one out of eleven adults suffered from Diabetes Mellitus. Diabetic Foot Ulcers (DFU) are a major complication of this disease, which if not managed properly can lead to amputation. Current clinical approaches to DFU treatment rely on patient and clinician vigilance, which has significant limitations such as the high cost involved in the diagnosis, treatment and lengthy care of the DFU. We collected an extensive dataset of foot images, which contain DFU from different patients. In this paper, we have proposed the use of traditional computer vision features for detecting foot ulcers among diabetic patients, which represent a cost-effective, remote and convenient healthcare solution. Furthermore, we used Convolutional Neural Networks (CNNs) for the first time in DFU classification. We have proposed a novel convolutional neural network architecture, DFUNet, with better feature extraction to identify the feature differences between healthy skin and the DFU. Using 10-fold cross-validation, DFUNet achieved an AUC score of 0.962. This outperformed both the machine learning and deep learning classifiers we have tested. Here we present the development of a novel and highly sensitive DFUNet for objectively detecting the presence of DFUs. This novel approach has the potential to deliver a paradigm shift in diabetic foot care.
Tasks
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10448v2
PDF	http://arxiv.org/pdf/1711.10448v2.pdf
PWC	https://paperswithcode.com/paper/dfunet-convolutional-neural-networks-for
Repo
Framework

Data Distillation for Controlling Specificity in Dialogue Generation


Title	Data Distillation for Controlling Specificity in Dialogue Generation
Authors	Jiwei Li, Will Monroe, Dan Jurafsky
Abstract	People speak at different levels of specificity in different situations. Depending on their knowledge, interlocutors, mood, etc.} A conversational agent should have this ability and know when to be specific and when to be general. We propose an approach that gives a neural network–based conversational agent this ability. Our approach involves alternating between \emph{data distillation} and model training : removing training examples that are closest to the responses most commonly produced by the model trained from the last round and then retrain the model on the remaining dataset. Dialogue generation models trained with different degrees of data distillation manifest different levels of specificity. We then train a reinforcement learning system for selecting among this pool of generation models, to choose the best level of specificity for a given input. Compared to the original generative model trained without distillation, the proposed system is capable of generating more interesting and higher-quality responses, in addition to appropriately adjusting specificity depending on the context. Our research constitutes a specific case of a broader approach involving training multiple subsystems from a single dataset distinguished by differences in a specific property one wishes to model. We show that from such a set of subsystems, one can use reinforcement learning to build a system that tailors its output to different input contexts at test time.
Tasks	Dialogue Generation
Published	2017-02-22
URL	http://arxiv.org/abs/1702.06703v1
PDF	http://arxiv.org/pdf/1702.06703v1.pdf
PWC	https://paperswithcode.com/paper/data-distillation-for-controlling-specificity
Repo
Framework

SAR Image Colorization: Converting Single-Polarization to Fully Polarimetric Using Deep Neural Networks


Title	SAR Image Colorization: Converting Single-Polarization to Fully Polarimetric Using Deep Neural Networks
Authors	Qian Song, Feng Xu, Ya-Qiu Jin
Abstract	A deep neural networks based method is proposed to convert single polarization grayscale SAR image to fully polarimetric. It consists of two components: a feature extractor network to extract hierarchical multi-scale spatial features of grayscale SAR image, followed by a feature translator network to map spatial feature to polarimetric feature with which the polarimetric covariance matrix of each pixel can be reconstructed. Both qualitative and quantitative experiments with real fully polarimetric data are conducted to show the efficacy of the proposed method. The reconstructed full-pol SAR image agrees well with the true full-pol image. Existing PolSAR applications such as model-based decomposition and unsupervised classification can be applied directly to the reconstructed full-pol SAR images. This framework can be easily extended to reconstruction of full-pol data from compact-pol data. The experiment results also show that the proposed method could be potentially used for interference removal on the cross-polarization channel.
Tasks	Colorization
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07225v1
PDF	http://arxiv.org/pdf/1707.07225v1.pdf
PWC	https://paperswithcode.com/paper/sar-image-colorization-converting-single
Repo
Framework

Phrase-based Image Captioning with Hierarchical LSTM Model


Title	Phrase-based Image Captioning with Hierarchical LSTM Model
Authors	Ying Hua Tan, Chee Seng Chan
Abstract	Automatic generation of caption to describe the content of an image has been gaining a lot of research interests recently, where most of the existing works treat the image caption as pure sequential data. Natural language, however possess a temporal hierarchy structure, with complex dependencies between each subsequence. In this paper, we propose a phrase-based hierarchical Long Short-Term Memory (phi-LSTM) model to generate image description. In contrast to the conventional solutions that generate caption in a pure sequential manner, our proposed model decodes image caption from phrase to sentence. It consists of a phrase decoder at the bottom hierarchy to decode noun phrases of variable length, and an abbreviated sentence decoder at the upper hierarchy to decode an abbreviated form of the image description. A complete image caption is formed by combining the generated phrases with sentence during the inference stage. Empirically, our proposed model shows a better or competitive result on the Flickr8k, Flickr30k and MS-COCO datasets in comparison to the state-of-the art models. We also show that our proposed model is able to generate more novel captions (not seen in the training data) which are richer in word contents in all these three datasets.
Tasks	Image Captioning
Published	2017-11-11
URL	http://arxiv.org/abs/1711.05557v1
PDF	http://arxiv.org/pdf/1711.05557v1.pdf
PWC	https://paperswithcode.com/paper/phrase-based-image-captioning-with
Repo
Framework

Elliptification of Rectangular Imagery


Title	Elliptification of Rectangular Imagery
Authors	Chamberlain Fong
Abstract	We present and discuss different algorithms for converting rectangular imagery into elliptical regions. We mainly focus on methods that use mathematical mappings with explicit and invertible equations. The key idea is to start with invertible mappings between the square and the circular disc then extend it to handle rectangles and ellipses. This extension can be done by simply removing the eccentricity and reintroducing it back after using a chosen square-to-disc mapping.
Tasks
Published	2017-09-22
URL	https://arxiv.org/abs/1709.07875v4
PDF	https://arxiv.org/pdf/1709.07875v4.pdf
PWC	https://paperswithcode.com/paper/elliptification-of-rectangular-imagery
Repo
Framework

Balancing Explicability and Explanation in Human-Aware Planning


Title	Balancing Explicability and Explanation in Human-Aware Planning
Authors	Tathagata Chakraborti, Sarath Sreedharan, Subbarao Kambhampati
Abstract	Human aware planning requires an agent to be aware of the intentions, capabilities and mental model of the human in the loop during its decision process. This can involve generating plans that are explicable to a human observer as well as the ability to provide explanations when such plans cannot be generated. This has led to the notion “multi-model planning” which aim to incorporate effects of human expectation in the deliberative process of a planner - either in the form of explicable task planning or explanations produced thereof. In this paper, we bring these two concepts together and show how a planner can account for both these needs and achieve a trade-off during the plan generation process itself by means of a model-space search method MEGA. This in effect provides a comprehensive perspective of what it means for a decision making agent to be “human-aware” by bringing together existing principles of planning under the umbrella of a single plan generation process. We situate our discussion specifically keeping in mind the recent work on explicable planning and explanation generation, and illustrate these concepts in modified versions of two well known planning domains, as well as a demonstration on a robot involved in a typical search and reconnaissance task with an external supervisor.
Tasks	Decision Making
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00543v2
PDF	http://arxiv.org/pdf/1708.00543v2.pdf
PWC	https://paperswithcode.com/paper/balancing-explicability-and-explanation-in
Repo
Framework

Online Learning for Offloading and Autoscaling in Energy Harvesting Mobile Edge Computing


Title	Online Learning for Offloading and Autoscaling in Energy Harvesting Mobile Edge Computing
Authors	Jie Xu, Lixing Chen, Shaolei Ren
Abstract	Mobile edge computing (a.k.a. fog computing) has recently emerged to enable in-situ processing of delay-sensitive applications at the edge of mobile networks. Providing grid power supply in support of mobile edge computing, however, is costly and even infeasible (in certain rugged or under-developed areas), thus mandating on-site renewable energy as a major or even sole power supply in increasingly many scenarios. Nonetheless, the high intermittency and unpredictability of renewable energy make it very challenging to deliver a high quality of service to users in energy harvesting mobile edge computing systems. In this paper, we address the challenge of incorporating renewables into mobile edge computing and propose an efficient reinforcement learning-based resource management algorithm, which learns on-the-fly the optimal policy of dynamic workload offloading (to the centralized cloud) and edge server provisioning to minimize the long-term system cost (including both service delay and operational cost). Our online learning algorithm uses a decomposition of the (offline) value iteration and (online) reinforcement learning, thus achieving a significant improvement of learning rate and run-time performance when compared to standard reinforcement learning algorithms such as Q-learning. We prove the convergence of the proposed algorithm and analytically show that the learned policy has a simple monotone structure amenable to practical implementation. Our simulation results validate the efficacy of our algorithm, which significantly improves the edge computing performance compared to fixed or myopic optimization schemes and conventional reinforcement learning algorithms.
Tasks	Q-Learning
Published	2017-03-17
URL	http://arxiv.org/abs/1703.06060v1
PDF	http://arxiv.org/pdf/1703.06060v1.pdf
PWC	https://paperswithcode.com/paper/online-learning-for-offloading-and
Repo
Framework

3D Convolutional Neural Networks for Brain Tumor Segmentation: A Comparison of Multi-resolution Architectures


Title	3D Convolutional Neural Networks for Brain Tumor Segmentation: A Comparison of Multi-resolution Architectures
Authors	Adrià Casamitjana, Santi Puch, Asier Aduriz, Verónica Vilaplana
Abstract	This paper analyzes the use of 3D Convolutional Neural Networks for brain tumor segmentation in MR images. We address the problem using three different architectures that combine fine and coarse features to obtain the final segmentation. We compare three different networks that use multi-resolution features in terms of both design and performance and we show that they improve their single-resolution counterparts.
Tasks	Brain Tumor Segmentation
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08236v1
PDF	http://arxiv.org/pdf/1705.08236v1.pdf
PWC	https://paperswithcode.com/paper/3d-convolutional-neural-networks-for-brain
Repo
Framework

Material Editing Using a Physically Based Rendering Network


Title	Material Editing Using a Physically Based Rendering Network
Authors	Guilin Liu, Duygu Ceylan, Ersin Yumer, Jimei Yang, Jyh-Ming Lien
Abstract	The ability to edit materials of objects in images is desirable by many content creators. However, this is an extremely challenging task as it requires to disentangle intrinsic physical properties of an image. We propose an end-to-end network architecture that replicates the forward image formation process to accomplish this task. Specifically, given a single image, the network first predicts intrinsic properties, i.e. shape, illumination, and material, which are then provided to a rendering layer. This layer performs in-network image synthesis, thereby enabling the network to understand the physics behind the image formation process. The proposed rendering layer is fully differentiable, supports both diffuse and specular materials, and thus can be applicable in a variety of problem settings. We demonstrate a rich set of visually plausible material editing examples and provide an extensive comparative study.
Tasks	Image Generation
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00106v2
PDF	http://arxiv.org/pdf/1708.00106v2.pdf
PWC	https://paperswithcode.com/paper/material-editing-using-a-physically-based
Repo
Framework

Parameter identification in Markov chain choice models


Title	Parameter identification in Markov chain choice models
Authors	Arushi Gupta, Daniel Hsu
Abstract	This work studies the parameter identification problem for the Markov chain choice model of Blanchet, Gallego, and Goyal used in assortment planning. In this model, the product selected by a customer is determined by a Markov chain over the products, where the products in the offered assortment are absorbing states. The underlying parameters of the model were previously shown to be identifiable from the choice probabilities for the all-products assortment, together with choice probabilities for assortments of all-but-one products. Obtaining and estimating choice probabilities for such large assortments is not desirable in many settings. The main result of this work is that the parameters may be identified from assortments of sizes two and three, regardless of the total number of products. The result is obtained via a simple and efficient parameter recovery algorithm.
Tasks
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00729v3
PDF	http://arxiv.org/pdf/1706.00729v3.pdf
PWC	https://paperswithcode.com/paper/parameter-identification-in-markov-chain
Repo
Framework

Spatio-temporal interaction model for crowd video analysis


Title	Spatio-temporal interaction model for crowd video analysis
Authors	Neha Bhargava, Subhasis Chaudhuri
Abstract	We present an unsupervised approach to analyze crowd at various levels of granularity $-$ individual, group and collective. We also propose a motion model to represent the collective motion of the crowd. The model captures the spatio-temporal interaction pattern of the crowd from the trajectory data captured over a time period. Furthermore, we also propose an effective group detection algorithm that utilizes the eigenvectors of the interaction matrix of the model. We also show that the eigenvalues of the interaction matrix characterize various group activities such as being stationary, walking, splitting and approaching. The algorithm is also extended trivially to recognize individual activity. Finally, we discover the overall crowd behavior by classifying a crowd video in one of the eight categories. Since the crowd behavior is determined by its constituent groups, we demonstrate the usefulness of group level features during classification. Extensive experimentation on various datasets demonstrates a superlative performance of our algorithms over the state-of-the-art methods.
Tasks
Published	2017-10-31
URL	http://arxiv.org/abs/1710.11354v1
PDF	http://arxiv.org/pdf/1710.11354v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-interaction-model-for-crowd
Repo
Framework