October 16, 2019

3379 words 16 mins read

Paper Group ANR 975

Paper Group ANR 975

Paraphrase Detection on Noisy Subtitles in Six Languages. Full-Frame Scene Coordinate Regression for Image-Based Localization. Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems. Generative Adversarial Networks for MR-CT Deformable Image Registration. Blood Vessel Geometry Synthesis using …

Paraphrase Detection on Noisy Subtitles in Six Languages

Title Paraphrase Detection on Noisy Subtitles in Six Languages
Authors Eetu Sjöblom, Mathias Creutz, Mikko Aulamo
Abstract We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six European languages: German, English, Finnish, French, Russian, and Swedish. We train two types of supervised sentence embedding models: a word-averaging (WA) model and a gated recurrent averaging network (GRAN) model. We find out that GRAN outperforms WA and is more robust to noisy training data. Better results are obtained with more and noisier data than less and cleaner data. Additionally, we experiment on other datasets, without reaching the same level of performance, because of domain mismatch between training and test data.
Tasks Sentence Embedding
Published 2018-09-21
URL http://arxiv.org/abs/1809.07978v1
PDF http://arxiv.org/pdf/1809.07978v1.pdf
PWC https://paperswithcode.com/paper/paraphrase-detection-on-noisy-subtitles-in
Repo
Framework

Full-Frame Scene Coordinate Regression for Image-Based Localization

Title Full-Frame Scene Coordinate Regression for Image-Based Localization
Authors Xiaotian Li, Juha Ylioinas, Juho Kannala
Abstract Image-based localization, or camera relocalization, is a fundamental problem in computer vision and robotics, and it refers to estimating camera pose from an image. Recent state-of-the-art approaches use learning based methods, such as Random Forests (RFs) and Convolutional Neural Networks (CNNs), to regress for each pixel in the image its corresponding position in the scene’s world coordinate frame, and solve the final pose via a RANSAC-based optimization scheme using the predicted correspondences. In this paper, instead of in a patch-based manner, we propose to perform the scene coordinate regression in a full-frame manner to make the computation efficient at test time and, more importantly, to add more global context to the regression process to improve the robustness. To do so, we adopt a fully convolutional encoder-decoder neural network architecture which accepts a whole image as input and produces scene coordinate predictions for all pixels in the image. However, using more global context is prone to overfitting. To alleviate this issue, we propose to use data augmentation to generate more data for training. In addition to the data augmentation in 2D image space, we also augment the data in 3D space. We evaluate our approach on the publicly available 7-Scenes dataset, and experiments show that it has better scene coordinate predictions and achieves state-of-the-art results in localization with improved robustness on the hardest frames (e.g., frames with repeated structures).
Tasks Camera Relocalization, Data Augmentation, Image-Based Localization
Published 2018-02-09
URL http://arxiv.org/abs/1802.03237v2
PDF http://arxiv.org/pdf/1802.03237v2.pdf
PWC https://paperswithcode.com/paper/full-frame-scene-coordinate-regression-for
Repo
Framework

Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

Title Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems
Authors Seonghan Ryu, Seokhwan Kim, Junhwi Choi, Hwanjo Yu, Gary Geunbae Lee
Abstract To ensure satisfactory user experience, dialog systems must be able to determine whether an input sentence is in-domain (ID) or out-of-domain (OOD). We assume that only ID sentences are available as training data because collecting enough OOD sentences in an unbiased way is a laborious and time-consuming job. This paper proposes a novel neural sentence embedding method that represents sentences in a low-dimensional continuous vector space that emphasizes aspects that distinguish ID cases from OOD cases. We first used a large set of unlabeled text to pre-train word representations that are used to initialize neural sentence embedding. Then we used domain-category analysis as an auxiliary task to train neural sentence embedding for OOD sentence detection. After the sentence representations were learned, we used them to train an autoencoder aimed at OOD sentence detection. We evaluated our method by experimentally comparing it to the state-of-the-art methods in an eight-domain dialog system; our proposed method achieved the highest accuracy in all tests.
Tasks Sentence Embedding
Published 2018-07-27
URL http://arxiv.org/abs/1807.11567v1
PDF http://arxiv.org/pdf/1807.11567v1.pdf
PWC https://paperswithcode.com/paper/neural-sentence-embedding-using-only-in
Repo
Framework

Generative Adversarial Networks for MR-CT Deformable Image Registration

Title Generative Adversarial Networks for MR-CT Deformable Image Registration
Authors Christine Tanner, Firat Ozdemir, Romy Profanter, Valeriy Vishnevsky, Ender Konukoglu, Orcun Goksel
Abstract Deformable Image Registration (DIR) of MR and CT images is one of the most challenging registration task, due to the inherent structural differences of the modalities and the missing dense ground truth. Recently cycle Generative Adversarial Networks (cycle-GANs) have been used to learn the intensity relationship between these 2 modalities for unpaired brain data. Yet its usefulness for DIR was not assessed. In this study we evaluate the DIR performance for thoracic and abdominal organs after synthesis by cycle-GAN. We show that geometric changes, which differentiate the two populations (e.g. inhale vs. exhale), are readily synthesized as well. This causes substantial problems for any application which relies on spatial correspondences being preserved between the real and the synthesized image (e.g. plan, segmentation, landmark propagation). To alleviate this problem, we investigated reducing the spatial information provided to the discriminator by decreasing the size of its receptive fields. Image synthesis was learned from 17 unpaired subjects per modality. Registration performance was evaluated with respect to manual segmentations of 11 structures for 3 subjects from the VISERAL challenge. State-of-the-art DIR methods based on Normalized Mutual Information (NMI), Modality Independent Neighborhood Descriptor (MIND) and their novel combination achieved a mean segmentation overlap ratio of 76.7, 67.7, 76.9%, respectively. This dropped to 69.1% or less when registering images synthesized by cycle-GAN based on local correlation, due to the poor performance on the thoracic region, where large lung volume changes were synthesized. Performance for the abdominal region was similar to that of CT-MRI NMI registration (77.4 vs. 78.8%) when using 3D synthesizing MRIs (12 slices) and medium sized receptive fields for the discriminator.
Tasks Image Generation, Image Registration
Published 2018-07-19
URL http://arxiv.org/abs/1807.07349v1
PDF http://arxiv.org/pdf/1807.07349v1.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-networks-for-mr-ct
Repo
Framework

Blood Vessel Geometry Synthesis using Generative Adversarial Networks

Title Blood Vessel Geometry Synthesis using Generative Adversarial Networks
Authors Jelmer M. Wolterink, Tim Leiner, Ivana Isgum
Abstract Computationally synthesized blood vessels can be used for training and evaluation of medical image analysis applications. We propose a deep generative model to synthesize blood vessel geometries, with an application to coronary arteries in cardiac CT angiography (CCTA). In the proposed method, a Wasserstein generative adversarial network (GAN) consisting of a generator and a discriminator network is trained. While the generator tries to synthesize realistic blood vessel geometries, the discriminator tries to distinguish synthesized geometries from those of real blood vessels. Both real and synthesized blood vessel geometries are parametrized as 1D signals based on the central vessel axis. The generator can optionally be provided with an attribute vector to synthesize vessels with particular characteristics. The GAN was optimized using a reference database with parametrizations of 4,412 real coronary artery geometries extracted from CCTA scans. After training, plausible coronary artery geometries could be synthesized based on random vectors sampled from a latent space. A qualitative analysis showed strong similarities between real and synthesized coronary arteries. A detailed analysis of the latent space showed that the diversity present in coronary artery anatomy was accurately captured by the generator. Results show that Wasserstein generative adversarial networks can be used to synthesize blood vessel geometries.
Tasks
Published 2018-04-12
URL http://arxiv.org/abs/1804.04381v1
PDF http://arxiv.org/pdf/1804.04381v1.pdf
PWC https://paperswithcode.com/paper/blood-vessel-geometry-synthesis-using
Repo
Framework

Unbiased Estimation of the Value of an Optimized Policy

Title Unbiased Estimation of the Value of an Optimized Policy
Authors Elon Portugaly, Joseph J. Pfeiffer III
Abstract Randomized trials, also known as A/B tests, are used to select between two policies: a control and a treatment. Given a corresponding set of features, we can ideally learn an optimized policy P that maps the A/B test data features to action space and optimizes reward. However, although A/B testing provides an unbiased estimator for the value of deploying B (i.e., switching from policy A to B), direct application of those samples to learn the the optimized policy P generally does not provide an unbiased estimator of the value of P as the samples were observed when constructing P. In situations where the cost and risks associated of deploying a policy are high, such an unbiased estimator is highly desirable. We present a procedure for learning optimized policies and getting unbiased estimates for the value of deploying them. We wrap any policy learning procedure with a bagging process and obtain out-of-bag policy inclusion decisions for each sample. We then prove that inverse-propensity-weighting effect estimator is unbiased when applied to the optimized subset. Likewise, we apply the same idea to obtain out-of-bag unbiased per-sample value estimate of the measurement that is independent of the randomized treatment, and use these estimates to build an unbiased doubly-robust effect estimator. Lastly, we empirically shown that even when the average treatment effect is negative we can find a positive optimized policy.
Tasks
Published 2018-06-07
URL http://arxiv.org/abs/1806.02794v1
PDF http://arxiv.org/pdf/1806.02794v1.pdf
PWC https://paperswithcode.com/paper/unbiased-estimation-of-the-value-of-an
Repo
Framework

Learning to Personalize in Appearance-Based Gaze Tracking

Title Learning to Personalize in Appearance-Based Gaze Tracking
Authors Erik Lindén, Jonas Sjöstrand, Alexandre Proutiere
Abstract Personal variations severely limit the performance of appearance-based gaze tracking. Adapting to these variations using standard neural network model adaptation methods is difficult. The problems range from overfitting, due to small amounts of training data, to underfitting, due to restrictive model architectures. We tackle these problems by introducing the SPatial Adaptive GaZe Estimator (SPAZE). By modeling personal variations as a low-dimensional latent parameter space, SPAZE provides just enough adaptability to capture the range of personal variations without being prone to overfitting. Calibrating SPAZE for a new person reduces to solving a small optimization problem. SPAZE achieves an error of 2.70 degrees with 9 calibration samples on MPIIGaze, improving on the state-of-the-art by 14 %. We contribute to gaze tracking research by empirically showing that personal variations are well-modeled as a 3-dimensional latent parameter space for each eye. We show that this low-dimensionality is expected by examining model-based approaches to gaze tracking. We also show that accurate head pose-free gaze tracking is possible.
Tasks Calibration, Gaze Estimation
Published 2018-07-02
URL https://arxiv.org/abs/1807.00664v3
PDF https://arxiv.org/pdf/1807.00664v3.pdf
PWC https://paperswithcode.com/paper/appearance-based-3d-gaze-estimation-with
Repo
Framework

Temporal Convolution Networks for Real-Time Abdominal Fetal Aorta Analysis with Ultrasound

Title Temporal Convolution Networks for Real-Time Abdominal Fetal Aorta Analysis with Ultrasound
Authors Nicolo’ Savioli, Silvia Visentin, Erich Cosmi, Enrico Grisan, Pablo Lamata, Giovanni Montana
Abstract The automatic analysis of ultrasound sequences can substantially improve the efficiency of clinical diagnosis. In this work we present our attempt to automate the challenging task of measuring the vascular diameter of the fetal abdominal aorta from ultrasound images. We propose a neural network architecture consisting of three blocks: a convolutional layer for the extraction of imaging features, a Convolution Gated Recurrent Unit (C-GRU) for enforcing the temporal coherence across video frames and exploiting the temporal redundancy of a signal, and a regularized loss function, called \textit{CyclicLoss}, to impose our prior knowledge about the periodicity of the observed signal. We present experimental evidence suggesting that the proposed architecture can reach an accuracy substantially superior to previously proposed methods, providing an average reduction of the mean squared error from $0.31 mm^2$ (state-of-art) to $0.09 mm^2$, and a relative error reduction from $8.1%$ to $5.3%$. The mean execution speed of the proposed approach of 289 frames per second makes it suitable for real time clinical use.
Tasks
Published 2018-07-11
URL http://arxiv.org/abs/1807.04056v1
PDF http://arxiv.org/pdf/1807.04056v1.pdf
PWC https://paperswithcode.com/paper/temporal-convolution-networks-for-real-time
Repo
Framework

On exponential convergence of SGD in non-convex over-parametrized learning

Title On exponential convergence of SGD in non-convex over-parametrized learning
Authors Raef Bassily, Mikhail Belkin, Siyuan Ma
Abstract Large over-parametrized models learned via stochastic gradient descent (SGD) methods have become a key element in modern machine learning. Although SGD methods are very effective in practice, most theoretical analyses of SGD suggest slower convergence than what is empirically observed. In our recent work [8] we analyzed how interpolation, common in modern over-parametrized learning, results in exponential convergence of SGD with constant step size for convex loss functions. In this note, we extend those results to a much broader non-convex function class satisfying the Polyak-Lojasiewicz (PL) condition. A number of important non-convex problems in machine learning, including some classes of neural networks, have been recently shown to satisfy the PL condition. We argue that the PL condition provides a relevant and attractive setting for many machine learning problems, particularly in the over-parametrized regime.
Tasks
Published 2018-11-06
URL http://arxiv.org/abs/1811.02564v1
PDF http://arxiv.org/pdf/1811.02564v1.pdf
PWC https://paperswithcode.com/paper/on-exponential-convergence-of-sgd-in-non
Repo
Framework

Using Mode Connectivity for Loss Landscape Analysis

Title Using Mode Connectivity for Loss Landscape Analysis
Authors Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Abstract Mode connectivity is a recently introduced frame- work that empirically establishes the connected- ness of minima by finding a high accuracy curve between two independently trained models. To investigate the limits of this setup, we examine the efficacy of this technique in extreme cases where the input models are trained or initialized differently. We find that the procedure is resilient to such changes. Given this finding, we propose using the framework for analyzing loss surfaces and training trajectories more generally, and in this direction, study SGD with cosine annealing and restarts (SGDR). We report that while SGDR moves over barriers in its trajectory, propositions claiming that it converges to and escapes from multiple local minima are not substantiated by our empirical results.
Tasks
Published 2018-06-18
URL http://arxiv.org/abs/1806.06977v1
PDF http://arxiv.org/pdf/1806.06977v1.pdf
PWC https://paperswithcode.com/paper/using-mode-connectivity-for-loss-landscape
Repo
Framework

Discovering Underlying Plans Based on Shallow Models

Title Discovering Underlying Plans Based on Shallow Models
Authors Hankz Hankui Zhuo, Yantian Zha, Subbarao Kambhampati
Abstract Plan recognition aims to discover target plans (i.e., sequences of actions) behind observed actions, with history plan libraries or domain models in hand. Previous approaches either discover plans by maximally “matching” observed actions to plan libraries, assuming target plans are from plan libraries, or infer plans by executing domain models to best explain the observed actions, assuming that complete domain models are available. In real world applications, however, target plans are often not from plan libraries, and complete domain models are often not available, since building complete sets of plans and complete domain models are often difficult or expensive. In this paper we view plan libraries as corpora and learn vector representations of actions using the corpora, we then discover target plans based on the vector representations. Specifically, we propose two approaches, DUP and RNNPlanner, to discover target plans based on vector representations of actions. DUP explores the EM-style framework to capture local contexts of actions and discover target plans by optimizing the probability of target plans, while RNNPlanner aims to leverage long-short term contexts of actions based on RNNs (recurrent neural networks) framework to help recognize target plans. In the experiments, we empirically show that our approaches are capable of discovering underlying plans that are not from plan libraries, without requiring domain models provided. We demonstrate the effectiveness of our approaches by comparing its performance to traditional plan recognition approaches in three planning domains. We also compare DUP and RNNPlanner to see their advantages and disadvantages.
Tasks
Published 2018-03-04
URL http://arxiv.org/abs/1803.02208v1
PDF http://arxiv.org/pdf/1803.02208v1.pdf
PWC https://paperswithcode.com/paper/discovering-underlying-plans-based-on-shallow
Repo
Framework

Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems

Title Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems
Authors Cumhur Erkan Tuncali, James Kapinski, Hisahiro Ito, Jyotirmoy V. Deshmukh
Abstract We present a simulation-based approach for generating barrier certificate functions for safety verification of cyber-physical systems (CPS) that contain neural network-based controllers. A linear programming solver is utilized to find a candidate generator function from a set of simulation traces obtained by randomly selecting initial states for the CPS model. A level set of the generator function is then selected to act as a barrier certificate for the system, meaning it demonstrates that no unsafe system states are reachable from a given set of initial states. The barrier certificate properties are verified with an SMT solver. This approach is demonstrated on a case study in which a Dubins car model of an autonomous vehicle is controlled by a neural network to follow a given path.
Tasks
Published 2018-04-11
URL http://arxiv.org/abs/1804.03973v1
PDF http://arxiv.org/pdf/1804.03973v1.pdf
PWC https://paperswithcode.com/paper/reasoning-about-safety-of-learning-enabled
Repo
Framework

Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition

Title Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition
Authors Lei Li, Changqing Zou, Youyi Zheng, Qingkun Su, Hongbo Fu, Chiew-Lan Tai
Abstract Freehand sketching is a dynamic process where points are sequentially sampled and grouped as strokes for sketch acquisition on electronic devices. To recognize a sketched object, most existing methods discard such important temporal ordering and grouping information from human and simply rasterize sketches into binary images for classification. In this paper, we propose a novel single-branch attentive network architecture RNN-Rasterization-CNN (Sketch-R2CNN for short) to fully leverage the dynamics in sketches for recognition. Sketch-R2CNN takes as input only a vector sketch with grouped sequences of points, and uses an RNN for stroke attention estimation in the vector space and a CNN for 2D feature extraction in the pixel space respectively. To bridge the gap between these two spaces in neural networks, we propose a neural line rasterization module to convert the vector sketch along with the attention estimated by RNN into a bitmap image, which is subsequently consumed by CNN. The neural line rasterization module is designed in a differentiable way to yield a unified pipeline for end-to-end learning. We perform experiments on existing large-scale sketch recognition benchmarks and show that by exploiting the sketch dynamics with the attention mechanism, our method is more robust and achieves better performance than the state-of-the-art methods.
Tasks Sketch Recognition
Published 2018-11-20
URL http://arxiv.org/abs/1811.08170v1
PDF http://arxiv.org/pdf/1811.08170v1.pdf
PWC https://paperswithcode.com/paper/sketch-r2cnn-an-attentive-network-for-vector
Repo
Framework

Network Estimation from Point Process Data

Title Network Estimation from Point Process Data
Authors Benjamin Mark, Garvesh Raskutti, Rebecca Willett
Abstract Consider observing a collection of discrete events within a network that reflect how network nodes influence one another. Such data are common in spike trains recorded from biological neural networks, interactions within a social network, and a variety of other settings. Data of this form may be modeled as self-exciting point processes, in which the likelihood of future events depends on the past events. This paper addresses the problem of estimating self-excitation parameters and inferring the underlying functional network structure from self-exciting point process data. Past work in this area was limited by strong assumptions which are addressed by the novel approach here. Specifically, in this paper we (1) incorporate saturation in a point process model which both ensures stability and models non-linear thresholding effects; (2) impose general low-dimensional structural assumptions that include sparsity, group sparsity and low-rankness that allows bounds to be developed in the high-dimensional setting; and (3) incorporate long-range memory effects through moving average and higher-order auto-regressive components. Using our general framework, we provide a number of novel theoretical guarantees for high-dimensional self-exciting point processes that reflect the role played by the underlying network structure and long-term memory. We also provide simulations and real data examples to support our methodology and main results.
Tasks Point Processes
Published 2018-02-13
URL http://arxiv.org/abs/1802.04838v1
PDF http://arxiv.org/pdf/1802.04838v1.pdf
PWC https://paperswithcode.com/paper/network-estimation-from-point-process-data
Repo
Framework

Deep Frame Prediction for Video Coding

Title Deep Frame Prediction for Video Coding
Authors Hyomin Choi, Ivan V. Bajic
Abstract We propose a novel frame prediction method using a deep neural network (DNN), with the goal of improving video coding efficiency. The proposed DNN makes use of decoded frames, at both encoder and decoder, to predict textures of the current coding block. Unlike conventional inter-prediction, the proposed method does not require any motion information to be transferred between the encoder and the decoder. Still, both uni-directional and bi-directional prediction are possible using the proposed DNN, which is enabled by the use of the temporal index channel, in addition to color channels. In this study, we developed a jointly trained DNN for both uni- and bi- directional prediction, as well as separate networks for uni- and bi-directional prediction, and compared the efficacy of both approaches. The proposed DNNs were compared with the conventional motion-compensated prediction in the latest video coding standard, HEVC, in terms of BD-Bitrate. The experiments show that the proposed joint DNN (for both uni- and bi-directional prediction) reduces the luminance bitrate by about 4.4%, 2.4%, and 2.3% in the Low delay P, Low delay, and Random access configurations, respectively. In addition, using the separately trained DNNs brings further bit savings of about 0.3%-0.5%.
Tasks
Published 2018-12-31
URL https://arxiv.org/abs/1901.00062v3
PDF https://arxiv.org/pdf/1901.00062v3.pdf
PWC https://paperswithcode.com/paper/deep-frame-prediction-for-video-coding
Repo
Framework
comments powered by Disqus