October 19, 2019

3288 words 16 mins read

Paper Group ANR 232

Paper Group ANR 232

Future-State Predicting LSTM for Early Surgery Type Recognition. The Effect of Context on Metaphor Paraphrase Aptness Judgments. Magnetically Guided Capsule Endoscopy. Patch-Based Image Inpainting with Generative Adversarial Networks. Robust Gradient Descent via Moment Encoding with LDPC Codes. Classification of auditory stimuli from EEG signals wi …

Future-State Predicting LSTM for Early Surgery Type Recognition

Title Future-State Predicting LSTM for Early Surgery Type Recognition
Authors Siddharth Kannan, Gaurav Yengera, Didier Mutter, Jacques Marescaux, Nicolas Padoy
Abstract This work presents a novel approach for the early recognition of the type of a laparoscopic surgery from its video. Early recognition algorithms can be beneficial to the development of ‘smart’ OR systems that can provide automatic context-aware assistance, and also enable quick database indexing. The task is however ridden with challenges specific to videos belonging to the domain of laparoscopy, such as high visual similarity across surgeries and large variations in video durations. To capture the spatio-temporal dependencies in these videos, we choose as our model a combination of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network. We then propose two complementary approaches for improving early recognition performance. The first approach is a CNN fine-tuning method that encourages surgeries to be distinguished based on the initial frames of laparoscopic videos. The second approach, referred to as ‘Future-State Predicting LSTM’, trains an LSTM to predict information related to future frames, which helps in distinguishing between the different types of surgeries. We evaluate our approaches on a large dataset of 425 laparoscopic videos containing 9 types of surgeries (Laparo425), and achieve on average an accuracy of 75% having observed only the first 10 minutes of a surgery. These results are quite promising from a practical standpoint and also encouraging for other types of image-guided surgeries.
Tasks
Published 2018-11-28
URL https://arxiv.org/abs/1811.11727v2
PDF https://arxiv.org/pdf/1811.11727v2.pdf
PWC https://paperswithcode.com/paper/future-state-predicting-lstm-for-early
Repo
Framework

The Effect of Context on Metaphor Paraphrase Aptness Judgments

Title The Effect of Context on Metaphor Paraphrase Aptness Judgments
Authors Yuri Bizzoni, Shalom Lappin
Abstract We conduct two experiments to study the effect of context on metaphor paraphrase aptness judgments. The first is an AMT crowd source task in which speakers rank metaphor paraphrase candidate sentence pairs in short document contexts for paraphrase aptness. In the second we train a composite DNN to predict these human judgments, first in binary classifier mode, and then as gradient ratings. We found that for both mean human judgments and our DNN’s predictions, adding document context compresses the aptness scores towards the center of the scale, raising low out of context ratings and decreasing high out of context scores. We offer a provisional explanation for this compression effect.
Tasks
Published 2018-09-04
URL http://arxiv.org/abs/1809.01060v1
PDF http://arxiv.org/pdf/1809.01060v1.pdf
PWC https://paperswithcode.com/paper/the-effect-of-context-on-metaphor-paraphrase
Repo
Framework

Magnetically Guided Capsule Endoscopy

Title Magnetically Guided Capsule Endoscopy
Authors Thomas Kruezer
Abstract The following research undertakes a historical review of this technology with specific highlighting of its advancement in medical diagnostics as well as the therapeutic functionality of wireless capsule endoscopy. Without restriction to the gastrointestinal tract alone, the review will additionally investigate the developments in the technology of micro-robots guided through the magnetic power and are capable of navigating through multiple forms of air and fluid filled lumina as well as cavities within the body. All these capabilities are of use in the utilization of minimally invasive medicine.
Tasks
Published 2018-09-11
URL http://arxiv.org/abs/1809.04130v1
PDF http://arxiv.org/pdf/1809.04130v1.pdf
PWC https://paperswithcode.com/paper/magnetically-guided-capsule-endoscopy
Repo
Framework

Patch-Based Image Inpainting with Generative Adversarial Networks

Title Patch-Based Image Inpainting with Generative Adversarial Networks
Authors Ugur Demir, Gozde Unal
Abstract Area of image inpainting over relatively large missing regions recently advanced substantially through adaptation of dedicated deep neural networks. However, current network solutions still introduce undesired artifacts and noise to the repaired regions. We present an image inpainting method that is based on the celebrated generative adversarial network (GAN) framework. The proposed PGGAN method includes a discriminator network that combines a global GAN (G-GAN) architecture with a patchGAN approach. PGGAN first shares network layers between G-GAN and patchGAN, then splits paths to produce two adversarial losses that feed the generator network in order to capture both local continuity of image texture and pervasive global features in images. The proposed framework is evaluated extensively, and the results including comparison to recent state-of-the-art demonstrate that it achieves considerable improvements on both visual and quantitative evaluations.
Tasks Image Inpainting
Published 2018-03-20
URL http://arxiv.org/abs/1803.07422v1
PDF http://arxiv.org/pdf/1803.07422v1.pdf
PWC https://paperswithcode.com/paper/patch-based-image-inpainting-with-generative
Repo
Framework

Robust Gradient Descent via Moment Encoding with LDPC Codes

Title Robust Gradient Descent via Moment Encoding with LDPC Codes
Authors Raj Kumar Maity, Ankit Singh Rawat, Arya Mazumdar
Abstract This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of {\em straggling} processors. To mitigate the effect of the stragglers, it has been previously proposed to encode the data with an erasure-correcting code and decode at the master server at the end of the computation. We, instead, propose to encode the second-moment of the data with a low density parity-check (LDPC) code. The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system. We show that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent method. This allows us to obtain convergence guarantees for the proposed solution. Furthermore, the proposed moment encoding based method is shown to outperform the existing schemes in a real distributed computing setup.
Tasks
Published 2018-05-22
URL http://arxiv.org/abs/1805.08327v2
PDF http://arxiv.org/pdf/1805.08327v2.pdf
PWC https://paperswithcode.com/paper/robust-gradient-descent-via-moment-encoding
Repo
Framework

Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir

Title Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir
Authors Marc-Antoine Moinnereau, Thomas Brienne, Simon Brodeur, Jean Rouat, Kevin Whittingstall, Eric Plourde
Abstract The use of electroencephalogram (EEG) as the main input signal in brain-machine interfaces has been widely proposed due to the non-invasive nature of the EEG. Here we are specifically interested in interfaces that extract information from the auditory system and more specifically in the task of classifying heard speech from EEGs. To do so, we propose to limit the preprocessing of the EEGs and use machine learning approaches to automatically extract their meaningful characteristics. More specifically, we use a regulated recurrent neural network (RNN) reservoir, which has been shown to outperform classic machine learning approaches when applied to several different bio-signals, and we compare it with a deep neural network approach. Moreover, we also investigate the classification performance as a function of the number of EEG electrodes. A set of 8 subjects were presented randomly with 3 different auditory stimuli (English vowels a, i and u). We obtained an excellent classification rate of 83.2% with the RNN when considering all 64 electrodes. A rate of 81.7% was achieved with only 10 electrodes.
Tasks EEG
Published 2018-04-27
URL http://arxiv.org/abs/1804.10322v1
PDF http://arxiv.org/pdf/1804.10322v1.pdf
PWC https://paperswithcode.com/paper/classification-of-auditory-stimuli-from-eeg
Repo
Framework

Deductron – A Recurrent Neural Network

Title Deductron – A Recurrent Neural Network
Authors Marek Rychlik
Abstract The current paper is a study in Recurrent Neural Networks (RNN), motivated by the lack of examples simple enough so that they can be thoroughly understood theoretically, but complex enough to be realistic. We constructed an example of structured data, motivated by problems from image-to-text conversion (OCR), which requires long-term memory to decode. Our data is a simple writing system, encoding characters ‘X’ and ‘O’ as their upper halves, which is possible due to symmetry of the two characters. The characters can be connected, as in some languages using cursive, such as Arabic (abjad). The string ‘XOOXXO’ may be encoded as ‘${\vee}{\wedge}\kern-1.5pt{\wedge}{\vee}\kern-1.5pt{\vee}{\wedge}$’. It follows that we may need to know arbitrarily long past to decode a current character, thus requiring long-term memory. Subsequently we constructed an RNN capable of decoding sequences encoded in this manner. Rather than by training, we constructed our RNN “by inspection”, i.e. we guessed its weights. This involved a sequence of steps. We wrote a conventional program which decodes the sequences as the example above. Subsequently, we interpreted the program as a neural network (the only example of this kind known to us). Finally, we generalized this neural network to discover a new RNN architecture whose instance is our handcrafted RNN. It turns out to be a 3 layer network, where the middle layer is capable of performing simple logical inferences; thus the name “deductron”. It is demonstrated that it is possible to train our network by simulated annealing. Also, known variants of stochastic gradient descent (SGD) methods are shown to work.
Tasks Optical Character Recognition
Published 2018-06-23
URL https://arxiv.org/abs/1806.09038v3
PDF https://arxiv.org/pdf/1806.09038v3.pdf
PWC https://paperswithcode.com/paper/deductron-a-recurrent-neural-network
Repo
Framework

Faster gaze prediction with dense networks and Fisher pruning

Title Faster gaze prediction with dense networks and Fisher pruning
Authors Lucas Theis, Iryna Korshunova, Alykhan Tejani, Ferenc Huszár
Abstract Predicting human fixations from images has recently seen large improvements by leveraging deep representations which were pretrained for object recognition. However, as we show in this paper, these networks are highly overparameterized for the task of fixation prediction. We first present a simple yet principled greedy pruning method which we call Fisher pruning. Through a combination of knowledge distillation and Fisher pruning, we obtain much more runtime-efficient architectures for saliency prediction, achieving a 10x speedup for the same AUC performance as a state of the art network on the CAT2000 dataset. Speeding up single-image gaze prediction is important for many real-world applications, but it is also a crucial step in the development of video saliency models, where the amount of data to be processed is substantially larger.
Tasks Gaze Estimation, Gaze Prediction, Object Recognition, Saliency Prediction
Published 2018-01-17
URL http://arxiv.org/abs/1801.05787v2
PDF http://arxiv.org/pdf/1801.05787v2.pdf
PWC https://paperswithcode.com/paper/faster-gaze-prediction-with-dense-networks
Repo
Framework

Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation

Title Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation
Authors Florian Piewak, Peter Pinggera, Manuel Schäfer, David Peter, Beate Schwarz, Nick Schneider, David Pfeiffer, Markus Enzweiler, Marius Zöllner
Abstract Mobile robots and autonomous vehicles rely on multi-modal sensor setups to perceive and understand their surroundings. Aside from cameras, LiDAR sensors represent a central component of state-of-the-art perception systems. In addition to accurate spatial perception, a comprehensive semantic understanding of the environment is essential for efficient and safe operation. In this paper we present a novel deep neural network architecture called LiLaNet for point-wise, multi-class semantic labeling of semi-dense LiDAR data. The network utilizes virtual image projections of the 3D point clouds for efficient inference. Further, we propose an automated process for large-scale cross-modal training data generation called Autolabeling, in order to boost semantic labeling performance while keeping the manual annotation effort low. The effectiveness of the proposed network architecture as well as the automated data generation process is demonstrated on a manually annotated ground truth dataset. LiLaNet is shown to significantly outperform current state-of-the-art CNN architectures for LiDAR data. Applying our automatically generated large-scale training data yields a boost of up to 14 percentage points compared to networks trained on manually annotated data only.
Tasks Autonomous Vehicles
Published 2018-04-26
URL http://arxiv.org/abs/1804.09915v1
PDF http://arxiv.org/pdf/1804.09915v1.pdf
PWC https://paperswithcode.com/paper/boosting-lidar-based-semantic-labeling-by
Repo
Framework

Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning

Title Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning
Authors Aidin Ferdowsi, Samad Ali, Walid Saad, Narayan B. Mandayam
Abstract Autonomous connected vehicles (ACVs) rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication to operate effectively. This reliance on cyber components exposes ACVs to cyber and physical attacks in which an adversary can manipulate sensor readings and physically take control of an ACV. In this paper, a comprehensive framework is proposed to thwart cyber and physical attacks on ACV networks. First, an optimal safe controller for ACVs is derived to maximize the street traffic flow while minimizing the risk of accidents by optimizing ACV speed and inter-ACV spacing. It is proven that the proposed controller is robust to physical attacks which aim at making ACV systems instable. To improve the cyber-physical security of ACV systems, next, data injection attack (DIA) detection approaches are proposed to address cyber attacks on sensors and their physical impact on the ACV system. To comprehensively design the DIA detection approaches, ACV sensors are characterized in two subsets based on the availability of a-priori information about their data. For sensors having a prior information, a DIA detection approach is proposed and an optimal threshold level is derived for the difference between the actual and estimated values of sensors data which enables ACV to stay robust against cyber attacks. For sensors having no prior information, a novel multi-armed bandit (MAB) algorithm is proposed to enable ACV to securely control its motion. Simulation results show that the proposed optimal safe controller outperforms current state of the art controllers by maximizing the robustness of ACVs to physical attacks. The results also show that the proposed DIA detection approaches, compared to Kalman filtering, can improve the security of ACV sensors against cyber attacks and ultimately improve the physical robustness of an ACV system.
Tasks
Published 2018-12-13
URL http://arxiv.org/abs/1812.05298v1
PDF http://arxiv.org/pdf/1812.05298v1.pdf
PWC https://paperswithcode.com/paper/cyber-physical-security-and-safety-of
Repo
Framework

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

Title Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
Authors Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang
Abstract The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of unlabeled images to boost captioning performance with no additional laborious annotations. We demonstrate the effectiveness of the proposed retrieval-guided method on COCO and Flickr30k captioning datasets, and show its superior captioning performance with more discriminative captions.
Tasks Image Captioning
Published 2018-03-22
URL http://arxiv.org/abs/1803.08314v3
PDF http://arxiv.org/pdf/1803.08314v3.pdf
PWC https://paperswithcode.com/paper/show-tell-and-discriminate-image-captioning
Repo
Framework

On Evaluating and Comparing Open Domain Dialog Systems

Title On Evaluating and Comparing Open Domain Dialog Systems
Authors Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju
Abstract Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the best social conversational experience. Alexa Prize provided the academic community with the unique opportunity to perform research with a live system used by millions of users. The subjectivity associated with evaluating conversations is key element underlying the challenge of building non-goal oriented dialogue systems. In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human judgement. The proposed metrics provide granular analysis of the conversational agents, which is not captured in human ratings. We show that these metrics can be used as a reasonable proxy for human judgment. We provide a mechanism to unify the metrics for selecting the top performing agents, which has also been applied throughout the Alexa Prize competition. To our knowledge, to date it is the largest setting for evaluating agents with millions of conversations and hundreds of thousands of ratings from users. We believe that this work is a step towards an automatic evaluation process for conversational AIs.
Tasks Goal-Oriented Dialogue Systems
Published 2018-01-11
URL http://arxiv.org/abs/1801.03625v2
PDF http://arxiv.org/pdf/1801.03625v2.pdf
PWC https://paperswithcode.com/paper/on-evaluating-and-comparing-open-domain
Repo
Framework

Fusion of multispectral satellite imagery using a cluster of graphics processing unit

Title Fusion of multispectral satellite imagery using a cluster of graphics processing unit
Authors Anas M. Al-Oraiqat, E. A. Bashkov, V. Babkov, C. Titarenko
Abstract The paper presents a parallel implementation of existing image fusion methods on a graphical cluster. Parallel implementations of methods based on discrete wavelet transformation (Haars and Daubechies discrete wavelet transform) are developed. Experiments were performed on a cluster using GPU and CPU and performance gains were estimated for the use of the developed parallel implementations to process satellite images from satellite Landsat 7. The implementation on a graphic cluster provides performance improvement from 2 to 18 times. The quality of the considered methods was evaluated by ERGAS and QNR metrics. The results show performance gains and retaining of quality with the cluster of GPU compared to the results obtained by the authors and other researchers for a CPU and single GPU.
Tasks
Published 2018-03-02
URL http://arxiv.org/abs/1803.00737v1
PDF http://arxiv.org/pdf/1803.00737v1.pdf
PWC https://paperswithcode.com/paper/fusion-of-multispectral-satellite-imagery
Repo
Framework

Tap-based User Authentication for Smartwatches

Title Tap-based User Authentication for Smartwatches
Authors Toan Nguyen, Nasir Memon
Abstract This paper presents TapMeIn, an eyes-free, two-factor authentication method for smartwatches. It allows users to tap a memorable melody (tap-password) of their choice anywhere on the touchscreen to unlock their watch. A user is verified based on the tap-password as well as her physiological and behavioral characteristics when tapping. Results from preliminary experiments with 41 participants show that TapMeIn could achieve an accuracy of 98.7% with a False Positive Rate of only 0.98%. In addition, TapMeIn retains its performance in different conditions such as sitting and walking. In terms of speed, TapMeIn has an average authentication time of 2 seconds. A user study with the System Usability Scale (SUS) tool suggests that TapMeIn has a high usability score.
Tasks
Published 2018-07-02
URL http://arxiv.org/abs/1807.00482v2
PDF http://arxiv.org/pdf/1807.00482v2.pdf
PWC https://paperswithcode.com/paper/tap-based-user-authentication-for
Repo
Framework

Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks

Title Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks
Authors Keiller Nogueira, Mauro Dalla Mura, Jocelyn Chanussot, William R. Schwartz, Jefersson A. dos Santos
Abstract Semantic segmentation requires methods capable of learning high-level features while dealing with large volume of data. Towards such goal, Convolutional Networks can learn specific and adaptable features based on the data. However, these networks are not capable of processing a whole remote sensing image, given its huge size. To overcome such limitation, the image is processed using fixed size patches. The definition of the input patch size is usually performed empirically (evaluating several sizes) or imposed (by network constraint). Both strategies suffer from drawbacks and could not lead to the best patch size. To alleviate this problem, several works exploited multi-context information by combining networks or layers. This process increases the number of parameters resulting in a more difficult model to train. In this work, we propose a novel technique to perform semantic segmentation of remote sensing images that exploits a multi-context paradigm without increasing the number of parameters while defining, in training time, the best patch size. The main idea is to train a dilated network with distinct patch sizes, allowing it to capture multi-context characteristics from heterogeneous contexts. While processing these varying patches, the network provides a score for each patch size, helping in the definition of the best size for the current scenario. A systematic evaluation of the proposed algorithm is conducted using four high-resolution remote sensing datasets with very distinct properties. Our results show that the proposed algorithm provides improvements in pixelwise classification accuracy when compared to state-of-the-art methods.
Tasks Semantic Segmentation
Published 2018-04-11
URL http://arxiv.org/abs/1804.04020v3
PDF http://arxiv.org/pdf/1804.04020v3.pdf
PWC https://paperswithcode.com/paper/dynamic-multi-scale-segmentation-of-remote
Repo
Framework
comments powered by Disqus