October 19, 2019

3288 words 16 mins read

Paper Group ANR 232

Future-State Predicting LSTM for Early Surgery Type Recognition. The Effect of Context on Metaphor Paraphrase Aptness Judgments. Magnetically Guided Capsule Endoscopy. Patch-Based Image Inpainting with Generative Adversarial Networks. Robust Gradient Descent via Moment Encoding with LDPC Codes. Classification of auditory stimuli from EEG signals wi …

Future-State Predicting LSTM for Early Surgery Type Recognition


Title	Future-State Predicting LSTM for Early Surgery Type Recognition
Authors	Siddharth Kannan, Gaurav Yengera, Didier Mutter, Jacques Marescaux, Nicolas Padoy
Abstract	This work presents a novel approach for the early recognition of the type of a laparoscopic surgery from its video. Early recognition algorithms can be beneficial to the development of ‘smart’ OR systems that can provide automatic context-aware assistance, and also enable quick database indexing. The task is however ridden with challenges specific to videos belonging to the domain of laparoscopy, such as high visual similarity across surgeries and large variations in video durations. To capture the spatio-temporal dependencies in these videos, we choose as our model a combination of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network. We then propose two complementary approaches for improving early recognition performance. The first approach is a CNN fine-tuning method that encourages surgeries to be distinguished based on the initial frames of laparoscopic videos. The second approach, referred to as ‘Future-State Predicting LSTM’, trains an LSTM to predict information related to future frames, which helps in distinguishing between the different types of surgeries. We evaluate our approaches on a large dataset of 425 laparoscopic videos containing 9 types of surgeries (Laparo425), and achieve on average an accuracy of 75% having observed only the first 10 minutes of a surgery. These results are quite promising from a practical standpoint and also encouraging for other types of image-guided surgeries.
Tasks
Published	2018-11-28
URL	https://arxiv.org/abs/1811.11727v2
PDF	https://arxiv.org/pdf/1811.11727v2.pdf
PWC	https://paperswithcode.com/paper/future-state-predicting-lstm-for-early
Repo
Framework

The Effect of Context on Metaphor Paraphrase Aptness Judgments


Title	The Effect of Context on Metaphor Paraphrase Aptness Judgments
Authors	Yuri Bizzoni, Shalom Lappin
Abstract	We conduct two experiments to study the effect of context on metaphor paraphrase aptness judgments. The first is an AMT crowd source task in which speakers rank metaphor paraphrase candidate sentence pairs in short document contexts for paraphrase aptness. In the second we train a composite DNN to predict these human judgments, first in binary classifier mode, and then as gradient ratings. We found that for both mean human judgments and our DNN’s predictions, adding document context compresses the aptness scores towards the center of the scale, raising low out of context ratings and decreasing high out of context scores. We offer a provisional explanation for this compression effect.
Tasks
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01060v1
PDF	http://arxiv.org/pdf/1809.01060v1.pdf
PWC	https://paperswithcode.com/paper/the-effect-of-context-on-metaphor-paraphrase
Repo
Framework

Magnetically Guided Capsule Endoscopy


Title	Magnetically Guided Capsule Endoscopy
Authors	Thomas Kruezer
Abstract	The following research undertakes a historical review of this technology with specific highlighting of its advancement in medical diagnostics as well as the therapeutic functionality of wireless capsule endoscopy. Without restriction to the gastrointestinal tract alone, the review will additionally investigate the developments in the technology of micro-robots guided through the magnetic power and are capable of navigating through multiple forms of air and fluid filled lumina as well as cavities within the body. All these capabilities are of use in the utilization of minimally invasive medicine.
Tasks
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04130v1
PDF	http://arxiv.org/pdf/1809.04130v1.pdf
PWC	https://paperswithcode.com/paper/magnetically-guided-capsule-endoscopy
Repo
Framework

Patch-Based Image Inpainting with Generative Adversarial Networks


Title	Patch-Based Image Inpainting with Generative Adversarial Networks
Authors	Ugur Demir, Gozde Unal
Abstract	Area of image inpainting over relatively large missing regions recently advanced substantially through adaptation of dedicated deep neural networks. However, current network solutions still introduce undesired artifacts and noise to the repaired regions. We present an image inpainting method that is based on the celebrated generative adversarial network (GAN) framework. The proposed PGGAN method includes a discriminator network that combines a global GAN (G-GAN) architecture with a patchGAN approach. PGGAN first shares network layers between G-GAN and patchGAN, then splits paths to produce two adversarial losses that feed the generator network in order to capture both local continuity of image texture and pervasive global features in images. The proposed framework is evaluated extensively, and the results including comparison to recent state-of-the-art demonstrate that it achieves considerable improvements on both visual and quantitative evaluations.
Tasks	Image Inpainting
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07422v1
PDF	http://arxiv.org/pdf/1803.07422v1.pdf
PWC	https://paperswithcode.com/paper/patch-based-image-inpainting-with-generative
Repo
Framework

Robust Gradient Descent via Moment Encoding with LDPC Codes


Title	Robust Gradient Descent via Moment Encoding with LDPC Codes
Authors	Raj Kumar Maity, Ankit Singh Rawat, Arya Mazumdar
Abstract	This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of {\em straggling} processors. To mitigate the effect of the stragglers, it has been previously proposed to encode the data with an erasure-correcting code and decode at the master server at the end of the computation. We, instead, propose to encode the second-moment of the data with a low density parity-check (LDPC) code. The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system. We show that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent method. This allows us to obtain convergence guarantees for the proposed solution. Furthermore, the proposed moment encoding based method is shown to outperform the existing schemes in a real distributed computing setup.
Tasks
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08327v2
PDF	http://arxiv.org/pdf/1805.08327v2.pdf
PWC	https://paperswithcode.com/paper/robust-gradient-descent-via-moment-encoding
Repo
Framework

Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir


Title	Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir
Authors	Marc-Antoine Moinnereau, Thomas Brienne, Simon Brodeur, Jean Rouat, Kevin Whittingstall, Eric Plourde
Abstract	The use of electroencephalogram (EEG) as the main input signal in brain-machine interfaces has been widely proposed due to the non-invasive nature of the EEG. Here we are specifically interested in interfaces that extract information from the auditory system and more specifically in the task of classifying heard speech from EEGs. To do so, we propose to limit the preprocessing of the EEGs and use machine learning approaches to automatically extract their meaningful characteristics. More specifically, we use a regulated recurrent neural network (RNN) reservoir, which has been shown to outperform classic machine learning approaches when applied to several different bio-signals, and we compare it with a deep neural network approach. Moreover, we also investigate the classification performance as a function of the number of EEG electrodes. A set of 8 subjects were presented randomly with 3 different auditory stimuli (English vowels a, i and u). We obtained an excellent classification rate of 83.2% with the RNN when considering all 64 electrodes. A rate of 81.7% was achieved with only 10 electrodes.
Tasks	EEG
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10322v1
PDF	http://arxiv.org/pdf/1804.10322v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-auditory-stimuli-from-eeg
Repo
Framework

Deductron – A Recurrent Neural Network


Title	Deductron – A Recurrent Neural Network
Authors	Marek Rychlik
Abstract	The current paper is a study in Recurrent Neural Networks (RNN), motivated by the lack of examples simple enough so that they can be thoroughly understood theoretically, but complex enough to be realistic. We constructed an example of structured data, motivated by problems from image-to-text conversion (OCR), which requires long-term memory to decode. Our data is a simple writing system, encoding characters ‘X’ and ‘O’ as their upper halves, which is possible due to symmetry of the two characters. The characters can be connected, as in some languages using cursive, such as Arabic (abjad). The string ‘XOOXXO’ may be encoded as ‘${\vee}{\wedge}\kern-1.5pt{\wedge}{\vee}\kern-1.5pt{\vee}{\wedge}$’. It follows that we may need to know arbitrarily long past to decode a current character, thus requiring long-term memory. Subsequently we constructed an RNN capable of decoding sequences encoded in this manner. Rather than by training, we constructed our RNN “by inspection”, i.e. we guessed its weights. This involved a sequence of steps. We wrote a conventional program which decodes the sequences as the example above. Subsequently, we interpreted the program as a neural network (the only example of this kind known to us). Finally, we generalized this neural network to discover a new RNN architecture whose instance is our handcrafted RNN. It turns out to be a 3 layer network, where the middle layer is capable of performing simple logical inferences; thus the name “deductron”. It is demonstrated that it is possible to train our network by simulated annealing. Also, known variants of stochastic gradient descent (SGD) methods are shown to work.
Tasks	Optical Character Recognition
Published	2018-06-23
URL	https://arxiv.org/abs/1806.09038v3
PDF	https://arxiv.org/pdf/1806.09038v3.pdf
PWC	https://paperswithcode.com/paper/deductron-a-recurrent-neural-network
Repo
Framework

Faster gaze prediction with dense networks and Fisher pruning


Title	Faster gaze prediction with dense networks and Fisher pruning
Authors	Lucas Theis, Iryna Korshunova, Alykhan Tejani, Ferenc Huszár
Abstract	Predicting human fixations from images has recently seen large improvements by leveraging deep representations which were pretrained for object recognition. However, as we show in this paper, these networks are highly overparameterized for the task of fixation prediction. We first present a simple yet principled greedy pruning method which we call Fisher pruning. Through a combination of knowledge distillation and Fisher pruning, we obtain much more runtime-efficient architectures for saliency prediction, achieving a 10x speedup for the same AUC performance as a state of the art network on the CAT2000 dataset. Speeding up single-image gaze prediction is important for many real-world applications, but it is also a crucial step in the development of video saliency models, where the amount of data to be processed is substantially larger.
Tasks	Gaze Estimation, Gaze Prediction, Object Recognition, Saliency Prediction
Published	2018-01-17
URL	http://arxiv.org/abs/1801.05787v2
PDF	http://arxiv.org/pdf/1801.05787v2.pdf
PWC	https://paperswithcode.com/paper/faster-gaze-prediction-with-dense-networks
Repo
Framework


Title	Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation
Authors	Florian Piewak, Peter Pinggera, Manuel Schäfer, David Peter, Beate Schwarz, Nick Schneider, David Pfeiffer, Markus Enzweiler, Marius Zöllner
Abstract	Mobile robots and autonomous vehicles rely on multi-modal sensor setups to perceive and understand their surroundings. Aside from cameras, LiDAR sensors represent a central component of state-of-the-art perception systems. In addition to accurate spatial perception, a comprehensive semantic understanding of the environment is essential for efficient and safe operation. In this paper we present a novel deep neural network architecture called LiLaNet for point-wise, multi-class semantic labeling of semi-dense LiDAR data. The network utilizes virtual image projections of the 3D point clouds for efficient inference. Further, we propose an automated process for large-scale cross-modal training data generation called Autolabeling, in order to boost semantic labeling performance while keeping the manual annotation effort low. The effectiveness of the proposed network architecture as well as the automated data generation process is demonstrated on a manually annotated ground truth dataset. LiLaNet is shown to significantly outperform current state-of-the-art CNN architectures for LiDAR data. Applying our automatically generated large-scale training data yields a boost of up to 14 percentage points compared to networks trained on manually annotated data only.
Tasks	Autonomous Vehicles
Published	2018-04-26
URL	http://arxiv.org/abs/1804.09915v1
PDF	http://arxiv.org/pdf/1804.09915v1.pdf
PWC	https://paperswithcode.com/paper/boosting-lidar-based-semantic-labeling-by
Repo
Framework

Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning


Title	Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning
Authors	Aidin Ferdowsi, Samad Ali, Walid Saad, Narayan B. Mandayam
Abstract	Autonomous connected vehicles (ACVs) rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication to operate effectively. This reliance on cyber components exposes ACVs to cyber and physical attacks in which an adversary can manipulate sensor readings and physically take control of an ACV. In this paper, a comprehensive framework is proposed to thwart cyber and physical attacks on ACV networks. First, an optimal safe controller for ACVs is derived to maximize the street traffic flow while minimizing the risk of accidents by optimizing ACV speed and inter-ACV spacing. It is proven that the proposed controller is robust to physical attacks which aim at making ACV systems instable. To improve the cyber-physical security of ACV systems, next, data injection attack (DIA) detection approaches are proposed to address cyber attacks on sensors and their physical impact on the ACV system. To comprehensively design the DIA detection approaches, ACV sensors are characterized in two subsets based on the availability of a-priori information about their data. For sensors having a prior information, a DIA detection approach is proposed and an optimal threshold level is derived for the difference between the actual and estimated values of sensors data which enables ACV to stay robust against cyber attacks. For sensors having no prior information, a novel multi-armed bandit (MAB) algorithm is proposed to enable ACV to securely control its motion. Simulation results show that the proposed optimal safe controller outperforms current state of the art controllers by maximizing the robustness of ACVs to physical attacks. The results also show that the proposed DIA detection approaches, compared to Kalman filtering, can improve the security of ACV sensors against cyber attacks and ultimately improve the physical robustness of an ACV system.
Tasks
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05298v1
PDF	http://arxiv.org/pdf/1812.05298v1.pdf
PWC	https://paperswithcode.com/paper/cyber-physical-security-and-safety-of
Repo
Framework

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data


Title	Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
Authors	Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang
Abstract	The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of unlabeled images to boost captioning performance with no additional laborious annotations. We demonstrate the effectiveness of the proposed retrieval-guided method on COCO and Flickr30k captioning datasets, and show its superior captioning performance with more discriminative captions.
Tasks	Image Captioning
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08314v3
PDF	http://arxiv.org/pdf/1803.08314v3.pdf
PWC	https://paperswithcode.com/paper/show-tell-and-discriminate-image-captioning
Repo
Framework

On Evaluating and Comparing Open Domain Dialog Systems


Title	On Evaluating and Comparing Open Domain Dialog Systems
Authors	Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju
Abstract	Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the best social conversational experience. Alexa Prize provided the academic community with the unique opportunity to perform research with a live system used by millions of users. The subjectivity associated with evaluating conversations is key element underlying the challenge of building non-goal oriented dialogue systems. In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human judgement. The proposed metrics provide granular analysis of the conversational agents, which is not captured in human ratings. We show that these metrics can be used as a reasonable proxy for human judgment. We provide a mechanism to unify the metrics for selecting the top performing agents, which has also been applied throughout the Alexa Prize competition. To our knowledge, to date it is the largest setting for evaluating agents with millions of conversations and hundreds of thousands of ratings from users. We believe that this work is a step towards an automatic evaluation process for conversational AIs.
Tasks	Goal-Oriented Dialogue Systems
Published	2018-01-11
URL	http://arxiv.org/abs/1801.03625v2
PDF	http://arxiv.org/pdf/1801.03625v2.pdf
PWC	https://paperswithcode.com/paper/on-evaluating-and-comparing-open-domain
Repo
Framework

Fusion of multispectral satellite imagery using a cluster of graphics processing unit


Title	Fusion of multispectral satellite imagery using a cluster of graphics processing unit
Authors	Anas M. Al-Oraiqat, E. A. Bashkov, V. Babkov, C. Titarenko
Abstract	The paper presents a parallel implementation of existing image fusion methods on a graphical cluster. Parallel implementations of methods based on discrete wavelet transformation (Haars and Daubechies discrete wavelet transform) are developed. Experiments were performed on a cluster using GPU and CPU and performance gains were estimated for the use of the developed parallel implementations to process satellite images from satellite Landsat 7. The implementation on a graphic cluster provides performance improvement from 2 to 18 times. The quality of the considered methods was evaluated by ERGAS and QNR metrics. The results show performance gains and retaining of quality with the cluster of GPU compared to the results obtained by the authors and other researchers for a CPU and single GPU.
Tasks
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00737v1
PDF	http://arxiv.org/pdf/1803.00737v1.pdf
PWC	https://paperswithcode.com/paper/fusion-of-multispectral-satellite-imagery
Repo
Framework

Tap-based User Authentication for Smartwatches


Title	Tap-based User Authentication for Smartwatches
Authors	Toan Nguyen, Nasir Memon
Abstract	This paper presents TapMeIn, an eyes-free, two-factor authentication method for smartwatches. It allows users to tap a memorable melody (tap-password) of their choice anywhere on the touchscreen to unlock their watch. A user is verified based on the tap-password as well as her physiological and behavioral characteristics when tapping. Results from preliminary experiments with 41 participants show that TapMeIn could achieve an accuracy of 98.7% with a False Positive Rate of only 0.98%. In addition, TapMeIn retains its performance in different conditions such as sitting and walking. In terms of speed, TapMeIn has an average authentication time of 2 seconds. A user study with the System Usability Scale (SUS) tool suggests that TapMeIn has a high usability score.
Tasks
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00482v2
PDF	http://arxiv.org/pdf/1807.00482v2.pdf
PWC	https://paperswithcode.com/paper/tap-based-user-authentication-for
Repo
Framework

Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks


Title	Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks
Authors	Keiller Nogueira, Mauro Dalla Mura, Jocelyn Chanussot, William R. Schwartz, Jefersson A. dos Santos
Abstract	Semantic segmentation requires methods capable of learning high-level features while dealing with large volume of data. Towards such goal, Convolutional Networks can learn specific and adaptable features based on the data. However, these networks are not capable of processing a whole remote sensing image, given its huge size. To overcome such limitation, the image is processed using fixed size patches. The definition of the input patch size is usually performed empirically (evaluating several sizes) or imposed (by network constraint). Both strategies suffer from drawbacks and could not lead to the best patch size. To alleviate this problem, several works exploited multi-context information by combining networks or layers. This process increases the number of parameters resulting in a more difficult model to train. In this work, we propose a novel technique to perform semantic segmentation of remote sensing images that exploits a multi-context paradigm without increasing the number of parameters while defining, in training time, the best patch size. The main idea is to train a dilated network with distinct patch sizes, allowing it to capture multi-context characteristics from heterogeneous contexts. While processing these varying patches, the network provides a score for each patch size, helping in the definition of the best size for the current scenario. A systematic evaluation of the proposed algorithm is conducted using four high-resolution remote sensing datasets with very distinct properties. Our results show that the proposed algorithm provides improvements in pixelwise classification accuracy when compared to state-of-the-art methods.
Tasks	Semantic Segmentation
Published	2018-04-11
URL	http://arxiv.org/abs/1804.04020v3
PDF	http://arxiv.org/pdf/1804.04020v3.pdf
PWC	https://paperswithcode.com/paper/dynamic-multi-scale-segmentation-of-remote
Repo
Framework