Paper Group ANR 232
Future-State Predicting LSTM for Early Surgery Type Recognition. The Effect of Context on Metaphor Paraphrase Aptness Judgments. Magnetically Guided Capsule Endoscopy. Patch-Based Image Inpainting with Generative Adversarial Networks. Robust Gradient Descent via Moment Encoding with LDPC Codes. Classification of auditory stimuli from EEG signals wi …
Future-State Predicting LSTM for Early Surgery Type Recognition
Title | Future-State Predicting LSTM for Early Surgery Type Recognition |
Authors | Siddharth Kannan, Gaurav Yengera, Didier Mutter, Jacques Marescaux, Nicolas Padoy |
Abstract | This work presents a novel approach for the early recognition of the type of a laparoscopic surgery from its video. Early recognition algorithms can be beneficial to the development of ‘smart’ OR systems that can provide automatic context-aware assistance, and also enable quick database indexing. The task is however ridden with challenges specific to videos belonging to the domain of laparoscopy, such as high visual similarity across surgeries and large variations in video durations. To capture the spatio-temporal dependencies in these videos, we choose as our model a combination of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network. We then propose two complementary approaches for improving early recognition performance. The first approach is a CNN fine-tuning method that encourages surgeries to be distinguished based on the initial frames of laparoscopic videos. The second approach, referred to as ‘Future-State Predicting LSTM’, trains an LSTM to predict information related to future frames, which helps in distinguishing between the different types of surgeries. We evaluate our approaches on a large dataset of 425 laparoscopic videos containing 9 types of surgeries (Laparo425), and achieve on average an accuracy of 75% having observed only the first 10 minutes of a surgery. These results are quite promising from a practical standpoint and also encouraging for other types of image-guided surgeries. |
Tasks | |
Published | 2018-11-28 |
URL | https://arxiv.org/abs/1811.11727v2 |
https://arxiv.org/pdf/1811.11727v2.pdf | |
PWC | https://paperswithcode.com/paper/future-state-predicting-lstm-for-early |
Repo | |
Framework | |
The Effect of Context on Metaphor Paraphrase Aptness Judgments
Title | The Effect of Context on Metaphor Paraphrase Aptness Judgments |
Authors | Yuri Bizzoni, Shalom Lappin |
Abstract | We conduct two experiments to study the effect of context on metaphor paraphrase aptness judgments. The first is an AMT crowd source task in which speakers rank metaphor paraphrase candidate sentence pairs in short document contexts for paraphrase aptness. In the second we train a composite DNN to predict these human judgments, first in binary classifier mode, and then as gradient ratings. We found that for both mean human judgments and our DNN’s predictions, adding document context compresses the aptness scores towards the center of the scale, raising low out of context ratings and decreasing high out of context scores. We offer a provisional explanation for this compression effect. |
Tasks | |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.01060v1 |
http://arxiv.org/pdf/1809.01060v1.pdf | |
PWC | https://paperswithcode.com/paper/the-effect-of-context-on-metaphor-paraphrase |
Repo | |
Framework | |
Magnetically Guided Capsule Endoscopy
Title | Magnetically Guided Capsule Endoscopy |
Authors | Thomas Kruezer |
Abstract | The following research undertakes a historical review of this technology with specific highlighting of its advancement in medical diagnostics as well as the therapeutic functionality of wireless capsule endoscopy. Without restriction to the gastrointestinal tract alone, the review will additionally investigate the developments in the technology of micro-robots guided through the magnetic power and are capable of navigating through multiple forms of air and fluid filled lumina as well as cavities within the body. All these capabilities are of use in the utilization of minimally invasive medicine. |
Tasks | |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04130v1 |
http://arxiv.org/pdf/1809.04130v1.pdf | |
PWC | https://paperswithcode.com/paper/magnetically-guided-capsule-endoscopy |
Repo | |
Framework | |
Patch-Based Image Inpainting with Generative Adversarial Networks
Title | Patch-Based Image Inpainting with Generative Adversarial Networks |
Authors | Ugur Demir, Gozde Unal |
Abstract | Area of image inpainting over relatively large missing regions recently advanced substantially through adaptation of dedicated deep neural networks. However, current network solutions still introduce undesired artifacts and noise to the repaired regions. We present an image inpainting method that is based on the celebrated generative adversarial network (GAN) framework. The proposed PGGAN method includes a discriminator network that combines a global GAN (G-GAN) architecture with a patchGAN approach. PGGAN first shares network layers between G-GAN and patchGAN, then splits paths to produce two adversarial losses that feed the generator network in order to capture both local continuity of image texture and pervasive global features in images. The proposed framework is evaluated extensively, and the results including comparison to recent state-of-the-art demonstrate that it achieves considerable improvements on both visual and quantitative evaluations. |
Tasks | Image Inpainting |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07422v1 |
http://arxiv.org/pdf/1803.07422v1.pdf | |
PWC | https://paperswithcode.com/paper/patch-based-image-inpainting-with-generative |
Repo | |
Framework | |
Robust Gradient Descent via Moment Encoding with LDPC Codes
Title | Robust Gradient Descent via Moment Encoding with LDPC Codes |
Authors | Raj Kumar Maity, Ankit Singh Rawat, Arya Mazumdar |
Abstract | This paper considers the problem of implementing large-scale gradient descent algorithms in a distributed computing setting in the presence of {\em straggling} processors. To mitigate the effect of the stragglers, it has been previously proposed to encode the data with an erasure-correcting code and decode at the master server at the end of the computation. We, instead, propose to encode the second-moment of the data with a low density parity-check (LDPC) code. The iterative decoding algorithms for LDPC codes have very low computational overhead and the number of decoding iterations can be made to automatically adjust with the number of stragglers in the system. We show that for a random model for stragglers, the proposed moment encoding based gradient descent method can be viewed as the stochastic gradient descent method. This allows us to obtain convergence guarantees for the proposed solution. Furthermore, the proposed moment encoding based method is shown to outperform the existing schemes in a real distributed computing setup. |
Tasks | |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08327v2 |
http://arxiv.org/pdf/1805.08327v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-gradient-descent-via-moment-encoding |
Repo | |
Framework | |
Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir
Title | Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir |
Authors | Marc-Antoine Moinnereau, Thomas Brienne, Simon Brodeur, Jean Rouat, Kevin Whittingstall, Eric Plourde |
Abstract | The use of electroencephalogram (EEG) as the main input signal in brain-machine interfaces has been widely proposed due to the non-invasive nature of the EEG. Here we are specifically interested in interfaces that extract information from the auditory system and more specifically in the task of classifying heard speech from EEGs. To do so, we propose to limit the preprocessing of the EEGs and use machine learning approaches to automatically extract their meaningful characteristics. More specifically, we use a regulated recurrent neural network (RNN) reservoir, which has been shown to outperform classic machine learning approaches when applied to several different bio-signals, and we compare it with a deep neural network approach. Moreover, we also investigate the classification performance as a function of the number of EEG electrodes. A set of 8 subjects were presented randomly with 3 different auditory stimuli (English vowels a, i and u). We obtained an excellent classification rate of 83.2% with the RNN when considering all 64 electrodes. A rate of 81.7% was achieved with only 10 electrodes. |
Tasks | EEG |
Published | 2018-04-27 |
URL | http://arxiv.org/abs/1804.10322v1 |
http://arxiv.org/pdf/1804.10322v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-auditory-stimuli-from-eeg |
Repo | |
Framework | |
Deductron – A Recurrent Neural Network
Title | Deductron – A Recurrent Neural Network |
Authors | Marek Rychlik |
Abstract | The current paper is a study in Recurrent Neural Networks (RNN), motivated by the lack of examples simple enough so that they can be thoroughly understood theoretically, but complex enough to be realistic. We constructed an example of structured data, motivated by problems from image-to-text conversion (OCR), which requires long-term memory to decode. Our data is a simple writing system, encoding characters ‘X’ and ‘O’ as their upper halves, which is possible due to symmetry of the two characters. The characters can be connected, as in some languages using cursive, such as Arabic (abjad). The string ‘XOOXXO’ may be encoded as ‘${\vee}{\wedge}\kern-1.5pt{\wedge}{\vee}\kern-1.5pt{\vee}{\wedge}$’. It follows that we may need to know arbitrarily long past to decode a current character, thus requiring long-term memory. Subsequently we constructed an RNN capable of decoding sequences encoded in this manner. Rather than by training, we constructed our RNN “by inspection”, i.e. we guessed its weights. This involved a sequence of steps. We wrote a conventional program which decodes the sequences as the example above. Subsequently, we interpreted the program as a neural network (the only example of this kind known to us). Finally, we generalized this neural network to discover a new RNN architecture whose instance is our handcrafted RNN. It turns out to be a 3 layer network, where the middle layer is capable of performing simple logical inferences; thus the name “deductron”. It is demonstrated that it is possible to train our network by simulated annealing. Also, known variants of stochastic gradient descent (SGD) methods are shown to work. |
Tasks | Optical Character Recognition |
Published | 2018-06-23 |
URL | https://arxiv.org/abs/1806.09038v3 |
https://arxiv.org/pdf/1806.09038v3.pdf | |
PWC | https://paperswithcode.com/paper/deductron-a-recurrent-neural-network |
Repo | |
Framework | |
Faster gaze prediction with dense networks and Fisher pruning
Title | Faster gaze prediction with dense networks and Fisher pruning |
Authors | Lucas Theis, Iryna Korshunova, Alykhan Tejani, Ferenc Huszár |
Abstract | Predicting human fixations from images has recently seen large improvements by leveraging deep representations which were pretrained for object recognition. However, as we show in this paper, these networks are highly overparameterized for the task of fixation prediction. We first present a simple yet principled greedy pruning method which we call Fisher pruning. Through a combination of knowledge distillation and Fisher pruning, we obtain much more runtime-efficient architectures for saliency prediction, achieving a 10x speedup for the same AUC performance as a state of the art network on the CAT2000 dataset. Speeding up single-image gaze prediction is important for many real-world applications, but it is also a crucial step in the development of video saliency models, where the amount of data to be processed is substantially larger. |
Tasks | Gaze Estimation, Gaze Prediction, Object Recognition, Saliency Prediction |
Published | 2018-01-17 |
URL | http://arxiv.org/abs/1801.05787v2 |
http://arxiv.org/pdf/1801.05787v2.pdf | |
PWC | https://paperswithcode.com/paper/faster-gaze-prediction-with-dense-networks |
Repo | |
Framework | |
Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation
Title | Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation |
Authors | Florian Piewak, Peter Pinggera, Manuel Schäfer, David Peter, Beate Schwarz, Nick Schneider, David Pfeiffer, Markus Enzweiler, Marius Zöllner |
Abstract | Mobile robots and autonomous vehicles rely on multi-modal sensor setups to perceive and understand their surroundings. Aside from cameras, LiDAR sensors represent a central component of state-of-the-art perception systems. In addition to accurate spatial perception, a comprehensive semantic understanding of the environment is essential for efficient and safe operation. In this paper we present a novel deep neural network architecture called LiLaNet for point-wise, multi-class semantic labeling of semi-dense LiDAR data. The network utilizes virtual image projections of the 3D point clouds for efficient inference. Further, we propose an automated process for large-scale cross-modal training data generation called Autolabeling, in order to boost semantic labeling performance while keeping the manual annotation effort low. The effectiveness of the proposed network architecture as well as the automated data generation process is demonstrated on a manually annotated ground truth dataset. LiLaNet is shown to significantly outperform current state-of-the-art CNN architectures for LiDAR data. Applying our automatically generated large-scale training data yields a boost of up to 14 percentage points compared to networks trained on manually annotated data only. |
Tasks | Autonomous Vehicles |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1804.09915v1 |
http://arxiv.org/pdf/1804.09915v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-lidar-based-semantic-labeling-by |
Repo | |
Framework | |
Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning
Title | Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning |
Authors | Aidin Ferdowsi, Samad Ali, Walid Saad, Narayan B. Mandayam |
Abstract | Autonomous connected vehicles (ACVs) rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication to operate effectively. This reliance on cyber components exposes ACVs to cyber and physical attacks in which an adversary can manipulate sensor readings and physically take control of an ACV. In this paper, a comprehensive framework is proposed to thwart cyber and physical attacks on ACV networks. First, an optimal safe controller for ACVs is derived to maximize the street traffic flow while minimizing the risk of accidents by optimizing ACV speed and inter-ACV spacing. It is proven that the proposed controller is robust to physical attacks which aim at making ACV systems instable. To improve the cyber-physical security of ACV systems, next, data injection attack (DIA) detection approaches are proposed to address cyber attacks on sensors and their physical impact on the ACV system. To comprehensively design the DIA detection approaches, ACV sensors are characterized in two subsets based on the availability of a-priori information about their data. For sensors having a prior information, a DIA detection approach is proposed and an optimal threshold level is derived for the difference between the actual and estimated values of sensors data which enables ACV to stay robust against cyber attacks. For sensors having no prior information, a novel multi-armed bandit (MAB) algorithm is proposed to enable ACV to securely control its motion. Simulation results show that the proposed optimal safe controller outperforms current state of the art controllers by maximizing the robustness of ACVs to physical attacks. The results also show that the proposed DIA detection approaches, compared to Kalman filtering, can improve the security of ACV sensors against cyber attacks and ultimately improve the physical robustness of an ACV system. |
Tasks | |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05298v1 |
http://arxiv.org/pdf/1812.05298v1.pdf | |
PWC | https://paperswithcode.com/paper/cyber-physical-security-and-safety-of |
Repo | |
Framework | |
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
Title | Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data |
Authors | Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang |
Abstract | The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of unlabeled images to boost captioning performance with no additional laborious annotations. We demonstrate the effectiveness of the proposed retrieval-guided method on COCO and Flickr30k captioning datasets, and show its superior captioning performance with more discriminative captions. |
Tasks | Image Captioning |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08314v3 |
http://arxiv.org/pdf/1803.08314v3.pdf | |
PWC | https://paperswithcode.com/paper/show-tell-and-discriminate-image-captioning |
Repo | |
Framework | |
On Evaluating and Comparing Open Domain Dialog Systems
Title | On Evaluating and Comparing Open Domain Dialog Systems |
Authors | Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju |
Abstract | Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the best social conversational experience. Alexa Prize provided the academic community with the unique opportunity to perform research with a live system used by millions of users. The subjectivity associated with evaluating conversations is key element underlying the challenge of building non-goal oriented dialogue systems. In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human judgement. The proposed metrics provide granular analysis of the conversational agents, which is not captured in human ratings. We show that these metrics can be used as a reasonable proxy for human judgment. We provide a mechanism to unify the metrics for selecting the top performing agents, which has also been applied throughout the Alexa Prize competition. To our knowledge, to date it is the largest setting for evaluating agents with millions of conversations and hundreds of thousands of ratings from users. We believe that this work is a step towards an automatic evaluation process for conversational AIs. |
Tasks | Goal-Oriented Dialogue Systems |
Published | 2018-01-11 |
URL | http://arxiv.org/abs/1801.03625v2 |
http://arxiv.org/pdf/1801.03625v2.pdf | |
PWC | https://paperswithcode.com/paper/on-evaluating-and-comparing-open-domain |
Repo | |
Framework | |
Fusion of multispectral satellite imagery using a cluster of graphics processing unit
Title | Fusion of multispectral satellite imagery using a cluster of graphics processing unit |
Authors | Anas M. Al-Oraiqat, E. A. Bashkov, V. Babkov, C. Titarenko |
Abstract | The paper presents a parallel implementation of existing image fusion methods on a graphical cluster. Parallel implementations of methods based on discrete wavelet transformation (Haars and Daubechies discrete wavelet transform) are developed. Experiments were performed on a cluster using GPU and CPU and performance gains were estimated for the use of the developed parallel implementations to process satellite images from satellite Landsat 7. The implementation on a graphic cluster provides performance improvement from 2 to 18 times. The quality of the considered methods was evaluated by ERGAS and QNR metrics. The results show performance gains and retaining of quality with the cluster of GPU compared to the results obtained by the authors and other researchers for a CPU and single GPU. |
Tasks | |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.00737v1 |
http://arxiv.org/pdf/1803.00737v1.pdf | |
PWC | https://paperswithcode.com/paper/fusion-of-multispectral-satellite-imagery |
Repo | |
Framework | |
Tap-based User Authentication for Smartwatches
Title | Tap-based User Authentication for Smartwatches |
Authors | Toan Nguyen, Nasir Memon |
Abstract | This paper presents TapMeIn, an eyes-free, two-factor authentication method for smartwatches. It allows users to tap a memorable melody (tap-password) of their choice anywhere on the touchscreen to unlock their watch. A user is verified based on the tap-password as well as her physiological and behavioral characteristics when tapping. Results from preliminary experiments with 41 participants show that TapMeIn could achieve an accuracy of 98.7% with a False Positive Rate of only 0.98%. In addition, TapMeIn retains its performance in different conditions such as sitting and walking. In terms of speed, TapMeIn has an average authentication time of 2 seconds. A user study with the System Usability Scale (SUS) tool suggests that TapMeIn has a high usability score. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00482v2 |
http://arxiv.org/pdf/1807.00482v2.pdf | |
PWC | https://paperswithcode.com/paper/tap-based-user-authentication-for |
Repo | |
Framework | |
Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks
Title | Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks |
Authors | Keiller Nogueira, Mauro Dalla Mura, Jocelyn Chanussot, William R. Schwartz, Jefersson A. dos Santos |
Abstract | Semantic segmentation requires methods capable of learning high-level features while dealing with large volume of data. Towards such goal, Convolutional Networks can learn specific and adaptable features based on the data. However, these networks are not capable of processing a whole remote sensing image, given its huge size. To overcome such limitation, the image is processed using fixed size patches. The definition of the input patch size is usually performed empirically (evaluating several sizes) or imposed (by network constraint). Both strategies suffer from drawbacks and could not lead to the best patch size. To alleviate this problem, several works exploited multi-context information by combining networks or layers. This process increases the number of parameters resulting in a more difficult model to train. In this work, we propose a novel technique to perform semantic segmentation of remote sensing images that exploits a multi-context paradigm without increasing the number of parameters while defining, in training time, the best patch size. The main idea is to train a dilated network with distinct patch sizes, allowing it to capture multi-context characteristics from heterogeneous contexts. While processing these varying patches, the network provides a score for each patch size, helping in the definition of the best size for the current scenario. A systematic evaluation of the proposed algorithm is conducted using four high-resolution remote sensing datasets with very distinct properties. Our results show that the proposed algorithm provides improvements in pixelwise classification accuracy when compared to state-of-the-art methods. |
Tasks | Semantic Segmentation |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.04020v3 |
http://arxiv.org/pdf/1804.04020v3.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-multi-scale-segmentation-of-remote |
Repo | |
Framework | |