January 31, 2020

3401 words 16 mins read

Paper Group ANR 124

Autonomous Aerial Cinematography In Unstructured Environments With Learned Artistic Decision-Making. Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification. Exploring applications of deep reinforcement learning for real-world autonomous driving systems. Point Attention Network for Semantic Segmentati …

Autonomous Aerial Cinematography In Unstructured Environments With Learned Artistic Decision-Making


Title	Autonomous Aerial Cinematography In Unstructured Environments With Learned Artistic Decision-Making
Authors	Rogerio Bonatti, Wenshan Wang, Cherie Ho, Aayush Ahuja, Mirko Gschwindt, Efe Camci, Erdal Kayacan, Sanjiban Choudhury, Sebastian Scherer
Abstract	Aerial cinematography is revolutionizing industries that require live and dynamic camera viewpoints such as entertainment, sports, and security. However, safely piloting a drone while filming a moving target in the presence of obstacles is immensely taxing, often requiring multiple expert human operators. Hence, there is demand for an autonomous cinematographer that can reason about both geometry and scene context in real-time. Existing approaches do not address all aspects of this problem; they either require high-precision motion-capture systems or GPS tags to localize targets, rely on prior maps of the environment, plan for short time horizons, or only follow artistic guidelines specified before flight. In this work, we address the problem in its entirety and propose a complete system for real-time aerial cinematography that for the first time combines: (1) vision-based target estimation; (2) 3D signed-distance mapping for occlusion estimation; (3) efficient trajectory optimization for long time-horizon camera motion; and (4) learning-based artistic shot selection. We extensively evaluate our system both in simulation and in field experiments by filming dynamic targets moving through unstructured environments. Our results indicate that our system can operate reliably in the real world without restrictive assumptions. We also provide in-depth analysis and discussions for each module, with the hope that our design tradeoffs can generalize to other related applications. Videos of the complete system can be found at: https://youtu.be/ookhHnqmlaU.
Tasks	Decision Making, Motion Capture
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06988v1
PDF	https://arxiv.org/pdf/1910.06988v1.pdf
PWC	https://paperswithcode.com/paper/autonomous-aerial-cinematography-in
Repo
Framework

Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification


Title	Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification
Authors	Harsh Shrivastava, Eugene Bart, Bob Price, Hanjun Dai, Bo Dai, Srinivas Aluru
Abstract	We propose a new approach, called cooperative neural networks (CoNN), which uses a set of cooperatively trained neural networks to capture latent representations that exploit prior given independence structure. The model is more flexible than traditional graphical models based on exponential family distributions, but incorporates more domain specific prior structure than traditional deep networks or variational autoencoders. The framework is very general and can be used to exploit the independence structure of any graphical model. We illustrate the technique by showing that we can transfer the independence structure of the popular Latent Dirichlet Allocation (LDA) model to a cooperative neural network, CoNN-sLDA. Empirical evaluation of CoNN-sLDA on supervised text classification tasks demonstrates that the theoretical advantages of prior independence structure can be realized in practice -we demonstrate a 23% reduction in error on the challenging MultiSent data set compared to state-of-the-art.
Tasks	Text Classification
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00291v1
PDF	https://arxiv.org/pdf/1906.00291v1.pdf
PWC	https://paperswithcode.com/paper/190600291
Repo
Framework

Exploring applications of deep reinforcement learning for real-world autonomous driving systems


Title	Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Authors	Victor Talpaert, Ibrahim Sobh, B Ravi Kiran, Patrick Mannion, Senthil Yogamani, Ahmad El-Sallab, Patrick Perez
Abstract	Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind’s AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye’s path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still at its infancy in terms of usability in real-world applications. Our goal in this paper is to encourage real-world deployment of DRL in various autonomous driving (AD) applications. We first provide an overview of the tasks in autonomous driving systems, reinforcement learning algorithms and applications of DRL to AD systems. We then discuss the challenges which must be addressed to enable further progress towards real-world deployment.
Tasks	Autonomous Driving
Published	2019-01-06
URL	http://arxiv.org/abs/1901.01536v3
PDF	http://arxiv.org/pdf/1901.01536v3.pdf
PWC	https://paperswithcode.com/paper/exploring-applications-of-deep-reinforcement
Repo
Framework

Point Attention Network for Semantic Segmentation of 3D Point Clouds


Title	Point Attention Network for Semantic Segmentation of 3D Point Clouds
Authors	Mingtao Feng, Liang Zhang, Xuefei Lin, Syed Zulqarnain Gilani, Ajmal Mian
Abstract	Convolutional Neural Networks (CNNs) have performed extremely well on data represented by regularly arranged grids such as images. However, directly leveraging the classic convolution kernels or parameter sharing mechanisms on sparse 3D point clouds is inefficient due to their irregular and unordered nature. We propose a point attention network that learns rich local shape features and their contextual correlations for 3D point cloud semantic segmentation. Since the geometric distribution of the neighboring points is invariant to the point ordering, we propose a Local Attention-Edge Convolution (LAE Conv) to construct a local graph based on the neighborhood points searched in multi-directions. We assign attention coefficients to each edge and then aggregate the point features as a weighted sum of its neighbors. The learned LAE-Conv layer features are then given to a point-wise spatial attention module to generate an interdependency matrix of all points regardless of their distances, which captures long-range spatial contextual features contributing to more precise semantic information. The proposed point attention network consists of an encoder and decoder which, together with the LAE-Conv layers and the point-wise spatial attention modules, make it an end-to-end trainable network for predicting dense labels for 3D point cloud segmentation. Experiments on challenging benchmarks of 3D point clouds show that our algorithm can perform at par or better than the existing state of the art methods.
Tasks	Semantic Segmentation
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12663v1
PDF	https://arxiv.org/pdf/1909.12663v1.pdf
PWC	https://paperswithcode.com/paper/point-attention-network-for-semantic
Repo
Framework

Self-supervised speaker embeddings


Title	Self-supervised speaker embeddings
Authors	Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukas Burget
Abstract	Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers. In this paper, we explore an alternative training strategy to enable the use of unlabelled utterances in training. We propose to train speaker embedding extractors via reconstructing the frames of a target speech segment, given the inferred embedding of another speech segment of the same utterance. We do this by attaching to the standard speaker embedding extractor a decoder network, which we feed not merely with the speaker embedding, but also with the estimated phone sequence of the target frame sequence. The reconstruction loss can be used either as a single objective, or be combined with the standard speaker classification loss. In the latter case, it acts as a regularizer, encouraging generalizability to speakers unseen during training. In all cases, the proposed architectures are trained from scratch and in an end-to-end fashion. We demonstrate the benefits from the proposed approach on VoxCeleb and Speakers in the wild, and we report notable improvements over the baseline.
Tasks
Published	2019-04-06
URL	http://arxiv.org/abs/1904.03486v2
PDF	http://arxiv.org/pdf/1904.03486v2.pdf
PWC	https://paperswithcode.com/paper/self-supervised-speaker-embeddings
Repo
Framework

High Mutual Information in Representation Learning with Symmetric Variational Inference


Title	High Mutual Information in Representation Learning with Symmetric Variational Inference
Authors	Micha Livne, Kevin Swersky, David J. Fleet
Abstract	We introduce the Mutual Information Machine (MIM), a novel formulation of representation learning, using a joint distribution over the observations and latent state in an encoder/decoder framework. Our key principles are symmetry and mutual information, where symmetry encourages the encoder and decoder to learn different factorizations of the same underlying distribution, and mutual information, to encourage the learning of useful representations for downstream tasks. Our starting point is the symmetric Jensen-Shannon divergence between the encoding and decoding joint distributions, plus a mutual information encouraging regularizer. We show that this can be bounded by a tractable cross entropy loss function between the true model and a parameterized approximation, and relate this to the maximum likelihood framework. We also relate MIM to variational autoencoders (VAEs) and demonstrate that MIM is capable of learning symmetric factorizations, with high mutual information that avoids posterior collapse.
Tasks	Representation Learning
Published	2019-10-04
URL	https://arxiv.org/abs/1910.04153v1
PDF	https://arxiv.org/pdf/1910.04153v1.pdf
PWC	https://paperswithcode.com/paper/high-mutual-information-in-representation
Repo
Framework

Hyperbolic Graph Attention Network


Title	Hyperbolic Graph Attention Network
Authors	Yiding Zhang, Xiao Wang, Xunqiang Jiang, Chuan Shi, Yanfang Ye
Abstract	Graph neural network (GNN) has shown superior performance in dealing with graphs, which has attracted considerable research attention recently. However, most of the existing GNN models are primarily designed for graphs in Euclidean spaces. Recent research has proven that the graph data exhibits non-Euclidean latent anatomy. Unfortunately, there was rarely study of GNN in non-Euclidean settings so far. To bridge this gap, in this paper, we study the GNN with attention mechanism in hyperbolic spaces at the first attempt. The research of hyperbolic GNN has some unique challenges: since the hyperbolic spaces are not vector spaces, the vector operations (e.g., vector addition, subtraction, and scalar multiplication) cannot be carried. To tackle this problem, we employ the gyrovector spaces, which provide an elegant algebraic formalism for hyperbolic geometry, to transform the features in a graph; and then we propose the hyperbolic proximity based attention mechanism to aggregate the features. Moreover, as mathematical operations in hyperbolic spaces could be more complicated than those in Euclidean spaces, we further devise a novel acceleration strategy using logarithmic and exponential mappings to improve the efficiency of our proposed model. The comprehensive experimental results on four real-world datasets demonstrate the performance of our proposed hyperbolic graph attention network model, by comparisons with other state-of-the-art baseline methods.
Tasks
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03046v1
PDF	https://arxiv.org/pdf/1912.03046v1.pdf
PWC	https://paperswithcode.com/paper/hyperbolic-graph-attention-network
Repo
Framework

Anomaly Detection in High Performance Computers: A Vicinity Perspective


Title	Anomaly Detection in High Performance Computers: A Vicinity Perspective
Authors	Siavash Ghiasvand, Florina M. Ciorba
Abstract	In response to the demand for higher computational power, the number of computing nodes in high performance computers (HPC) increases rapidly. Exascale HPC systems are expected to arrive by 2020. With drastic increase in the number of HPC system components, it is expected to observe a sudden increase in the number of failures which, consequently, poses a threat to the continuous operation of the HPC systems. Detecting failures as early as possible and, ideally, predicting them, is a necessary step to avoid interruptions in HPC systems operation. Anomaly detection is a well-known general purpose approach for failure detection, in computing systems. The majority of existing methods are designed for specific architectures, require adjustments on the computing systems hardware and software, need excessive information, or pose a threat to users’ and systems’ privacy. This work proposes a node failure detection mechanism based on a vicinity-based statistical anomaly detection approach using passively collected and anonymized system log entries. Application of the proposed approach on system logs collected over 8 months indicates an anomaly detection precision between 62% to 81%.
Tasks	Anomaly Detection
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04550v1
PDF	https://arxiv.org/pdf/1906.04550v1.pdf
PWC	https://paperswithcode.com/paper/anomaly-detection-in-high-performance
Repo
Framework

Towards Controlled Transformation of Sentiment in Sentences


Title	Towards Controlled Transformation of Sentiment in Sentences
Authors	Wouter Leeftink, Gerasimos Spanakis
Abstract	An obstacle to the development of many natural language processing products is the vast amount of training examples necessary to get satisfactory results. The generation of these examples is often a tedious and time-consuming task. This paper this paper proposes a method to transform the sentiment of sentences in order to limit the work necessary to generate more training data. This means that one sentence can be transformed to an opposite sentiment sentence and should reduce by half the work required in the generation of text. The proposed pipeline consists of a sentiment classifier with an attention mechanism to highlight the short phrases that determine the sentiment of a sentence. Then, these phrases are changed to phrases of the opposite sentiment using a baseline model and an autoencoder approach. Experiments are run on both the separate parts of the pipeline as well as on the end-to-end model. The sentiment classifier is tested on its accuracy and is found to perform adequately. The autoencoder is tested on how well it is able to change the sentiment of an encoded phrase and it was found that such a task is possible. We use human evaluation to judge the performance of the full (end-to-end) pipeline and that reveals that a model using word vectors outperforms the encoder model. Numerical evaluation shows that a success rate of 54.7% is achieved on the sentiment change.
Tasks
Published	2019-01-31
URL	http://arxiv.org/abs/1901.11467v1
PDF	http://arxiv.org/pdf/1901.11467v1.pdf
PWC	https://paperswithcode.com/paper/towards-controlled-transformation-of
Repo
Framework

Bandits with Feedback Graphs and Switching Costs


Title	Bandits with Feedback Graphs and Switching Costs
Authors	Raman Arora, Teodor V. Marinov, Mehryar Mohri
Abstract	We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a \emph{switching cost} is incurred for shifting to a new action. All previously known results incur a factor proportional to the independence number of the feedback graph. We give a new algorithm whose regret guarantee depends only on the domination number of the graph. We further supplement that result with a lower bound. Finally, we also give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12189v2
PDF	https://arxiv.org/pdf/1907.12189v2.pdf
PWC	https://paperswithcode.com/paper/bandits-with-feedback-graphs-and-switching
Repo
Framework

Multi-Path Learnable Wavelet Neural Network for Image Classification


Title	Multi-Path Learnable Wavelet Neural Network for Image Classification
Authors	D. D. N. De Silva, H. W. M. K. Vithanage, K. S. D. Fernando, I. T. S. Piyatilake
Abstract	Despite the remarkable success of deep learning in pattern recognition, deep network models face the problem of training a large number of parameters. In this paper, we propose and evaluate a novel multi-path wavelet neural network architecture for image classification with far less number of trainable parameters. The model architecture consists of a multi-path layout with several levels of wavelet decompositions performed in parallel followed by fully connected layers. These decomposition operations comprise wavelet neurons with learnable parameters, which are updated during the training phase using the back-propagation algorithm. We evaluate the performance of the introduced network using common image datasets without data augmentation except for SVHN and compare the results with influential deep learning models. Our findings support the possibility of reducing the number of parameters significantly in deep neural networks without compromising its accuracy.
Tasks	Data Augmentation, Image Classification
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09775v1
PDF	https://arxiv.org/pdf/1908.09775v1.pdf
PWC	https://paperswithcode.com/paper/multi-path-learnable-wavelet-neural-network
Repo
Framework

Automatic Lumbar Spinal CT Image Segmentation with a Dual Densely Connected U-Net


Title	Automatic Lumbar Spinal CT Image Segmentation with a Dual Densely Connected U-Net
Authors	He Tang, Xiaobing Pei, Shilong Huang, Xin Li, Chao Liu
Abstract	The clinical treatment of degenerative and developmental lumbar spinal stenosis (LSS) is different. Computed tomography (CT) is helpful in distinguishing degenerative and developmental LSS due to its advantage in imaging of osseous and calcified tissues. However, boundaries of the vertebral body, spinal canal and dural sac have low contrast and hard to identify in a CT image, so the diagnosis depends heavily on the knowledge of expert surgeons and radiologists. In this paper, we develop an automatic lumbar spinal CT image segmentation method to assist LSS diagnosis. The main contributions of this paper are the following: 1) a new lumbar spinal CT image dataset is constructed that contains 2393 axial CT images collected from 279 patients, with the ground truth of pixel-level segmentation labels; 2) a dual densely connected U-shaped neural network (DDU-Net) is used to segment the spinal canal, dural sac and vertebral body in an end-to-end manner; 3) DDU-Net is capable of segmenting tissues with large scale-variant, inconspicuous edges (e.g., spinal canal) and extremely small size (e.g., dural sac); and 4) DDU-Net is practical, requiring no image preprocessing such as contrast enhancement, registration and denoising, and the running time reaches 12 FPS. In the experiment, we achieve state-of-the-art performance on the lumbar spinal image segmentation task. We expect that the technique will increase both radiology workflow efficiency and the perceived value of radiology reports for referring clinicians and patients.
Tasks	Computed Tomography (CT), Denoising, Semantic Segmentation
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09198v2
PDF	https://arxiv.org/pdf/1910.09198v2.pdf
PWC	https://paperswithcode.com/paper/automatic-lumbar-spinal-ct-image-segmentation
Repo
Framework

Competitive Coevolution as an Adversarial Approach to Dynamic Optimization


Title	Competitive Coevolution as an Adversarial Approach to Dynamic Optimization
Authors	Xiaofen Lu, Ke Tang, Stefan Menzel, Xin Yao
Abstract	Dynamic optimization, for which the objective functions change over time, has attracted intensive investigations due to the inherent uncertainty associated with many real-world problems. For its robustness with respect to noise, Evolutionary Algorithms (EAs) have been expected to have great potential for dynamic optimization. On the other hand, EAs are also criticized for its high computational complexity, which appears to be contradictory to the core requirement of real-world dynamic optimization, i.e., fast adaptation (typically in terms of wall-clock time) to the environmental change. So far, whether EAs would indeed lead to a truly effective approach for real-world dynamic optimization remain unclear. In this paper, a new framework of employing EAs in the context of dynamic optimization is explored. We suggest that, instead of online evolving (searching) solutions for the ever-changing objective function, EAs are more suitable for acquiring an archive of solutions in an offline way, which could be adopted to construct a system to provide high-quality solutions efficiently in a dynamic environment. To be specific, we first re-formulate dynamic optimization problems as static set-oriented optimization problems. Then, a particular type of EAs, namely competitive coevolution, is employed to search for the archive of solutions in an adversarial way. The general framework is instantiated for continuous dynamic constrained optimization problems, and the empirical results showed the potential of the proposed framework.
Tasks
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13529v2
PDF	https://arxiv.org/pdf/1907.13529v2.pdf
PWC	https://paperswithcode.com/paper/competitive-co-evolution-for-dynamic
Repo
Framework

Organ At Risk Segmentation with Multiple Modality


Title	Organ At Risk Segmentation with Multiple Modality
Authors	Kuan-Lun Tseng, Winston Hsu, Chun-ting Wu, Ya-Fang Shih, Fan-Yun Sun
Abstract	With the development of image segmentation in computer vision, biomedical image segmentation have achieved remarkable progress on brain tumor segmentation and Organ At Risk (OAR) segmentation. However, most of the research only uses single modality such as Computed Tomography (CT) scans while in real world scenario doctors often use multiple modalities to get more accurate result. To better leverage different modalities, we have collected a large dataset consists of 136 cases with CT and MR images which diagnosed with nasopharyngeal cancer. In this paper, we propose to use Generative Adversarial Network to perform CT to MR transformation to synthesize MR images instead of aligning two modalities. The synthesized MR can be jointly trained with CT to achieve better performance. In addition, we use instance segmentation model to extend the OAR segmentation task to segment both organs and tumor region. The collected dataset will be made public soon.
Tasks	Brain Tumor Segmentation, Computed Tomography (CT), Instance Segmentation, Semantic Segmentation
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07800v1
PDF	https://arxiv.org/pdf/1910.07800v1.pdf
PWC	https://paperswithcode.com/paper/organ-at-risk-segmentation-with-multiple
Repo
Framework

Segmentation Criteria in the Problem of Porosity Determination based on CT Scans


Title	Segmentation Criteria in the Problem of Porosity Determination based on CT Scans
Authors	V. Kokhan, M. Grigoriev, A. Buzmakov, V. Uvarov, A. Ingacheva, E. Shvets, M. Chukalina
Abstract	Porous materials are widely used in different applications, in particular they are used to create various filters. Their quality depends on parameters that characterize the internal structure such as porosity, permeability and so on. Computed tomography (CT) allows one to see the internal structure of a porous object without destroying it. The result of tomography is a gray image. To evaluate the desired parameters, the image should be segmented. Traditional intensity threshold approaches did not reliably produce correct results due to limitations with CT images quality. Errors in the evaluation of characteristics of porous materials based on segmented images can lead to the incorrect estimation of their quality and consequently to the impossibility of exploitation, financial losses and even to accidents. It is difficult to perform correctly segmentation due to the strong difference in voxel intensities of the reconstructed object and the presence of noise. Image filtering as a preprocessing procedure is used to improve the quality of segmentation. Nevertheless, there is a problem of choosing an optimal filter. In this work, a method for selecting an optimal filter based on attributive indicator of porous objects (should be free from ‘levitating stones’ inside of pores) is proposed. In this paper, we use real data where beam hardening artifacts are removed, which allows us to focus on the noise reduction process
Tasks	Computed Tomography (CT)
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07328v1
PDF	https://arxiv.org/pdf/1910.07328v1.pdf
PWC	https://paperswithcode.com/paper/segmentation-criteria-in-the-problem-of
Repo
Framework