January 26, 2020

3082 words 15 mins read

Paper Group ANR 1377

Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection. Universal Hysteresis Identification Using Extended Preisach Neural Network. Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts. Adaptive versus Standard Descent Methods and Robust …

Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection


Title	Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection
Authors	Changhee Han, Yoshiro Kitamura, Akira Kudo, Akimichi Ichinose, Leonardo Rundo, Yujiro Furukawa, Kazuki Umemoto, Yuanzhong Li, Hideki Nakayama
Abstract	Accurate Computer-Assisted Diagnosis, relying on large-scale annotated pathological images, can alleviate the risk of overlooking the diagnosis. Unfortunately, in medical imaging, most available datasets are small/fragmented. To tackle this, as a Data Augmentation (DA) method, 3D conditional Generative Adversarial Networks (GANs) can synthesize desired realistic/diverse 3D images as additional training data. However, no 3D conditional GAN-based DA approach exists for general bounding box-based 3D object detection, while it can locate disease areas with physicians’ minimum annotation cost, unlike rigorous 3D segmentation. Moreover, since lesions vary in position/size/attenuation, further GAN-based DA performance requires multiple conditions. Therefore, we propose 3D Multi-Conditional GAN (MCGAN) to generate realistic/diverse 32 X 32 X 32 nodules placed naturally on lung Computed Tomography images to boost sensitivity in 3D object detection. Our MCGAN adopts two discriminators for conditioning: the context discriminator learns to classify real vs synthetic nodule/surrounding pairs with noise box-centered surroundings; the nodule discriminator attempts to classify real vs synthetic nodules with size/attenuation conditions. The results show that 3D Convolutional Neural Network-based detection can achieve higher sensitivity under any nodule size/attenuation at fixed False Positive rates and overcome the medical data paucity with the MCGAN-generated realistic nodules—even expert physicians fail to distinguish them from the real ones in Visual Turing Test.
Tasks	3D Object Detection, Data Augmentation, Image Augmentation, Object Detection
Published	2019-06-12
URL	https://arxiv.org/abs/1906.04962v2
PDF	https://arxiv.org/pdf/1906.04962v2.pdf
PWC	https://paperswithcode.com/paper/synthesizing-diverse-lung-nodules-wherever
Repo
Framework

Universal Hysteresis Identification Using Extended Preisach Neural Network


Title	Universal Hysteresis Identification Using Extended Preisach Neural Network
Authors	Mojtaba Farrokh, Mehrdad Shafiei Dizaji, Farzad Shafiei Dizaji, Nazanin Moradinasab
Abstract	Hysteresis phenomena have been observed in different branches of physics and engineering sciences. Therefore, several models have been proposed for hysteresis simulation in different fields; however, almost neither of them can be utilized universally. In this paper by inspiring of Preisach Neural Network which was inspired by the Preisach model that basically stemmed from Madelungs rules and using the learning capability of the neural networks, an adaptive universal model for hysteresis is introduced and called Extended Preisach Neural Network Model. It is comprised of input, output and, two hidden layers. The input and output layers contain linear neurons while the first hidden layer incorporates neurons called Deteriorating Stop neurons, which their activation function follows Deteriorating Stop operator. Deteriorating Stop operators can generate non-congruent hysteresis loops. The second hidden layer includes Sigmoidal neurons. Adding the second hidden layer, helps the neural network learn non-Masing and asymmetric hysteresis loops very smoothly. At the input layer, besides input data the rate at which input data changes, is included as well in order to give the model the capability of learning rate-dependent hysteresis loops. Hence, the proposed approach has the capability of the simulation of both rate-independent and rate-dependent hysteresis with either congruent or non-congruent loops as well as symmetric and asymmetric loops. A new hybridized algorithm has been adopted for training the model which is based on a combination of the Genetic Algorithm and the optimization method of sub-gradient with space dilatation. The generality of the proposed model has been evaluated by applying it to various hysteresis from different areas of engineering with different characteristics. The results show that the model is successful in the identification of the considered hystereses.
Tasks
Published	2019-12-22
URL	https://arxiv.org/abs/2001.01559v1
PDF	https://arxiv.org/pdf/2001.01559v1.pdf
PWC	https://paperswithcode.com/paper/universal-hysteresis-identification-using
Repo
Framework

Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts


Title	Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts
Authors	Zhile Ren, Erik B. Sudderth
Abstract	We develop new representations and algorithms for three-dimensional (3D) object detection and spatial layout prediction in cluttered indoor scenes. We first propose a clouds of oriented gradient (COG) descriptor that links the 2D appearance and 3D pose of object categories, and thus accurately models how perspective projection affects perceived image gradients. To better represent the 3D visual styles of large objects and provide contextual cues to improve the detection of small objects, we introduce latent support surfaces. We then propose a “Manhattan voxel” representation which better captures the 3D room layout geometry of common indoor environments. Effective classification rules are learned via a latent structured prediction framework. Contextual relationships among categories and layout are captured via a cascade of classifiers, leading to holistic scene hypotheses that exceed the state-of-the-art on the SUN RGB-D database.
Tasks	3D Object Detection, Object Detection, Structured Prediction
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04725v1
PDF	https://arxiv.org/pdf/1906.04725v1.pdf
PWC	https://paperswithcode.com/paper/clouds-of-oriented-gradients-for-3d-detection
Repo
Framework

Adaptive versus Standard Descent Methods and Robustness Against Adversarial Examples


Title	Adaptive versus Standard Descent Methods and Robustness Against Adversarial Examples
Authors	Marc Khoury
Abstract	Adversarial examples are a pervasive phenomenon of machine learning models where seemingly imperceptible perturbations to the input lead to misclassifications for otherwise statistically accurate models. In this paper we study how the choice of optimization algorithm influences the robustness of the resulting classifier to adversarial examples. Specifically we show an example of a learning problem for which the solution found by adaptive optimization algorithms exhibits qualitatively worse robustness properties against both $L_{2}$- and $L_{\infty}$-adversaries than the solution found by non-adaptive algorithms. Then we fully characterize the geometry of the loss landscape of $L_{2}$-adversarial training in least-squares linear regression. The geometry of the loss landscape is subtle and has important consequences for optimization algorithms. Finally we provide experimental evidence which suggests that non-adaptive methods consistently produce more robust models than adaptive methods.
Tasks
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03784v2
PDF	https://arxiv.org/pdf/1911.03784v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-versus-standard-descent-methods-and
Repo
Framework

Deep Convolutions for In-Depth Automated Rock Typing


Title	Deep Convolutions for In-Depth Automated Rock Typing
Authors	E. E. Baraboshkin, L. S. Ismailova, D. M. Orlov, E. A. Zhukovskaya, G. A. Kalmykov, O. V. Khotylev, E. Yu. Baraboshkin, D. A. Koroteev
Abstract	The description of rocks is one of the most time-consuming tasks in the everyday work of a geologist, especially when very accurate description is required. We here present a method that reduces the time needed for accurate description of rocks, enabling the geologist to work more efficiently. We describe the application of methods based on color distribution analysis and feature extraction. Then we focus on a new approach, used by us, which is based on convolutional neural networks. We used several well-known neural network architectures (AlexNet, VGG, GoogLeNet, ResNet) and made a comparison of their performance. The precision of the algorithms is up to 95% on the validation set with GoogLeNet architecture. The best of the proposed algorithms can describe 50 m of full-size core in one minute.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10227v3
PDF	https://arxiv.org/pdf/1909.10227v3.pdf
PWC	https://paperswithcode.com/paper/190910227
Repo
Framework

Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets


Title	Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets
Authors	Homayun Afrabandpey, Tomi Peltola, Samuel Kaski
Abstract	Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics. Expert knowledge elicitation can help, and a strong line of work focuses on directly eliciting informative prior distributions for parameters. This either requires considerable statistical expertise or is laborious, as the emphasis has been on accuracy and not on efficiency of the process. Another line of work queries about importance of features one at a time, assuming them to be independent and hence missing covariance information. In contrast, we propose eliciting expert knowledge about pairwise feature similarities, to borrow statistical strength in the predictions, and using sequential decision making techniques to minimize the effort of the expert. Empirical results demonstrate improvement in predictive performance on both simulated and real data, in high-dimensional linear regression tasks, where we learn the covariance structure with a Gaussian process, based on sequential elicitation.
Tasks	Decision Making
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09834v2
PDF	http://arxiv.org/pdf/1902.09834v2.pdf
PWC	https://paperswithcode.com/paper/human-in-the-loop-active-covariance-learning
Repo
Framework

Randomized Iterative Methods for Linear Systems: Momentum, Inexactness and Gossip


Title	Randomized Iterative Methods for Linear Systems: Momentum, Inexactness and Gossip
Authors	Nicolas Loizou
Abstract	In the era of big data, one of the key challenges is the development of novel optimization algorithms that can accommodate vast amounts of data while at the same time satisfying constraints and limitations of the problem under study. The need to solve optimization problems is ubiquitous in essentially all quantitative areas of human endeavor, including industry and science. In the last decade there has been a surge in the demand from practitioners, in fields such as machine learning, computer vision, artificial intelligence, signal processing and data science, for new methods able to cope with these new large scale problems. In this thesis we are focusing on the design, complexity analysis and efficient implementations of such algorithms. In particular, we are interested in the development of randomized iterative methods for solving large scale linear systems, stochastic quadratic optimization problems, the best approximation problem and quadratic optimization problems. A large part of the thesis is also devoted to the development of efficient methods for obtaining average consensus on large scale networks.
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12176v1
PDF	https://arxiv.org/pdf/1909.12176v1.pdf
PWC	https://paperswithcode.com/paper/randomized-iterative-methods-for-linear
Repo
Framework

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model


Title	Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Authors	Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg
Abstract	In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which “translates” ASR model output into grammatically and semantically correct text. We investigate different strategies for regularizing and optimizing the model and show that extensive data augmentation and the initialization with pre-trained weights are required to achieve good performance. On the LibriSpeech benchmark, our method demonstrates significant improvement in word error rate over the baseline acoustic model with greedy decoding, especially on much noisier dev-other and test-other portions of the evaluation dataset. Our model also outperforms baseline with 6-gram language model re-scoring and approaches the performance of re-scoring with Transformer-XL neural language model.
Tasks	Data Augmentation, Language Modelling, Speech Recognition
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10697v1
PDF	https://arxiv.org/pdf/1910.10697v1.pdf
PWC	https://paperswithcode.com/paper/correction-of-automatic-speech-recognition
Repo
Framework

Is Attention Interpretable?


Title	Is Attention Interpretable?
Authors	Sofia Serrano, Noah A. Smith
Abstract	Attention mechanisms have recently boosted performance on a range of NLP tasks. Because attention layers explicitly weight input components’ representations, it is also often assumed that attention can be used to identify information that models found important (e.g., specific contextualized word tokens). We test whether that assumption holds by manipulating attention weights in already-trained text classification models and analyzing the resulting differences in their predictions. While we observe some ways in which higher attention weights correlate with greater impact on model predictions, we also find many ways in which this does not hold, i.e., where gradient-based rankings of attention weights better predict their effects than their magnitudes. We conclude that while attention noisily predicts input components’ overall importance to a model, it is by no means a fail-safe indicator.
Tasks	Text Classification
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03731v1
PDF	https://arxiv.org/pdf/1906.03731v1.pdf
PWC	https://paperswithcode.com/paper/is-attention-interpretable
Repo
Framework

Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping


Title	Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping
Authors	Adam W. Harley, Shrinidhi K. Lakshmikanth, Fangyu Li, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki
Abstract	Predictive coding theories suggest that the brain learns by predicting observations at various levels of abstraction. One of the most basic prediction tasks is view prediction: how would a given scene look from an alternative viewpoint? Humans excel at this task. Our ability to imagine and fill in missing information is tightly coupled with perception: we feel as if we see the world in 3 dimensions, while in fact, information from only the front surface of the world hits our retinas. This paper explores the role of view prediction in the development of 3D visual recognition. We propose neural 3D mapping networks, which take as input 2.5D (color and depth) video streams captured by a moving camera, and lift them to stable 3D feature maps of the scene, by disentangling the scene content from the motion of the camera. The model also projects its 3D feature maps to novel viewpoints, to predict and match against target views. We propose contrastive prediction losses to replace the standard color regression loss, and show that this leads to better performance on complex photorealistic data. We show that the proposed model learns visual representations useful for (1) semi-supervised learning of 3D object detectors, and (2) unsupervised learning of 3D moving object detectors, by estimating the motion of the inferred 3D feature maps in videos of dynamic scenes. To the best of our knowledge, this is the first work that empirically shows view prediction to be a scalable self-supervised task beneficial to 3D object detection.
Tasks	3D Object Detection, Object Detection, Representation Learning
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03764v5
PDF	https://arxiv.org/pdf/1906.03764v5.pdf
PWC	https://paperswithcode.com/paper/embodied-view-contrastive-3d-feature-learning
Repo
Framework

Automatic difficulty management and testing in games using a framework based on behavior trees and genetic algorithms


Title	Automatic difficulty management and testing in games using a framework based on behavior trees and genetic algorithms
Authors	Ciprian Paduraru, Miruna Paduraru
Abstract	The diversity of agent behaviors is an important topic for the quality of video games and virtual environments in general. Offering the most compelling experience for users with different skills is a difficult task, and usually needs important manual human effort for tuning existing code. This can get even harder when dealing with adaptive difficulty systems. Our paper’s main purpose is to create a framework that can automatically create behaviors for game agents of different difficulty classes and enough diversity. In parallel with this, a second purpose is to create more automated tests for showing defects in the source code or possible logic exploits with less human effort.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04368v1
PDF	https://arxiv.org/pdf/1909.04368v1.pdf
PWC	https://paperswithcode.com/paper/automatic-difficulty-management-and-testing
Repo
Framework

An Ensemble Dialogue System for Facts-Based Sentence Generation


Title	An Ensemble Dialogue System for Facts-Based Sentence Generation
Authors	Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee
Abstract	This study aims to generate responses based on real-world facts by conditioning context and external facts extracted from information websites. Our system is an ensemble system that combines three modules: generated-based module, retrieval-based module, and reranking module. Therefore, this system can return diverse and meaningful responses from various perspectives. The experiments and evaluations are conducted with the sentence generation task in Dialog System Technology Challenges 7 (DSTC7-Task2). As a result, the proposed system performed significantly better than sole modules, and worked fine at the DSTC7-Task2, specifically on the objective evaluation.
Tasks
Published	2019-02-05
URL	http://arxiv.org/abs/1902.01529v1
PDF	http://arxiv.org/pdf/1902.01529v1.pdf
PWC	https://paperswithcode.com/paper/an-ensemble-dialogue-system-for-facts-based
Repo
Framework

Joint Active and Passive Beamforming Optimization for Intelligent Reflecting Surface Assisted SWIPT under QoS Constraints


Title	Joint Active and Passive Beamforming Optimization for Intelligent Reflecting Surface Assisted SWIPT under QoS Constraints
Authors	Qingqing Wu, Rui Zhang
Abstract	Intelligent reflecting surface (IRS) is a new and revolutionizing technology for achieving spectrum and energy efficient wireless networks. By leveraging massive low-cost passive elements that are able to reflect radio-frequency (RF) signals with adjustable phase shifts, IRS can achieve high passive beamforming gains, which are particularly appealing for improving the efficiency of RF-based wireless power transfer. Motivated by the above, we study in the paper an IRS-assisted simultaneous wireless information and power transfer (SWIPT) system. Specifically, a set of IRSs are deployed to assist in the information/power transfer from a multi-antenna access point (AP) to multiple single-antenna information users (IUs) and energy users (EUs), respectively. We aim to minimize the transmit power at the AP via jointly optimizing its transmit precoders and the reflect phase shifts at all IRSs, subject to the quality-of-service (QoS) constraints at all users, namely, the individual signal-to-interference-plus-noise ratio (SINR) constraints at IUs and energy harvesting constraints at EUs. However, this optimization problem is non-convex with intricately coupled variables, for which the existing alternating optimization approach is shown to be inefficient as the number of QoS constraints increases. To tackle this challenge, we first apply proper transformations on the QoS constraints and then propose an efficient iterative algorithm by applying the penalty-based method. Moreover, by exploiting the short-range coverage of IRSs, we further propose a low-complexity algorithm by optimizing the phase shifts of all IRSs in parallel.
Tasks
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06220v1
PDF	https://arxiv.org/pdf/1910.06220v1.pdf
PWC	https://paperswithcode.com/paper/joint-active-and-passive-beamforming
Repo
Framework

Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction


Title	Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction
Authors	Tu Nguyen
Abstract	This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The key problem we addressed is to properly model both low-level (pixel based) and high-level spatial information while still preserve the temporal relations among the frames. Our approach is inspired by the recent adoption of convolutional features into a recurrent neural networks such as LSTM to jointly capture the spatio-temporal dependency. While this approach has been proven to surpass the traditional stacked CNNs (using 2D or 3D kernels) in action recognition, we observe suboptimal performance in traffic prediction setting. Therefore, we apply a number of adaptations in the frame encoder-decoder layers and in sampling procedure to better capture the high-resolution trajectories, and to increase the training efficiency.
Tasks	Traffic Prediction, Video Prediction
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11030v3
PDF	https://arxiv.org/pdf/1910.11030v3.pdf
PWC	https://paperswithcode.com/paper/spatiotemporal-tile-based-attention-guided
Repo
Framework

Flat2Layout: Flat Representation for Estimating Layout of General Room Types


Title	Flat2Layout: Flat Representation for Estimating Layout of General Room Types
Authors	Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen
Abstract	This paper proposes a new approach, Flat2Layout, for estimating general indoor room layout from a single-view RGB image whereas existing methods can only produce layout topologies captured from the box-shaped room. The proposed flat representation encodes the layout information into row vectors which are treated as the training target of the deep model. A dynamic programming based postprocessing is employed to decode the estimated flat output from the deep model into the final room layout. Flat2Layout achieves state-of-the-art performance on existing room layout benchmark. This paper also constructs a benchmark for validating the performance on general layout topologies, where Flat2Layout achieves good performance on general room types. Flat2Layout is applicable on more scenario for layout estimation and would have an impact on applications of Scene Modeling, Robotics, and Augmented Reality.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12571v1
PDF	https://arxiv.org/pdf/1905.12571v1.pdf
PWC	https://paperswithcode.com/paper/flat2layout-flat-representation-for
Repo
Framework