Paper Group ANR 1377
Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection. Universal Hysteresis Identification Using Extended Preisach Neural Network. Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts. Adaptive versus Standard Descent Methods and Robust …
Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection
Title | Synthesizing Diverse Lung Nodules Wherever Massively: 3D Multi-Conditional GAN-based CT Image Augmentation for Object Detection |
Authors | Changhee Han, Yoshiro Kitamura, Akira Kudo, Akimichi Ichinose, Leonardo Rundo, Yujiro Furukawa, Kazuki Umemoto, Yuanzhong Li, Hideki Nakayama |
Abstract | Accurate Computer-Assisted Diagnosis, relying on large-scale annotated pathological images, can alleviate the risk of overlooking the diagnosis. Unfortunately, in medical imaging, most available datasets are small/fragmented. To tackle this, as a Data Augmentation (DA) method, 3D conditional Generative Adversarial Networks (GANs) can synthesize desired realistic/diverse 3D images as additional training data. However, no 3D conditional GAN-based DA approach exists for general bounding box-based 3D object detection, while it can locate disease areas with physicians’ minimum annotation cost, unlike rigorous 3D segmentation. Moreover, since lesions vary in position/size/attenuation, further GAN-based DA performance requires multiple conditions. Therefore, we propose 3D Multi-Conditional GAN (MCGAN) to generate realistic/diverse 32 X 32 X 32 nodules placed naturally on lung Computed Tomography images to boost sensitivity in 3D object detection. Our MCGAN adopts two discriminators for conditioning: the context discriminator learns to classify real vs synthetic nodule/surrounding pairs with noise box-centered surroundings; the nodule discriminator attempts to classify real vs synthetic nodules with size/attenuation conditions. The results show that 3D Convolutional Neural Network-based detection can achieve higher sensitivity under any nodule size/attenuation at fixed False Positive rates and overcome the medical data paucity with the MCGAN-generated realistic nodules—even expert physicians fail to distinguish them from the real ones in Visual Turing Test. |
Tasks | 3D Object Detection, Data Augmentation, Image Augmentation, Object Detection |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04962v2 |
https://arxiv.org/pdf/1906.04962v2.pdf | |
PWC | https://paperswithcode.com/paper/synthesizing-diverse-lung-nodules-wherever |
Repo | |
Framework | |
Universal Hysteresis Identification Using Extended Preisach Neural Network
Title | Universal Hysteresis Identification Using Extended Preisach Neural Network |
Authors | Mojtaba Farrokh, Mehrdad Shafiei Dizaji, Farzad Shafiei Dizaji, Nazanin Moradinasab |
Abstract | Hysteresis phenomena have been observed in different branches of physics and engineering sciences. Therefore, several models have been proposed for hysteresis simulation in different fields; however, almost neither of them can be utilized universally. In this paper by inspiring of Preisach Neural Network which was inspired by the Preisach model that basically stemmed from Madelungs rules and using the learning capability of the neural networks, an adaptive universal model for hysteresis is introduced and called Extended Preisach Neural Network Model. It is comprised of input, output and, two hidden layers. The input and output layers contain linear neurons while the first hidden layer incorporates neurons called Deteriorating Stop neurons, which their activation function follows Deteriorating Stop operator. Deteriorating Stop operators can generate non-congruent hysteresis loops. The second hidden layer includes Sigmoidal neurons. Adding the second hidden layer, helps the neural network learn non-Masing and asymmetric hysteresis loops very smoothly. At the input layer, besides input data the rate at which input data changes, is included as well in order to give the model the capability of learning rate-dependent hysteresis loops. Hence, the proposed approach has the capability of the simulation of both rate-independent and rate-dependent hysteresis with either congruent or non-congruent loops as well as symmetric and asymmetric loops. A new hybridized algorithm has been adopted for training the model which is based on a combination of the Genetic Algorithm and the optimization method of sub-gradient with space dilatation. The generality of the proposed model has been evaluated by applying it to various hysteresis from different areas of engineering with different characteristics. The results show that the model is successful in the identification of the considered hystereses. |
Tasks | |
Published | 2019-12-22 |
URL | https://arxiv.org/abs/2001.01559v1 |
https://arxiv.org/pdf/2001.01559v1.pdf | |
PWC | https://paperswithcode.com/paper/universal-hysteresis-identification-using |
Repo | |
Framework | |
Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts
Title | Clouds of Oriented Gradients for 3D Detection of Objects, Surfaces, and Indoor Scene Layouts |
Authors | Zhile Ren, Erik B. Sudderth |
Abstract | We develop new representations and algorithms for three-dimensional (3D) object detection and spatial layout prediction in cluttered indoor scenes. We first propose a clouds of oriented gradient (COG) descriptor that links the 2D appearance and 3D pose of object categories, and thus accurately models how perspective projection affects perceived image gradients. To better represent the 3D visual styles of large objects and provide contextual cues to improve the detection of small objects, we introduce latent support surfaces. We then propose a “Manhattan voxel” representation which better captures the 3D room layout geometry of common indoor environments. Effective classification rules are learned via a latent structured prediction framework. Contextual relationships among categories and layout are captured via a cascade of classifiers, leading to holistic scene hypotheses that exceed the state-of-the-art on the SUN RGB-D database. |
Tasks | 3D Object Detection, Object Detection, Structured Prediction |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04725v1 |
https://arxiv.org/pdf/1906.04725v1.pdf | |
PWC | https://paperswithcode.com/paper/clouds-of-oriented-gradients-for-3d-detection |
Repo | |
Framework | |
Adaptive versus Standard Descent Methods and Robustness Against Adversarial Examples
Title | Adaptive versus Standard Descent Methods and Robustness Against Adversarial Examples |
Authors | Marc Khoury |
Abstract | Adversarial examples are a pervasive phenomenon of machine learning models where seemingly imperceptible perturbations to the input lead to misclassifications for otherwise statistically accurate models. In this paper we study how the choice of optimization algorithm influences the robustness of the resulting classifier to adversarial examples. Specifically we show an example of a learning problem for which the solution found by adaptive optimization algorithms exhibits qualitatively worse robustness properties against both $L_{2}$- and $L_{\infty}$-adversaries than the solution found by non-adaptive algorithms. Then we fully characterize the geometry of the loss landscape of $L_{2}$-adversarial training in least-squares linear regression. The geometry of the loss landscape is subtle and has important consequences for optimization algorithms. Finally we provide experimental evidence which suggests that non-adaptive methods consistently produce more robust models than adaptive methods. |
Tasks | |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03784v2 |
https://arxiv.org/pdf/1911.03784v2.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-versus-standard-descent-methods-and |
Repo | |
Framework | |
Deep Convolutions for In-Depth Automated Rock Typing
Title | Deep Convolutions for In-Depth Automated Rock Typing |
Authors | E. E. Baraboshkin, L. S. Ismailova, D. M. Orlov, E. A. Zhukovskaya, G. A. Kalmykov, O. V. Khotylev, E. Yu. Baraboshkin, D. A. Koroteev |
Abstract | The description of rocks is one of the most time-consuming tasks in the everyday work of a geologist, especially when very accurate description is required. We here present a method that reduces the time needed for accurate description of rocks, enabling the geologist to work more efficiently. We describe the application of methods based on color distribution analysis and feature extraction. Then we focus on a new approach, used by us, which is based on convolutional neural networks. We used several well-known neural network architectures (AlexNet, VGG, GoogLeNet, ResNet) and made a comparison of their performance. The precision of the algorithms is up to 95% on the validation set with GoogLeNet architecture. The best of the proposed algorithms can describe 50 m of full-size core in one minute. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10227v3 |
https://arxiv.org/pdf/1909.10227v3.pdf | |
PWC | https://paperswithcode.com/paper/190910227 |
Repo | |
Framework | |
Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets
Title | Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets |
Authors | Homayun Afrabandpey, Tomi Peltola, Samuel Kaski |
Abstract | Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics. Expert knowledge elicitation can help, and a strong line of work focuses on directly eliciting informative prior distributions for parameters. This either requires considerable statistical expertise or is laborious, as the emphasis has been on accuracy and not on efficiency of the process. Another line of work queries about importance of features one at a time, assuming them to be independent and hence missing covariance information. In contrast, we propose eliciting expert knowledge about pairwise feature similarities, to borrow statistical strength in the predictions, and using sequential decision making techniques to minimize the effort of the expert. Empirical results demonstrate improvement in predictive performance on both simulated and real data, in high-dimensional linear regression tasks, where we learn the covariance structure with a Gaussian process, based on sequential elicitation. |
Tasks | Decision Making |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09834v2 |
http://arxiv.org/pdf/1902.09834v2.pdf | |
PWC | https://paperswithcode.com/paper/human-in-the-loop-active-covariance-learning |
Repo | |
Framework | |
Randomized Iterative Methods for Linear Systems: Momentum, Inexactness and Gossip
Title | Randomized Iterative Methods for Linear Systems: Momentum, Inexactness and Gossip |
Authors | Nicolas Loizou |
Abstract | In the era of big data, one of the key challenges is the development of novel optimization algorithms that can accommodate vast amounts of data while at the same time satisfying constraints and limitations of the problem under study. The need to solve optimization problems is ubiquitous in essentially all quantitative areas of human endeavor, including industry and science. In the last decade there has been a surge in the demand from practitioners, in fields such as machine learning, computer vision, artificial intelligence, signal processing and data science, for new methods able to cope with these new large scale problems. In this thesis we are focusing on the design, complexity analysis and efficient implementations of such algorithms. In particular, we are interested in the development of randomized iterative methods for solving large scale linear systems, stochastic quadratic optimization problems, the best approximation problem and quadratic optimization problems. A large part of the thesis is also devoted to the development of efficient methods for obtaining average consensus on large scale networks. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12176v1 |
https://arxiv.org/pdf/1909.12176v1.pdf | |
PWC | https://paperswithcode.com/paper/randomized-iterative-methods-for-linear |
Repo | |
Framework | |
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Title | Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model |
Authors | Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg |
Abstract | In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which “translates” ASR model output into grammatically and semantically correct text. We investigate different strategies for regularizing and optimizing the model and show that extensive data augmentation and the initialization with pre-trained weights are required to achieve good performance. On the LibriSpeech benchmark, our method demonstrates significant improvement in word error rate over the baseline acoustic model with greedy decoding, especially on much noisier dev-other and test-other portions of the evaluation dataset. Our model also outperforms baseline with 6-gram language model re-scoring and approaches the performance of re-scoring with Transformer-XL neural language model. |
Tasks | Data Augmentation, Language Modelling, Speech Recognition |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10697v1 |
https://arxiv.org/pdf/1910.10697v1.pdf | |
PWC | https://paperswithcode.com/paper/correction-of-automatic-speech-recognition |
Repo | |
Framework | |
Is Attention Interpretable?
Title | Is Attention Interpretable? |
Authors | Sofia Serrano, Noah A. Smith |
Abstract | Attention mechanisms have recently boosted performance on a range of NLP tasks. Because attention layers explicitly weight input components’ representations, it is also often assumed that attention can be used to identify information that models found important (e.g., specific contextualized word tokens). We test whether that assumption holds by manipulating attention weights in already-trained text classification models and analyzing the resulting differences in their predictions. While we observe some ways in which higher attention weights correlate with greater impact on model predictions, we also find many ways in which this does not hold, i.e., where gradient-based rankings of attention weights better predict their effects than their magnitudes. We conclude that while attention noisily predicts input components’ overall importance to a model, it is by no means a fail-safe indicator. |
Tasks | Text Classification |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03731v1 |
https://arxiv.org/pdf/1906.03731v1.pdf | |
PWC | https://paperswithcode.com/paper/is-attention-interpretable |
Repo | |
Framework | |
Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping
Title | Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping |
Authors | Adam W. Harley, Shrinidhi K. Lakshmikanth, Fangyu Li, Xian Zhou, Hsiao-Yu Fish Tung, Katerina Fragkiadaki |
Abstract | Predictive coding theories suggest that the brain learns by predicting observations at various levels of abstraction. One of the most basic prediction tasks is view prediction: how would a given scene look from an alternative viewpoint? Humans excel at this task. Our ability to imagine and fill in missing information is tightly coupled with perception: we feel as if we see the world in 3 dimensions, while in fact, information from only the front surface of the world hits our retinas. This paper explores the role of view prediction in the development of 3D visual recognition. We propose neural 3D mapping networks, which take as input 2.5D (color and depth) video streams captured by a moving camera, and lift them to stable 3D feature maps of the scene, by disentangling the scene content from the motion of the camera. The model also projects its 3D feature maps to novel viewpoints, to predict and match against target views. We propose contrastive prediction losses to replace the standard color regression loss, and show that this leads to better performance on complex photorealistic data. We show that the proposed model learns visual representations useful for (1) semi-supervised learning of 3D object detectors, and (2) unsupervised learning of 3D moving object detectors, by estimating the motion of the inferred 3D feature maps in videos of dynamic scenes. To the best of our knowledge, this is the first work that empirically shows view prediction to be a scalable self-supervised task beneficial to 3D object detection. |
Tasks | 3D Object Detection, Object Detection, Representation Learning |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03764v5 |
https://arxiv.org/pdf/1906.03764v5.pdf | |
PWC | https://paperswithcode.com/paper/embodied-view-contrastive-3d-feature-learning |
Repo | |
Framework | |
Automatic difficulty management and testing in games using a framework based on behavior trees and genetic algorithms
Title | Automatic difficulty management and testing in games using a framework based on behavior trees and genetic algorithms |
Authors | Ciprian Paduraru, Miruna Paduraru |
Abstract | The diversity of agent behaviors is an important topic for the quality of video games and virtual environments in general. Offering the most compelling experience for users with different skills is a difficult task, and usually needs important manual human effort for tuning existing code. This can get even harder when dealing with adaptive difficulty systems. Our paper’s main purpose is to create a framework that can automatically create behaviors for game agents of different difficulty classes and enough diversity. In parallel with this, a second purpose is to create more automated tests for showing defects in the source code or possible logic exploits with less human effort. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04368v1 |
https://arxiv.org/pdf/1909.04368v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-difficulty-management-and-testing |
Repo | |
Framework | |
An Ensemble Dialogue System for Facts-Based Sentence Generation
Title | An Ensemble Dialogue System for Facts-Based Sentence Generation |
Authors | Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee |
Abstract | This study aims to generate responses based on real-world facts by conditioning context and external facts extracted from information websites. Our system is an ensemble system that combines three modules: generated-based module, retrieval-based module, and reranking module. Therefore, this system can return diverse and meaningful responses from various perspectives. The experiments and evaluations are conducted with the sentence generation task in Dialog System Technology Challenges 7 (DSTC7-Task2). As a result, the proposed system performed significantly better than sole modules, and worked fine at the DSTC7-Task2, specifically on the objective evaluation. |
Tasks | |
Published | 2019-02-05 |
URL | http://arxiv.org/abs/1902.01529v1 |
http://arxiv.org/pdf/1902.01529v1.pdf | |
PWC | https://paperswithcode.com/paper/an-ensemble-dialogue-system-for-facts-based |
Repo | |
Framework | |
Joint Active and Passive Beamforming Optimization for Intelligent Reflecting Surface Assisted SWIPT under QoS Constraints
Title | Joint Active and Passive Beamforming Optimization for Intelligent Reflecting Surface Assisted SWIPT under QoS Constraints |
Authors | Qingqing Wu, Rui Zhang |
Abstract | Intelligent reflecting surface (IRS) is a new and revolutionizing technology for achieving spectrum and energy efficient wireless networks. By leveraging massive low-cost passive elements that are able to reflect radio-frequency (RF) signals with adjustable phase shifts, IRS can achieve high passive beamforming gains, which are particularly appealing for improving the efficiency of RF-based wireless power transfer. Motivated by the above, we study in the paper an IRS-assisted simultaneous wireless information and power transfer (SWIPT) system. Specifically, a set of IRSs are deployed to assist in the information/power transfer from a multi-antenna access point (AP) to multiple single-antenna information users (IUs) and energy users (EUs), respectively. We aim to minimize the transmit power at the AP via jointly optimizing its transmit precoders and the reflect phase shifts at all IRSs, subject to the quality-of-service (QoS) constraints at all users, namely, the individual signal-to-interference-plus-noise ratio (SINR) constraints at IUs and energy harvesting constraints at EUs. However, this optimization problem is non-convex with intricately coupled variables, for which the existing alternating optimization approach is shown to be inefficient as the number of QoS constraints increases. To tackle this challenge, we first apply proper transformations on the QoS constraints and then propose an efficient iterative algorithm by applying the penalty-based method. Moreover, by exploiting the short-range coverage of IRSs, we further propose a low-complexity algorithm by optimizing the phase shifts of all IRSs in parallel. |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06220v1 |
https://arxiv.org/pdf/1910.06220v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-active-and-passive-beamforming |
Repo | |
Framework | |
Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction
Title | Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction |
Authors | Tu Nguyen |
Abstract | This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The key problem we addressed is to properly model both low-level (pixel based) and high-level spatial information while still preserve the temporal relations among the frames. Our approach is inspired by the recent adoption of convolutional features into a recurrent neural networks such as LSTM to jointly capture the spatio-temporal dependency. While this approach has been proven to surpass the traditional stacked CNNs (using 2D or 3D kernels) in action recognition, we observe suboptimal performance in traffic prediction setting. Therefore, we apply a number of adaptations in the frame encoder-decoder layers and in sampling procedure to better capture the high-resolution trajectories, and to increase the training efficiency. |
Tasks | Traffic Prediction, Video Prediction |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11030v3 |
https://arxiv.org/pdf/1910.11030v3.pdf | |
PWC | https://paperswithcode.com/paper/spatiotemporal-tile-based-attention-guided |
Repo | |
Framework | |
Flat2Layout: Flat Representation for Estimating Layout of General Room Types
Title | Flat2Layout: Flat Representation for Estimating Layout of General Room Types |
Authors | Chi-Wei Hsiao, Cheng Sun, Min Sun, Hwann-Tzong Chen |
Abstract | This paper proposes a new approach, Flat2Layout, for estimating general indoor room layout from a single-view RGB image whereas existing methods can only produce layout topologies captured from the box-shaped room. The proposed flat representation encodes the layout information into row vectors which are treated as the training target of the deep model. A dynamic programming based postprocessing is employed to decode the estimated flat output from the deep model into the final room layout. Flat2Layout achieves state-of-the-art performance on existing room layout benchmark. This paper also constructs a benchmark for validating the performance on general layout topologies, where Flat2Layout achieves good performance on general room types. Flat2Layout is applicable on more scenario for layout estimation and would have an impact on applications of Scene Modeling, Robotics, and Augmented Reality. |
Tasks | |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12571v1 |
https://arxiv.org/pdf/1905.12571v1.pdf | |
PWC | https://paperswithcode.com/paper/flat2layout-flat-representation-for |
Repo | |
Framework | |