April 3, 2020

3160 words 15 mins read

Paper Group ANR 35

Paper Group ANR 35

Feature-Driven Super-Resolution for Object Detection. Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks. Region Proposal Network with Graph Prior and IoU-Balance Loss for Landmark Detection in 3D Ultrasound. VQA-LOL: Visual Question Answering under the Lens of Logic. Quantum Computing Assisted Deep Learning fo …

Feature-Driven Super-Resolution for Object Detection

Title Feature-Driven Super-Resolution for Object Detection
Authors Bin Wang, Tao Lu, Yanduo Zhang
Abstract Although some convolutional neural networks (CNNs) based super-resolution (SR) algorithms yield good visual performances on single images recently. Most of them focus on perfect perceptual quality but ignore specific needs of subsequent detection task. This paper proposes a simple but powerful feature-driven super-resolution (FDSR) to improve the detection performance of low-resolution (LR) images. First, the proposed method uses feature-domain prior which extracts from an existing detector backbone to guide the HR image reconstruction. Then, with the aligned features, FDSR update SR parameters for better detection performance. Comparing with some state-of-the-art SR algorithms with 4$\times$ scale factor, FDSR outperforms the detection performance mAP on MS COCO validation, VOC2007 databases with good generalization to other detection networks.
Tasks Image Reconstruction, Object Detection, Super-Resolution
Published 2020-04-01
URL https://arxiv.org/abs/2004.00554v1
PDF https://arxiv.org/pdf/2004.00554v1.pdf
PWC https://paperswithcode.com/paper/feature-driven-super-resolution-for-object

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks

Title Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks
Authors Jingdong Li, Hui Zhang, Xueliang Zhang, Changliang Li
Abstract In recent decades, neural network based methods have significantly improved the performace of speech enhancement. Most of them estimate time-frequency (T-F) representation of target speech directly or indirectly, then resynthesize waveform using the estimated T-F representation. In this work, we proposed the temporal convolutional recurrent network (TCRN), an end-to-end model that directly map noisy waveform to clean waveform. The TCRN, which is combined convolution and recurrent neural network, is able to efficiently and effectively leverage short-term ang long-term information. Futuremore, we present the architecture that repeatedly downsample and upsample speech during forward propagation. We show that our model is able to improve the performance of model, compared with existing convolutional recurrent networks. Futuremore, We present several key techniques to stabilize the training process. The experimental results show that our model consistently outperforms existing speech enhancement approaches, in terms of speech intelligibility and quality.
Tasks Speech Enhancement
Published 2020-02-02
URL https://arxiv.org/abs/2002.00319v1
PDF https://arxiv.org/pdf/2002.00319v1.pdf
PWC https://paperswithcode.com/paper/single-channel-speech-enhancement-using-1

Region Proposal Network with Graph Prior and IoU-Balance Loss for Landmark Detection in 3D Ultrasound

Title Region Proposal Network with Graph Prior and IoU-Balance Loss for Landmark Detection in 3D Ultrasound
Authors Chaoyu Chen, Xin Yang, Ruobing Huang, Wenlong Shi, Shengfeng Liu, Mingrong Lin, Yuhao Huang, Yong Yang, Yuanji Zhang, Huanjia Luo, Yankai Huang, Yi Xiong, Dong Ni
Abstract 3D ultrasound (US) can facilitate detailed prenatal examinations for fetal growth monitoring. To analyze a 3D US volume, it is fundamental to identify anatomical landmarks of the evaluated organs accurately. Typical deep learning methods usually regress the coordinates directly or involve heatmap-matching. However, these methods struggle to deal with volumes with large sizes and the highly-varying positions and orientations of fetuses. In this work, we exploit an object detection framework to detect landmarks in 3D fetal facial US volumes. By regressing multiple parameters of the landmark-centered bounding box (B-box) with a strict criteria, the proposed model is able to pinpoint the exact location of the targeted landmarks. Specifically, the model uses a 3D region proposal network (RPN) to generate 3D candidate regions, followed by several 3D classification branches to select the best candidate. It also adopts an IoU-balance loss to improve communications between branches that benefits the learning process. Furthermore, it leverages a distance-based graph prior to regularize the training and helps to reduce false positive predictions. The performance of the proposed framework is evaluated on a 3D US dataset to detect five key fetal facial landmarks. Results showed the proposed method outperforms some of the state-of-the-art methods in efficacy and efficiency.
Tasks Object Detection
Published 2020-04-01
URL https://arxiv.org/abs/2004.00207v1
PDF https://arxiv.org/pdf/2004.00207v1.pdf
PWC https://paperswithcode.com/paper/region-proposal-network-with-graph-prior-and

VQA-LOL: Visual Question Answering under the Lens of Logic

Title VQA-LOL: Visual Question Answering under the Lens of Logic
Authors Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
Abstract Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate visual question answering (VQA) through the lens of logical transformation and posit that systems that seek to answer questions about images must be robust to these transformations of the question. If a VQA system is able to answer a question, it should also be able to answer the logical composition of questions. We analyze the performance of state-of-the-art models on the VQA task under these logical operations and show that they have difficulty in correctly answering such questions. We then construct an augmentation of the VQA dataset with questions containing logical operations and retrain the same models to establish a baseline. We further propose a novel methodology to train models to learn negation, conjunction, and disjunction and show improvement in learning logical composition and retaining performance on VQA. We suggest this work as a move towards embedding logical connectives in visual understanding, along with the benefits of robustness and generalizability. Our code and dataset is available online at https://www.public.asu.edu/~tgokhale/vqa_lol.html
Tasks Question Answering, Visual Question Answering
Published 2020-02-19
URL https://arxiv.org/abs/2002.08325v1
PDF https://arxiv.org/pdf/2002.08325v1.pdf
PWC https://paperswithcode.com/paper/vqa-lol-visual-question-answering-under-the

Quantum Computing Assisted Deep Learning for Fault Detection and Diagnosis in Industrial Process Systems

Title Quantum Computing Assisted Deep Learning for Fault Detection and Diagnosis in Industrial Process Systems
Authors Akshay Ajagekar, Fengqi You
Abstract Quantum computing (QC) and deep learning techniques have attracted widespread attention in the recent years. This paper proposes QC-based deep learning methods for fault diagnosis that exploit their unique capabilities to overcome the computational challenges faced by conventional data-driven approaches performed on classical computers. Deep belief networks are integrated into the proposed fault diagnosis model and are used to extract features at different levels for normal and faulty process operations. The QC-based fault diagnosis model uses a quantum computing assisted generative training process followed by discriminative training to address the shortcomings of classical algorithms. To demonstrate its applicability and efficiency, the proposed fault diagnosis method is applied to process monitoring of continuous stirred tank reactor (CSTR) and Tennessee Eastman (TE) process. The proposed QC-based deep learning approach enjoys superior fault detection and diagnosis performance with obtained average fault detection rates of 79.2% and 99.39% for CSTR and TE process, respectively.
Tasks Fault Detection
Published 2020-02-29
URL https://arxiv.org/abs/2003.00264v1
PDF https://arxiv.org/pdf/2003.00264v1.pdf
PWC https://paperswithcode.com/paper/quantum-computing-assisted-deep-learning-for

Mid-flight Propeller Failure Detection and Control of Propeller-deficient Quadcopter using Reinforcement Learning

Title Mid-flight Propeller Failure Detection and Control of Propeller-deficient Quadcopter using Reinforcement Learning
Authors Rohitkumar Arasanipalai, Aakriti Agrawal, Debasish Ghose
Abstract Quadcopters can suffer from loss of propellers in mid-flight, thus requiring a need to have a system that detects single and multiple propeller failures and an adaptive controller that stabilizes the propeller-deficient quadcopter. This paper presents reinforcement learning based controllers for quadcopters with 4, 3, and 2 (opposing) functional propellers. The system is adaptive, unlike traditional control system based controllers. In order to develop an end-to-end system, the paper also proposes a novel neural network based propeller fault detection system to detect propeller loss and switch to the appropriate controller. Our simulation results demonstrate a stable quadcopter with efficient waypoint tracking for all controllers. The detection system is able to detect propeller failure within 2.5 seconds and stabilize for all heights above 3 meters.
Tasks Fault Detection
Published 2020-02-26
URL https://arxiv.org/abs/2002.11564v1
PDF https://arxiv.org/pdf/2002.11564v1.pdf
PWC https://paperswithcode.com/paper/mid-flight-propeller-failure-detection-and

DeepMutation: A Neural Mutation Tool

Title DeepMutation: A Neural Mutation Tool
Authors Michele Tufano, Jason Kimko, Shiya Wang, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Denys Poshyvanyk
Abstract Mutation testing can be used to assess the fault-detection capabilities of a given test suite. To this aim, two characteristics of mutation testing frameworks are of paramount importance: (i) they should generate mutants that are representative of real faults; and (ii) they should provide a complete tool chain able to automatically generate, inject, and test the mutants. To address the first point, we recently proposed an approach using a Recurrent Neural Network Encoder-Decoder architecture to learn mutants from ~787k faults mined from real programs. The empirical evaluation of this approach confirmed its ability to generate mutants representative of real faults. In this paper, we address the second point, presenting DeepMutation, a tool wrapping our deep learning model into a fully automated tool chain able to generate, inject, and test mutants learned from real faults. Video: https://sites.google.com/view/learning-mutation/deepmutation
Tasks Fault Detection
Published 2020-02-12
URL https://arxiv.org/abs/2002.04760v2
PDF https://arxiv.org/pdf/2002.04760v2.pdf
PWC https://paperswithcode.com/paper/deepmutation-a-neural-mutation-tool

Learning to Zoom-in via Learning to Zoom-out: Real-world Super-resolution by Generating and Adapting Degradation

Title Learning to Zoom-in via Learning to Zoom-out: Real-world Super-resolution by Generating and Adapting Degradation
Authors Dong Gong, Wei Sun, Qinfeng Shi, Anton van den Hengel, Yanning Zhang
Abstract Most learning-based super-resolution (SR) methods aim to recover high-resolution (HR) image from a given low-resolution (LR) image via learning on LR-HR image pairs. The SR methods learned on synthetic data do not perform well in real-world, due to the domain gap between the artificially synthesized and real LR images. Some efforts are thus taken to capture real-world image pairs. The captured LR-HR image pairs usually suffer from unavoidable misalignment, which hampers the performance of end-to-end learning, however. Here, focusing on the real-world SR, we ask a different question: since misalignment is unavoidable, can we propose a method that does not need LR-HR image pairing and alignment at all and utilize real images as they are? Hence we propose a framework to learn SR from an arbitrary set of unpaired LR and HR images and see how far a step can go in such a realistic and “unsupervised” setting. To do so, we firstly train a degradation generation network to generate realistic LR images and, more importantly, to capture their distribution (i.e., learning to zoom out). Instead of assuming the domain gap has been eliminated, we minimize the discrepancy between the generated data and real data while learning a degradation adaptive SR network (i.e., learning to zoom in). The proposed unpaired method achieves state-of-the-art SR results on real-world images, even in the datasets that favor the paired-learning methods more.
Tasks Super-Resolution
Published 2020-01-08
URL https://arxiv.org/abs/2001.02381v1
PDF https://arxiv.org/pdf/2001.02381v1.pdf
PWC https://paperswithcode.com/paper/learning-to-zoom-in-via-learning-to-zoom-out

Unsupervised feature learning for speech using correspondence and Siamese networks

Title Unsupervised feature learning for speech using correspondence and Siamese networks
Authors Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper
Abstract In zero-resource settings where transcribed speech audio is unavailable, unsupervised feature learning is essential for downstream speech processing tasks. Here we compare two recent methods for frame-level acoustic feature learning. For both methods, unsupervised term discovery is used to find pairs of word examples of the same unknown type. Dynamic programming is then used to align the feature frames between each word pair, serving as weak top-down supervision for the two models. For the correspondence autoencoder (CAE), matching frames are presented as input-output pairs. The Triamese network uses a contrastive loss to reduce the distance between frames of the same predicted word type while increasing the distance between negative examples. For the first time, these feature extractors are compared on the same discrimination tasks using the same weak supervision pairs. We find that, on the two datasets considered here, the CAE outperforms the Triamese network. However, we show that a new hybrid correspondence-Triamese approach (CTriamese), consistently outperforms both the CAE and Triamese models in terms of average precision and ABX error rates on both English and Xitsonga evaluation data.
Published 2020-03-28
URL https://arxiv.org/abs/2003.12799v1
PDF https://arxiv.org/pdf/2003.12799v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-feature-learning-for-speech

Proposal of a Takagi-Sugeno Fuzzy-PI Controller Hardware

Title Proposal of a Takagi-Sugeno Fuzzy-PI Controller Hardware
Authors Sérgio N. Silva, Felipe F. Lopes, Carlos Valderrama, Marcelo A. C. Fernandes
Abstract This work proposes dedicated hardware for an intelligent control system on Field Programmable Gate Array (FPGA). The intelligent system is represented as Takagi-Sugeno Fuzzy-PI controller. The implementation uses a fully parallel strategy associated with a hybrid bit format scheme (fixed-point and other floating-point). Two hardware designs are proposed; the first one uses a single clock cycle processing architecture, and the other uses a pipeline scheme. The bit accuracy was tested by simulation with a non linear control system of a robotic manipulator. The area, throughput, and dynamic power consumption of the implemented hardware are used to validate and compare the results of this proposal. The results achieved allow that the proposal hardware can use in several applications with high-throughput, low-power and ultra-low-latency restrictions such as teleportation of robot manipulators, tactile internet, industrial automation in industry 4.0, and others.
Published 2020-03-12
URL https://arxiv.org/abs/2003.06420v1
PDF https://arxiv.org/pdf/2003.06420v1.pdf
PWC https://paperswithcode.com/paper/proposal-of-a-takagi-sugeno-fuzzy-pi

Robust Perceptual Night Vision in Thermal Colorization

Title Robust Perceptual Night Vision in Thermal Colorization
Authors Feras Almasri, Olivier Debeir
Abstract Transforming a thermal infrared image into a robust perceptual colour Visible image is an ill-posed problem due to the differences in their spectral domains and in the objects’ representations. Objects appear in one spectrum but not necessarily in the other, and the thermal signature of a single object may have different colours in its Visible representation. This makes a direct mapping from thermal to Visible images impossible and necessitates a solution that preserves texture captured in the thermal spectrum while predicting the possible colour for certain objects. In this work, a deep learning method to map the thermal signature from the thermal image’s spectrum to a Visible representation in their low-frequency space is proposed. A pan-sharpening method is then used to merge the predicted low-frequency representation with the high-frequency representation extracted from the thermal image. The proposed model generates colour values consistent with the Visible ground truth when the object does not vary much in its appearance and generates averaged grey values in other cases. The proposed method shows robust perceptual night vision images in preserving the object’s appearance and image context compared with the existing state-of-the-art.
Tasks Colorization
Published 2020-03-04
URL https://arxiv.org/abs/2003.02204v1
PDF https://arxiv.org/pdf/2003.02204v1.pdf
PWC https://paperswithcode.com/paper/robust-perceptual-night-vision-in-thermal

Capturing and Explaining Trajectory Singularities using Composite Signal Neural Networks

Title Capturing and Explaining Trajectory Singularities using Composite Signal Neural Networks
Authors Hippolyte Dubois, Patrick Le Callet, Antoine Coutrot
Abstract Spatial trajectories are ubiquitous and complex signals. Their analysis is crucial in many research fields, from urban planning to neuroscience. Several approaches have been proposed to cluster trajectories. They rely on hand-crafted features, which struggle to capture the spatio-temporal complexity of the signal, or on Artificial Neural Networks (ANNs) which can be more efficient but less interpretable. In this paper we present a novel ANN architecture designed to capture the spatio-temporal patterns characteristic of a set of trajectories, while taking into account the demographics of the navigators. Hence, our model extracts markers linked to both behaviour and demographics. We propose a composite signal analyser (CompSNN) combining three simple ANN modules. Each of these modules uses different signal representations of the trajectory while remaining interpretable. Our CompSNN performs significantly better than its modules taken in isolation and allows to visualise which parts of the signal were most useful to discriminate the trajectories.
Published 2020-03-24
URL https://arxiv.org/abs/2003.10810v1
PDF https://arxiv.org/pdf/2003.10810v1.pdf
PWC https://paperswithcode.com/paper/capturing-and-explaining-trajectory

Automatic Discovery of Political Meme Genres with Diverse Appearances

Title Automatic Discovery of Political Meme Genres with Diverse Appearances
Authors William Theisen, Joel Brogan, Pamela Bilo Thomas, Daniel Moreira, Pascal Phoa, Tim Weninger, Walter Scheirer
Abstract Forms of human communication are not static — we expect some evolution in the way information is conveyed over time because of advances in technology. One example of this phenomenon is the image-based meme, which has emerged as a dominant form of political messaging in the past decade. While originally used to spread jokes on social media, memes are now having an outsized impact on public perception of world events. A significant challenge in automatic meme analysis has been the development of a strategy to match memes from within a single genre when the appearances of the images vary. Such variation is especially common in memes exhibiting mimicry. For example, when voters perform a common hand gesture to signal their support for a candidate. In this paper we introduce a scalable automated visual recognition pipeline for discovering political meme genres of diverse appearance. This pipeline can ingest meme images from a social network, apply computer vision-based techniques to extract local features and index new images into a database, and then organize the memes into related genres. To validate this approach, we perform a large case study on the 2019 Indonesian Presidential Election using a new dataset of over two million images collected from Twitter and Instagram. Results show that this approach can discover new meme genres with visually diverse images that share common stylistic elements, paving the way forward for further work in semantic analysis and content attribution.
Published 2020-01-17
URL https://arxiv.org/abs/2001.06122v1
PDF https://arxiv.org/pdf/2001.06122v1.pdf
PWC https://paperswithcode.com/paper/automatic-discovery-of-political-meme-genres

Covariance-Robust Dynamic Watermarking

Title Covariance-Robust Dynamic Watermarking
Authors Matt Olfat, Stephen Sloan, Pedro Hespanhol, Matt Porter, Ram Vasudevan, Anil Aswani
Abstract Attack detection and mitigation strategies for cyberphysical systems (CPS) are an active area of research, and researchers have developed a variety of attack-detection tools such as dynamic watermarking. However, such methods often make assumptions that are difficult to guarantee, such as exact knowledge of the distribution of measurement noise. Here, we develop a new dynamic watermarking method that we call covariance-robust dynamic watermarking, which is able to handle uncertainties in the covariance of measurement noise. Specifically, we consider two cases. In the first this covariance is fixed but unknown, and in the second this covariance is slowly-varying. For our tests, we only require knowledge of a set within which the covariance lies. Furthermore, we connect this problem to that of algorithmic fairness and the nascent field of fair hypothesis testing, and we show that our tests satisfy some notions of fairness. Finally, we exhibit the efficacy of our tests on empirical examples chosen to reflect values observed in a standard simulation model of autonomous vehicles.
Tasks Autonomous Vehicles
Published 2020-03-31
URL https://arxiv.org/abs/2003.13908v1
PDF https://arxiv.org/pdf/2003.13908v1.pdf
PWC https://paperswithcode.com/paper/covariance-robust-dynamic-watermarking

Grid Cells Are Ubiquitous in Neural Networks

Title Grid Cells Are Ubiquitous in Neural Networks
Authors Li Songlin, Deng Yangdong, Wang Zhihua
Abstract Grid cells are believed to play an important role in both spatial and non-spatial cognition tasks. A recent study observed the emergence of grid cells in an LSTM for path integration. The connection between biological and artificial neural networks underlying the seemingly similarity, as well as the application domain of grid cells in deep neural networks (DNNs), expect further exploration. This work demonstrated that grid cells could be replicated in either pure vision based or vision guided path integration DNNs for navigation under a proper setting of training parameters. We also show that grid-like behaviors arise in feedforward DNNs for non-spatial tasks. Our findings support that the grid coding is an effective representation for both biological and artificial networks.
Published 2020-03-07
URL https://arxiv.org/abs/2003.03482v1
PDF https://arxiv.org/pdf/2003.03482v1.pdf
PWC https://paperswithcode.com/paper/grid-cells-are-ubiquitous-in-neural-networks
comments powered by Disqus