January 30, 2020

3349 words 16 mins read

Paper Group ANR 454

Meta Reasoning over Knowledge Graphs. Convolutional dictionary learning based auto-encoders for natural exponential-family distributions. Robust and interpretable blind image denoising via bias-free convolutional neural networks. An Empirical Study on Leveraging Scene Graphs for Visual Question Answering. Practical Hidden Voice Attacks against Spee …

Meta Reasoning over Knowledge Graphs


Title	Meta Reasoning over Knowledge Graphs
Authors	Hong Wang, Wenhan Xiong, Mo Yu, Xiaoxiao Guo, Shiyu Chang, William Yang Wang
Abstract	The ability to reason over learned knowledge is an innate ability for humans and humans can easily master new reasoning rules with only a few demonstrations. While most existing studies on knowledge graph (KG) reasoning assume enough training examples, we study the challenging and practical problem of few-shot knowledge graph reasoning under the paradigm of meta-learning. We propose a new meta learning framework that effectively utilizes the task-specific meta information such as local graph neighbors and reasoning paths in KGs. Specifically, we design a meta-encoder that encodes the meta information into task-specific initialization parameters for different tasks. This allows our reasoning module to have diverse starting points when learning to reason over different relations, which is expected to better fit the target task. On two few-shot knowledge base completion benchmarks, we show that the augmented task-specific meta-encoder yields much better initial point than MAML and outperforms several few-shot learning baselines.
Tasks	Few-Shot Learning, Knowledge Base Completion, Knowledge Graphs, Meta-Learning
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04877v1
PDF	https://arxiv.org/pdf/1908.04877v1.pdf
PWC	https://paperswithcode.com/paper/meta-reasoning-over-knowledge-graphs
Repo
Framework

Convolutional dictionary learning based auto-encoders for natural exponential-family distributions


Title	Convolutional dictionary learning based auto-encoders for natural exponential-family distributions
Authors	Bahareh Tolooshams, Andrew H. Song, Simona Temereanca, Demba Ba
Abstract	We introduce a class of auto-encoder neural networks tailored to data from the natural exponential family (e.g., count data). The architectures are inspired by the problem of learning the filters in a convolutional generative model with sparsity constraints, often referred to as convolutional dictionary learning (CDL). Our work is the first to combine ideas from convolutional generative models and deep learning for data that are naturally modeled with a non-Gaussian distribution (e.g., binomial and Poisson). This perspective provides us with a scalable and flexible framework that can be re-purposed for a wide range of tasks and assumptions on the generative model. Specifically, the iterative optimization procedure for solving CDL, an unsupervised task, is mapped to an unfolded and constrained neural network, with iterative adjustments to the inputs to account for the generative distribution. We also show that the framework can easily be extended for discriminative training, appropriate for a supervised task. We demonstrate 1) that fitting the generative model to learn, in an unsupervised fashion, the latent stimulus that underlies neural spiking data leads to better goodness-of-fit compared to other baselines, 2) competitive performance compared to state-of-the-art algorithms for supervised Poisson image denoising, with significantly fewer parameters, and 3) gradient dynamics of shallow binomial auto-encoder.
Tasks	Denoising, Dictionary Learning, Image Denoising
Published	2019-07-07
URL	https://arxiv.org/abs/1907.03211v3
PDF	https://arxiv.org/pdf/1907.03211v3.pdf
PWC	https://paperswithcode.com/paper/deep-exponential-family-auto-encoders
Repo
Framework


Title	Robust and interpretable blind image denoising via bias-free convolutional neural networks
Authors	Sreyas Mohan, Zahra Kadkhodaie, Eero P. Simoncelli, Carlos Fernandez-Granda
Abstract	Deep convolutional networks often append additive constant (“bias”) terms to their convolution operations, enabling a richer repertoire of functional mappings. Biases are also used to facilitate training, by subtracting mean response over batches of training images (a component of “batch normalization”). Recent state-of-the-art blind denoising methods (e.g., DnCNN) seem to require these terms for their success. Here, however, we show that these networks systematically overfit the noise levels for which they are trained: when deployed at noise levels outside the training range, performance degrades dramatically. In contrast, a bias-free architecture – obtained by removing the constant terms in every layer of the network, including those used for batch normalization– generalizes robustly across noise levels, while preserving state-of-the-art performance within the training range. Locally, the bias-free network acts linearly on the noisy image, enabling direct analysis of network behavior via standard linear-algebraic tools. These analyses provide interpretations of network functionality in terms of nonlinear adaptive filtering, and projection onto a union of low-dimensional subspaces, connecting the learning-based method to more traditional denoising methodology.
Tasks	Denoising, Image Denoising
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05478v3
PDF	https://arxiv.org/pdf/1906.05478v3.pdf
PWC	https://paperswithcode.com/paper/robust-and-interpretable-blind-image
Repo
Framework

An Empirical Study on Leveraging Scene Graphs for Visual Question Answering


Title	An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Authors	Cheng Zhang, Wei-Lun Chao, Dong Xuan
Abstract	Visual question answering (Visual QA) has attracted significant attention these years. While a variety of algorithms have been proposed, most of them are built upon different combinations of image and language features as well as multi-modal attention and fusion. In this paper, we investigate an alternative approach inspired by conventional QA systems that operate on knowledge graphs. Specifically, we investigate the use of scene graphs derived from images for Visual QA: an image is abstractly represented by a graph with nodes corresponding to object entities and edges to object relationships. We adapt the recently proposed graph network (GN) to encode the scene graph and perform structured reasoning according to the input question. Our empirical studies demonstrate that scene graphs can already capture essential information of images and graph networks have the potential to outperform state-of-the-art Visual QA algorithms but with a much cleaner architecture. By analyzing the features generated by GNs we can further interpret the reasoning process, suggesting a promising direction towards explainable Visual QA.
Tasks	Knowledge Graphs, Question Answering, Visual Question Answering
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12133v1
PDF	https://arxiv.org/pdf/1907.12133v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-on-leveraging-scene-graphs
Repo
Framework

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems


Title	Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems
Authors	Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, Joseph Wilson
Abstract	Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks.
Tasks	Speaker Recognition
Published	2019-03-18
URL	http://arxiv.org/abs/1904.05734v1
PDF	http://arxiv.org/pdf/1904.05734v1.pdf
PWC	https://paperswithcode.com/paper/practical-hidden-voice-attacks-against-speech
Repo
Framework

Consensus Neural Network for Medical Imaging Denoising with Only Noisy Training Samples


Title	Consensus Neural Network for Medical Imaging Denoising with Only Noisy Training Samples
Authors	Dufan Wu, Kuang Gong, Kyungsang Kim, Quanzheng Li
Abstract	Deep neural networks have been proved efficient for medical image denoising. Current training methods require both noisy and clean images. However, clean images cannot be acquired for many practical medical applications due to naturally noisy signal, such as dynamic imaging, spectral computed tomography, arterial spin labeling magnetic resonance imaging, etc. In this paper we proposed a training method which learned denoising neural networks from noisy training samples only. Training data in the acquisition domain was split to two subsets and the network was trained to map one noisy set to the other. A consensus loss function was further proposed to efficiently combine the outputs from both subsets. A mathematical proof was provided that the proposed training scheme was equivalent to training with noisy and clean samples when the noise in the two subsets was uncorrelated and zero-mean. The method was validated on Low-dose CT Challenge dataset and NYU MRI dataset and achieved improved performance compared to existing unsupervised methods.
Tasks	Denoising, Image Denoising
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03639v1
PDF	https://arxiv.org/pdf/1906.03639v1.pdf
PWC	https://paperswithcode.com/paper/consensus-neural-network-for-medical-imaging
Repo
Framework


Title	Learning Deep Image Priors for Blind Image Denoising
Authors	Xianxu Hou, Hongming Luo, Jingxin Liu, Bolei Xu, Ke Sun, Yuanhao Gong, Bozhi Liu, Guoping Qiu
Abstract	Image denoising is the process of removing noise from noisy images, which is an image domain transferring task, i.e., from a single or several noise level domains to a photo-realistic domain. In this paper, we propose an effective image denoising method by learning two image priors from the perspective of domain alignment. We tackle the domain alignment on two levels. 1) the feature-level prior is to learn domain-invariant features for corrupted images with different level noise; 2) the pixel-level prior is used to push the denoised images to the natural image manifold. The two image priors are based on $\mathcal{H}$-divergence theory and implemented by learning classifiers in adversarial training manners. We evaluate our approach on multiple datasets. The results demonstrate the effectiveness of our approach for robust image denoising on both synthetic and real-world noisy images. Furthermore, we show that the feature-level prior is capable of alleviating the discrepancy between different level noise. It can be used to improve the blind denoising performance in terms of distortion measures (PSNR and SSIM), while pixel-level prior can effectively improve the perceptual quality to ensure the realistic outputs, which is further validated by subjective evaluation.
Tasks	Denoising, Image Denoising
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01259v1
PDF	https://arxiv.org/pdf/1906.01259v1.pdf
PWC	https://paperswithcode.com/paper/learning-deep-image-priors-for-blind-image
Repo
Framework


Title	Using natural language processing to extract health-related causality from Twitter messages
Authors	Son Doan, Elly W Yang, Sameer Tilak, Manabu Torii
Abstract	Twitter messages (tweets) contain various types of information, which include health-related information. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily life. In this work, we evaluated an approach to extracting causal relations from tweets using natural language processing (NLP) techniques. We focused on three health-related topics: stress”, “insomnia”, and “headache”. We proposed a set of lexico-syntactic patterns based on dependency parser outputs to extract causal information. A large dataset consisting of 24 million tweets were used. The results show that our approach achieved an average precision between 74.59% and 92.27%. Analysis of extracted relations revealed interesting findings about health-related in Twitter.
Tasks
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06488v1
PDF	https://arxiv.org/pdf/1911.06488v1.pdf
PWC	https://paperswithcode.com/paper/using-natural-language-processing-to-extract-2
Repo
Framework

Flow Guided Short-term Trackers with Cascade Detection for Long-term Tracking


Title	Flow Guided Short-term Trackers with Cascade Detection for Long-term Tracking
Authors	Han Wu, Xueyuan Yang, Yong Yang, Guizhong Liu
Abstract	Object tracking has been studied for decades, but most of the existing works are focused on the short-term tracking. For a long sequence, the object is often fully occluded or out of view for a long time, and existing short-term object tracking algorithms often lose the target, and it is difficult to re-catch the target even if it reappears again. In this paper a novel long-term object tracking algorithm flow_MDNet_RPN is proposed, in which a tracking result judgement module and a detection module are added to the short-term object tracking algorithm. Experiments show that the proposed long-term tracking algorithm is effective to the problem of target disappearance.
Tasks	Object Tracking
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00319v1
PDF	https://arxiv.org/pdf/1909.00319v1.pdf
PWC	https://paperswithcode.com/paper/flow-guided-short-term-trackers-with-cascade
Repo
Framework

Parallel Black-Box Complexity with Tail Bounds


Title	Parallel Black-Box Complexity with Tail Bounds
Authors	Per Kristian Lehre, Dirk Sudholt
Abstract	We propose a new black-box complexity model for search algorithms evaluating $\lambda$ search points in parallel. The parallel unary unbiased black-box complexity gives lower bounds on the number of function evaluations every parallel unary unbiased black-box algorithm needs to optimise a given problem. It captures the inertia caused by offspring populations in evolutionary algorithms and the total computational effort in parallel metaheuristics. We present complexity results for LeadingOnes and OneMax. Our main result is a general performance limit: we prove that on every function every $\lambda$-parallel unary unbiased algorithm needs at least $\Omega(\frac{\lambda n}{\ln \lambda} + n \log n)$ evaluations to find any desired target set of up to exponential size, with an overwhelming probability. This yields lower bounds for the typical optimisation time on unimodal and multimodal problems, for the time to find any local optimum, and for the time to even get close to any optimum. The power and versatility of this approach is shown for a wide range of illustrative problems from combinatorial optimisation. Our performance limits can guide parameter choice and algorithm design; we demonstrate the latter by presenting an optimal $\lambda$-parallel algorithm for OneMax that uses parallelism most effectively.
Tasks
Published	2019-01-31
URL	http://arxiv.org/abs/1902.00107v1
PDF	http://arxiv.org/pdf/1902.00107v1.pdf
PWC	https://paperswithcode.com/paper/parallel-black-box-complexity-with-tail
Repo
Framework

End-to-End Pixel-Based Deep Active Inference for Body Perception and Action


Title	End-to-End Pixel-Based Deep Active Inference for Body Perception and Action
Authors	Cansu Sancaktar, Marcel van Gerven, Pablo Lanillos
Abstract	We present a pixel-based deep active inference algorithm (PixelAI) inspired by human body perception and action. Our algorithm combines the free-energy principle from neuroscience, rooted in variational inference, with deep convolutional decoders to scale the algorithm to directly deal with raw visual input and provide online adaptive inference. Our approach is validated by studying body perception and action in a simulated and a real Nao robot. Results show that our approach allows the robot to perform 1) dynamical body estimation of its arm using only monocular camera images and 2) autonomous reaching to “imagined” arm poses in the visual space. This suggests that robot and human body perception and action can be efficiently solved by viewing both as an active inference problem guided by ongoing sensory input.
Tasks
Published	2019-12-28
URL	https://arxiv.org/abs/2001.05847v2
PDF	https://arxiv.org/pdf/2001.05847v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-pixel-based-deep-active-inference
Repo
Framework

Deep Learning for Signal Demodulation in Physical Layer Wireless Communications: Prototype Platform, Open Dataset, and Analytics


Title	Deep Learning for Signal Demodulation in Physical Layer Wireless Communications: Prototype Platform, Open Dataset, and Analytics
Authors	Hongmei Wang, Zhenzhen Wu, Shuai Ma, Songtao Lu, Han Zhang, Guoru Ding, Shiyin Li
Abstract	In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems. Specifically, we propose a flexible communication prototype platform for measuring real modulation dataset. Then, based on the measured dataset, two DL-based demodulators, called deep belief network (DBN)-support vector machine (SVM) demodulator and adaptive boosting (AdaBoost) based demodulator, are proposed. The proposed DBN-SVM based demodulator exploits the advantages of both DBN and SVM, i.e., the advantage of DBN as a feature extractor and SVM as a feature classifier. In DBN-SVM based demodulator, the received signals are normalized before being fed to the DBN network. Furthermore, an AdaBoost based demodulator is developed, which employs the $k$-Nearest Neighbor (KNN) as a weak classifier to form a strong combined classifier. Finally, experimental results indicate that the proposed DBN-SVM based demodulator and AdaBoost based demodulator are superior to the single classification method using DBN, SVM, and maximum likelihood (MLD) based demodulator.
Tasks
Published	2019-03-08
URL	http://arxiv.org/abs/1903.04297v1
PDF	http://arxiv.org/pdf/1903.04297v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-signal-demodulation-in
Repo
Framework

Do People Prefer “Natural” code?


Title	Do People Prefer “Natural” code?
Authors	Casey Casalnuovo, Kevin Lee, Hulin Wang, Prem Devanbu, Emily Morgan
Abstract	Natural code is known to be very repetitive (much more so than natural language corpora); furthermore, this repetitiveness persists, even after accounting for the simpler syntax of code. However, programming languages are very expressive, allowing a great many different ways (all clear and unambiguous) to express even very simple computations. So why is natural code repetitive? We hypothesize that the reasons for this lie in fact that code is bimodal: it is executed by machines, but also read by humans. This bimodality, we argue, leads developers to write code in certain preferred ways that would be familiar to code readers. To test this theory, we 1) model familiarity using a language model estimated over a large training corpus and 2) run an experiment applying several meaning preserving transformations to Java and Python expressions in a distinct test corpus to see if forms more familiar to readers (as predicted by the language models) are in fact the ones actually written. We find that these transformations generally produce program structures that are less common in practice, supporting the theory that the high repetitiveness in code is a matter of deliberate preference. Finally, 3) we use a human subject study to show alignment between language model score and human preference for the first time in code, providing support for using this measure to improve code.
Tasks	Language Modelling
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03704v1
PDF	https://arxiv.org/pdf/1910.03704v1.pdf
PWC	https://paperswithcode.com/paper/do-people-prefer-natural-code
Repo
Framework


Title	Joint Graph-based Depth Refinement and Normal Estimation
Authors	Mattia Rossi, Mireille El Gheche, Andreas Kuhn, Pascal Frossard
Abstract	Depth estimation is an essential component in understanding the 3D geometry of a scene, with numerous applications in urban and indoor settings. These scenes are characterized by a prevalence of human made structures, which in most of the cases, are either inherently piece-wise planar, or can be approximated as such. In these settings, we devise a novel depth refinement framework that aims at recovering the underlying piece-wise planarity of the inverse depth map. We formulate this task as an optimization problem involving a data fidelity term that minimizes the distance to the input inverse depth map, as well as a regularization that enforces a piece-wise planar solution. As for the regularization term, we model the inverse depth map as a weighted graph between pixels. The proposed regularization is designed to estimate a plane automatically at each pixel, without any need for an a priori estimation of the scene planes, and at the same time it encourages similar pixels to be assigned to the same plane. The resulting optimization problem is efficiently solved with ADAM algorithm. Experiments show that our method leads to a significant improvement in depth refinement, both visually and numerically, with respect to state-of-the-art algorithms on Middlebury, KITTI and ETH3D multi-view stereo datasets.
Tasks	Depth Estimation
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01306v1
PDF	https://arxiv.org/pdf/1912.01306v1.pdf
PWC	https://paperswithcode.com/paper/joint-graph-based-depth-refinement-and-normal
Repo
Framework

SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation


Title	SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation
Authors	Yasin Almalioglu, Mehmet Turan, Alp Eren Sari, Muhamad Risqi U. Saputra, Pedro P. B. de Gusmão, Andrew Markham, Niki Trigoni
Abstract	In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation. To overcome the data limitation, self-supervised learning has emerged as a promising alternative, exploiting constraints such as geometric and photometric consistency in the scene. In this study, we introduce a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns to jointly estimate 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabeled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without the need for IMU intrinsic parameters and/or the extrinsic calibration between the IMU and the camera. estimation and single-view depth recovery network. We provide comprehensive quantitative and qualitative evaluations of the proposed framework comparing its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.
Tasks	Calibration, Depth Estimation, Pose Estimation, Sensor Fusion, Simultaneous Localization and Mapping
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09968v1
PDF	https://arxiv.org/pdf/1911.09968v1.pdf
PWC	https://paperswithcode.com/paper/selfvio-self-supervised-deep-monocular-visual
Repo
Framework

Paper Group ANR 454

Meta Reasoning over Knowledge Graphs

Convolutional dictionary learning based auto-encoders for natural exponential-family distributions

Robust and interpretable blind image denoising via bias-free convolutional neural networks

An Empirical Study on Leveraging Scene Graphs for Visual Question Answering

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

Consensus Neural Network for Medical Imaging Denoising with Only Noisy Training Samples

Learning Deep Image Priors for Blind Image Denoising

Using natural language processing to extract health-related causality from Twitter messages

Flow Guided Short-term Trackers with Cascade Detection for Long-term Tracking

Parallel Black-Box Complexity with Tail Bounds

End-to-End Pixel-Based Deep Active Inference for Body Perception and Action

Deep Learning for Signal Demodulation in Physical Layer Wireless Communications: Prototype Platform, Open Dataset, and Analytics

Do People Prefer “Natural” code?

Joint Graph-based Depth Refinement and Normal Estimation

SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation

Paper Group ANR 218

Paper Group ANR 256

Paper Group ANR 796