January 30, 2020

2960 words 14 mins read

Paper Group ANR 405

Scattering Statistics of Generalized Spatial Poisson Point Processes. Confusion matrices and rough set data analysis. Object-oriented state editing for HRL. An Interpretable Compression and Classification System: Theory and Applications. OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference. Deep Neural Network …

Scattering Statistics of Generalized Spatial Poisson Point Processes


Title	Scattering Statistics of Generalized Spatial Poisson Point Processes
Authors	Michael Perlmutter, Jieqian He, Matthew Hirn
Abstract	We present a machine learning model for the analysis of randomly generated discrete signals, which we model as the points of a homogeneous or inhomogeneous, compound Poisson point process. Like the wavelet scattering transform introduced by S. Mallat, our construction is a mathematical model of convolutional neural networks and is naturally invariant to translations and reflections. Our model replaces wavelets with Gabor-type measurements and therefore decouples the roles of scale and frequency. We show that, with suitably chosen nonlinearities, our measurements distinguish Poisson point processes from common self-similar processes, and separate different types of Poisson point processes based on the first and second moments of the arrival intensity $\lambda(t)$, as well as the absolute moments of the charges associated to each point.
Tasks	Point Processes
Published	2019-02-10
URL	http://arxiv.org/abs/1902.03537v1
PDF	http://arxiv.org/pdf/1902.03537v1.pdf
PWC	https://paperswithcode.com/paper/scattering-statistics-of-generalized-spatial
Repo
Framework

Confusion matrices and rough set data analysis


Title	Confusion matrices and rough set data analysis
Authors	Ivo Düntsch, Günther Gediga
Abstract	A widespread approach in machine learning to evaluate the quality of a classifier is to cross – classify predicted and actual decision classes in a confusion matrix, also called error matrix. A classification tool which does not assume distributional parameters but only information contained in the data is based on the rough set data model which assumes that knowledge is given only up to a certain granularity. Using this assumption and the technique of confusion matrices, we define various indices and classifiers based on rough confusion matrices.
Tasks
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01487v1
PDF	http://arxiv.org/pdf/1902.01487v1.pdf
PWC	https://paperswithcode.com/paper/confusion-matrices-and-rough-set-data
Repo
Framework

Object-oriented state editing for HRL


Title	Object-oriented state editing for HRL
Authors	Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick
Abstract	We introduce agents that use object-oriented reasoning to consider alternate states of the world in order to more quickly find solutions to problems. Specifically, a hierarchical controller directs a low-level agent to behave as if objects in the scene were added, deleted, or modified. The actions taken by the controller are defined over a graph-based representation of the scene, with actions corresponding to adding, deleting, or editing the nodes of a graph. We present preliminary results on three environments, demonstrating that our approach can achieve similar levels of reward as non-hierarchical agents, but with better data efficiency.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14361v1
PDF	https://arxiv.org/pdf/1910.14361v1.pdf
PWC	https://paperswithcode.com/paper/object-oriented-state-editing-for-hrl
Repo
Framework

An Interpretable Compression and Classification System: Theory and Applications


Title	An Interpretable Compression and Classification System: Theory and Applications
Authors	Tzu-Wei Tseng, Kai-Jiun Yang, C. -C. Jay Kuo, Shang-Ho, Tsai
Abstract	This study proposes a low-complexity interpretable classification system. The proposed system contains three main modules including feature extraction, feature reduction, and classification. All of them are linear. Thanks to the linear property, the extracted and reduced features can be inversed to original data, like a linear transform such as Fourier transform, so that one can quantify and visualize the contribution of individual features towards the original data. Also, the reduced features and reversibility naturally endure the proposed system ability of data compression. This system can significantly compress data with a small percent deviation between the compressed and the original data. At the same time, when the compressed data is used for classification, it still achieves high testing accuracy. Furthermore, we observe that the extracted features of the proposed system can be approximated to uncorrelated Gaussian random variables. Hence, classical theory in estimation and detection can be applied for classification. This motivates us to propose using a MAP (maximum a posteriori) based classification method. As a result, the extracted features and the corresponding performance have statistical meaning and mathematically interpretable. Simulation results show that the proposed classification system not only enjoys significant reduced training and testing time but also high testing accuracy compared to the conventional schemes.
Tasks
Published	2019-07-21
URL	https://arxiv.org/abs/1907.08952v1
PDF	https://arxiv.org/pdf/1907.08952v1.pdf
PWC	https://paperswithcode.com/paper/an-interpretable-compression-and
Repo
Framework

OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference


Title	OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference
Authors	Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum
Abstract	In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embedding for (subject, object) pairs, thus cannot handle pairs absent in any existing triples; or they perform predicate-level mapping and completely ignore background evidence from individual entities, thus cannot achieve satisfying quality. We propose OpenKI to handle sparsity of OpenIE extractions by performing instance-level inference: for each entity, we encode the rich information in its neighborhood in both KB and OpenIE extractions, and leverage this information in relation inference by exploring different methods of aggregation and attention. In order to handle unseen entities, our model is designed without creating entity-specific parameters. Extensive experiments show that this method not only significantly improves state-of-the-art for conventional OpenIE extractions like ReVerb, but also boosts the performance on OpenIE from semi-structured data, where new entity pairs are abundant and data are fairly sparse.
Tasks	Open Information Extraction
Published	2019-04-12
URL	http://arxiv.org/abs/1904.12606v1
PDF	http://arxiv.org/pdf/1904.12606v1.pdf
PWC	https://paperswithcode.com/paper/190412606
Repo
Framework

Deep Neural Network for Semantic-based Text Recognition in Images


Title	Deep Neural Network for Semantic-based Text Recognition in Images
Authors	Yi Zheng, Qitong Wang, Margrit Betke
Abstract	State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel “semantic-based text recognition” (STR) deep learning model that reads text in images with the help of understanding context. STR consists of several modules. We introduce the Text Grouping and Arranging (TGA) algorithm to connect and order isolated text regions. A text-recognition network interprets isolated words. Benefiting from semantic information, a sequenceto-sequence network model efficiently corrects inaccurate and uncertain phrases produced earlier in the STR pipeline. We present experiments on two new distinct datasets that contain scanned catalog images of interior designs and photographs of protesters with hand-written signs, respectively. Our results show that our STR model outperforms a baseline method that uses state-of-the-art single-wordrecognition techniques on both datasets. STR yields a high accuracy rate of 90% on the catalog images and 71% on the more difficult protest images, suggesting its generality in recognizing text.
Tasks	Text Spotting
Published	2019-08-04
URL	https://arxiv.org/abs/1908.01403v3
PDF	https://arxiv.org/pdf/1908.01403v3.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-for-semantic-based-text
Repo
Framework

New Perspective of Interpretability of Deep Neural Networks


Title	New Perspective of Interpretability of Deep Neural Networks
Authors	Masanari Kimura, Masayuki Tanaka
Abstract	Deep neural networks (DNNs) are known as black-box models. In other words, it is difficult to interpret the internal state of the model. Improving the interpretability of DNNs is one of the hot research topics. However, at present, the definition of interpretability for DNNs is vague, and the question of what is a highly explanatory model is still controversial. To address this issue, we provide the definition of the human predictability of the model, as a part of the interpretability of the DNNs. The human predictability proposed in this paper is defined by easiness to predict the change of the inference when perturbating the model of the DNNs. In addition, we introduce one example of high human-predictable DNNs. We discuss that our definition will help to the research of the interpretability of the DNNs considering various types of applications.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.07156v1
PDF	https://arxiv.org/pdf/1909.07156v1.pdf
PWC	https://paperswithcode.com/paper/new-perspective-of-interpretability-of-deep
Repo
Framework

Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory


Title	Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory
Authors	Arash Rahnama, Andre T. Nguyen, Edward Raff
Abstract	Deep neural networks (DNNs) are vulnerable to subtle adversarial perturbations applied to the input. These adversarial perturbations, though imperceptible, can easily mislead the DNN. In this work, we take a control theoretic approach to the problem of robustness in DNNs. We treat each individual layer of the DNN as a nonlinear dynamical system and use Lyapunov theory to prove stability and robustness locally. We then proceed to prove stability and robustness globally for the entire DNN. We develop empirically tight bounds on the response of the output layer, or any hidden layer, to adversarial perturbations added to the input, or the input of hidden layers. Recent works have proposed spectral norm regularization as a solution for improving robustness against l2 adversarial attacks. Our results give new insights into how spectral norm regularization can mitigate the adversarial effects. Finally, we evaluate the power of our approach on a variety of data sets and network architectures and against some of the well-known adversarial attacks.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04636v1
PDF	https://arxiv.org/pdf/1911.04636v1.pdf
PWC	https://paperswithcode.com/paper/robust-design-of-deep-neural-networks-against
Repo
Framework

TIGEr: Text-to-Image Grounding for Image Caption Evaluation


Title	TIGEr: Text-to-Image Grounding for Image Caption Evaluation
Authors	Ming Jiang, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, Jianfeng Gao
Abstract	This paper presents a new metric called TIGEr for the automatic evaluation of image captioning systems. Popular metrics, such as BLEU and CIDEr, are based solely on text matching between reference captions and machine-generated captions, potentially leading to biased evaluations because references may not fully cover the image content and natural language is inherently ambiguous. Building upon a machine-learned text-image grounding model, TIGEr allows to evaluate caption quality not only based on how well a caption represents image content, but also on how well machine-generated captions match human-generated captions. Our empirical tests show that TIGEr has a higher consistency with human judgments than alternative existing metrics. We also comprehensively assess the metric’s effectiveness in caption evaluation by measuring the correlation between human judgments and metric scores.
Tasks	Image Captioning, Text Matching
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02050v1
PDF	https://arxiv.org/pdf/1909.02050v1.pdf
PWC	https://paperswithcode.com/paper/tiger-text-to-image-grounding-for-image
Repo
Framework

Likelihood-free approximate Gibbs sampling


Title	Likelihood-free approximate Gibbs sampling
Authors	G. S. Rodrigues, D. J. Nott, S. A. Sisson
Abstract	Likelihood-free methods such as approximate Bayesian computation (ABC) have extended the reach of statistical inference to problems with computationally intractable likelihoods. Such approaches perform well for small-to-moderate dimensional problems, but suffer a curse of dimensionality in the number of model parameters. We introduce a likelihood-free approximate Gibbs sampler that naturally circumvents the dimensionality issue by focusing on lower-dimensional conditional distributions. These distributions are estimated by flexible regression models either before the sampler is run, or adaptively during sampler implementation. As a result, and in comparison to Metropolis-Hastings based approaches, we are able to fit substantially more challenging statistical models than would otherwise be possible. We demonstrate the sampler’s performance via two simulated examples, and a real analysis of Airbnb rental prices using a intractable high-dimensional multivariate non-linear state space model containing 13,140 parameters, which presents a real challenge to standard ABC techniques.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04347v1
PDF	https://arxiv.org/pdf/1906.04347v1.pdf
PWC	https://paperswithcode.com/paper/likelihood-free-approximate-gibbs-sampling
Repo
Framework

Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications


Title	Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Authors	Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon
Abstract	Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its empirical learnability, in this work we first present a novel unsupervised algorithm to address the problem of localizing sound sources in visual scenes. In order to achieve this goal, a two-stream network structure which handles each modality with attention mechanism is developed for sound source localization. The network naturally reveals the localized response in the scene without human annotation. In addition, a new sound source dataset is developed for performance evaluation. Nevertheless, our empirical evaluation shows that the unsupervised method generates false conclusions in some cases. Thereby, we show that this false conclusion cannot be fixed without human prior knowledge due to the well-known correlation and causality mismatch misconception. To fix this issue, we extend our network to the supervised and semi-supervised network settings via a simple modification due to the general architecture of our two-stream network. We show that the false conclusions can be effectively corrected even with a small amount of supervision, i.e., semi-supervised setup. Furthermore, we present the versatility of the learned audio and visual embeddings on the cross-modal content alignment and we extend this proposed algorithm to a new application, sound saliency based automatic camera view panning in 360-degree{\deg} videos.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09649v1
PDF	https://arxiv.org/pdf/1911.09649v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-localize-sound-sources-in-visual
Repo
Framework

The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?


Title	The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?
Authors	Toby Shevlane, Allan Dafoe
Abstract	There is growing concern over the potential misuse of artificial intelligence (AI) research. Publishing scientific research can facilitate misuse of the technology, but the research can also contribute to protections against misuse. This paper addresses the balance between these two effects. Our theoretical framework elucidates the factors governing whether the published research will be more useful for attackers or defenders, such as the possibility for adequate defensive measures, or the independent discovery of the knowledge outside of the scientific community. The balance will vary across scientific fields. However, we show that the existing conversation within AI has imported concepts and conclusions from prior debates within computer security over the disclosure of software vulnerabilities. While disclosure of software vulnerabilities often favours defence, this cannot be assumed for AI research. The AI research community should consider concepts and policies from a broad set of adjacent fields, and ultimately needs to craft policy well-suited to its particular challenges.
Tasks
Published	2019-12-27
URL	https://arxiv.org/abs/2001.00463v2
PDF	https://arxiv.org/pdf/2001.00463v2.pdf
PWC	https://paperswithcode.com/paper/the-offense-defense-balance-of-scientific
Repo
Framework

Model Decay in Long-Term Tracking


Title	Model Decay in Long-Term Tracking
Authors	Efstratios Gavves, Ran Tao, Deepak K. Gupta, Arnold W. M. Smeulders
Abstract	Updating the tracker model with adverse bounding box predictions adds an unavoidable bias term to the learning. This bias term, which we refer to as model decay, offsets the learning and causes tracking drift. While its adverse affect might not be visible in short-term tracking, accumulation of this bias over a long-term can eventually lead to a permanent loss of the target. In this paper, we look at the problem of model bias from a mathematical perspective. Further, we briefly examine the effect of various sources of tracking error on model decay, using a correlation filter (ECO) and a Siamese (SINT) tracker. Based on observations and insights, we propose simple additions that help to reduce model decay in long-term tracking. The proposed tracker is evaluated on four long-term and one short term tracking benchmarks, demonstrating superior accuracy and robustness, even in 30 minute long videos.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01603v1
PDF	https://arxiv.org/pdf/1908.01603v1.pdf
PWC	https://paperswithcode.com/paper/model-decay-in-long-term-tracking
Repo
Framework

Solving Optimization Problems through Fully Convolutional Networks: an Application to the Travelling Salesman Problem


Title	Solving Optimization Problems through Fully Convolutional Networks: an Application to the Travelling Salesman Problem
Authors	Zhengxuan Ling, Xinyu Tao, Yu Zhang, Xi Chen
Abstract	In the new wave of artificial intelligence, deep learning is impacting various industries. As a closely related area, optimization algorithms greatly contribute to the development of deep learning. But the reverse applications are still insufficient. Is there any efficient way to solve certain optimization problem through deep learning? The key is to convert the optimization to a representation suitable for deep learning. In this paper, a traveling salesman problem (TSP) is studied. Considering that deep learning is good at image processing, an image representation method is proposed to transfer a TSP to an image. Based on samples of a 10 city TSP, a fully convolutional network (FCN) is used to learn the mapping from a feasible region to an optimal solution. The training process is analyzed and interpreted through stages. A visualization method is presented to show how a FCN can understand the training task of a TSP. Once the training is completed, no significant effort is required to solve a new TSP and the prediction is obtained on the scale of milliseconds. The results show good performance in finding the global optimal solution. Moreover, the developed FCN model has been demonstrated on TSP’s with different city numbers, proving excellent generalization performance.
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12243v1
PDF	https://arxiv.org/pdf/1910.12243v1.pdf
PWC	https://paperswithcode.com/paper/solving-optimization-problems-through-fully
Repo
Framework

Improving the convergence of SGD through adaptive batch sizes


Title	Improving the convergence of SGD through adaptive batch sizes
Authors	Scott Sievert, Zachary Charles
Abstract	Mini-batch stochastic gradient descent (SGD) approximates the gradient of an objective function with the average gradient of some batch of constant size. While small batch sizes can yield high-variance gradient estimates that prevent the model from learning a good model, large batches may require more data and computational effort. This work presents a method to change the batch size adaptively with model quality. We show that our method requires the same number of model updates as full-batch gradient descent while requiring the same total number of gradient computations as SGD. While this method requires evaluating the objective function, we present a passive approximation that eliminates this constraint and improves computational efficiency. We provide extensive experiments illustrating that our methods require far fewer model updates without increasing the total amount of computation.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08222v1
PDF	https://arxiv.org/pdf/1910.08222v1.pdf
PWC	https://paperswithcode.com/paper/improving-the-convergence-of-sgd-through
Repo
Framework