May 5, 2019

2668 words 13 mins read

Paper Group ANR 529

Using Artificial Intelligence to Identify State Secrets. Hierarchical Question Answering for Long Documents. Encoding Data for HTM Systems. Improving abcdSAT by At-Least-One Recently Used Clause Management Strategy. Kalman’s shrinkage for wavelet-based despeckling of SAR images. Generalising the Discriminative Restricted Boltzmann Machine. Automati …

Using Artificial Intelligence to Identify State Secrets


Title	Using Artificial Intelligence to Identify State Secrets
Authors	Renato Rocha Souza, Flavio Codeco Coelho, Rohan Shah, Matthew Connelly
Abstract	Whether officials can be trusted to protect national security information has become a matter of great public controversy, reigniting a long-standing debate about the scope and nature of official secrecy. The declassification of millions of electronic records has made it possible to analyze these issues with greater rigor and precision. Using machine-learning methods, we examined nearly a million State Department cables from the 1970s to identify features of records that are more likely to be classified, such as international negotiations, military operations, and high-level communications. Even with incomplete data, algorithms can use such features to identify 90% of classified cables with <11% false positives. But our results also show that there are longstanding problems in the identification of sensitive information. Error analysis reveals many examples of both overclassification and underclassification. This indicates both the need for research on inter-coder reliability among officials as to what constitutes classified material and the opportunity to develop recommender systems to better manage both classification and declassification.
Tasks	Recommendation Systems
Published	2016-11-01
URL	http://arxiv.org/abs/1611.00356v1
PDF	http://arxiv.org/pdf/1611.00356v1.pdf
PWC	https://paperswithcode.com/paper/using-artificial-intelligence-to-identify
Repo
Framework

Hierarchical Question Answering for Long Documents


Title	Hierarchical Question Answering for Long Documents
Authors	Eunsol Choi, Daniel Hewlett, Alexandre Lacoste, Illia Polosukhin, Jakob Uszkoreit, Jonathan Berant
Abstract	We present a framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models. While most successful approaches for reading comprehension rely on recurrent neural networks (RNNs), running them over long documents is prohibitively slow because it is difficult to parallelize over sequences. Inspired by how people first skim the document, identify relevant parts, and carefully read these parts to produce an answer, we combine a coarse, fast model for selecting relevant sentences and a more expensive RNN for producing the answer from those sentences. We treat sentence selection as a latent variable trained jointly from the answer only using reinforcement learning. Experiments demonstrate the state of the art performance on a challenging subset of the Wikireading and on a new dataset, while speeding up the model by 3.5x-6.7x.
Tasks	Question Answering, Reading Comprehension
Published	2016-11-06
URL	http://arxiv.org/abs/1611.01839v2
PDF	http://arxiv.org/pdf/1611.01839v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-question-answering-for-long
Repo
Framework

Encoding Data for HTM Systems


Title	Encoding Data for HTM Systems
Authors	Scott Purdy
Abstract	Hierarchical Temporal Memory (HTM) is a biologically inspired machine intelligence technology that mimics the architecture and processes of the neocortex. In this white paper we describe how to encode data as Sparse Distributed Representations (SDRs) for use in HTM systems. We explain several existing encoders, which are available through the open source project called NuPIC, and we discuss requirements for creating encoders for new types of data.
Tasks
Published	2016-02-18
URL	http://arxiv.org/abs/1602.05925v1
PDF	http://arxiv.org/pdf/1602.05925v1.pdf
PWC	https://paperswithcode.com/paper/encoding-data-for-htm-systems
Repo
Framework

Improving abcdSAT by At-Least-One Recently Used Clause Management Strategy


Title	Improving abcdSAT by At-Least-One Recently Used Clause Management Strategy
Authors	Jingchao Chen
Abstract	We improve further the 2015 version of abcdSAT by various heuristics such as at-least-one recently used strategy, learnt clause database approximation reduction etc. Based on the requirement of different tracks at the SAT Competition 2016, we develop three versions of abcdSAT: drup, inc and lim, which participate in the competition of main (agile), incremental library and no-limit track, respectively.
Tasks
Published	2016-05-05
URL	http://arxiv.org/abs/1605.01622v1
PDF	http://arxiv.org/pdf/1605.01622v1.pdf
PWC	https://paperswithcode.com/paper/improving-abcdsat-by-at-least-one-recently
Repo
Framework

Kalman’s shrinkage for wavelet-based despeckling of SAR images


Title	Kalman’s shrinkage for wavelet-based despeckling of SAR images
Authors	Mario Mastriani, Alberto E. Giraldez
Abstract	In this paper, a new probability density function (pdf) is proposed to model the statistics of wavelet coefficients, and a simple Kalman’s filter is derived from the new pdf using Bayesian estimation theory. Specifically, we decompose the speckled image into wavelet subbands, we apply the Kalman’s filter to the high subbands, and reconstruct a despeckled image from the modified detail coefficients. Experimental results demonstrate that our method compares favorably to several other despeckling methods on test synthetic aperture radar (SAR) images.
Tasks
Published	2016-07-31
URL	http://arxiv.org/abs/1608.00273v1
PDF	http://arxiv.org/pdf/1608.00273v1.pdf
PWC	https://paperswithcode.com/paper/kalmans-shrinkage-for-wavelet-based
Repo
Framework

Generalising the Discriminative Restricted Boltzmann Machine


Title	Generalising the Discriminative Restricted Boltzmann Machine
Authors	Srikanth Cherla, Son N Tran, Tillman Weyde, Artur d’Avila Garcez
Abstract	We present a novel theoretical result that generalises the Discriminative Restricted Boltzmann Machine (DRBM). While originally the DRBM was defined assuming the {0, 1}-Bernoulli distribution in each of its hidden units, this result makes it possible to derive cost functions for variants of the DRBM that utilise other distributions, including some that are often encountered in the literature. This is illustrated with the Binomial and {-1, +1}-Bernoulli distributions here. We evaluate these two DRBM variants and compare them with the original one on three benchmark datasets, namely the MNIST and USPS digit classification datasets, and the 20 Newsgroups document classification dataset. Results show that each of the three compared models outperforms the remaining two in one of the three datasets, thus indicating that the proposed theoretical generalisation of the DRBM may be valuable in practice.
Tasks	Document Classification
Published	2016-04-06
URL	http://arxiv.org/abs/1604.01806v1
PDF	http://arxiv.org/pdf/1604.01806v1.pdf
PWC	https://paperswithcode.com/paper/generalising-the-discriminative-restricted
Repo
Framework

Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection


Title	Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection
Authors	Maurilio Di Cicco, Ciro Potena, Giorgio Grisetti, Alberto Pretto
Abstract	Selective weeding is one of the key challenges in the field of agriculture robotics. To accomplish this task, a farm robot should be able to accurately detect plants and to distinguish them between crop and weeds. Most of the promising state-of-the-art approaches make use of appearance-based models trained on large annotated datasets. Unfortunately, creating large agricultural datasets with pixel-level annotations is an extremely time consuming task, actually penalizing the usage of data-driven techniques. In this paper, we face this problem by proposing a novel and effective approach that aims to dramatically minimize the human intervention needed to train the detection and classification algorithms. The idea is to procedurally generate large synthetic training datasets randomizing the key features of the target environment (i.e., crop and weed species, type of soil, light conditions). More specifically, by tuning these model parameters, and exploiting a few real-world textures, it is possible to render a large amount of realistic views of an artificial agricultural scenario with no effort. The generated data can be directly used to train the model or to supplement real-world images. We validate the proposed methodology by using as testbed a modern deep learning based image segmentation architecture. We compare the classification results obtained using both real and synthetic images as training data. The reported results confirm the effectiveness and the potentiality of our approach.
Tasks	Semantic Segmentation
Published	2016-12-09
URL	http://arxiv.org/abs/1612.03019v3
PDF	http://arxiv.org/pdf/1612.03019v3.pdf
PWC	https://paperswithcode.com/paper/automatic-model-based-dataset-generation-for
Repo
Framework

Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves


Title	Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves
Authors	Fei Tian, Bin Gao, Di He, Tie-Yan Liu
Abstract	We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence. Different from conventional topic models that largely ignore the sequential order of words or their topic coherence, SLRTM gives full characterization to them by using a Recurrent Neural Networks (RNN) based framework. Experimental results have shown that SLRTM outperforms several strong baselines on various tasks. Furthermore, SLRTM can automatically generate sentences given a topic (i.e., topics to sentences), which is a key technology for real world applications such as personalized short text conversation.
Tasks	Short-Text Conversation, Topic Models
Published	2016-04-07
URL	http://arxiv.org/abs/1604.02038v2
PDF	http://arxiv.org/pdf/1604.02038v2.pdf
PWC	https://paperswithcode.com/paper/sentence-level-recurrent-topic-model-letting
Repo
Framework

Using Spatial Pooler of Hierarchical Temporal Memory to classify noisy videos with predefined complexity


Title	Using Spatial Pooler of Hierarchical Temporal Memory to classify noisy videos with predefined complexity
Authors	Maciej Wielgosz, Marcin Pietroń
Abstract	This paper examines the performance of a Spatial Pooler (SP) of a Hierarchical Temporal Memory (HTM) in the task of noisy object recognition. To address this challenge, a dedicated custom-designed system based on the SP, histogram calculation module and SVM classifier was implemented. In addition to implementing their own version of HTM, the authors also designed a profiler which is capable of tracing all of the key parameters of the system. This was necessary, since an analysis and monitoring of the system performance turned out to be extremely difficult using conventional testing and debugging tools. The system was initially trained on artificially prepared videos without noise and then tested with a set of noisy video streams. This approach was intended to mimic a real life scenario where an agent or a system trained to deal with ideal objects faces a task of classifying distorted and noisy ones in its regular working conditions. The authors conducted a series of experiments for various macro parameters of HTM SP, as well as for different levels of video reduction ratios. The experiments allowed them to evaluate the performance of two different system setups (i.e. ‘Multiple HTMs’ and ‘Single HTM’) under various noise conditions with 32–frame video files. Results of all the tests were compared to SVM baseline setup. It was determined that the system featuring SP is capable of achieving approximately 12 times the noise reduction for a video signal with with distorted bits accounting for 13% of the total. Furthermore, the system featuring SP performed better also in the experiments without a noise component and achieved a max F1 score of 0.96. The experiments also revealed that a rise of column and synapse number of SP has a substantial impact on the performance of the system. Consequently, the highest F1 score values were obtained for 256 and 4096 synapses and columns respectively.
Tasks	Object Recognition
Published	2016-09-10
URL	http://arxiv.org/abs/1609.03093v2
PDF	http://arxiv.org/pdf/1609.03093v2.pdf
PWC	https://paperswithcode.com/paper/using-spatial-pooler-of-hierarchical-temporal
Repo
Framework

Feature Descriptors for Tracking by Detection: a Benchmark


Title	Feature Descriptors for Tracking by Detection: a Benchmark
Authors	Alessandro Pieropan, Mårten Björkman, Niklas Bergström, Danica Kragic
Abstract	In this paper, we provide an extensive evaluation of the performance of local descriptors for tracking applications. Many different descriptors have been proposed in the literature for a wide range of application in computer vision such as object recognition and 3D reconstruction. More recently, due to fast key-point detectors, local image features can be used in online tracking frameworks. However, while much effort has been spent on evaluating their performance in terms of distinctiveness and robustness to image transformations, very little has been done in the contest of tracking. Our evaluation is performed in terms of distinctiveness, tracking precision and tracking speed. Our results show that binary descriptors like ORB or BRISK have comparable results to SIFT or AKAZE due to a higher number of key-points.
Tasks	3D Reconstruction, Object Recognition
Published	2016-07-21
URL	http://arxiv.org/abs/1607.06178v1
PDF	http://arxiv.org/pdf/1607.06178v1.pdf
PWC	https://paperswithcode.com/paper/feature-descriptors-for-tracking-by-detection
Repo
Framework

Reproducible Pattern Recognition Research: The Case of Optimistic SSL


Title	Reproducible Pattern Recognition Research: The Case of Optimistic SSL
Authors	Jesse H. Krijthe, Marco Loog
Abstract	In this paper, we discuss the approaches we took and trade-offs involved in making a paper on a conceptual topic in pattern recognition research fully reproducible. We discuss our definition of reproducibility, the tools used, how the analysis was set up, show some examples of alternative analyses the code enables and discuss our views on reproducibility.
Tasks
Published	2016-12-27
URL	http://arxiv.org/abs/1612.08650v1
PDF	http://arxiv.org/pdf/1612.08650v1.pdf
PWC	https://paperswithcode.com/paper/reproducible-pattern-recognition-research-the
Repo
Framework

Discrete Distribution Estimation under Local Privacy


Title	Discrete Distribution Estimation under Local Privacy
Authors	Peter Kairouz, Keith Bonawitz, Daniel Ramage
Abstract	The collection and analysis of user data drives improvements in the app and web ecosystems, but comes with risks to privacy. This paper examines discrete distribution estimation under local privacy, a setting wherein service providers can learn the distribution of a categorical statistic of interest without collecting the underlying data. We present new mechanisms, including hashed K-ary Randomized Response (KRR), that empirically meet or exceed the utility of existing mechanisms at all privacy levels. New theoretical results demonstrate the order-optimality of KRR and the existing RAPPOR mechanism at different privacy regimes.
Tasks
Published	2016-02-24
URL	http://arxiv.org/abs/1602.07387v3
PDF	http://arxiv.org/pdf/1602.07387v3.pdf
PWC	https://paperswithcode.com/paper/discrete-distribution-estimation-under-local
Repo
Framework

On the Sample Complexity of Learning Graphical Games


Title	On the Sample Complexity of Learning Graphical Games
Authors	Jean Honorio
Abstract	We analyze the sample complexity of learning graphical games from purely behavioral data. We assume that we can only observe the players’ joint actions and not their payoffs. We analyze the sufficient and necessary number of samples for the correct recovery of the set of pure-strategy Nash equilibria (PSNE) of the true game. Our analysis focuses on directed graphs with $n$ nodes and at most $k$ parents per node. Sparse graphs correspond to ${k \in O(1)}$ with respect to $n$, while dense graphs correspond to ${k \in O(n)}$. By using VC dimension arguments, we show that if the number of samples is greater than ${O(k n \log^2{n})}$ for sparse graphs or ${O(n^2 \log{n})}$ for dense graphs, then maximum likelihood estimation correctly recovers the PSNE with high probability. By using information-theoretic arguments, we show that if the number of samples is less than ${\Omega(k n \log^2{n})}$ for sparse graphs or ${\Omega(n^2 \log{n})}$ for dense graphs, then any conceivable method fails to recover the PSNE with arbitrary probability.
Tasks
Published	2016-01-27
URL	http://arxiv.org/abs/1601.07243v3
PDF	http://arxiv.org/pdf/1601.07243v3.pdf
PWC	https://paperswithcode.com/paper/on-the-sample-complexity-of-learning
Repo
Framework

Gender and Interest Targeting for Sponsored Post Advertising at Tumblr


Title	Gender and Interest Targeting for Sponsored Post Advertising at Tumblr
Authors	Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Ananth Nagarajan
Abstract	As one of the leading platforms for creative content, Tumblr offers advertisers a unique way of creating brand identity. Advertisers can tell their story through images, animation, text, music, video, and more, and promote that content by sponsoring it to appear as an advertisement in the streams of Tumblr users. In this paper we present a framework that enabled one of the key targeted advertising components for Tumblr, specifically gender and interest targeting. We describe the main challenges involved in development of the framework, which include creating the ground truth for training gender prediction models, as well as mapping Tumblr content to an interest taxonomy. For purposes of inferring user interests we propose a novel semi-supervised neural language model for categorization of Tumblr content (i.e., post tags and post keywords). The model was trained on a large-scale data set consisting of 6.8 billion user posts, with very limited amount of categorized keywords, and was shown to have superior performance over the bag-of-words model. We successfully deployed gender and interest targeting capability in Yahoo production systems, delivering inference for users that cover more than 90% of daily activities at Tumblr. Online performance results indicate advantages of the proposed approach, where we observed 20% lift in user engagement with sponsored posts as compared to untargeted campaigns.
Tasks	Gender Prediction, Language Modelling
Published	2016-06-23
URL	http://arxiv.org/abs/1606.07189v1
PDF	http://arxiv.org/pdf/1606.07189v1.pdf
PWC	https://paperswithcode.com/paper/gender-and-interest-targeting-for-sponsored
Repo
Framework

AMR-to-text generation as a Traveling Salesman Problem


Title	AMR-to-text generation as a Traveling Salesman Problem
Authors	Linfeng Song, Yue Zhang, Xiaochang Peng, Zhiguo Wang, Daniel Gildea
Abstract	The task of AMR-to-text generation is to generate grammatical text that sustains the semantic meaning for a given AMR graph. We at- tack the task by first partitioning the AMR graph into smaller fragments, and then generating the translation for each fragment, before finally deciding the order by solving an asymmetric generalized traveling salesman problem (AGTSP). A Maximum Entropy classifier is trained to estimate the traveling costs, and a TSP solver is used to find the optimized solution. The final model reports a BLEU score of 22.44 on the SemEval-2016 Task8 dataset.
Tasks	Text Generation
Published	2016-09-23
URL	http://arxiv.org/abs/1609.07451v1
PDF	http://arxiv.org/pdf/1609.07451v1.pdf
PWC	https://paperswithcode.com/paper/amr-to-text-generation-as-a-traveling
Repo
Framework