Paper Group ANR 529
Using Artificial Intelligence to Identify State Secrets. Hierarchical Question Answering for Long Documents. Encoding Data for HTM Systems. Improving abcdSAT by At-Least-One Recently Used Clause Management Strategy. Kalman’s shrinkage for wavelet-based despeckling of SAR images. Generalising the Discriminative Restricted Boltzmann Machine. Automati …
Using Artificial Intelligence to Identify State Secrets
Title | Using Artificial Intelligence to Identify State Secrets |
Authors | Renato Rocha Souza, Flavio Codeco Coelho, Rohan Shah, Matthew Connelly |
Abstract | Whether officials can be trusted to protect national security information has become a matter of great public controversy, reigniting a long-standing debate about the scope and nature of official secrecy. The declassification of millions of electronic records has made it possible to analyze these issues with greater rigor and precision. Using machine-learning methods, we examined nearly a million State Department cables from the 1970s to identify features of records that are more likely to be classified, such as international negotiations, military operations, and high-level communications. Even with incomplete data, algorithms can use such features to identify 90% of classified cables with <11% false positives. But our results also show that there are longstanding problems in the identification of sensitive information. Error analysis reveals many examples of both overclassification and underclassification. This indicates both the need for research on inter-coder reliability among officials as to what constitutes classified material and the opportunity to develop recommender systems to better manage both classification and declassification. |
Tasks | Recommendation Systems |
Published | 2016-11-01 |
URL | http://arxiv.org/abs/1611.00356v1 |
http://arxiv.org/pdf/1611.00356v1.pdf | |
PWC | https://paperswithcode.com/paper/using-artificial-intelligence-to-identify |
Repo | |
Framework | |
Hierarchical Question Answering for Long Documents
Title | Hierarchical Question Answering for Long Documents |
Authors | Eunsol Choi, Daniel Hewlett, Alexandre Lacoste, Illia Polosukhin, Jakob Uszkoreit, Jonathan Berant |
Abstract | We present a framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models. While most successful approaches for reading comprehension rely on recurrent neural networks (RNNs), running them over long documents is prohibitively slow because it is difficult to parallelize over sequences. Inspired by how people first skim the document, identify relevant parts, and carefully read these parts to produce an answer, we combine a coarse, fast model for selecting relevant sentences and a more expensive RNN for producing the answer from those sentences. We treat sentence selection as a latent variable trained jointly from the answer only using reinforcement learning. Experiments demonstrate the state of the art performance on a challenging subset of the Wikireading and on a new dataset, while speeding up the model by 3.5x-6.7x. |
Tasks | Question Answering, Reading Comprehension |
Published | 2016-11-06 |
URL | http://arxiv.org/abs/1611.01839v2 |
http://arxiv.org/pdf/1611.01839v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-question-answering-for-long |
Repo | |
Framework | |
Encoding Data for HTM Systems
Title | Encoding Data for HTM Systems |
Authors | Scott Purdy |
Abstract | Hierarchical Temporal Memory (HTM) is a biologically inspired machine intelligence technology that mimics the architecture and processes of the neocortex. In this white paper we describe how to encode data as Sparse Distributed Representations (SDRs) for use in HTM systems. We explain several existing encoders, which are available through the open source project called NuPIC, and we discuss requirements for creating encoders for new types of data. |
Tasks | |
Published | 2016-02-18 |
URL | http://arxiv.org/abs/1602.05925v1 |
http://arxiv.org/pdf/1602.05925v1.pdf | |
PWC | https://paperswithcode.com/paper/encoding-data-for-htm-systems |
Repo | |
Framework | |
Improving abcdSAT by At-Least-One Recently Used Clause Management Strategy
Title | Improving abcdSAT by At-Least-One Recently Used Clause Management Strategy |
Authors | Jingchao Chen |
Abstract | We improve further the 2015 version of abcdSAT by various heuristics such as at-least-one recently used strategy, learnt clause database approximation reduction etc. Based on the requirement of different tracks at the SAT Competition 2016, we develop three versions of abcdSAT: drup, inc and lim, which participate in the competition of main (agile), incremental library and no-limit track, respectively. |
Tasks | |
Published | 2016-05-05 |
URL | http://arxiv.org/abs/1605.01622v1 |
http://arxiv.org/pdf/1605.01622v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-abcdsat-by-at-least-one-recently |
Repo | |
Framework | |
Kalman’s shrinkage for wavelet-based despeckling of SAR images
Title | Kalman’s shrinkage for wavelet-based despeckling of SAR images |
Authors | Mario Mastriani, Alberto E. Giraldez |
Abstract | In this paper, a new probability density function (pdf) is proposed to model the statistics of wavelet coefficients, and a simple Kalman’s filter is derived from the new pdf using Bayesian estimation theory. Specifically, we decompose the speckled image into wavelet subbands, we apply the Kalman’s filter to the high subbands, and reconstruct a despeckled image from the modified detail coefficients. Experimental results demonstrate that our method compares favorably to several other despeckling methods on test synthetic aperture radar (SAR) images. |
Tasks | |
Published | 2016-07-31 |
URL | http://arxiv.org/abs/1608.00273v1 |
http://arxiv.org/pdf/1608.00273v1.pdf | |
PWC | https://paperswithcode.com/paper/kalmans-shrinkage-for-wavelet-based |
Repo | |
Framework | |
Generalising the Discriminative Restricted Boltzmann Machine
Title | Generalising the Discriminative Restricted Boltzmann Machine |
Authors | Srikanth Cherla, Son N Tran, Tillman Weyde, Artur d’Avila Garcez |
Abstract | We present a novel theoretical result that generalises the Discriminative Restricted Boltzmann Machine (DRBM). While originally the DRBM was defined assuming the {0, 1}-Bernoulli distribution in each of its hidden units, this result makes it possible to derive cost functions for variants of the DRBM that utilise other distributions, including some that are often encountered in the literature. This is illustrated with the Binomial and {-1, +1}-Bernoulli distributions here. We evaluate these two DRBM variants and compare them with the original one on three benchmark datasets, namely the MNIST and USPS digit classification datasets, and the 20 Newsgroups document classification dataset. Results show that each of the three compared models outperforms the remaining two in one of the three datasets, thus indicating that the proposed theoretical generalisation of the DRBM may be valuable in practice. |
Tasks | Document Classification |
Published | 2016-04-06 |
URL | http://arxiv.org/abs/1604.01806v1 |
http://arxiv.org/pdf/1604.01806v1.pdf | |
PWC | https://paperswithcode.com/paper/generalising-the-discriminative-restricted |
Repo | |
Framework | |
Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection
Title | Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection |
Authors | Maurilio Di Cicco, Ciro Potena, Giorgio Grisetti, Alberto Pretto |
Abstract | Selective weeding is one of the key challenges in the field of agriculture robotics. To accomplish this task, a farm robot should be able to accurately detect plants and to distinguish them between crop and weeds. Most of the promising state-of-the-art approaches make use of appearance-based models trained on large annotated datasets. Unfortunately, creating large agricultural datasets with pixel-level annotations is an extremely time consuming task, actually penalizing the usage of data-driven techniques. In this paper, we face this problem by proposing a novel and effective approach that aims to dramatically minimize the human intervention needed to train the detection and classification algorithms. The idea is to procedurally generate large synthetic training datasets randomizing the key features of the target environment (i.e., crop and weed species, type of soil, light conditions). More specifically, by tuning these model parameters, and exploiting a few real-world textures, it is possible to render a large amount of realistic views of an artificial agricultural scenario with no effort. The generated data can be directly used to train the model or to supplement real-world images. We validate the proposed methodology by using as testbed a modern deep learning based image segmentation architecture. We compare the classification results obtained using both real and synthetic images as training data. The reported results confirm the effectiveness and the potentiality of our approach. |
Tasks | Semantic Segmentation |
Published | 2016-12-09 |
URL | http://arxiv.org/abs/1612.03019v3 |
http://arxiv.org/pdf/1612.03019v3.pdf | |
PWC | https://paperswithcode.com/paper/automatic-model-based-dataset-generation-for |
Repo | |
Framework | |
Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves
Title | Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves |
Authors | Fei Tian, Bin Gao, Di He, Tie-Yan Liu |
Abstract | We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence. Different from conventional topic models that largely ignore the sequential order of words or their topic coherence, SLRTM gives full characterization to them by using a Recurrent Neural Networks (RNN) based framework. Experimental results have shown that SLRTM outperforms several strong baselines on various tasks. Furthermore, SLRTM can automatically generate sentences given a topic (i.e., topics to sentences), which is a key technology for real world applications such as personalized short text conversation. |
Tasks | Short-Text Conversation, Topic Models |
Published | 2016-04-07 |
URL | http://arxiv.org/abs/1604.02038v2 |
http://arxiv.org/pdf/1604.02038v2.pdf | |
PWC | https://paperswithcode.com/paper/sentence-level-recurrent-topic-model-letting |
Repo | |
Framework | |
Using Spatial Pooler of Hierarchical Temporal Memory to classify noisy videos with predefined complexity
Title | Using Spatial Pooler of Hierarchical Temporal Memory to classify noisy videos with predefined complexity |
Authors | Maciej Wielgosz, Marcin Pietroń |
Abstract | This paper examines the performance of a Spatial Pooler (SP) of a Hierarchical Temporal Memory (HTM) in the task of noisy object recognition. To address this challenge, a dedicated custom-designed system based on the SP, histogram calculation module and SVM classifier was implemented. In addition to implementing their own version of HTM, the authors also designed a profiler which is capable of tracing all of the key parameters of the system. This was necessary, since an analysis and monitoring of the system performance turned out to be extremely difficult using conventional testing and debugging tools. The system was initially trained on artificially prepared videos without noise and then tested with a set of noisy video streams. This approach was intended to mimic a real life scenario where an agent or a system trained to deal with ideal objects faces a task of classifying distorted and noisy ones in its regular working conditions. The authors conducted a series of experiments for various macro parameters of HTM SP, as well as for different levels of video reduction ratios. The experiments allowed them to evaluate the performance of two different system setups (i.e. ‘Multiple HTMs’ and ‘Single HTM’) under various noise conditions with 32–frame video files. Results of all the tests were compared to SVM baseline setup. It was determined that the system featuring SP is capable of achieving approximately 12 times the noise reduction for a video signal with with distorted bits accounting for 13% of the total. Furthermore, the system featuring SP performed better also in the experiments without a noise component and achieved a max F1 score of 0.96. The experiments also revealed that a rise of column and synapse number of SP has a substantial impact on the performance of the system. Consequently, the highest F1 score values were obtained for 256 and 4096 synapses and columns respectively. |
Tasks | Object Recognition |
Published | 2016-09-10 |
URL | http://arxiv.org/abs/1609.03093v2 |
http://arxiv.org/pdf/1609.03093v2.pdf | |
PWC | https://paperswithcode.com/paper/using-spatial-pooler-of-hierarchical-temporal |
Repo | |
Framework | |
Feature Descriptors for Tracking by Detection: a Benchmark
Title | Feature Descriptors for Tracking by Detection: a Benchmark |
Authors | Alessandro Pieropan, Mårten Björkman, Niklas Bergström, Danica Kragic |
Abstract | In this paper, we provide an extensive evaluation of the performance of local descriptors for tracking applications. Many different descriptors have been proposed in the literature for a wide range of application in computer vision such as object recognition and 3D reconstruction. More recently, due to fast key-point detectors, local image features can be used in online tracking frameworks. However, while much effort has been spent on evaluating their performance in terms of distinctiveness and robustness to image transformations, very little has been done in the contest of tracking. Our evaluation is performed in terms of distinctiveness, tracking precision and tracking speed. Our results show that binary descriptors like ORB or BRISK have comparable results to SIFT or AKAZE due to a higher number of key-points. |
Tasks | 3D Reconstruction, Object Recognition |
Published | 2016-07-21 |
URL | http://arxiv.org/abs/1607.06178v1 |
http://arxiv.org/pdf/1607.06178v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-descriptors-for-tracking-by-detection |
Repo | |
Framework | |
Reproducible Pattern Recognition Research: The Case of Optimistic SSL
Title | Reproducible Pattern Recognition Research: The Case of Optimistic SSL |
Authors | Jesse H. Krijthe, Marco Loog |
Abstract | In this paper, we discuss the approaches we took and trade-offs involved in making a paper on a conceptual topic in pattern recognition research fully reproducible. We discuss our definition of reproducibility, the tools used, how the analysis was set up, show some examples of alternative analyses the code enables and discuss our views on reproducibility. |
Tasks | |
Published | 2016-12-27 |
URL | http://arxiv.org/abs/1612.08650v1 |
http://arxiv.org/pdf/1612.08650v1.pdf | |
PWC | https://paperswithcode.com/paper/reproducible-pattern-recognition-research-the |
Repo | |
Framework | |
Discrete Distribution Estimation under Local Privacy
Title | Discrete Distribution Estimation under Local Privacy |
Authors | Peter Kairouz, Keith Bonawitz, Daniel Ramage |
Abstract | The collection and analysis of user data drives improvements in the app and web ecosystems, but comes with risks to privacy. This paper examines discrete distribution estimation under local privacy, a setting wherein service providers can learn the distribution of a categorical statistic of interest without collecting the underlying data. We present new mechanisms, including hashed K-ary Randomized Response (KRR), that empirically meet or exceed the utility of existing mechanisms at all privacy levels. New theoretical results demonstrate the order-optimality of KRR and the existing RAPPOR mechanism at different privacy regimes. |
Tasks | |
Published | 2016-02-24 |
URL | http://arxiv.org/abs/1602.07387v3 |
http://arxiv.org/pdf/1602.07387v3.pdf | |
PWC | https://paperswithcode.com/paper/discrete-distribution-estimation-under-local |
Repo | |
Framework | |
On the Sample Complexity of Learning Graphical Games
Title | On the Sample Complexity of Learning Graphical Games |
Authors | Jean Honorio |
Abstract | We analyze the sample complexity of learning graphical games from purely behavioral data. We assume that we can only observe the players’ joint actions and not their payoffs. We analyze the sufficient and necessary number of samples for the correct recovery of the set of pure-strategy Nash equilibria (PSNE) of the true game. Our analysis focuses on directed graphs with $n$ nodes and at most $k$ parents per node. Sparse graphs correspond to ${k \in O(1)}$ with respect to $n$, while dense graphs correspond to ${k \in O(n)}$. By using VC dimension arguments, we show that if the number of samples is greater than ${O(k n \log^2{n})}$ for sparse graphs or ${O(n^2 \log{n})}$ for dense graphs, then maximum likelihood estimation correctly recovers the PSNE with high probability. By using information-theoretic arguments, we show that if the number of samples is less than ${\Omega(k n \log^2{n})}$ for sparse graphs or ${\Omega(n^2 \log{n})}$ for dense graphs, then any conceivable method fails to recover the PSNE with arbitrary probability. |
Tasks | |
Published | 2016-01-27 |
URL | http://arxiv.org/abs/1601.07243v3 |
http://arxiv.org/pdf/1601.07243v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-sample-complexity-of-learning |
Repo | |
Framework | |
Gender and Interest Targeting for Sponsored Post Advertising at Tumblr
Title | Gender and Interest Targeting for Sponsored Post Advertising at Tumblr |
Authors | Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Ananth Nagarajan |
Abstract | As one of the leading platforms for creative content, Tumblr offers advertisers a unique way of creating brand identity. Advertisers can tell their story through images, animation, text, music, video, and more, and promote that content by sponsoring it to appear as an advertisement in the streams of Tumblr users. In this paper we present a framework that enabled one of the key targeted advertising components for Tumblr, specifically gender and interest targeting. We describe the main challenges involved in development of the framework, which include creating the ground truth for training gender prediction models, as well as mapping Tumblr content to an interest taxonomy. For purposes of inferring user interests we propose a novel semi-supervised neural language model for categorization of Tumblr content (i.e., post tags and post keywords). The model was trained on a large-scale data set consisting of 6.8 billion user posts, with very limited amount of categorized keywords, and was shown to have superior performance over the bag-of-words model. We successfully deployed gender and interest targeting capability in Yahoo production systems, delivering inference for users that cover more than 90% of daily activities at Tumblr. Online performance results indicate advantages of the proposed approach, where we observed 20% lift in user engagement with sponsored posts as compared to untargeted campaigns. |
Tasks | Gender Prediction, Language Modelling |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07189v1 |
http://arxiv.org/pdf/1606.07189v1.pdf | |
PWC | https://paperswithcode.com/paper/gender-and-interest-targeting-for-sponsored |
Repo | |
Framework | |
AMR-to-text generation as a Traveling Salesman Problem
Title | AMR-to-text generation as a Traveling Salesman Problem |
Authors | Linfeng Song, Yue Zhang, Xiaochang Peng, Zhiguo Wang, Daniel Gildea |
Abstract | The task of AMR-to-text generation is to generate grammatical text that sustains the semantic meaning for a given AMR graph. We at- tack the task by first partitioning the AMR graph into smaller fragments, and then generating the translation for each fragment, before finally deciding the order by solving an asymmetric generalized traveling salesman problem (AGTSP). A Maximum Entropy classifier is trained to estimate the traveling costs, and a TSP solver is used to find the optimized solution. The final model reports a BLEU score of 22.44 on the SemEval-2016 Task8 dataset. |
Tasks | Text Generation |
Published | 2016-09-23 |
URL | http://arxiv.org/abs/1609.07451v1 |
http://arxiv.org/pdf/1609.07451v1.pdf | |
PWC | https://paperswithcode.com/paper/amr-to-text-generation-as-a-traveling |
Repo | |
Framework | |