July 30, 2019

3276 words 16 mins read

Paper Group AWR 16

Paper Group AWR 16

Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions. Improving Text Proposals for Scene Images with Fully Convolutional Networks. Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs. Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation. Text Generation Ba …

Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions

Title Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
Authors Lex Fridman, Li Ding, Benedikt Jenik, Bryan Reimer
Abstract We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an “arguing machines” framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision.
Tasks Autonomous Driving, Image Classification
Published 2017-10-12
URL http://arxiv.org/abs/1710.04459v2
PDF http://arxiv.org/pdf/1710.04459v2.pdf
PWC https://paperswithcode.com/paper/arguing-machines-human-supervision-of-black
Repo https://github.com/scope-lab-vu/deep-nn-car
Framework tf

Improving Text Proposals for Scene Images with Fully Convolutional Networks

Title Improving Text Proposals for Scene Images with Fully Convolutional Networks
Authors Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov
Abstract Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.
Tasks Scene Text Recognition
Published 2017-02-16
URL http://arxiv.org/abs/1702.05089v1
PDF http://arxiv.org/pdf/1702.05089v1.pdf
PWC https://paperswithcode.com/paper/improving-text-proposals-for-scene-images
Repo https://github.com/gombru/TextFCN
Framework caffe2

Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs

Title Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs
Authors Alexey A. Novikov, Dimitrios Lenis, David Major, Jiri Hladůvka, Maria Wimmer, Katja Bühler
Abstract The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures for automated multi-class segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. We address several open challenges including model overfitting, reducing number of parameters and handling of severely imbalanced data in CXR by fusing recent concepts in convolutional networks and adapting them to the segmentation problem task in CXR. We demonstrate that our architecture combining delayed subsampling, exponential linear units, highly restrictive regularization and a large number of high resolution low level abstract features outperforms state-of-the-art methods on all considered organs, as well as the human observer on lungs and heart. The models use a multi-class configuration with three target classes and are trained and tested on the publicly available JSRT database, consisting of 247 X-ray images the ground-truth masks for which are available in the SCR database. Our best performing model, trained with the loss function based on the Dice coefficient, reached mean Jaccard overlap scores of 95.0% for lungs, 86.8% for clavicles and 88.2% for heart. This architecture outperformed the human observer results for lungs and heart.
Tasks Image Classification
Published 2017-01-30
URL http://arxiv.org/abs/1701.08816v4
PDF http://arxiv.org/pdf/1701.08816v4.pdf
PWC https://paperswithcode.com/paper/fully-convolutional-architectures-for-multi
Repo https://github.com/Diganta13/Image-segmentation-by-UNet-Algorithm
Framework none

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

Title Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Authors Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann
Abstract Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.
Tasks Word Sense Disambiguation
Published 2017-07-21
URL http://arxiv.org/abs/1707.06878v1
PDF http://arxiv.org/pdf/1707.06878v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-knowledge-free-and-interpretable
Repo https://github.com/uhh-lt/wsd
Framework none

Text Generation Based on Generative Adversarial Nets with Latent Variable

Title Text Generation Based on Generative Adversarial Nets with Latent Variable
Authors Heng Wang, Zengchang Qin, Tao Wan
Abstract In this paper, we propose a model using generative adversarial net (GAN) to generate realistic text. Instead of using standard GAN, we combine variational autoencoder (VAE) with generative adversarial net. The use of high-level latent random variables is helpful to learn the data distribution and solve the problem that generative adversarial net always emits the similar data. We propose the VGAN model where the generative model is composed of recurrent neural network and VAE. The discriminative model is a convolutional neural network. We train the model via policy gradient. We apply the proposed model to the task of text generation and compare it to other recent neural network based models, such as recurrent neural network language model and SeqGAN. We evaluate the performance of the model by calculating negative log-likelihood and the BLEU score. We conduct experiments on three benchmark datasets, and results show that our model outperforms other previous models.
Tasks Language Modelling, Text Generation
Published 2017-12-01
URL http://arxiv.org/abs/1712.00170v2
PDF http://arxiv.org/pdf/1712.00170v2.pdf
PWC https://paperswithcode.com/paper/text-generation-based-on-generative
Repo https://github.com/valko073/LyricsGANs
Framework tf

FALKON: An Optimal Large Scale Kernel Method

Title FALKON: An Optimal Large Scale Kernel Method
Authors Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco
Abstract Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic subsampling, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially $O(n)$ memory and $O(n\sqrt{n})$ time. An extensive experimental analysis on large scale datasets shows that, even with a single machine, FALKON outperforms previous state of the art solutions, which exploit parallel/distributed architectures.
Tasks
Published 2017-05-31
URL http://arxiv.org/abs/1705.10958v3
PDF http://arxiv.org/pdf/1705.10958v3.pdf
PWC https://paperswithcode.com/paper/falkon-an-optimal-large-scale-kernel-method
Repo https://github.com/LCSL/FALKON_paper
Framework none

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

Title Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Authors Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra
Abstract We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce ‘safe’ and generic responses (“I don’t know”, “I can’t tell”). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it cannot be deployed to have real conversations with users. Our work aims to achieve the best of both worlds – the practical usefulness of G and the strong performance of D – via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution – specifically, an RNN augmented with a sequence of GS samplers, coupled with the straight-through gradient estimator to enable end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67% on recall@10). The source code can be downloaded from https://github.com/jiasenlu/visDial.pytorch.
Tasks Metric Learning, Transfer Learning, Visual Dialog
Published 2017-06-05
URL http://arxiv.org/abs/1706.01554v2
PDF http://arxiv.org/pdf/1706.01554v2.pdf
PWC https://paperswithcode.com/paper/best-of-both-worlds-transferring-knowledge
Repo https://github.com/jiasenlu/visDial.pytorch
Framework pytorch

Automated Latent Fingerprint Recognition

Title Automated Latent Fingerprint Recognition
Authors Kai Cao, Anil K. Jain
Abstract Latent fingerprints are one of the most important and widely used evidence in law enforcement and forensic agencies worldwide. Yet, NIST evaluations show that the performance of state-of-the-art latent recognition systems is far from satisfactory. An automated latent fingerprint recognition system with high accuracy is essential to compare latents found at crime scenes to a large collection of reference prints to generate a candidate list of possible mates. In this paper, we propose an automated latent fingerprint recognition algorithm that utilizes Convolutional Neural Networks (ConvNets) for ridge flow estimation and minutiae descriptor extraction, and extract complementary templates (two minutiae templates and one texture template) to represent the latent. The comparison scores between the latent and a reference print based on the three templates are fused to retrieve a short candidate list from the reference database. Experimental results show that the rank-1 identification accuracies (query latent is matched with its true mate in the reference database) are 64.7% for the NIST SD27 and 75.3% for the WVU latent databases, against a reference database of 100K rolled prints. These results are the best among published papers on latent recognition and competitive with the performance (66.7% and 70.8% rank-1 accuracies on NIST SD27 and WVU DB, respectively) of a leading COTS latent Automated Fingerprint Identification System (AFIS). By score-level (rank-level) fusion of our system with the commercial off-the-shelf (COTS) latent AFIS, the overall rank-1 identification performance can be improved from 64.7% and 75.3% to 73.3% (74.4%) and 76.6% (78.4%) on NIST SD27 and WVU latent databases, respectively.
Tasks
Published 2017-04-06
URL http://arxiv.org/abs/1704.01925v1
PDF http://arxiv.org/pdf/1704.01925v1.pdf
PWC https://paperswithcode.com/paper/automated-latent-fingerprint-recognition
Repo https://github.com/prip-lab/MSU-LatentAFIS
Framework pytorch

Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

Title Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction
Authors Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas
Abstract With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet’s accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties.
Tasks Speech Recognition, Transfer Learning
Published 2017-12-07
URL http://arxiv.org/abs/1712.02734v2
PDF http://arxiv.org/pdf/1712.02734v2.pdf
PWC https://paperswithcode.com/paper/using-rule-based-labels-for-weak-supervised
Repo https://github.com/Yindong-Zhang/GraphConvolutionDrugTargetInteration
Framework tf

Dual-Path Convolutional Image-Text Embedding with Instance Loss

Title Dual-Path Convolutional Image-Text Embedding with Instance Loss
Authors Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Yi-Dong Shen
Abstract Matching images and sentences demands a fine understanding of both modalities. In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image / text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss is hard for network learning, since it starts from the two heterogeneous features to build inter-modal relationship. To address this problem, we propose the instance loss which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image / text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this paper constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.
Tasks Content-Based Image Retrieval, Cross-Modal Retrieval, Person Retrieval, Texture Image Retrieval
Published 2017-11-15
URL http://arxiv.org/abs/1711.05535v3
PDF http://arxiv.org/pdf/1711.05535v3.pdf
PWC https://paperswithcode.com/paper/dual-path-convolutional-image-text-embedding
Repo https://github.com/pshroff04/Dual_Path_CNN
Framework pytorch

DisSent: Sentence Representation Learning from Explicit Discourse Relations

Title DisSent: Sentence Representation Learning from Explicit Discourse Relations
Authors Allen Nie, Erin D. Bennett, Noah D. Goodman
Abstract Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that our curated dataset provides an excellent signal for learning vector representations of sentence meaning, representing relations that can only be determined when the meanings of two sentences are combined. We demonstrate that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. Our fixed sentence embeddings achieve high performance on a variety of transfer tasks, including SentEval, and we achieve state-of-the-art results on Penn Discourse Treebank’s implicit relation prediction task.
Tasks Dependency Parsing, Representation Learning, Sentence Embeddings
Published 2017-10-12
URL https://arxiv.org/abs/1710.04334v4
PDF https://arxiv.org/pdf/1710.04334v4.pdf
PWC https://paperswithcode.com/paper/dissent-sentence-representation-learning-from
Repo https://github.com/facebookresearch/InferSent
Framework pytorch

HashNet: Deep Learning to Hash by Continuation

Title HashNet: Deep Learning to Hash by Continuation
Authors Zhangjie Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu
Abstract Learning to hash has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by end-to-end representation learning and hash encoding, has received increasing attention recently. Subject to the ill-posed gradient difficulty in the optimization with sign activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This work presents HashNet, a novel deep architecture for deep learning to hash by continuation method with convergence guarantees, which learns exactly binary hash codes from imbalanced similarity data. The key idea is to attack the ill-posed gradient problem in optimizing deep networks with non-smooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield state-of-the-art multimedia retrieval performance on standard benchmarks.
Tasks Representation Learning
Published 2017-02-02
URL http://arxiv.org/abs/1702.00758v4
PDF http://arxiv.org/pdf/1702.00758v4.pdf
PWC https://paperswithcode.com/paper/hashnet-deep-learning-to-hash-by-continuation
Repo https://github.com/thuml/HashNet
Framework pytorch

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

Title Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Authors Robert M. Gower, Nicolas Le Roux, Francis Bach
Abstract Our goal is to improve variance reducing stochastic methods through better control variates. We first propose a modification of SVRG which uses the Hessian to track gradients over time, rather than to recondition, increasing the correlation of the control variates and leading to faster theoretical convergence close to the optimum. We then propose accurate and computationally efficient approximations to the Hessian, both using a diagonal and a low-rank matrix. Finally, we demonstrate the effectiveness of our method on a wide range of problems.
Tasks
Published 2017-10-20
URL http://arxiv.org/abs/1710.07462v3
PDF http://arxiv.org/pdf/1710.07462v3.pdf
PWC https://paperswithcode.com/paper/tracking-the-gradients-using-the-hessian-a
Repo https://github.com/gowerrobert/StochOpt
Framework none

Translating Neuralese

Title Translating Neuralese
Authors Jacob Andreas, Anca Dragan, Dan Klein
Abstract Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel. While these policies are effective for many tasks, interpretation of their induced communication strategies has remained a challenge. Here we propose to interpret agents’ messages by translating them. Unlike in typical machine translation problems, we have no parallel data to learn from. Instead we develop a translation model based on the insight that agent messages and natural language strings mean the same thing if they induce the same belief about the world in a listener. We present theoretical guarantees and empirical evidence that our approach preserves both the semantics and pragmatics of messages by ensuring that players communicating through a translation layer do not suffer a substantial loss in reward relative to players with a common language.
Tasks Machine Translation
Published 2017-04-23
URL http://arxiv.org/abs/1704.06960v5
PDF http://arxiv.org/pdf/1704.06960v5.pdf
PWC https://paperswithcode.com/paper/translating-neuralese
Repo https://github.com/jacobandreas/neuralese
Framework tf

Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted Illumination

Title Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted Illumination
Authors Adam Gaier, Alexander Asteroth, Jean-Baptiste Mouret
Abstract The MAP-Elites algorithm produces a set of high-performing solutions that vary according to features defined by the user. This technique has the potential to be a powerful tool for design space exploration, but is limited by the need for numerous evaluations. The Surrogate-Assisted Illumination algorithm (SAIL), introduced here, integrates approximative models and intelligent sampling of the objective function to minimize the number of evaluations required by MAP-Elites. The ability of SAIL to efficiently produce both accurate models and diverse high performing solutions is illustrated on a 2D airfoil design problem. The search space is divided into bins, each holding a design with a different combination of features. In each bin SAIL produces a better performing solution than MAP-Elites, and requires several orders of magnitude fewer evaluations. The CMA-ES algorithm was used to produce an optimal design in each bin: with the same number of evaluations required by CMA-ES to find a near-optimal solution in a single bin, SAIL finds solutions of similar quality in every bin.
Tasks Efficient Exploration
Published 2017-02-13
URL http://arxiv.org/abs/1702.03713v2
PDF http://arxiv.org/pdf/1702.03713v2.pdf
PWC https://paperswithcode.com/paper/data-efficient-exploration-optimization-and
Repo https://github.com/DanieleGravina/divergence-and-quality-diversity
Framework none
comments powered by Disqus