July 30, 2019

3276 words 16 mins read

Paper Group AWR 16

Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions. Improving Text Proposals for Scene Images with Fully Convolutional Networks. Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs. Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation. Text Generation Ba …

Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions


Title	Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
Authors	Lex Fridman, Li Ding, Benedikt Jenik, Bryan Reimer
Abstract	We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an “arguing machines” framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision.
Tasks	Autonomous Driving, Image Classification
Published	2017-10-12
URL	http://arxiv.org/abs/1710.04459v2
PDF	http://arxiv.org/pdf/1710.04459v2.pdf
PWC	https://paperswithcode.com/paper/arguing-machines-human-supervision-of-black
Repo	https://github.com/scope-lab-vu/deep-nn-car
Framework	tf

Improving Text Proposals for Scene Images with Fully Convolutional Networks


Title	Improving Text Proposals for Scene Images with Fully Convolutional Networks
Authors	Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov
Abstract	Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.
Tasks	Scene Text Recognition
Published	2017-02-16
URL	http://arxiv.org/abs/1702.05089v1
PDF	http://arxiv.org/pdf/1702.05089v1.pdf
PWC	https://paperswithcode.com/paper/improving-text-proposals-for-scene-images
Repo	https://github.com/gombru/TextFCN
Framework	caffe2

Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs


Title	Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs
Authors	Alexey A. Novikov, Dimitrios Lenis, David Major, Jiri Hladůvka, Maria Wimmer, Katja Bühler
Abstract	The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures for automated multi-class segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. We address several open challenges including model overfitting, reducing number of parameters and handling of severely imbalanced data in CXR by fusing recent concepts in convolutional networks and adapting them to the segmentation problem task in CXR. We demonstrate that our architecture combining delayed subsampling, exponential linear units, highly restrictive regularization and a large number of high resolution low level abstract features outperforms state-of-the-art methods on all considered organs, as well as the human observer on lungs and heart. The models use a multi-class configuration with three target classes and are trained and tested on the publicly available JSRT database, consisting of 247 X-ray images the ground-truth masks for which are available in the SCR database. Our best performing model, trained with the loss function based on the Dice coefficient, reached mean Jaccard overlap scores of 95.0% for lungs, 86.8% for clavicles and 88.2% for heart. This architecture outperformed the human observer results for lungs and heart.
Tasks	Image Classification
Published	2017-01-30
URL	http://arxiv.org/abs/1701.08816v4
PDF	http://arxiv.org/pdf/1701.08816v4.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-architectures-for-multi
Repo	https://github.com/Diganta13/Image-segmentation-by-UNet-Algorithm
Framework	none

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation


Title	Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Authors	Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann
Abstract	Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.
Tasks	Word Sense Disambiguation
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06878v1
PDF	http://arxiv.org/pdf/1707.06878v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-knowledge-free-and-interpretable
Repo	https://github.com/uhh-lt/wsd
Framework	none

Text Generation Based on Generative Adversarial Nets with Latent Variable


Title	Text Generation Based on Generative Adversarial Nets with Latent Variable
Authors	Heng Wang, Zengchang Qin, Tao Wan
Abstract	In this paper, we propose a model using generative adversarial net (GAN) to generate realistic text. Instead of using standard GAN, we combine variational autoencoder (VAE) with generative adversarial net. The use of high-level latent random variables is helpful to learn the data distribution and solve the problem that generative adversarial net always emits the similar data. We propose the VGAN model where the generative model is composed of recurrent neural network and VAE. The discriminative model is a convolutional neural network. We train the model via policy gradient. We apply the proposed model to the task of text generation and compare it to other recent neural network based models, such as recurrent neural network language model and SeqGAN. We evaluate the performance of the model by calculating negative log-likelihood and the BLEU score. We conduct experiments on three benchmark datasets, and results show that our model outperforms other previous models.
Tasks	Language Modelling, Text Generation
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00170v2
PDF	http://arxiv.org/pdf/1712.00170v2.pdf
PWC	https://paperswithcode.com/paper/text-generation-based-on-generative
Repo	https://github.com/valko073/LyricsGANs
Framework	tf

FALKON: An Optimal Large Scale Kernel Method


Title	FALKON: An Optimal Large Scale Kernel Method
Authors	Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco
Abstract	Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic subsampling, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially $O(n)$ memory and $O(n\sqrt{n})$ time. An extensive experimental analysis on large scale datasets shows that, even with a single machine, FALKON outperforms previous state of the art solutions, which exploit parallel/distributed architectures.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1705.10958v3
PDF	http://arxiv.org/pdf/1705.10958v3.pdf
PWC	https://paperswithcode.com/paper/falkon-an-optimal-large-scale-kernel-method
Repo	https://github.com/LCSL/FALKON_paper
Framework	none

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model


Title	Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Authors	Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra
Abstract	We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce ‘safe’ and generic responses (“I don’t know”, “I can’t tell”). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it cannot be deployed to have real conversations with users. Our work aims to achieve the best of both worlds – the practical usefulness of G and the strong performance of D – via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution – specifically, an RNN augmented with a sequence of GS samplers, coupled with the straight-through gradient estimator to enable end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67% on recall@10). The source code can be downloaded from https://github.com/jiasenlu/visDial.pytorch.
Tasks	Metric Learning, Transfer Learning, Visual Dialog
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01554v2
PDF	http://arxiv.org/pdf/1706.01554v2.pdf
PWC	https://paperswithcode.com/paper/best-of-both-worlds-transferring-knowledge
Repo	https://github.com/jiasenlu/visDial.pytorch
Framework	pytorch

Automated Latent Fingerprint Recognition


Title	Automated Latent Fingerprint Recognition
Authors	Kai Cao, Anil K. Jain
Abstract	Latent fingerprints are one of the most important and widely used evidence in law enforcement and forensic agencies worldwide. Yet, NIST evaluations show that the performance of state-of-the-art latent recognition systems is far from satisfactory. An automated latent fingerprint recognition system with high accuracy is essential to compare latents found at crime scenes to a large collection of reference prints to generate a candidate list of possible mates. In this paper, we propose an automated latent fingerprint recognition algorithm that utilizes Convolutional Neural Networks (ConvNets) for ridge flow estimation and minutiae descriptor extraction, and extract complementary templates (two minutiae templates and one texture template) to represent the latent. The comparison scores between the latent and a reference print based on the three templates are fused to retrieve a short candidate list from the reference database. Experimental results show that the rank-1 identification accuracies (query latent is matched with its true mate in the reference database) are 64.7% for the NIST SD27 and 75.3% for the WVU latent databases, against a reference database of 100K rolled prints. These results are the best among published papers on latent recognition and competitive with the performance (66.7% and 70.8% rank-1 accuracies on NIST SD27 and WVU DB, respectively) of a leading COTS latent Automated Fingerprint Identification System (AFIS). By score-level (rank-level) fusion of our system with the commercial off-the-shelf (COTS) latent AFIS, the overall rank-1 identification performance can be improved from 64.7% and 75.3% to 73.3% (74.4%) and 76.6% (78.4%) on NIST SD27 and WVU latent databases, respectively.
Tasks
Published	2017-04-06
URL	http://arxiv.org/abs/1704.01925v1
PDF	http://arxiv.org/pdf/1704.01925v1.pdf
PWC	https://paperswithcode.com/paper/automated-latent-fingerprint-recognition
Repo	https://github.com/prip-lab/MSU-LatentAFIS
Framework	pytorch

Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction


Title	Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction
Authors	Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas
Abstract	With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet’s accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties.
Tasks	Speech Recognition, Transfer Learning
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02734v2
PDF	http://arxiv.org/pdf/1712.02734v2.pdf
PWC	https://paperswithcode.com/paper/using-rule-based-labels-for-weak-supervised
Repo	https://github.com/Yindong-Zhang/GraphConvolutionDrugTargetInteration
Framework	tf

Dual-Path Convolutional Image-Text Embedding with Instance Loss


Title	Dual-Path Convolutional Image-Text Embedding with Instance Loss
Authors	Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Yi-Dong Shen
Abstract	Matching images and sentences demands a fine understanding of both modalities. In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image / text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss is hard for network learning, since it starts from the two heterogeneous features to build inter-modal relationship. To address this problem, we propose the instance loss which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image / text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this paper constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.
Tasks	Content-Based Image Retrieval, Cross-Modal Retrieval, Person Retrieval, Texture Image Retrieval
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05535v3
PDF	http://arxiv.org/pdf/1711.05535v3.pdf
PWC	https://paperswithcode.com/paper/dual-path-convolutional-image-text-embedding
Repo	https://github.com/pshroff04/Dual_Path_CNN
Framework	pytorch

DisSent: Sentence Representation Learning from Explicit Discourse Relations


Title	DisSent: Sentence Representation Learning from Explicit Discourse Relations
Authors	Allen Nie, Erin D. Bennett, Noah D. Goodman
Abstract	Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that our curated dataset provides an excellent signal for learning vector representations of sentence meaning, representing relations that can only be determined when the meanings of two sentences are combined. We demonstrate that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. Our fixed sentence embeddings achieve high performance on a variety of transfer tasks, including SentEval, and we achieve state-of-the-art results on Penn Discourse Treebank’s implicit relation prediction task.
Tasks	Dependency Parsing, Representation Learning, Sentence Embeddings
Published	2017-10-12
URL	https://arxiv.org/abs/1710.04334v4
PDF	https://arxiv.org/pdf/1710.04334v4.pdf
PWC	https://paperswithcode.com/paper/dissent-sentence-representation-learning-from
Repo	https://github.com/facebookresearch/InferSent
Framework	pytorch

HashNet: Deep Learning to Hash by Continuation


Title	HashNet: Deep Learning to Hash by Continuation
Authors	Zhangjie Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu
Abstract	Learning to hash has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by end-to-end representation learning and hash encoding, has received increasing attention recently. Subject to the ill-posed gradient difficulty in the optimization with sign activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This work presents HashNet, a novel deep architecture for deep learning to hash by continuation method with convergence guarantees, which learns exactly binary hash codes from imbalanced similarity data. The key idea is to attack the ill-posed gradient problem in optimizing deep networks with non-smooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield state-of-the-art multimedia retrieval performance on standard benchmarks.
Tasks	Representation Learning
Published	2017-02-02
URL	http://arxiv.org/abs/1702.00758v4
PDF	http://arxiv.org/pdf/1702.00758v4.pdf
PWC	https://paperswithcode.com/paper/hashnet-deep-learning-to-hash-by-continuation
Repo	https://github.com/thuml/HashNet
Framework	pytorch

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods


Title	Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Authors	Robert M. Gower, Nicolas Le Roux, Francis Bach
Abstract	Our goal is to improve variance reducing stochastic methods through better control variates. We first propose a modification of SVRG which uses the Hessian to track gradients over time, rather than to recondition, increasing the correlation of the control variates and leading to faster theoretical convergence close to the optimum. We then propose accurate and computationally efficient approximations to the Hessian, both using a diagonal and a low-rank matrix. Finally, we demonstrate the effectiveness of our method on a wide range of problems.
Tasks
Published	2017-10-20
URL	http://arxiv.org/abs/1710.07462v3
PDF	http://arxiv.org/pdf/1710.07462v3.pdf
PWC	https://paperswithcode.com/paper/tracking-the-gradients-using-the-hessian-a
Repo	https://github.com/gowerrobert/StochOpt
Framework	none

Translating Neuralese


Title	Translating Neuralese
Authors	Jacob Andreas, Anca Dragan, Dan Klein
Abstract	Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel. While these policies are effective for many tasks, interpretation of their induced communication strategies has remained a challenge. Here we propose to interpret agents’ messages by translating them. Unlike in typical machine translation problems, we have no parallel data to learn from. Instead we develop a translation model based on the insight that agent messages and natural language strings mean the same thing if they induce the same belief about the world in a listener. We present theoretical guarantees and empirical evidence that our approach preserves both the semantics and pragmatics of messages by ensuring that players communicating through a translation layer do not suffer a substantial loss in reward relative to players with a common language.
Tasks	Machine Translation
Published	2017-04-23
URL	http://arxiv.org/abs/1704.06960v5
PDF	http://arxiv.org/pdf/1704.06960v5.pdf
PWC	https://paperswithcode.com/paper/translating-neuralese
Repo	https://github.com/jacobandreas/neuralese
Framework	tf

Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted Illumination


Title	Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted Illumination
Authors	Adam Gaier, Alexander Asteroth, Jean-Baptiste Mouret
Abstract	The MAP-Elites algorithm produces a set of high-performing solutions that vary according to features defined by the user. This technique has the potential to be a powerful tool for design space exploration, but is limited by the need for numerous evaluations. The Surrogate-Assisted Illumination algorithm (SAIL), introduced here, integrates approximative models and intelligent sampling of the objective function to minimize the number of evaluations required by MAP-Elites. The ability of SAIL to efficiently produce both accurate models and diverse high performing solutions is illustrated on a 2D airfoil design problem. The search space is divided into bins, each holding a design with a different combination of features. In each bin SAIL produces a better performing solution than MAP-Elites, and requires several orders of magnitude fewer evaluations. The CMA-ES algorithm was used to produce an optimal design in each bin: with the same number of evaluations required by CMA-ES to find a near-optimal solution in a single bin, SAIL finds solutions of similar quality in every bin.
Tasks	Efficient Exploration
Published	2017-02-13
URL	http://arxiv.org/abs/1702.03713v2
PDF	http://arxiv.org/pdf/1702.03713v2.pdf
PWC	https://paperswithcode.com/paper/data-efficient-exploration-optimization-and
Repo	https://github.com/DanieleGravina/divergence-and-quality-diversity
Framework	none