Paper Group AWR 16
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions. Improving Text Proposals for Scene Images with Fully Convolutional Networks. Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs. Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation. Text Generation Ba …
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
Title | Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions |
Authors | Lex Fridman, Li Ding, Benedikt Jenik, Bryan Reimer |
Abstract | We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an “arguing machines” framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision. |
Tasks | Autonomous Driving, Image Classification |
Published | 2017-10-12 |
URL | http://arxiv.org/abs/1710.04459v2 |
http://arxiv.org/pdf/1710.04459v2.pdf | |
PWC | https://paperswithcode.com/paper/arguing-machines-human-supervision-of-black |
Repo | https://github.com/scope-lab-vu/deep-nn-car |
Framework | tf |
Improving Text Proposals for Scene Images with Fully Convolutional Networks
Title | Improving Text Proposals for Scene Images with Fully Convolutional Networks |
Authors | Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov |
Abstract | Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art. |
Tasks | Scene Text Recognition |
Published | 2017-02-16 |
URL | http://arxiv.org/abs/1702.05089v1 |
http://arxiv.org/pdf/1702.05089v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-text-proposals-for-scene-images |
Repo | https://github.com/gombru/TextFCN |
Framework | caffe2 |
Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs
Title | Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs |
Authors | Alexey A. Novikov, Dimitrios Lenis, David Major, Jiri Hladůvka, Maria Wimmer, Katja Bühler |
Abstract | The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures for automated multi-class segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. We address several open challenges including model overfitting, reducing number of parameters and handling of severely imbalanced data in CXR by fusing recent concepts in convolutional networks and adapting them to the segmentation problem task in CXR. We demonstrate that our architecture combining delayed subsampling, exponential linear units, highly restrictive regularization and a large number of high resolution low level abstract features outperforms state-of-the-art methods on all considered organs, as well as the human observer on lungs and heart. The models use a multi-class configuration with three target classes and are trained and tested on the publicly available JSRT database, consisting of 247 X-ray images the ground-truth masks for which are available in the SCR database. Our best performing model, trained with the loss function based on the Dice coefficient, reached mean Jaccard overlap scores of 95.0% for lungs, 86.8% for clavicles and 88.2% for heart. This architecture outperformed the human observer results for lungs and heart. |
Tasks | Image Classification |
Published | 2017-01-30 |
URL | http://arxiv.org/abs/1701.08816v4 |
http://arxiv.org/pdf/1701.08816v4.pdf | |
PWC | https://paperswithcode.com/paper/fully-convolutional-architectures-for-multi |
Repo | https://github.com/Diganta13/Image-segmentation-by-UNet-Algorithm |
Framework | none |
Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Title | Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation |
Authors | Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann |
Abstract | Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration. |
Tasks | Word Sense Disambiguation |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.06878v1 |
http://arxiv.org/pdf/1707.06878v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-knowledge-free-and-interpretable |
Repo | https://github.com/uhh-lt/wsd |
Framework | none |
Text Generation Based on Generative Adversarial Nets with Latent Variable
Title | Text Generation Based on Generative Adversarial Nets with Latent Variable |
Authors | Heng Wang, Zengchang Qin, Tao Wan |
Abstract | In this paper, we propose a model using generative adversarial net (GAN) to generate realistic text. Instead of using standard GAN, we combine variational autoencoder (VAE) with generative adversarial net. The use of high-level latent random variables is helpful to learn the data distribution and solve the problem that generative adversarial net always emits the similar data. We propose the VGAN model where the generative model is composed of recurrent neural network and VAE. The discriminative model is a convolutional neural network. We train the model via policy gradient. We apply the proposed model to the task of text generation and compare it to other recent neural network based models, such as recurrent neural network language model and SeqGAN. We evaluate the performance of the model by calculating negative log-likelihood and the BLEU score. We conduct experiments on three benchmark datasets, and results show that our model outperforms other previous models. |
Tasks | Language Modelling, Text Generation |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00170v2 |
http://arxiv.org/pdf/1712.00170v2.pdf | |
PWC | https://paperswithcode.com/paper/text-generation-based-on-generative |
Repo | https://github.com/valko073/LyricsGANs |
Framework | tf |
FALKON: An Optimal Large Scale Kernel Method
Title | FALKON: An Optimal Large Scale Kernel Method |
Authors | Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco |
Abstract | Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic subsampling, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially $O(n)$ memory and $O(n\sqrt{n})$ time. An extensive experimental analysis on large scale datasets shows that, even with a single machine, FALKON outperforms previous state of the art solutions, which exploit parallel/distributed architectures. |
Tasks | |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1705.10958v3 |
http://arxiv.org/pdf/1705.10958v3.pdf | |
PWC | https://paperswithcode.com/paper/falkon-an-optimal-large-scale-kernel-method |
Repo | https://github.com/LCSL/FALKON_paper |
Framework | none |
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Title | Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model |
Authors | Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra |
Abstract | We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce ‘safe’ and generic responses (“I don’t know”, “I can’t tell”). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it cannot be deployed to have real conversations with users. Our work aims to achieve the best of both worlds – the practical usefulness of G and the strong performance of D – via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution – specifically, an RNN augmented with a sequence of GS samplers, coupled with the straight-through gradient estimator to enable end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67% on recall@10). The source code can be downloaded from https://github.com/jiasenlu/visDial.pytorch. |
Tasks | Metric Learning, Transfer Learning, Visual Dialog |
Published | 2017-06-05 |
URL | http://arxiv.org/abs/1706.01554v2 |
http://arxiv.org/pdf/1706.01554v2.pdf | |
PWC | https://paperswithcode.com/paper/best-of-both-worlds-transferring-knowledge |
Repo | https://github.com/jiasenlu/visDial.pytorch |
Framework | pytorch |
Automated Latent Fingerprint Recognition
Title | Automated Latent Fingerprint Recognition |
Authors | Kai Cao, Anil K. Jain |
Abstract | Latent fingerprints are one of the most important and widely used evidence in law enforcement and forensic agencies worldwide. Yet, NIST evaluations show that the performance of state-of-the-art latent recognition systems is far from satisfactory. An automated latent fingerprint recognition system with high accuracy is essential to compare latents found at crime scenes to a large collection of reference prints to generate a candidate list of possible mates. In this paper, we propose an automated latent fingerprint recognition algorithm that utilizes Convolutional Neural Networks (ConvNets) for ridge flow estimation and minutiae descriptor extraction, and extract complementary templates (two minutiae templates and one texture template) to represent the latent. The comparison scores between the latent and a reference print based on the three templates are fused to retrieve a short candidate list from the reference database. Experimental results show that the rank-1 identification accuracies (query latent is matched with its true mate in the reference database) are 64.7% for the NIST SD27 and 75.3% for the WVU latent databases, against a reference database of 100K rolled prints. These results are the best among published papers on latent recognition and competitive with the performance (66.7% and 70.8% rank-1 accuracies on NIST SD27 and WVU DB, respectively) of a leading COTS latent Automated Fingerprint Identification System (AFIS). By score-level (rank-level) fusion of our system with the commercial off-the-shelf (COTS) latent AFIS, the overall rank-1 identification performance can be improved from 64.7% and 75.3% to 73.3% (74.4%) and 76.6% (78.4%) on NIST SD27 and WVU latent databases, respectively. |
Tasks | |
Published | 2017-04-06 |
URL | http://arxiv.org/abs/1704.01925v1 |
http://arxiv.org/pdf/1704.01925v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-latent-fingerprint-recognition |
Repo | https://github.com/prip-lab/MSU-LatentAFIS |
Framework | pytorch |
Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction
Title | Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction |
Authors | Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas |
Abstract | With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet’s accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties. |
Tasks | Speech Recognition, Transfer Learning |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02734v2 |
http://arxiv.org/pdf/1712.02734v2.pdf | |
PWC | https://paperswithcode.com/paper/using-rule-based-labels-for-weak-supervised |
Repo | https://github.com/Yindong-Zhang/GraphConvolutionDrugTargetInteration |
Framework | tf |
Dual-Path Convolutional Image-Text Embedding with Instance Loss
Title | Dual-Path Convolutional Image-Text Embedding with Instance Loss |
Authors | Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Yi-Dong Shen |
Abstract | Matching images and sentences demands a fine understanding of both modalities. In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image / text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss is hard for network learning, since it starts from the two heterogeneous features to build inter-modal relationship. To address this problem, we propose the instance loss which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image / text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this paper constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available. |
Tasks | Content-Based Image Retrieval, Cross-Modal Retrieval, Person Retrieval, Texture Image Retrieval |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05535v3 |
http://arxiv.org/pdf/1711.05535v3.pdf | |
PWC | https://paperswithcode.com/paper/dual-path-convolutional-image-text-embedding |
Repo | https://github.com/pshroff04/Dual_Path_CNN |
Framework | pytorch |
DisSent: Sentence Representation Learning from Explicit Discourse Relations
Title | DisSent: Sentence Representation Learning from Explicit Discourse Relations |
Authors | Allen Nie, Erin D. Bennett, Noah D. Goodman |
Abstract | Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that our curated dataset provides an excellent signal for learning vector representations of sentence meaning, representing relations that can only be determined when the meanings of two sentences are combined. We demonstrate that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. Our fixed sentence embeddings achieve high performance on a variety of transfer tasks, including SentEval, and we achieve state-of-the-art results on Penn Discourse Treebank’s implicit relation prediction task. |
Tasks | Dependency Parsing, Representation Learning, Sentence Embeddings |
Published | 2017-10-12 |
URL | https://arxiv.org/abs/1710.04334v4 |
https://arxiv.org/pdf/1710.04334v4.pdf | |
PWC | https://paperswithcode.com/paper/dissent-sentence-representation-learning-from |
Repo | https://github.com/facebookresearch/InferSent |
Framework | pytorch |
HashNet: Deep Learning to Hash by Continuation
Title | HashNet: Deep Learning to Hash by Continuation |
Authors | Zhangjie Cao, Mingsheng Long, Jianmin Wang, Philip S. Yu |
Abstract | Learning to hash has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by end-to-end representation learning and hash encoding, has received increasing attention recently. Subject to the ill-posed gradient difficulty in the optimization with sign activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This work presents HashNet, a novel deep architecture for deep learning to hash by continuation method with convergence guarantees, which learns exactly binary hash codes from imbalanced similarity data. The key idea is to attack the ill-posed gradient problem in optimizing deep networks with non-smooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield state-of-the-art multimedia retrieval performance on standard benchmarks. |
Tasks | Representation Learning |
Published | 2017-02-02 |
URL | http://arxiv.org/abs/1702.00758v4 |
http://arxiv.org/pdf/1702.00758v4.pdf | |
PWC | https://paperswithcode.com/paper/hashnet-deep-learning-to-hash-by-continuation |
Repo | https://github.com/thuml/HashNet |
Framework | pytorch |
Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods
Title | Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods |
Authors | Robert M. Gower, Nicolas Le Roux, Francis Bach |
Abstract | Our goal is to improve variance reducing stochastic methods through better control variates. We first propose a modification of SVRG which uses the Hessian to track gradients over time, rather than to recondition, increasing the correlation of the control variates and leading to faster theoretical convergence close to the optimum. We then propose accurate and computationally efficient approximations to the Hessian, both using a diagonal and a low-rank matrix. Finally, we demonstrate the effectiveness of our method on a wide range of problems. |
Tasks | |
Published | 2017-10-20 |
URL | http://arxiv.org/abs/1710.07462v3 |
http://arxiv.org/pdf/1710.07462v3.pdf | |
PWC | https://paperswithcode.com/paper/tracking-the-gradients-using-the-hessian-a |
Repo | https://github.com/gowerrobert/StochOpt |
Framework | none |
Translating Neuralese
Title | Translating Neuralese |
Authors | Jacob Andreas, Anca Dragan, Dan Klein |
Abstract | Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel. While these policies are effective for many tasks, interpretation of their induced communication strategies has remained a challenge. Here we propose to interpret agents’ messages by translating them. Unlike in typical machine translation problems, we have no parallel data to learn from. Instead we develop a translation model based on the insight that agent messages and natural language strings mean the same thing if they induce the same belief about the world in a listener. We present theoretical guarantees and empirical evidence that our approach preserves both the semantics and pragmatics of messages by ensuring that players communicating through a translation layer do not suffer a substantial loss in reward relative to players with a common language. |
Tasks | Machine Translation |
Published | 2017-04-23 |
URL | http://arxiv.org/abs/1704.06960v5 |
http://arxiv.org/pdf/1704.06960v5.pdf | |
PWC | https://paperswithcode.com/paper/translating-neuralese |
Repo | https://github.com/jacobandreas/neuralese |
Framework | tf |
Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted Illumination
Title | Data-Efficient Exploration, Optimization, and Modeling of Diverse Designs through Surrogate-Assisted Illumination |
Authors | Adam Gaier, Alexander Asteroth, Jean-Baptiste Mouret |
Abstract | The MAP-Elites algorithm produces a set of high-performing solutions that vary according to features defined by the user. This technique has the potential to be a powerful tool for design space exploration, but is limited by the need for numerous evaluations. The Surrogate-Assisted Illumination algorithm (SAIL), introduced here, integrates approximative models and intelligent sampling of the objective function to minimize the number of evaluations required by MAP-Elites. The ability of SAIL to efficiently produce both accurate models and diverse high performing solutions is illustrated on a 2D airfoil design problem. The search space is divided into bins, each holding a design with a different combination of features. In each bin SAIL produces a better performing solution than MAP-Elites, and requires several orders of magnitude fewer evaluations. The CMA-ES algorithm was used to produce an optimal design in each bin: with the same number of evaluations required by CMA-ES to find a near-optimal solution in a single bin, SAIL finds solutions of similar quality in every bin. |
Tasks | Efficient Exploration |
Published | 2017-02-13 |
URL | http://arxiv.org/abs/1702.03713v2 |
http://arxiv.org/pdf/1702.03713v2.pdf | |
PWC | https://paperswithcode.com/paper/data-efficient-exploration-optimization-and |
Repo | https://github.com/DanieleGravina/divergence-and-quality-diversity |
Framework | none |