Paper Group AWR 71
Learning SO(3) Equivariant Representations with Spherical CNNs. Open-World Knowledge Graph Completion. Should Robots be Obedient?. Learning a Generative Model for Validity in Complex Discrete Structures. Trainable Greedy Decoding for Neural Machine Translation. Wavelet Convolutional Neural Networks for Texture Classification. Fast Meta-Learning for …
Learning SO(3) Equivariant Representations with Spherical CNNs
Title | Learning SO(3) Equivariant Representations with Spherical CNNs |
Authors | Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, Kostas Daniilidis |
Abstract | We address the problem of 3D rotation equivariance in convolutional neural networks. 3D rotations have been a challenging nuisance in 3D classification tasks requiring higher capacity and extended data augmentation in order to tackle it. We model 3D data with multi-valued spherical functions and we propose a novel spherical convolutional network that implements exact convolutions on the sphere by realizing them in the spherical harmonic domain. Resulting filters have local symmetry and are localized by enforcing smooth spectra. We apply a novel pooling on the spectral domain and our operations are independent of the underlying spherical resolution throughout the network. We show that networks with much lower capacity and without requiring data augmentation can exhibit performance comparable to the state of the art in standard retrieval and classification benchmarks. |
Tasks | Data Augmentation |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06721v3 |
http://arxiv.org/pdf/1711.06721v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-so3-equivariant-representations-with |
Repo | https://github.com/aidinhass/tf-sphcnn |
Framework | tf |
Open-World Knowledge Graph Completion
Title | Open-World Knowledge Graph Completion |
Authors | Baoxu Shi, Tim Weninger |
Abstract | Knowledge Graphs (KGs) have been applied to many tasks including Web search, link prediction, recommendation, natural language processing, and entity linking. However, most KGs are far from complete and are growing at a rapid pace. To address these problems, Knowledge Graph Completion (KGC) has been proposed to improve KGs by filling in its missing connections. Unlike existing methods which hold a closed-world assumption, i.e., where KGs are fixed and new entities cannot be easily added, in the present work we relax this assumption and propose a new open-world KGC task. As a first attempt to solve this task we introduce an open-world KGC model called ConMask. This model learns embeddings of the entity’s name and parts of its text-description to connect unseen entities to the KG. To mitigate the presence of noisy text descriptions, ConMask uses a relationship-dependent content masking to extract relevant snippets and then trains a fully convolutional neural network to fuse the extracted snippets with entities in the KG. Experiments on large data sets, both old and new, show that ConMask performs well in the open-world KGC task and even outperforms existing KGC models on the standard closed-world KGC task. |
Tasks | Entity Linking, Knowledge Graph Completion, Knowledge Graphs, Link Prediction |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03438v1 |
http://arxiv.org/pdf/1711.03438v1.pdf | |
PWC | https://paperswithcode.com/paper/open-world-knowledge-graph-completion |
Repo | https://github.com/bxshi/ConMask |
Framework | tf |
Should Robots be Obedient?
Title | Should Robots be Obedient? |
Authors | Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell |
Abstract | Intuitively, obedience – following the order that a human gives – seems like a good property for a robot to have. But, we humans are not perfect and we may give orders that are not best aligned to our preferences. We show that when a human is not perfectly rational then a robot that tries to infer and act according to the human’s underlying preferences can always perform better than a robot that simply follows the human’s literal order. Thus, there is a tradeoff between the obedience of a robot and the value it can attain for its owner. We investigate how this tradeoff is impacted by the way the robot infers the human’s preferences, showing that some methods err more on the side of obedience than others. We then analyze how performance degrades when the robot has a misspecified model of the features that the human cares about or the level of rationality of the human. Finally, we study how robots can start detecting such model misspecification. Overall, our work suggests that there might be a middle ground in which robots intelligently decide when to obey human orders, but err on the side of obedience. |
Tasks | |
Published | 2017-05-28 |
URL | http://arxiv.org/abs/1705.09990v1 |
http://arxiv.org/pdf/1705.09990v1.pdf | |
PWC | https://paperswithcode.com/paper/should-robots-be-obedient |
Repo | https://github.com/smilli/obedience |
Framework | none |
Learning a Generative Model for Validity in Complex Discrete Structures
Title | Learning a Generative Model for Validity in Complex Discrete Structures |
Authors | David Janz, Jos van der Westhuizen, Brooks Paige, Matt J. Kusner, José Miguel Hernández-Lobato |
Abstract | Deep generative models have been successfully used to learn representations for high-dimensional discrete spaces by representing discrete objects as sequences and employing powerful sequence-based deep models. Unfortunately, these sequence-based models often produce invalid sequences: sequences which do not represent any underlying discrete structure; invalid sequences hinder the utility of such models. As a step towards solving this problem, we propose to learn a deep recurrent validator model, which can estimate whether a partial sequence can function as the beginning of a full, valid sequence. This validator provides insight as to how individual sequence elements influence the validity of the overall sequence, and can be used to constrain sequence based models to generate valid sequences – and thus faithfully model discrete objects. Our approach is inspired by reinforcement learning, where an oracle which can evaluate validity of complete sequences provides a sparse reward signal. We demonstrate its effectiveness as a generative model of Python 3 source code for mathematical expressions, and in improving the ability of a variational autoencoder trained on SMILES strings to decode valid molecular structures. |
Tasks | |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01664v4 |
http://arxiv.org/pdf/1712.01664v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-generative-model-for-validity-in |
Repo | https://github.com/DavidJanz/molecule_grammar_rnn |
Framework | pytorch |
Trainable Greedy Decoding for Neural Machine Translation
Title | Trainable Greedy Decoding for Neural Machine Translation |
Authors | Jiatao Gu, Kyunghyun Cho, Victor O. K. Li |
Abstract | Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research community. In this paper, we solely focus on the problem of decoding given a trained neural machine translation model. Instead of trying to build a new decoding algorithm for any specific decoding objective, we propose the idea of trainable decoding algorithm in which we train a decoding algorithm to find a translation that maximizes an arbitrary decoding objective. More specifically, we design an actor that observes and manipulates the hidden state of the neural machine translation decoder and propose to train it using a variant of deterministic policy gradient. We extensively evaluate the proposed algorithm using four language pairs and two decoding objectives and show that we can indeed train a trainable greedy decoder that generates a better translation (in terms of a target decoding objective) with minimal computational overhead. |
Tasks | Machine Translation |
Published | 2017-02-08 |
URL | http://arxiv.org/abs/1702.02429v1 |
http://arxiv.org/pdf/1702.02429v1.pdf | |
PWC | https://paperswithcode.com/paper/trainable-greedy-decoding-for-neural-machine |
Repo | https://github.com/kyunghyuncho/rl-pong |
Framework | pytorch |
Wavelet Convolutional Neural Networks for Texture Classification
Title | Wavelet Convolutional Neural Networks for Texture Classification |
Authors | Shin Fujieda, Kohei Takayama, Toshiya Hachisuka |
Abstract | Texture classification is an important and challenging problem in many image processing applications. While convolutional neural networks (CNNs) achieved significant successes for image classification, texture classification remains a difficult problem since textures usually do not contain enough information regarding the shape of object. In image processing, texture classification has been traditionally studied well with spectral analyses which exploit repeated structures in many textures. Since CNNs process images as-is in the spatial domain whereas spectral analyses process images in the frequency domain, these models have different characteristics in terms of performance. We propose a novel CNN architecture, wavelet CNNs, which integrates a spectral analysis into CNNs. Our insight is that the pooling layer and the convolution layer can be viewed as a limited form of a spectral analysis. Based on this insight, we generalize both layers to perform a spectral analysis with wavelet transform. Wavelet CNNs allow us to utilize spectral information which is lost in conventional CNNs but useful in texture classification. The experiments demonstrate that our model achieves better accuracy in texture classification than existing models. We also show that our model has significantly fewer parameters than CNNs, making our model easier to train with less memory. |
Tasks | Image Classification, Texture Classification |
Published | 2017-07-24 |
URL | http://arxiv.org/abs/1707.07394v1 |
http://arxiv.org/pdf/1707.07394v1.pdf | |
PWC | https://paperswithcode.com/paper/wavelet-convolutional-neural-networks-for |
Repo | https://github.com/shinfj/WaveletCNN_for_TextureClassification |
Framework | caffe2 |
Fast Meta-Learning for Adaptive Hierarchical Classifier Design
Title | Fast Meta-Learning for Adaptive Hierarchical Classifier Design |
Authors | Gerrit J. J. van den Burg, Alfred O. Hero |
Abstract | We propose a new splitting criterion for a meta-learning approach to multiclass classifier design that adaptively merges the classes into a tree-structured hierarchy of increasingly difficult binary classification problems. The classification tree is constructed from empirical estimates of the Henze-Penrose bounds on the pairwise Bayes misclassification rates that rank the binary subproblems in terms of difficulty of classification. The proposed empirical estimates of the Bayes error rate are computed from the minimal spanning tree (MST) of the samples from each pair of classes. Moreover, a meta-learning technique is presented for quantifying the one-vs-rest Bayes error rate for each individual class from a single MST on the entire dataset. Extensive simulations on benchmark datasets show that the proposed hierarchical method can often be learned much faster than competing methods, while achieving competitive accuracy. |
Tasks | Meta-Learning |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03512v1 |
http://arxiv.org/pdf/1711.03512v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-meta-learning-for-adaptive-hierarchical |
Repo | https://github.com/HeroResearchGroup/SmartSVM |
Framework | none |
End-to-end optimization of goal-driven and visually grounded dialogue systems
Title | End-to-end optimization of goal-driven and visually grounded dialogue systems |
Authors | Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin |
Abstract | End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature, making the context of a dialogue larger than the sole history. This is why only chit-chat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture. |
Tasks | Dialogue Management, Visual Question Answering |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.05423v1 |
http://arxiv.org/pdf/1703.05423v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-optimization-of-goal-driven-and |
Repo | https://github.com/ibrahimSouleiman/GuessWhat |
Framework | tf |
Remote Sensing Image Scene Classification: Benchmark and State of the Art
Title | Remote Sensing Image Scene Classification: Benchmark and State of the Art |
Authors | Gong Cheng, Junwei Han, Xiaoqiang Lu |
Abstract | Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed “NWPU-RESISC45”, which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research. |
Tasks | Scene Classification |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00121v1 |
http://arxiv.org/pdf/1703.00121v1.pdf | |
PWC | https://paperswithcode.com/paper/remote-sensing-image-scene-classification |
Repo | https://github.com/ArealTeamM2AIC/Remote-Sensing-Image |
Framework | pytorch |
Visual Feature Attribution using Wasserstein GANs
Title | Visual Feature Attribution using Wasserstein GANs |
Authors | Christian F. Baumgartner, Lisa M. Koch, Kerem Can Tezcan, Jia Xi Ang, Ender Konukoglu |
Abstract | Attributing the pixels of an input image to a certain category is an important and well-studied problem in computer vision, with applications ranging from weakly supervised localisation to understanding hidden effects in the data. In recent years, approaches based on interpreting a previously trained neural network classifier have become the de facto state-of-the-art and are commonly used on medical as well as natural image datasets. In this paper, we discuss a limitation of these approaches which may lead to only a subset of the category specific features being detected. To address this problem we develop a novel feature attribution technique based on Wasserstein Generative Adversarial Networks (WGAN), which does not suffer from this limitation. We show that our proposed method performs substantially better than the state-of-the-art for visual attribution on a synthetic dataset and on real 3D neuroimaging data from patients with mild cognitive impairment (MCI) and Alzheimer’s disease (AD). For AD patients the method produces compellingly realistic disease effect maps which are very close to the observed effects. |
Tasks | |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.08998v3 |
http://arxiv.org/pdf/1711.08998v3.pdf | |
PWC | https://paperswithcode.com/paper/visual-feature-attribution-using-wasserstein |
Repo | https://github.com/orobix/Visual-Feature-Attribution-Using-Wasserstein-GANs-Pytorch |
Framework | pytorch |
Action-depedent Control Variates for Policy Optimization via Stein’s Identity
Title | Action-depedent Control Variates for Policy Optimization via Stein’s Identity |
Authors | Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu |
Abstract | Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein’s identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method significantly improves the sample efficiency of the state-of-the-art policy gradient approaches. |
Tasks | Policy Gradient Methods |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.11198v4 |
http://arxiv.org/pdf/1710.11198v4.pdf | |
PWC | https://paperswithcode.com/paper/action-depedent-control-variates-for-policy |
Repo | https://github.com/brain-research/mirage-rl |
Framework | tf |
Deep Image Prior
Title | Deep Image Prior |
Authors | Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky |
Abstract | Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, super-resolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs. Apart from its diverse applications, our approach highlights the inductive bias captured by standard generator network architectures. It also bridges the gap between two very popular families of image restoration methods: learning-based methods using deep convolutional networks and learning-free methods based on handcrafted image priors such as self-similarity. Code and supplementary material are available at https://dmitryulyanov.github.io/deep_image_prior . |
Tasks | Denoising, Image Denoising, Image Generation, Image Inpainting, Image Restoration, Jpeg Compression Artifact Reduction, Super-Resolution |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.10925v3 |
http://arxiv.org/pdf/1711.10925v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-image-prior |
Repo | https://github.com/rsin46/deep-image-prior-keras |
Framework | none |
NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs
Title | NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs |
Authors | Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini |
Abstract | Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: while the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURAghe supports end-to-end CNN-based classification with a peak performance of 169 Gops/s, and an energy efficiency of 17 Gops/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 fps on the end-to-end execution of VGG-16, and 6.6 fps on ResNet-18. |
Tasks | Speech Recognition |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.00994v1 |
http://arxiv.org/pdf/1712.00994v1.pdf | |
PWC | https://paperswithcode.com/paper/neuraghe-exploiting-cpu-fpga-synergies-for |
Repo | https://github.com/Dhananjayadmd/DNN_MP |
Framework | none |
MACA: A Modular Architecture for Conversational Agents
Title | MACA: A Modular Architecture for Conversational Agents |
Authors | Hoai Phuoc Truong, Prasanna Parthasarathi, Joelle Pineau |
Abstract | We propose a software architecture designed to ease the implementation of dialogue systems. The Modular Architecture for Conversational Agents (MACA) uses a plug-n-play style that allows quick prototyping, thereby facilitating the development of new techniques and the reproduction of previous work. The architecture separates the domain of the conversation from the agent’s dialogue strategy, and as such can be easily extended to multiple domains. MACA provides tools to host dialogue agents on Amazon Mechanical Turk (mTurk) for data collection and allows processing of other sources of training data. The current version of the framework already incorporates several domains and existing dialogue strategies from the recent literature. |
Tasks | |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00673v2 |
http://arxiv.org/pdf/1705.00673v2.pdf | |
PWC | https://paperswithcode.com/paper/maca-a-modular-architecture-for |
Repo | https://github.com/ppartha03/MACA |
Framework | none |
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
Title | DARLA: Improving Zero-Shot Transfer in Reinforcement Learning |
Authors | Irina Higgins, Arka Pal, Andrei A. Rusu, Loic Matthey, Christopher P Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, Alexander Lerchner |
Abstract | Domain adaptation is an important open problem in deep reinforcement learning (RL). In many scenarios of interest data is hard to obtain, so agents may learn a source policy in a setting where data is readily available, with the hope that it generalises well to the target domain. We propose a new multi-stage RL agent, DARLA (DisentAngled Representation Learning Agent), which learns to see before learning to act. DARLA’s vision is based on learning a disentangled representation of the observed environment. Once DARLA can see, it is able to acquire source policies that are robust to many domain shifts - even with no access to the target domain. DARLA significantly outperforms conventional baselines in zero-shot domain adaptation scenarios, an effect that holds across a variety of RL environments (Jaco arm, DeepMind Lab) and base RL algorithms (DQN, A3C and EC). |
Tasks | Domain Adaptation, Representation Learning |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08475v2 |
http://arxiv.org/pdf/1707.08475v2.pdf | |
PWC | https://paperswithcode.com/paper/darla-improving-zero-shot-transfer-in |
Repo | https://github.com/BCHoagland/DARLA |
Framework | pytorch |