October 21, 2019

3150 words 15 mins read

Paper Group AWR 124

Paper Group AWR 124

ADef: an Iterative Algorithm to Construct Adversarial Deformations. Deep Bayesian Inversion. HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs. Seq2RDF: An end-to-end application for deriving Triples from Natural Language Text. Expeditious Generation of Knowledge Graph Embeddings. Margin-based Parallel Corpus Mining wit …

ADef: an Iterative Algorithm to Construct Adversarial Deformations

Title ADef: an Iterative Algorithm to Construct Adversarial Deformations
Authors Rima Alaifari, Giovanni S. Alberti, Tandri Gauksson
Abstract While deep neural networks have proven to be a powerful tool for many recognition and classification tasks, their stability properties are still not well understood. In the past, image classifiers have been shown to be vulnerable to so-called adversarial attacks, which are created by additively perturbing the correctly classified image. In this paper, we propose the ADef algorithm to construct a different kind of adversarial attack created by iteratively applying small deformations to the image, found through a gradient descent step. We demonstrate our results on MNIST with convolutional neural networks and on ImageNet with Inception-v3 and ResNet-101.
Tasks Adversarial Attack
Published 2018-04-20
URL http://arxiv.org/abs/1804.07729v3
PDF http://arxiv.org/pdf/1804.07729v3.pdf
PWC https://paperswithcode.com/paper/adef-an-iterative-algorithm-to-construct
Repo https://github.com/nicholasma88/CS260Final
Framework pytorch

Deep Bayesian Inversion

Title Deep Bayesian Inversion
Authors Jonas Adler, Ozan Öktem
Abstract Characterizing statistical properties of solutions of inverse problems is essential for decision making. Bayesian inversion offers a tractable framework for this purpose, but current approaches are computationally unfeasible for most realistic imaging applications in the clinic. We introduce two novel deep learning based methods for solving large-scale inverse problems using Bayesian inversion: a sampling based method using a WGAN with a novel mini-discriminator and a direct approach that trains a neural network using a novel loss function. The performance of both methods is demonstrated on image reconstruction in ultra low dose 3D helical CT. We compute the posterior mean and standard deviation of the 3D images followed by a hypothesis test to assess whether a “dark spot” in the liver of a cancer stricken patient is present. Both methods are computationally efficient and our evaluation shows very promising performance that clearly supports the claim that Bayesian inversion is usable for 3D imaging in time critical applications.
Tasks Decision Making, Image Reconstruction
Published 2018-11-14
URL http://arxiv.org/abs/1811.05910v1
PDF http://arxiv.org/pdf/1811.05910v1.pdf
PWC https://paperswithcode.com/paper/deep-bayesian-inversion
Repo https://github.com/JamesGlare/Neural-Net-LabView-DLL
Framework tf

HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs

Title HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs
Authors Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, Partha Talukdar
Abstract In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise. Hypergraphs provide a flexible and natural modeling tool to model such complex relationships. The obvious existence of such complex relationships in many real-world networks naturaly motivates the problem of learning with hypergraphs. A popular learning paradigm is hypergraph-based semi-supervised learning (SSL) where the goal is to assign labels to initially unlabeled vertices in a hypergraph. Motivated by the fact that a graph convolutional network (GCN) has been effective for graph-based SSL, we propose HyperGCN, a novel GCN for SSL on attributed hypergraphs. Additionally, we show how HyperGCN can be used as a learning-based approach for combinatorial optimisation on NP-hard hypergraph problems. We demonstrate HyperGCN’s effectiveness through detailed experimentation on real-world hypergraphs.
Tasks
Published 2018-09-07
URL https://arxiv.org/abs/1809.02589v4
PDF https://arxiv.org/pdf/1809.02589v4.pdf
PWC https://paperswithcode.com/paper/hypergcn-hypergraph-convolutional-networks
Repo https://github.com/malllabiisc/HyperGCN
Framework pytorch

Seq2RDF: An end-to-end application for deriving Triples from Natural Language Text

Title Seq2RDF: An end-to-end application for deriving Triples from Natural Language Text
Authors Yue Liu, Tongtao Zhang, Zhicheng Liang, Heng Ji, Deborah L. McGuinness
Abstract We present an end-to-end approach that takes unstructured textual input and generates structured output compliant with a given vocabulary. Inspired by recent successes in neural machine translation, we treat the triples within a given knowledge graph as an independent graph language and propose an encoder-decoder framework with an attention mechanism that leverages knowledge graph embeddings. Our model learns the mapping from natural language text to triple representation in the form of subject-predicate-object using the selected knowledge graph vocabulary. Experiments on three different data sets show that we achieve competitive F1-Measures over the baselines using our simple yet effective approach. A demo video is included.
Tasks Knowledge Graph Embeddings
Published 2018-07-04
URL http://arxiv.org/abs/1807.01763v3
PDF http://arxiv.org/pdf/1807.01763v3.pdf
PWC https://paperswithcode.com/paper/seq2rdf-an-end-to-end-application-for
Repo https://github.com/abhinavnagpal/KNOWLEDGE-GRAPH-PAPERS
Framework none

Expeditious Generation of Knowledge Graph Embeddings

Title Expeditious Generation of Knowledge Graph Embeddings
Authors Tommaso Soru, Stefano Ruberto, Diego Moussallem, André Valdestilhas, Alexander Bigerl, Edgard Marx, Diego Esteves
Abstract Knowledge Graph Embedding methods aim at representing entities and relations in a knowledge base as points or vectors in a continuous vector space. Several approaches using embeddings have shown promising results on tasks such as link prediction, entity recommendation, question answering, and triplet classification. However, only a few methods can compute low-dimensional embeddings of very large knowledge bases without needing state-of-the-art computational resources. In this paper, we propose KG2Vec, a simple and fast approach to Knowledge Graph Embedding based on the skip-gram model. Instead of using a predefined scoring function, we learn it relying on Long Short-Term Memories. We show that our embeddings achieve results comparable with the most scalable approaches on knowledge graph completion as well as on a new metric. Yet, KG2Vec can embed large graphs in lesser time by processing more than 250 million triples in less than 7 hours on common hardware.
Tasks Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graph Embeddings, Link Prediction, Question Answering
Published 2018-03-21
URL http://arxiv.org/abs/1803.07828v2
PDF http://arxiv.org/pdf/1803.07828v2.pdf
PWC https://paperswithcode.com/paper/expeditious-generation-of-knowledge-graph
Repo https://github.com/AKSW/KG2Vec
Framework none

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Title Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings
Authors Mikel Artetxe, Holger Schwenk
Abstract Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora. In this paper, we propose a new method for this task based on multilingual sentence embeddings. In contrast to previous approaches, which rely on nearest neighbor retrieval with a hard threshold over cosine similarity, our proposed method accounts for the scale inconsistencies of this measure, considering the margin between a given sentence pair and its closest candidates instead. Our experiments show large improvements over existing methods. We outperform the best published results on the BUCC mining task and the UN reconstruction task by more than 10 F1 and 30 precision points, respectively. Filtering the English-German ParaCrawl corpus with our approach, we obtain 31.2 BLEU points on newstest2014, an improvement of more than one point over the best official filtered version.
Tasks Cross-Lingual Bitext Mining, Machine Translation, Parallel Corpus Mining, Sentence Embeddings
Published 2018-11-03
URL https://arxiv.org/abs/1811.01136v2
PDF https://arxiv.org/pdf/1811.01136v2.pdf
PWC https://paperswithcode.com/paper/margin-based-parallel-corpus-mining-with
Repo https://github.com/transducens/LASERtrain
Framework pytorch

Fourier RNNs for Sequence Prediction

Title Fourier RNNs for Sequence Prediction
Authors Moritz Wolter, Angela Yao
Abstract Fourier methods have a long and proven track record as an excellent tool in data processing. We propose to integrate Fourier methods into complex recurrent neural network architectures and show accuracy improvements on prediction tasks as well as computational load reductions. We predict synthetic data drawn from synthetic-equations as well as real world power load data.
Tasks
Published 2018-12-13
URL https://arxiv.org/abs/1812.05645v2
PDF https://arxiv.org/pdf/1812.05645v2.pdf
PWC https://paperswithcode.com/paper/fourier-rnns-for-sequence-analysis-and
Repo https://github.com/v0lta/fourier-prediction
Framework tf

CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

Title CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions
Authors Kevin Tian, Teng Zhang, James Zou
Abstract Word embedding is a useful approach to capture co-occurrence structures in large text corpora. However, in addition to the text data itself, we often have additional covariates associated with individual corpus documents—e.g. the demographic of the author, time and venue of publication—and we would like the embedding to naturally capture this information. We propose CoVeR, a new tensor decomposition model for vector embeddings with covariates. CoVeR jointly learns a \emph{base} embedding for all the words as well as a weighted diagonal matrix to model how each covariate affects the base embedding. To obtain author or venue-specific embedding, for example, we can then simply multiply the base embedding by the associated transformation matrix. The main advantages of our approach are data efficiency and interpretability of the covariate transformation. Our experiments demonstrate that our joint model learns substantially better covariate-specific embeddings compared to the standard approach of learning a separate embedding for each covariate using only the relevant subset of data, as well as other related methods. Furthermore, CoVeR encourages the embeddings to be “topic-aligned” in that the dimensions have specific independent meanings. This allows our covariate-specific embeddings to be compared by topic, enabling downstream differential analysis. We empirically evaluate the benefits of our algorithm on datasets, and demonstrate how it can be used to address many natural questions about covariate effects. Accompanying code to this paper can be found at http://github.com/kjtian/CoVeR.
Tasks
Published 2018-02-21
URL http://arxiv.org/abs/1802.07839v2
PDF http://arxiv.org/pdf/1802.07839v2.pdf
PWC https://paperswithcode.com/paper/cover-learning-covariate-specific-vector
Repo https://github.com/kjtian/CoVeR
Framework none

The relativistic discriminator: a key element missing from standard GAN

Title The relativistic discriminator: a key element missing from standard GAN
Authors Alexia Jolicoeur-Martineau
Abstract In standard generative adversarial network (SGAN), the discriminator estimates the probability that the input data is real. The generator is trained to increase the probability that fake data is real. We argue that it should also simultaneously decrease the probability that real data is real because 1) this would account for a priori knowledge that half of the data in the mini-batch is fake, 2) this would be observed with divergence minimization, and 3) in optimal settings, SGAN would be equivalent to integral probability metric (IPM) GANs. We show that this property can be induced by using a relativistic discriminator which estimate the probability that the given real data is more realistic than a randomly sampled fake data. We also present a variant in which the discriminator estimate the probability that the given real data is more realistic than fake data, on average. We generalize both approaches to non-standard GAN loss functions and we refer to them respectively as Relativistic GANs (RGANs) and Relativistic average GANs (RaGANs). We show that IPM-based GANs are a subset of RGANs which use the identity function. Empirically, we observe that 1) RGANs and RaGANs are significantly more stable and generate higher quality data samples than their non-relativistic counterparts, 2) Standard RaGAN with gradient penalty generate data of better quality than WGAN-GP while only requiring a single discriminator update per generator update (reducing the time taken for reaching the state-of-the-art by 400%), and 3) RaGANs are able to generate plausible high resolutions images (256x256) from a very small sample (N=2011), while GAN and LSGAN cannot; these images are of significantly better quality than the ones generated by WGAN-GP and SGAN with spectral normalization.
Tasks Image Generation
Published 2018-07-02
URL http://arxiv.org/abs/1807.00734v3
PDF http://arxiv.org/pdf/1807.00734v3.pdf
PWC https://paperswithcode.com/paper/the-relativistic-discriminator-a-key-element
Repo https://github.com/eriklindernoren/PyTorch-GAN
Framework pytorch

Neural Arithmetic Logic Units

Title Neural Arithmetic Logic Units
Authors Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, Phil Blunsom
Abstract Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates. We call this module a neural arithmetic logic unit (NALU), by analogy to the arithmetic logic unit in traditional processors. Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images. In contrast to conventional architectures, we obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges.
Tasks
Published 2018-08-01
URL http://arxiv.org/abs/1808.00508v1
PDF http://arxiv.org/pdf/1808.00508v1.pdf
PWC https://paperswithcode.com/paper/neural-arithmetic-logic-units
Repo https://github.com/grananqvist/NALU-tf
Framework tf

SparseFool: a few pixels make a big difference

Title SparseFool: a few pixels make a big difference
Authors Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard
Abstract Deep Neural Networks have achieved extraordinary results on image classification tasks, but have been shown to be vulnerable to attacks with carefully crafted perturbations of the input data. Although most attacks usually change values of many image’s pixels, it has been shown that deep networks are also vulnerable to sparse alterations of the input. However, no computationally efficient method has been proposed to compute sparse perturbations. In this paper, we exploit the low mean curvature of the decision boundary, and propose SparseFool, a geometry inspired sparse attack that controls the sparsity of the perturbations. Extensive evaluations show that our approach computes sparse perturbations very fast, and scales efficiently to high dimensional data. We further analyze the transferability and the visual effects of the perturbations, and show the existence of shared semantic information across the images and the networks. Finally, we show that adversarial training can only slightly improve the robustness against sparse additive perturbations computed with SparseFool.
Tasks Image Classification
Published 2018-11-06
URL https://arxiv.org/abs/1811.02248v4
PDF https://arxiv.org/pdf/1811.02248v4.pdf
PWC https://paperswithcode.com/paper/sparsefool-a-few-pixels-make-a-big-difference
Repo https://github.com/LTS4/SparseFool
Framework pytorch

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Title Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
Authors Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer
Abstract Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In this work, we adopt an algorithm-hardware co-design approach to develop a ConvNet accelerator called Synetgy and a novel ConvNet model called DiracDeltaNet. Both the accelerator and ConvNet are tailored to FPGA requirements. DiracDeltaNet, as the name suggests, is a ConvNet with only 1$\times$1 convolutions while spatial convolutions are replaced by more efficient shift operations. DiracDeltaNet achieves competitive accuracy on ImageNet (88.7% top-5), but with 42$\times$ fewer parameters and 48$\times$ fewer OPs than VGG16. We further quantize DiracDeltaNet’s weights to 4-bit and activations to 4-bits, with less than 1% accuracy loss. These quantizations exploit well the nature of FPGA hardware. In short, DiracDeltaNet’s small model size, low computational OP count, low precision and simplified operators allow us to co-design a highly customized computing unit for an FPGA. We implement the computing units for DiracDeltaNet on an Ultra96 SoC system through high-level synthesis. Our accelerator’s final top-5 accuracy of 88.1% on ImageNet, is higher than all the previously reported embedded FPGA accelerators. In addition, the accelerator reaches an inference speed of 96.5 FPS on the ImageNet classification task, surpassing prior works with similar accuracy by at least 16.9$\times$.
Tasks
Published 2018-11-21
URL https://arxiv.org/abs/1811.08634v3
PDF https://arxiv.org/pdf/1811.08634v3.pdf
PWC https://paperswithcode.com/paper/synetgy-algorithm-hardware-co-design-for
Repo https://github.com/Yang-YiFan/DiracDeltaNet
Framework pytorch

SOM-VAE: Interpretable Discrete Representation Learning on Time Series

Title SOM-VAE: Interpretable Discrete Representation Learning on Time Series
Authors Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, Gunnar Rätsch
Abstract High-dimensional time series are common in many domains. Since human cognition is not optimized to work well in high-dimensional spaces, these areas could benefit from interpretable low-dimensional representations. However, most representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. This framework allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. We introduce a new way to overcome the non-differentiability in discrete representation learning and present a gradient-based version of the traditional self-organizing map algorithm that is more performant than the original. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the representation space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model in terms of clustering performance and interpretability on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application on the eICU data set. Our learned representations compare favorably with competitor methods and facilitate downstream tasks on the real world data.
Tasks Dimensionality Reduction, Representation Learning, Time Series, Time Series Clustering
Published 2018-06-06
URL http://arxiv.org/abs/1806.02199v7
PDF http://arxiv.org/pdf/1806.02199v7.pdf
PWC https://paperswithcode.com/paper/som-vae-interpretable-discrete-representation
Repo https://github.com/JustGlowing/minisom
Framework none

Combining Distant and Direct Supervision for Neural Relation Extraction

Title Combining Distant and Direct Supervision for Neural Relation Extraction
Authors Iz Beltagy, Kyle Lo, Waleed Ammar
Abstract In relation extraction with distant supervision, noisy labels make it difficult to train quality models. Previous neural models addressed this problem using an attention mechanism that attends to sentences that are likely to express the relations. We improve such models by combining the distant supervision data with an additional directly-supervised data, which we use as supervision for the attention weights. We find that joint training on both types of supervision leads to a better model because it improves the model’s ability to identify noisy sentences. In addition, we find that sigmoidal attention weights with max pooling achieves better performance over the commonly used weighted average attention in this setup. Our proposed method achieves a new state-of-the-art result on the widely used FB-NYT dataset.
Tasks Relation Extraction
Published 2018-10-30
URL http://arxiv.org/abs/1810.12956v2
PDF http://arxiv.org/pdf/1810.12956v2.pdf
PWC https://paperswithcode.com/paper/improving-distant-supervision-with-maxpooled
Repo https://github.com/allenai/comb_dist_direct_relex
Framework pytorch

Accelerating Incremental Gradient Optimization with Curvature Information

Title Accelerating Incremental Gradient Optimization with Curvature Information
Authors Hoi-To Wai, Wei Shi, Cesar A. Uribe, Angelia Nedich, Anna Scaglione
Abstract This paper studies an acceleration technique for incremental aggregated gradient ({\sf IAG}) method through the use of \emph{curvature} information for solving strongly convex finite sum optimization problems. These optimization problems of interest arise in large-scale learning applications. Our technique utilizes a curvature-aided gradient tracking step to produce accurate gradient estimates incrementally using Hessian information. We propose and analyze two methods utilizing the new technique, the curvature-aided IAG ({\sf CIAG}) method and the accelerated CIAG ({\sf A-CIAG}) method, which are analogous to gradient method and Nesterov’s accelerated gradient method, respectively. Setting $\kappa$ to be the condition number of the objective function, we prove the $R$ linear convergence rates of $1 - \frac{4c_0 \kappa}{(\kappa+1)^2}$ for the {\sf CIAG} method, and $1 - \sqrt{\frac{c_1}{2\kappa}}$ for the {\sf A-CIAG} method, where $c_0,c_1 \leq 1$ are constants inversely proportional to the distance between the initial point and the optimal solution. When the initial iterate is close to the optimal solution, the $R$ linear convergence rates match with the gradient and accelerated gradient method, albeit {\sf CIAG} and {\sf A-CIAG} operate in an incremental setting with strictly lower computation complexity. Numerical experiments confirm our findings. The source codes used for this paper can be found on \url{http://github.com/hoitowai/ciag/}.
Tasks
Published 2018-05-31
URL https://arxiv.org/abs/1806.00125v2
PDF https://arxiv.org/pdf/1806.00125v2.pdf
PWC https://paperswithcode.com/paper/on-curvature-aided-incremental-aggregated
Repo https://github.com/hoitowai/ciag
Framework none
comments powered by Disqus