July 29, 2019

2973 words 14 mins read

Paper Group AWR 100

Paper Group AWR 100

Evaluation of Croatian Word Embeddings. Generalising Random Forest Parameter Optimisation to Include Stability and Cost. Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data. Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages. A Design Methodol …

Evaluation of Croatian Word Embeddings

Title Evaluation of Croatian Word Embeddings
Authors Lukas Svoboda, Slobodan Beliga
Abstract Croatian is poorly resourced and highly inflected language from Slavic language family. Nowadays, research is focusing mostly on English. We created a new word analogy corpus based on the original English Word2vec word analogy corpus and added some of the specific linguistic aspects from Croatian language. Next, we created Croatian WordSim353 and RG65 corpora for a basic evaluation of word similarities. We compared created corpora on two popular word representation models, based on Word2Vec tool and fastText tool. Models has been trained on 1.37B tokens training data corpus and tested on a new robust Croatian word analogy corpus. Results show that models are able to create meaningful word representation. This research has shown that free word order and the higher morphological complexity of Croatian language influences the quality of resulting word embeddings.
Tasks Word Embeddings
Published 2017-11-06
URL http://arxiv.org/abs/1711.01804v2
PDF http://arxiv.org/pdf/1711.01804v2.pdf
PWC https://paperswithcode.com/paper/evaluation-of-croatian-word-embeddings
Repo https://github.com/Svobikl/cr-analogy
Framework none

Generalising Random Forest Parameter Optimisation to Include Stability and Cost

Title Generalising Random Forest Parameter Optimisation to Include Stability and Cost
Authors C. H. Bryan Liu, Benjamin Paul Chamberlain, Duncan A. Little, Angelo Cardoso
Abstract Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forests predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics.
Tasks Bayesian Optimisation
Published 2017-06-29
URL http://arxiv.org/abs/1706.09865v2
PDF http://arxiv.org/pdf/1706.09865v2.pdf
PWC https://paperswithcode.com/paper/generalising-random-forest-parameter
Repo https://github.com/liuchbryan/generalised_forest_tuning
Framework none

Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data

Title Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data
Authors Boris P. Hejblum, Chariff Alkhassim, Raphael Gottardo, François Caron, Rodolphe Thiébaut
Abstract Flow cytometry is a high-throughput technology used to quantify multiple surface and intracellular markers at the level of a single cell. This enables to identify cell sub-types, and to determine their relative proportions. Improvements of this technology allow to describe millions of individual cells from a blood sample using multiple markers. This results in high-dimensional datasets, whose manual analysis is highly time-consuming and poorly reproducible. While several methods have been developed to perform automatic recognition of cell populations, most of them treat and analyze each sample independently. However, in practice, individual samples are rarely independent (e.g. longitudinal studies). Here, we propose to use a Bayesian nonparametric approach with Dirichlet process mixture (DPM) of multivariate skew $t$-distributions to perform model based clustering of flow-cytometry data. DPM models directly estimate the number of cell populations from the data, avoiding model selection issues, and skew $t$-distributions provides robustness to outliers and non-elliptical shape of cell populations. To accommodate repeated measurements, we propose a sequential strategy relying on a parametric approximation of the posterior. We illustrate the good performance of our method on simulated data, on an experimental benchmark dataset, and on new longitudinal data from the DALIA-1 trial which evaluates a therapeutic vaccine against HIV. On the benchmark dataset, the sequential strategy outperforms all other methods evaluated, and similarly, leads to improved performance on the DALIA-1 data. We have made the method available for the community in the R package NPflow.
Tasks Model Selection
Published 2017-02-14
URL http://arxiv.org/abs/1702.04407v4
PDF http://arxiv.org/pdf/1702.04407v4.pdf
PWC https://paperswithcode.com/paper/sequential-dirichlet-process-mixtures-of
Repo https://github.com/borishejblum/NPflow
Framework none

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages

Title Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
Authors Michael Sejr Schlichtkrull, Anders Søgaard
Abstract In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding. We present an end-to-end graph-based neural network dependency parser that can be trained to reproduce matrices of edge scores, which can be directly projected across word alignments. We show that our approach to cross-lingual dependency parsing is not only simpler, but also achieves an absolute improvement of 2.25% averaged across 10 languages compared to the previous state of the art.
Tasks Dependency Parsing
Published 2017-01-06
URL http://arxiv.org/abs/1701.01623v1
PDF http://arxiv.org/pdf/1701.01623v1.pdf
PWC https://paperswithcode.com/paper/cross-lingual-dependency-parsing-with-late
Repo https://github.com/MichSchli/Tensor-LSTM
Framework none

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

Title A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA
Authors Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado
Abstract In recent years deep learning algorithms have shown extremely high performance on machine learning tasks such as image classification and speech recognition. In support of such applications, various FPGA accelerator architectures have been proposed for convolutional neural networks (CNNs) that enable high performance for classification tasks at lower power than CPU and GPU processors. However, to date, there has been little research on the use of FPGA implementations of deconvolutional neural networks (DCNNs). DCNNs, also known as generative CNNs, encode high-dimensional probability distributions and have been widely used for computer vision applications such as scene completion, scene segmentation, image creation, image denoising, and super-resolution imaging. We propose an FPGA architecture for deconvolutional networks built around an accelerator which effectively handles the complex memory access patterns needed to perform strided deconvolutions, and that supports convolution as well. We also develop a three-step design optimization method that systematically exploits statistical analysis, design space exploration and VLSI optimization. To verify our FPGA deconvolutional accelerator design methodology we train DCNNs offline on two representative datasets using the generative adversarial network method (GAN) run on Tensorflow, and then map these DCNNs to an FPGA DCNN-plus-accelerator implementation to perform generative inference on a Xilinx Zynq-7000 FPGA. Our DCNN implementation achieves a peak performance density of 0.012 GOPs/DSP.
Tasks Denoising, Image Classification, Image Denoising, Scene Segmentation, Speech Recognition, Super-Resolution
Published 2017-05-07
URL http://arxiv.org/abs/1705.02583v1
PDF http://arxiv.org/pdf/1705.02583v1.pdf
PWC https://paperswithcode.com/paper/a-design-methodology-for-efficient
Repo https://github.com/chl218/DCNN-on-FPGA
Framework none

Robust Kronecker-Decomposable Component Analysis for Low-Rank Modeling

Title Robust Kronecker-Decomposable Component Analysis for Low-Rank Modeling
Authors Mehdi Bahri, Yannis Panagakis, Stefanos Zafeiriou
Abstract Dictionary learning and component analysis are part of one of the most well-studied and active research fields, at the intersection of signal and image processing, computer vision, and statistical machine learning. In dictionary learning, the current methods of choice are arguably K-SVD and its variants, which learn a dictionary (i.e., a decomposition) for sparse coding via Singular Value Decomposition. In robust component analysis, leading methods derive from Principal Component Pursuit (PCP), which recovers a low-rank matrix from sparse corruptions of unknown magnitude and support. However, K-SVD is sensitive to the presence of noise and outliers in the training set. Additionally, PCP does not provide a dictionary that respects the structure of the data (e.g., images), and requires expensive SVD computations when solved by convex relaxation. In this paper, we introduce a new robust decomposition of images by combining ideas from sparse dictionary learning and PCP. We propose a novel Kronecker-decomposable component analysis which is robust to gross corruption, can be used for low-rank modeling, and leverages separability to solve significantly smaller problems. We design an efficient learning algorithm by drawing links with a restricted form of tensor factorization. The effectiveness of the proposed approach is demonstrated on real-world applications, namely background subtraction and image denoising, by performing a thorough comparison with the current state of the art.
Tasks Denoising, Dictionary Learning, Image Denoising
Published 2017-03-22
URL http://arxiv.org/abs/1703.07886v2
PDF http://arxiv.org/pdf/1703.07886v2.pdf
PWC https://paperswithcode.com/paper/robust-kronecker-decomposable-component
Repo https://github.com/mbahri/KDRSDL
Framework none

Emergent Complexity via Multi-Agent Competition

Title Emergent Complexity via Multi-Agent Competition
Authors Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch
Abstract Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. In this paper, we point out that a competitive multi-agent environment trained with self-play can produce behaviors that are far more complex than the environment itself. We also point out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty. This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics. The trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple. The skills include behaviors such as running, blocking, ducking, tackling, fooling opponents, kicking, and defending using both arms and legs. A highlight of the learned behaviors can be found here: https://goo.gl/eR7fbX
Tasks
Published 2017-10-10
URL http://arxiv.org/abs/1710.03748v3
PDF http://arxiv.org/pdf/1710.03748v3.pdf
PWC https://paperswithcode.com/paper/emergent-complexity-via-multi-agent
Repo https://github.com/openai/multiagent-competition
Framework tf

Wasserstein Distance Guided Representation Learning for Domain Adaptation

Title Wasserstein Distance Guided Representation Learning for Domain Adaptation
Authors Jian Shen, Yanru Qu, Weinan Zhang, Yong Yu
Abstract Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative in prediction. To learn such representations, domain adaptation frameworks usually include a domain invariant representation learning approach to measure and reduce the domain discrepancy, as well as a discriminator for classification. Inspired by Wasserstein GAN, in this paper we propose a novel approach to learn domain invariant feature representations, namely Wasserstein Distance Guided Representation Learning (WDGRL). WDGRL utilizes a neural network, denoted by the domain critic, to estimate empirical Wasserstein distance between the source and target samples and optimizes the feature extractor network to minimize the estimated Wasserstein distance in an adversarial manner. The theoretical advantages of Wasserstein distance for domain adaptation lie in its gradient property and promising generalization bound. Empirical studies on common sentiment and image classification adaptation datasets demonstrate that our proposed WDGRL outperforms the state-of-the-art domain invariant representation learning approaches.
Tasks Domain Adaptation, Image Classification, Representation Learning
Published 2017-07-05
URL http://arxiv.org/abs/1707.01217v4
PDF http://arxiv.org/pdf/1707.01217v4.pdf
PWC https://paperswithcode.com/paper/wasserstein-distance-guided-representation
Repo https://github.com/yjhong89/Domain-Adaptation
Framework tf

Slanted Stixels: Representing San Francisco’s Steepest Streets

Title Slanted Stixels: Representing San Francisco’s Steepest Streets
Authors Daniel Hernandez-Juarez, Lukas Schneider, Antonio Espinosa, David Vázquez, Antonio M. López, Uwe Franke, Marc Pollefeys, Juan C. Moure
Abstract In this work we present a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced that uses an extremely efficient over-segmentation. In doing so, the computational complexity of the Stixel inference algorithm is reduced significantly, achieving real-time computation capabilities with only a slight drop in accuracy. We evaluate the proposed approach in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.
Tasks
Published 2017-07-17
URL http://arxiv.org/abs/1707.05397v1
PDF http://arxiv.org/pdf/1707.05397v1.pdf
PWC https://paperswithcode.com/paper/slanted-stixels-representing-san-franciscos
Repo https://github.com/dhernandez0/stixels
Framework none

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

Title Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets
Authors Zhen Yang, Wei Chen, Feng Wang, Bo Xu
Abstract This paper proposes an approach for applying GANs to NMT. We build a conditional sequence generative adversarial net which comprises of two adversarial sub models, a generator and a discriminator. The generator aims to generate sentences which are hard to be discriminated from human-translated sentences (i.e., the golden target sentences), And the discriminator makes efforts to discriminate the machine-generated sentences from human-translated ones. The two sub models play a mini-max game and achieve the win-win situation when they reach a Nash Equilibrium. Additionally, the static sentence-level BLEU is utilized as the reinforced objective for the generator, which biases the generation towards high BLEU points. During training, both the dynamic discriminator and the static BLEU objective are employed to evaluate the generated sentences and feedback the evaluations to guide the learning of the generator. Experimental results show that the proposed model consistently outperforms the traditional RNNSearch and the newly emerged state-of-the-art Transformer on English-German and Chinese-English translation tasks.
Tasks Machine Translation
Published 2017-03-15
URL http://arxiv.org/abs/1703.04887v4
PDF http://arxiv.org/pdf/1703.04887v4.pdf
PWC https://paperswithcode.com/paper/improving-neural-machine-translation-with
Repo https://github.com/qiuguoxia/chatbotmodal
Framework none

Tensor Product Generation Networks for Deep NLP Modeling

Title Tensor Product Generation Networks for Deep NLP Modeling
Authors Qiuyuan Huang, Paul Smolensky, Xiaodong He, Li Deng, Dapeng Wu
Abstract We present a new approach to the design of deep networks for natural language processing (NLP), based on the general technique of Tensor Product Representations (TPRs) for encoding and processing symbol structures in distributed neural networks. A network architecture — the Tensor Product Generation Network (TPGN) — is proposed which is capable in principle of carrying out TPR computation, but which uses unconstrained deep learning to design its internal representations. Instantiated in a model for image-caption generation, TPGN outperforms LSTM baselines when evaluated on the COCO dataset. The TPR-capable structure enables interpretation of internal representations and operations, which prove to contain considerable grammatical content. Our caption-generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation.
Tasks
Published 2017-09-26
URL http://arxiv.org/abs/1709.09118v5
PDF http://arxiv.org/pdf/1709.09118v5.pdf
PWC https://paperswithcode.com/paper/tensor-product-generation-networks-for-deep
Repo https://github.com/ggeorgea/TPRcaption
Framework pytorch

Learning Hierarchical Features from Generative Models

Title Learning Hierarchical Features from Generative Models
Authors Shengjia Zhao, Jiaming Song, Stefano Ermon
Abstract Deep neural networks have been shown to be very successful at learning feature hierarchies in supervised learning tasks. Generative models, on the other hand, have benefited less from hierarchical models with multiple layers of latent variables. In this paper, we prove that hierarchical latent variable models do not take advantage of the hierarchical structure when trained with existing variational methods, and provide some limitations on the kind of features existing models can learn. Finally we propose an alternative architecture that do not suffer from these limitations. Our model is able to learn highly interpretable and disentangled hierarchical features on several natural image datasets with no task specific regularization or prior knowledge.
Tasks Latent Variable Models
Published 2017-02-27
URL http://arxiv.org/abs/1702.08396v2
PDF http://arxiv.org/pdf/1702.08396v2.pdf
PWC https://paperswithcode.com/paper/learning-hierarchical-features-from
Repo https://github.com/Michedev/VLAE
Framework tf

Spatial Variational Auto-Encoding via Matrix-Variate Normal Distributions

Title Spatial Variational Auto-Encoding via Matrix-Variate Normal Distributions
Authors Zhengyang Wang, Hao Yuan, Shuiwang Ji
Abstract The key idea of variational auto-encoders (VAEs) resembles that of traditional auto-encoder models in which spatial information is supposed to be explicitly encoded in the latent space. However, the latent variables in VAEs are vectors, which can be interpreted as multiple feature maps of size 1x1. Such representations can only convey spatial information implicitly when coupled with powerful decoders. In this work, we propose spatial VAEs that use feature maps of larger size as latent variables to explicitly capture spatial information. This is achieved by allowing the latent variables to be sampled from matrix-variate normal (MVN) distributions whose parameters are computed from the encoder network. To increase dependencies among locations on latent feature maps and reduce the number of parameters, we further propose spatial VAEs via low-rank MVN distributions. Experimental results show that the proposed spatial VAEs outperform original VAEs in capturing rich structural and spatial information.
Tasks
Published 2017-05-18
URL http://arxiv.org/abs/1705.06821v2
PDF http://arxiv.org/pdf/1705.06821v2.pdf
PWC https://paperswithcode.com/paper/spatial-variational-auto-encoding-via-matrix
Repo https://github.com/divelab/svae
Framework tf

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Title Backpropagation through the Void: Optimizing control variates for black-box gradient estimation
Authors Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud
Abstract Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.
Tasks Latent Variable Models
Published 2017-10-31
URL http://arxiv.org/abs/1711.00123v3
PDF http://arxiv.org/pdf/1711.00123v3.pdf
PWC https://paperswithcode.com/paper/backpropagation-through-the-void-optimizing
Repo https://github.com/brain-research/mirage-rl
Framework tf

Neural Discourse Structure for Text Categorization

Title Neural Discourse Structure for Text Categorization
Authors Yangfeng Ji, Noah Smith
Abstract We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization. Our approach uses a recursive neural network and a newly proposed attention mechanism to compute a representation of the text that focuses on salient content, from the perspective of both RST and the task. Experiments consider variants of the approach and illustrate its strengths and weaknesses.
Tasks Text Categorization
Published 2017-02-07
URL http://arxiv.org/abs/1702.01829v2
PDF http://arxiv.org/pdf/1702.01829v2.pdf
PWC https://paperswithcode.com/paper/neural-discourse-structure-for-text
Repo https://github.com/jiyfeng/disco4textcat
Framework none
comments powered by Disqus