January 24, 2020

3298 words 16 mins read

Paper Group NANR 116

VHEGAN: Variational Hetero-Encoder Randomized GAN for Zero-Shot Learning. DBMS-KU at SemEval-2019 Task 9: Exploring Machine Learning Approaches in Classifying Text as Suggestion or Non-Suggestion. $A^*$ sampling with probability matching. Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains …

VHEGAN: Variational Hetero-Encoder Randomized GAN for Zero-Shot Learning


Title	VHEGAN: Variational Hetero-Encoder Randomized GAN for Zero-Shot Learning
Authors	Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou
Abstract	To extract and relate visual and linguistic concepts from images and textual descriptions for text-based zero-shot learning (ZSL), we develop variational hetero-encoder (VHE) that decodes text via a deep probabilisitic topic model, the variational posterior of whose local latent variables is encoded from an image via a Weibull distribution based inference network. To further improve VHE and add an image generator, we propose VHE randomized generative adversarial net (VHEGAN) that exploits the synergy between VHE and GAN through their shared latent space. After training with a hybrid stochastic-gradient MCMC/variational inference/stochastic gradient descent inference algorithm, VHEGAN can be used in a variety of settings, such as text generation/retrieval conditioning on an image, image generation/retrieval conditioning on a document/image, and generation of text-image pairs. The efficacy of VHEGAN is demonstrated quantitatively with experiments on both conventional and generalized ZSL tasks, and qualitatively on (conditional) image and/or text generation/retrieval.
Tasks	Image Generation, Text Generation, Zero-Shot Learning
Published	2019-05-01
URL	https://openreview.net/forum?id=S1eX-nA5KX
PDF	https://openreview.net/pdf?id=S1eX-nA5KX
PWC	https://paperswithcode.com/paper/vhegan-variational-hetero-encoder-randomized
Repo
Framework

DBMS-KU at SemEval-2019 Task 9: Exploring Machine Learning Approaches in Classifying Text as Suggestion or Non-Suggestion


Title	DBMS-KU at SemEval-2019 Task 9: Exploring Machine Learning Approaches in Classifying Text as Suggestion or Non-Suggestion
Authors	Tirana Fatyanosa, Al Hafiz Akbar Maulana Siagian, Masayoshi Aritsugi
Abstract	This paper describes the participation of DBMS-KU team in the SemEval 2019 Task 9, that is, suggestion mining from online reviews and forums. To deal with this task, we explore several machine learning approaches, i.e., Random Forest (RF), Logistic Regression (LR), Multinomial Naive Bayes (MNB), Linear Support Vector Classification (LSVC), Sublinear Support Vector Classification (SSVC), Convolutional Neural Network (CNN), and Variable Length Chromosome Genetic Algorithm-Naive Bayes (VLCGA-NB). Our system obtains reasonable results of F1-Score 0.47 and 0.37 on the evaluation data in Subtask A and Subtask B, respectively. In particular, our obtained results outperform the baseline in Subtask A. Interestingly, the results seem to show that our system could perform well in classifying Non-suggestion class.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2208/
PDF	https://www.aclweb.org/anthology/S19-2208
PWC	https://paperswithcode.com/paper/dbms-ku-at-semeval-2019-task-9-exploring
Repo
Framework

$A^*$ sampling with probability matching


Title	$A^*$ sampling with probability matching
Authors	Yichi Zhou, Jun Zhu
Abstract	Probabilistic methods often need to draw samples from a nontrivial distribution. $A^$ sampling is a nice algorithm by building upon a top-down construction of a Gumbel process, where a large state space is divided into subsets and at each round $A^$ sampling selects a subset to process. However, the selection rule depends on a bound function, which can be intractable. Moreover, we show that such a selection criterion can be inefficient. This paper aims to improve $A^$ sampling by addressing these issues. To design a suitable selection rule, we apply \emph{Probability Matching}, a widely used method for decision making, to $A^$ sampling. We provide insights into the relationship between $A^$ sampling and probability matching by analyzing a nontrivial special case in which the state space is partitioned into two subsets. We show that in this case probability matching is optimal within a constant gap. Furthermore, as directly applying probability matching to $A^$ sampling is time consuming, we design an approximate version based on Monte-Carlo estimators. We also present an efficient implementation by leveraging special properties of Gumbel distributions and well-designed balanced trees. Empirical results show that our method saves a significantly amount of computational resources on suboptimal regions compared with $A^*$ sampling.
Tasks	Decision Making
Published	2019-05-01
URL	https://openreview.net/forum?id=HygQro05KX
PDF	https://openreview.net/pdf?id=HygQro05KX
PWC	https://paperswithcode.com/paper/a-sampling-with-probability-matching
Repo
Framework

Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains


Title	Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains
Authors	Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema
Abstract	In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models. ReStA is a variant of the popular representational similarity analysis (RSA) in cognitive neuroscience. While RSA can be used to compare representations in models, model components, and human brains, ReStA compares instances of the same model, while systematically varying single model parameter. Using ReStA, we study four recent and successful neural language models, and evaluate how sensitive their internal representations are to the amount of prior context. Using RSA, we perform a systematic study of how similar the representational spaces in the first and second (or higher) layers of these models are to each other and to patterns of activation in the human brain. Our results reveal surprisingly strong differences between language models, and give insights into where the deep linguistic processing, that integrates information over multiple sentences, is happening in these models. The combination of ReStA and RSA on models and brains allows us to start addressing the important question of what kind of linguistic processes we can hope to observe in fMRI brain imaging data. In particular, our results suggest that the data on story reading from Wehbe et al./ (2014) contains a signal of shallow linguistic processing, but show no evidence on the more interesting deep linguistic processing.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4820/
PDF	https://www.aclweb.org/anthology/W19-4820
PWC	https://paperswithcode.com/paper/blackbox-meets-blackbox-representational-1
Repo
Framework

NETWORK COMPRESSION USING CORRELATION ANALYSIS OF LAYER RESPONSES


Title	NETWORK COMPRESSION USING CORRELATION ANALYSIS OF LAYER RESPONSES
Authors	Xavier Suau, Luca Zappella, Nicholas Apostoloff
Abstract	Principal Filter Analysis (PFA) is an easy to implement, yet effective method for neural network compression. PFA exploits the intrinsic correlation between filter responses within network layers to recommend a smaller network footprint. We propose two compression algorithms: the first allows a user to specify the proportion of the original spectral energy that should be preserved in each layer after compression, while the second is a heuristic that leads to a parameter-free approach that automatically selects the compression used at each layer. Both algorithms are evaluated against several architectures and datasets, and we show considerable compression rates without compromising accuracy, e.g., for VGG-16 on CIFAR-10, CIFAR-100 and ImageNet, PFA achieves a compression rate of 8x, 3x, and 1.4x with an accuracy gain of 0.4%, 1.4% points, and 2.4% respectively. In our tests we also demonstrate that networks compressed with PFA achieve an accuracy that is very close to the empirical upper bound for a given compression ratio. Finally, we show how PFA is an effective tool for simultaneous compression and domain adaptation.
Tasks	Domain Adaptation, Neural Network Compression
Published	2019-05-01
URL	https://openreview.net/forum?id=rkl42iA5t7
PDF	https://openreview.net/pdf?id=rkl42iA5t7
PWC	https://paperswithcode.com/paper/network-compression-using-correlation-1
Repo
Framework

Neural Network Cost Landscapes as Quantum States


Title	Neural Network Cost Landscapes as Quantum States
Authors	Abdulah Fawaz, Sebastien Piat, Paul Klein, Peter Mountney, Simone Severini
Abstract	Quantum computers promise significant advantages over classical computers for a number of different applications. We show that the complete loss function landscape of a neural network can be represented as the quantum state output by a quantum computer. We demonstrate this explicitly for a binary neural network and, further, show how a quantum computer can train the network by manipulating this state using a well-known algorithm known as quantum amplitude amplification. We further show that with minor adaptation, this method can also represent the meta-loss landscape of a number of neural network architectures simultaneously. We search this meta-loss landscape with the same method to simultaneously train and design a binary neural network.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=SyxvSiCcFQ
PDF	https://openreview.net/pdf?id=SyxvSiCcFQ
PWC	https://paperswithcode.com/paper/neural-network-cost-landscapes-as-quantum
Repo
Framework

Infinitely Deep Infinite-Width Networks


Title	Infinitely Deep Infinite-Width Networks
Authors	Jovana Mitrovic, Peter Wirnsberger, Charles Blundell, Dino Sejdinovic, Yee Whye Teh
Abstract	Infinite-width neural networks have been extensively used to study the theoretical properties underlying the extraordinary empirical success of standard, finite-width neural networks. Nevertheless, until now, infinite-width networks have been limited to at most two hidden layers. To address this shortcoming, we study the initialisation requirements of these networks and show that the main challenge for constructing them is defining the appropriate sampling distributions for the weights. Based on these observations, we propose a principled approach to weight initialisation that correctly accounts for the functional nature of the hidden layer activations and facilitates the construction of arbitrarily many infinite-width layers, thus enabling the construction of arbitrarily deep infinite-width networks. The main idea of our approach is to iteratively reparametrise the hidden-layer activations into appropriately defined reproducing kernel Hilbert spaces and use the canonical way of constructing probability distributions over these spaces for specifying the required weight distributions in a principled way. Furthermore, we examine the practical implications of this construction for standard, finite-width networks. In particular, we derive a novel weight initialisation scheme for standard, finite-width networks that takes into account the structure of the data and information about the task at hand. We demonstrate the effectiveness of this weight initialisation approach on the MNIST, CIFAR-10 and Year Prediction MSD datasets.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=SkGT6sRcFX
PDF	https://openreview.net/pdf?id=SkGT6sRcFX
PWC	https://paperswithcode.com/paper/infinitely-deep-infinite-width-networks
Repo
Framework

Training Variational Auto Encoders with Discrete Latent Representations using Importance Sampling


Title	Training Variational Auto Encoders with Discrete Latent Representations using Importance Sampling
Authors	Alexander Bartler, Felix Wiewel, Bin Yang, Lukas Mauch
Abstract	The Variational Auto Encoder (VAE) is a popular generative latent variable model that is often applied for representation learning. Standard VAEs assume continuous valued latent variables and are trained by maximization of the evidence lower bound (ELBO). Conventional methods obtain a differentiable estimate of the ELBO with reparametrized sampling and optimize it with Stochastic Gradient Descend (SGD). However, this is not possible if we want to train VAEs with discrete valued latent variables, since reparametrized sampling is not possible. Till now, there exist no simple solutions to circumvent this problem. In this paper, we propose an easy method to train VAEs with binary or categorically valued latent representations. Therefore, we use a differentiable estimator for the ELBO which is based on importance sampling. In experiments, we verify the approach and train two different VAEs architectures with Bernoulli and Categorically distributed latent representations on two different benchmark datasets.
Tasks	Representation Learning
Published	2019-05-01
URL	https://openreview.net/forum?id=SkNSOjR9Y7
PDF	https://openreview.net/pdf?id=SkNSOjR9Y7
PWC	https://paperswithcode.com/paper/training-variational-auto-encoders-with
Repo
Framework

Deep reinforcement learning with relational inductive biases


Title	Deep reinforcement learning with relational inductive biases
Authors	Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia
Abstract	We introduce an approach for augmenting model-free deep reinforcement learning agents with a mechanism for relational reasoning over structured representations, which improves performance, learning efficiency, generalization, and interpretability. Our architecture encodes an image as a set of vectors, and applies an iterative message-passing procedure to discover and reason about relevant entities and relations in a scene. In six of seven StarCraft II Learning Environment mini-games, our agent achieved state-of-the-art performance, and surpassed human grandmaster-level on four. In a novel navigation and planning task, our agent’s performance and learning efficiency far exceeded non-relational baselines, it was able to generalize to more complex scenes than it had experienced during training. Moreover, when we examined its learned internal representations, they reflected important structure about the problem and the agent’s intentions. The main contribution of this work is to introduce techniques for representing and reasoning about states in model-free deep reinforcement learning agents via relational inductive biases. Our experiments show this approach can offer advantages in efficiency, generalization, and interpretability, and can scale up to meet some of the most challenging test environments in modern artificial intelligence.
Tasks	Relational Reasoning, Starcraft, Starcraft II
Published	2019-05-01
URL	https://openreview.net/forum?id=HkxaFoC9KQ
PDF	https://openreview.net/pdf?id=HkxaFoC9KQ
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-with-relational
Repo
Framework

Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification


Title	Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification
Authors	Juan Manuel P{'e}rez, Franco M. Luque
Abstract	In this article, we describe our participation in HatEval, a shared task aimed at the detection of hate speech against immigrants and women. We focused on Spanish subtasks, building from our previous experiences on sentiment analysis in this language. We trained linear classifiers and Recurrent Neural Networks, using classic features, such as bag-of-words, bag-of-characters, and word embeddings, and also with recent techniques such as contextualized word representations. In particular, we trained robust task-oriented subword-aware embeddings and computed tweet representations using a weighted-averaging strategy. In the final evaluation, our systems showed competitive results for both Spanish subtasks ES-A and ES-B, achieving the first and fourth places respectively.
Tasks	Sentiment Analysis, Word Embeddings
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2008/
PDF	https://www.aclweb.org/anthology/S19-2008
PWC	https://paperswithcode.com/paper/atalaya-at-semeval-2019-task-5-robust
Repo
Framework

An Exhaustive Analysis of Lazy vs. Eager Learning Methods for Real-Estate Property Investment


Title	An Exhaustive Analysis of Lazy vs. Eager Learning Methods for Real-Estate Property Investment
Authors	Setareh Rafatirad, Maryam Heidari
Abstract	Accurate rent prediction in real estate investment can help in generating capital gains and guaranty a financial success. In this paper, we carry out a comprehensive analysis and study of eleven machine learning algorithms for rent prediction, including Linear Regression, Multilayer Perceptron, Random Forest, KNN, ML-KNN, Locally Weighted Learning, SMO, SVM, J48, lazy Decision Tree (i.e., lazy DT), and KStar algorithms. Our contribution in this paper is twofold: (1) We present a comprehensive analysis of internal and external attributes of a real-estate housing dataset and their correlation with rental prices. (2) We use rental prediction as a platform to study and compare the performance of eager vs. lazy machine learning methods using myriad of ML algorithms. We train our rent prediction models using a Zillow data set of 4K real estate properties in Virginia State of the US, including three house types of single-family, townhouse, and condo. Each data instance in the dataset has 21 internal attributes (e.g., area space, price, number of bed/bath, rent, school rating, so forth). In addition to Zillow data, external attributes like walk/transit score, and crime rate are collected from online data sources. A subset of the collected features - determined by the PCA technique- are selected to tune the parameters of the prediction models. We employ a hierarchical clustering approach to cluster the data based on two factors of house type, and average rent estimate of zip codes. We evaluate and compare the efficacy of the tuned prediction models based on two metrics of R-squared and Mean Absolute Error, applied on unseen data. Based on our study, lazy models like KStar lead to higher accuracy and lower prediction error compared to eager methods like J48 and LR. However, it is not necessarily found to be an overarching conclusion drawn from the comparison between all the lazy and eager methods in this work.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=r1ge8sCqFX
PDF	https://openreview.net/pdf?id=r1ge8sCqFX
PWC	https://paperswithcode.com/paper/an-exhaustive-analysis-of-lazy-vs-eager
Repo
Framework

Comparison Against Task Driven Artificial Neural Networks Reveals Functional Properties in Mouse Visual Cortex


Title	Comparison Against Task Driven Artificial Neural Networks Reveals Functional Properties in Mouse Visual Cortex
Authors	Jianghong Shi, Eric Shea-Brown, Michael Buice
Abstract	Partially inspired by features of computation in visual cortex, deep neural networks compute hierarchical representations of their inputs. While these networks have been highly successful in machine learning, it is still unclear to what extent they can aid our understanding of cortical function. Several groups have developed metrics that provide a quantitative comparison between representations computed by networks and representations measured in cortex. At the same time, neuroscience is well into an unprecedented phase of large-scale data collection, as evidenced by projects such as the Allen Brain Observatory. Despite the magnitude of these efforts, in a given experiment only a fraction of units are recorded, limiting the information available about the cortical representation. Moreover, only a finite number of stimuli can be shown to an animal over the course of a realistic experiment. These limitations raise the question of how and whether metrics that compare representations of deep networks are meaningful on these data sets. Here, we empirically quantify the capabilities and limitations of these metrics due to limited image and neuron sample spaces. We find that the comparison procedure is robust to different choices of stimuli set and the level of sub-sampling that one might expect in a large scale brain survey with thousands of neurons. Using these results, we compare the representations measured in the Allen Brain Observatory in response to natural image presentations. We show that the visual cortical areas are relatively high order representations (in that they map to deeper layers of convolutional neural networks). Furthermore, we see evidence of a broad, more parallel organization rather than a sequential hierarchy, with the primary area VisP (V1) being lower order relative to the other areas.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8813-comparison-against-task-driven-artificial-neural-networks-reveals-functional-properties-in-mouse-visual-cortex
PDF	http://papers.nips.cc/paper/8813-comparison-against-task-driven-artificial-neural-networks-reveals-functional-properties-in-mouse-visual-cortex.pdf
PWC	https://paperswithcode.com/paper/comparison-against-task-driven-artificial
Repo
Framework

Human Action Recognition Based on Spatial-Temporal Attention


Title	Human Action Recognition Based on Spatial-Temporal Attention
Authors	Wensong Chan, Zhiqiang Tian, Xuguang Lan
Abstract	Many state-of-the-art methods of recognizing human action are based on attention mechanism, which shows the importance of attention mechanism in action recognition. With the rapid development of neural networks, human action recognition has been achieved great improvement by using convolutional neural networks (CNN) or recurrent neural networks (RNN). In this paper, we propose a model based on spatial-temporal attention weighted LSTM. This model pays attention to the key part in each video frame, and also focuses on the important frames in each video sequence, thus the most important theme for our model is how to find out the key point spatially and the key frames temporally. We show a feasible architecture which can solve those two problems effectively and achieve a satisfactory result. Our model is trained and tested on three datasets including UCF-11, UCF-101, and HMDB51. Those results demonstrate a high performance of our model in human action recognition.
Tasks	Temporal Action Localization
Published	2019-05-01
URL	https://openreview.net/forum?id=Byx7LjRcYm
PDF	https://openreview.net/pdf?id=Byx7LjRcYm
PWC	https://paperswithcode.com/paper/human-action-recognition-based-on-spatial
Repo
Framework

Query selection methods for automated corpora construction with a use case in food-drug interactions


Title	Query selection methods for automated corpora construction with a use case in food-drug interactions
Authors	Georgeta Bordea, R, Tsanta riatsitohaina, Fleur Mougin, Natalia Grabar, Thierry Hamon
Abstract	In this paper, we address the problem of automatically constructing a relevant corpus of scientific articles about food-drug interactions. There is a growing number of scientific publications that describe food-drug interactions but currently building a high-coverage corpus that can be used for information extraction purposes is not trivial. We investigate several methods for automating the query selection process using an expert-curated corpus of food-drug interactions. Our experiments show that index term features along with a decision tree classifier are the best approach for this task and that feature selection approaches and in particular gain ratio outperform frequency-based methods for query selection.
Tasks	Feature Selection
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5013/
PDF	https://www.aclweb.org/anthology/W19-5013
PWC	https://paperswithcode.com/paper/query-selection-methods-for-automated-corpora
Repo
Framework

Stability of Stochastic Gradient Method with Momentum for Strongly Convex Loss Functions


Title	Stability of Stochastic Gradient Method with Momentum for Strongly Convex Loss Functions
Authors	Ali Ramezani-Kebrya, Ashish Khisti, and Ben Liang
Abstract	While momentum-based methods, in conjunction with the stochastic gradient descent, are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In practice, the momentum parameter is often chosen in a heuristic fashion with little theoretical guidance. In this work, we use the framework of algorithmic stability to provide an upper-bound on the generalization error for the class of strongly convex loss functions, under mild technical assumptions. Our bound decays to zero inversely with the size of the training set, and increases as the momentum parameter is increased. We also develop an upper-bound on the expected true risk, in terms of the number of training steps, the size of the training set, and the momentum parameter.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=S1lwRjR9YX
PDF	https://openreview.net/pdf?id=S1lwRjR9YX
PWC	https://paperswithcode.com/paper/stability-of-stochastic-gradient-method-with
Repo
Framework