Paper Group NANR 116
VHEGAN: Variational Hetero-Encoder Randomized GAN for Zero-Shot Learning. DBMS-KU at SemEval-2019 Task 9: Exploring Machine Learning Approaches in Classifying Text as Suggestion or Non-Suggestion. $A^*$ sampling with probability matching. Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains …
VHEGAN: Variational Hetero-Encoder Randomized GAN for Zero-Shot Learning
Title | VHEGAN: Variational Hetero-Encoder Randomized GAN for Zero-Shot Learning |
Authors | Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou |
Abstract | To extract and relate visual and linguistic concepts from images and textual descriptions for text-based zero-shot learning (ZSL), we develop variational hetero-encoder (VHE) that decodes text via a deep probabilisitic topic model, the variational posterior of whose local latent variables is encoded from an image via a Weibull distribution based inference network. To further improve VHE and add an image generator, we propose VHE randomized generative adversarial net (VHEGAN) that exploits the synergy between VHE and GAN through their shared latent space. After training with a hybrid stochastic-gradient MCMC/variational inference/stochastic gradient descent inference algorithm, VHEGAN can be used in a variety of settings, such as text generation/retrieval conditioning on an image, image generation/retrieval conditioning on a document/image, and generation of text-image pairs. The efficacy of VHEGAN is demonstrated quantitatively with experiments on both conventional and generalized ZSL tasks, and qualitatively on (conditional) image and/or text generation/retrieval. |
Tasks | Image Generation, Text Generation, Zero-Shot Learning |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=S1eX-nA5KX |
https://openreview.net/pdf?id=S1eX-nA5KX | |
PWC | https://paperswithcode.com/paper/vhegan-variational-hetero-encoder-randomized |
Repo | |
Framework | |
DBMS-KU at SemEval-2019 Task 9: Exploring Machine Learning Approaches in Classifying Text as Suggestion or Non-Suggestion
Title | DBMS-KU at SemEval-2019 Task 9: Exploring Machine Learning Approaches in Classifying Text as Suggestion or Non-Suggestion |
Authors | Tirana Fatyanosa, Al Hafiz Akbar Maulana Siagian, Masayoshi Aritsugi |
Abstract | This paper describes the participation of DBMS-KU team in the SemEval 2019 Task 9, that is, suggestion mining from online reviews and forums. To deal with this task, we explore several machine learning approaches, i.e., Random Forest (RF), Logistic Regression (LR), Multinomial Naive Bayes (MNB), Linear Support Vector Classification (LSVC), Sublinear Support Vector Classification (SSVC), Convolutional Neural Network (CNN), and Variable Length Chromosome Genetic Algorithm-Naive Bayes (VLCGA-NB). Our system obtains reasonable results of F1-Score 0.47 and 0.37 on the evaluation data in Subtask A and Subtask B, respectively. In particular, our obtained results outperform the baseline in Subtask A. Interestingly, the results seem to show that our system could perform well in classifying Non-suggestion class. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2208/ |
https://www.aclweb.org/anthology/S19-2208 | |
PWC | https://paperswithcode.com/paper/dbms-ku-at-semeval-2019-task-9-exploring |
Repo | |
Framework | |
$A^*$ sampling with probability matching
Title | $A^*$ sampling with probability matching |
Authors | Yichi Zhou, Jun Zhu |
Abstract | Probabilistic methods often need to draw samples from a nontrivial distribution. $A^$ sampling is a nice algorithm by building upon a top-down construction of a Gumbel process, where a large state space is divided into subsets and at each round $A^$ sampling selects a subset to process. However, the selection rule depends on a bound function, which can be intractable. Moreover, we show that such a selection criterion can be inefficient. This paper aims to improve $A^$ sampling by addressing these issues. To design a suitable selection rule, we apply \emph{Probability Matching}, a widely used method for decision making, to $A^$ sampling. We provide insights into the relationship between $A^$ sampling and probability matching by analyzing a nontrivial special case in which the state space is partitioned into two subsets. We show that in this case probability matching is optimal within a constant gap. Furthermore, as directly applying probability matching to $A^$ sampling is time consuming, we design an approximate version based on Monte-Carlo estimators. We also present an efficient implementation by leveraging special properties of Gumbel distributions and well-designed balanced trees. Empirical results show that our method saves a significantly amount of computational resources on suboptimal regions compared with $A^*$ sampling. |
Tasks | Decision Making |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HygQro05KX |
https://openreview.net/pdf?id=HygQro05KX | |
PWC | https://paperswithcode.com/paper/a-sampling-with-probability-matching |
Repo | |
Framework | |
Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains
Title | Blackbox Meets Blackbox: Representational Similarity & Stability Analysis of Neural Language Models and Brains |
Authors | Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema |
Abstract | In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models. ReStA is a variant of the popular representational similarity analysis (RSA) in cognitive neuroscience. While RSA can be used to compare representations in models, model components, and human brains, ReStA compares instances of the same model, while systematically varying single model parameter. Using ReStA, we study four recent and successful neural language models, and evaluate how sensitive their internal representations are to the amount of prior context. Using RSA, we perform a systematic study of how similar the representational spaces in the first and second (or higher) layers of these models are to each other and to patterns of activation in the human brain. Our results reveal surprisingly strong differences between language models, and give insights into where the deep linguistic processing, that integrates information over multiple sentences, is happening in these models. The combination of ReStA and RSA on models and brains allows us to start addressing the important question of what kind of linguistic processes we can hope to observe in fMRI brain imaging data. In particular, our results suggest that the data on story reading from Wehbe et al./ (2014) contains a signal of shallow linguistic processing, but show no evidence on the more interesting deep linguistic processing. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4820/ |
https://www.aclweb.org/anthology/W19-4820 | |
PWC | https://paperswithcode.com/paper/blackbox-meets-blackbox-representational-1 |
Repo | |
Framework | |
NETWORK COMPRESSION USING CORRELATION ANALYSIS OF LAYER RESPONSES
Title | NETWORK COMPRESSION USING CORRELATION ANALYSIS OF LAYER RESPONSES |
Authors | Xavier Suau, Luca Zappella, Nicholas Apostoloff |
Abstract | Principal Filter Analysis (PFA) is an easy to implement, yet effective method for neural network compression. PFA exploits the intrinsic correlation between filter responses within network layers to recommend a smaller network footprint. We propose two compression algorithms: the first allows a user to specify the proportion of the original spectral energy that should be preserved in each layer after compression, while the second is a heuristic that leads to a parameter-free approach that automatically selects the compression used at each layer. Both algorithms are evaluated against several architectures and datasets, and we show considerable compression rates without compromising accuracy, e.g., for VGG-16 on CIFAR-10, CIFAR-100 and ImageNet, PFA achieves a compression rate of 8x, 3x, and 1.4x with an accuracy gain of 0.4%, 1.4% points, and 2.4% respectively. In our tests we also demonstrate that networks compressed with PFA achieve an accuracy that is very close to the empirical upper bound for a given compression ratio. Finally, we show how PFA is an effective tool for simultaneous compression and domain adaptation. |
Tasks | Domain Adaptation, Neural Network Compression |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rkl42iA5t7 |
https://openreview.net/pdf?id=rkl42iA5t7 | |
PWC | https://paperswithcode.com/paper/network-compression-using-correlation-1 |
Repo | |
Framework | |
Neural Network Cost Landscapes as Quantum States
Title | Neural Network Cost Landscapes as Quantum States |
Authors | Abdulah Fawaz, Sebastien Piat, Paul Klein, Peter Mountney, Simone Severini |
Abstract | Quantum computers promise significant advantages over classical computers for a number of different applications. We show that the complete loss function landscape of a neural network can be represented as the quantum state output by a quantum computer. We demonstrate this explicitly for a binary neural network and, further, show how a quantum computer can train the network by manipulating this state using a well-known algorithm known as quantum amplitude amplification. We further show that with minor adaptation, this method can also represent the meta-loss landscape of a number of neural network architectures simultaneously. We search this meta-loss landscape with the same method to simultaneously train and design a binary neural network. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SyxvSiCcFQ |
https://openreview.net/pdf?id=SyxvSiCcFQ | |
PWC | https://paperswithcode.com/paper/neural-network-cost-landscapes-as-quantum |
Repo | |
Framework | |
Infinitely Deep Infinite-Width Networks
Title | Infinitely Deep Infinite-Width Networks |
Authors | Jovana Mitrovic, Peter Wirnsberger, Charles Blundell, Dino Sejdinovic, Yee Whye Teh |
Abstract | Infinite-width neural networks have been extensively used to study the theoretical properties underlying the extraordinary empirical success of standard, finite-width neural networks. Nevertheless, until now, infinite-width networks have been limited to at most two hidden layers. To address this shortcoming, we study the initialisation requirements of these networks and show that the main challenge for constructing them is defining the appropriate sampling distributions for the weights. Based on these observations, we propose a principled approach to weight initialisation that correctly accounts for the functional nature of the hidden layer activations and facilitates the construction of arbitrarily many infinite-width layers, thus enabling the construction of arbitrarily deep infinite-width networks. The main idea of our approach is to iteratively reparametrise the hidden-layer activations into appropriately defined reproducing kernel Hilbert spaces and use the canonical way of constructing probability distributions over these spaces for specifying the required weight distributions in a principled way. Furthermore, we examine the practical implications of this construction for standard, finite-width networks. In particular, we derive a novel weight initialisation scheme for standard, finite-width networks that takes into account the structure of the data and information about the task at hand. We demonstrate the effectiveness of this weight initialisation approach on the MNIST, CIFAR-10 and Year Prediction MSD datasets. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SkGT6sRcFX |
https://openreview.net/pdf?id=SkGT6sRcFX | |
PWC | https://paperswithcode.com/paper/infinitely-deep-infinite-width-networks |
Repo | |
Framework | |
Training Variational Auto Encoders with Discrete Latent Representations using Importance Sampling
Title | Training Variational Auto Encoders with Discrete Latent Representations using Importance Sampling |
Authors | Alexander Bartler, Felix Wiewel, Bin Yang, Lukas Mauch |
Abstract | The Variational Auto Encoder (VAE) is a popular generative latent variable model that is often applied for representation learning. Standard VAEs assume continuous valued latent variables and are trained by maximization of the evidence lower bound (ELBO). Conventional methods obtain a differentiable estimate of the ELBO with reparametrized sampling and optimize it with Stochastic Gradient Descend (SGD). However, this is not possible if we want to train VAEs with discrete valued latent variables, since reparametrized sampling is not possible. Till now, there exist no simple solutions to circumvent this problem. In this paper, we propose an easy method to train VAEs with binary or categorically valued latent representations. Therefore, we use a differentiable estimator for the ELBO which is based on importance sampling. In experiments, we verify the approach and train two different VAEs architectures with Bernoulli and Categorically distributed latent representations on two different benchmark datasets. |
Tasks | Representation Learning |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SkNSOjR9Y7 |
https://openreview.net/pdf?id=SkNSOjR9Y7 | |
PWC | https://paperswithcode.com/paper/training-variational-auto-encoders-with |
Repo | |
Framework | |
Deep reinforcement learning with relational inductive biases
Title | Deep reinforcement learning with relational inductive biases |
Authors | Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia |
Abstract | We introduce an approach for augmenting model-free deep reinforcement learning agents with a mechanism for relational reasoning over structured representations, which improves performance, learning efficiency, generalization, and interpretability. Our architecture encodes an image as a set of vectors, and applies an iterative message-passing procedure to discover and reason about relevant entities and relations in a scene. In six of seven StarCraft II Learning Environment mini-games, our agent achieved state-of-the-art performance, and surpassed human grandmaster-level on four. In a novel navigation and planning task, our agent’s performance and learning efficiency far exceeded non-relational baselines, it was able to generalize to more complex scenes than it had experienced during training. Moreover, when we examined its learned internal representations, they reflected important structure about the problem and the agent’s intentions. The main contribution of this work is to introduce techniques for representing and reasoning about states in model-free deep reinforcement learning agents via relational inductive biases. Our experiments show this approach can offer advantages in efficiency, generalization, and interpretability, and can scale up to meet some of the most challenging test environments in modern artificial intelligence. |
Tasks | Relational Reasoning, Starcraft, Starcraft II |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=HkxaFoC9KQ |
https://openreview.net/pdf?id=HkxaFoC9KQ | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-with-relational |
Repo | |
Framework | |
Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification
Title | Atalaya at SemEval 2019 Task 5: Robust Embeddings for Tweet Classification |
Authors | Juan Manuel P{'e}rez, Franco M. Luque |
Abstract | In this article, we describe our participation in HatEval, a shared task aimed at the detection of hate speech against immigrants and women. We focused on Spanish subtasks, building from our previous experiences on sentiment analysis in this language. We trained linear classifiers and Recurrent Neural Networks, using classic features, such as bag-of-words, bag-of-characters, and word embeddings, and also with recent techniques such as contextualized word representations. In particular, we trained robust task-oriented subword-aware embeddings and computed tweet representations using a weighted-averaging strategy. In the final evaluation, our systems showed competitive results for both Spanish subtasks ES-A and ES-B, achieving the first and fourth places respectively. |
Tasks | Sentiment Analysis, Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2008/ |
https://www.aclweb.org/anthology/S19-2008 | |
PWC | https://paperswithcode.com/paper/atalaya-at-semeval-2019-task-5-robust |
Repo | |
Framework | |
An Exhaustive Analysis of Lazy vs. Eager Learning Methods for Real-Estate Property Investment
Title | An Exhaustive Analysis of Lazy vs. Eager Learning Methods for Real-Estate Property Investment |
Authors | Setareh Rafatirad, Maryam Heidari |
Abstract | Accurate rent prediction in real estate investment can help in generating capital gains and guaranty a financial success. In this paper, we carry out a comprehensive analysis and study of eleven machine learning algorithms for rent prediction, including Linear Regression, Multilayer Perceptron, Random Forest, KNN, ML-KNN, Locally Weighted Learning, SMO, SVM, J48, lazy Decision Tree (i.e., lazy DT), and KStar algorithms. Our contribution in this paper is twofold: (1) We present a comprehensive analysis of internal and external attributes of a real-estate housing dataset and their correlation with rental prices. (2) We use rental prediction as a platform to study and compare the performance of eager vs. lazy machine learning methods using myriad of ML algorithms. We train our rent prediction models using a Zillow data set of 4K real estate properties in Virginia State of the US, including three house types of single-family, townhouse, and condo. Each data instance in the dataset has 21 internal attributes (e.g., area space, price, number of bed/bath, rent, school rating, so forth). In addition to Zillow data, external attributes like walk/transit score, and crime rate are collected from online data sources. A subset of the collected features - determined by the PCA technique- are selected to tune the parameters of the prediction models. We employ a hierarchical clustering approach to cluster the data based on two factors of house type, and average rent estimate of zip codes. We evaluate and compare the efficacy of the tuned prediction models based on two metrics of R-squared and Mean Absolute Error, applied on unseen data. Based on our study, lazy models like KStar lead to higher accuracy and lower prediction error compared to eager methods like J48 and LR. However, it is not necessarily found to be an overarching conclusion drawn from the comparison between all the lazy and eager methods in this work. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=r1ge8sCqFX |
https://openreview.net/pdf?id=r1ge8sCqFX | |
PWC | https://paperswithcode.com/paper/an-exhaustive-analysis-of-lazy-vs-eager |
Repo | |
Framework | |
Comparison Against Task Driven Artificial Neural Networks Reveals Functional Properties in Mouse Visual Cortex
Title | Comparison Against Task Driven Artificial Neural Networks Reveals Functional Properties in Mouse Visual Cortex |
Authors | Jianghong Shi, Eric Shea-Brown, Michael Buice |
Abstract | Partially inspired by features of computation in visual cortex, deep neural networks compute hierarchical representations of their inputs. While these networks have been highly successful in machine learning, it is still unclear to what extent they can aid our understanding of cortical function. Several groups have developed metrics that provide a quantitative comparison between representations computed by networks and representations measured in cortex. At the same time, neuroscience is well into an unprecedented phase of large-scale data collection, as evidenced by projects such as the Allen Brain Observatory. Despite the magnitude of these efforts, in a given experiment only a fraction of units are recorded, limiting the information available about the cortical representation. Moreover, only a finite number of stimuli can be shown to an animal over the course of a realistic experiment. These limitations raise the question of how and whether metrics that compare representations of deep networks are meaningful on these data sets. Here, we empirically quantify the capabilities and limitations of these metrics due to limited image and neuron sample spaces. We find that the comparison procedure is robust to different choices of stimuli set and the level of sub-sampling that one might expect in a large scale brain survey with thousands of neurons. Using these results, we compare the representations measured in the Allen Brain Observatory in response to natural image presentations. We show that the visual cortical areas are relatively high order representations (in that they map to deeper layers of convolutional neural networks). Furthermore, we see evidence of a broad, more parallel organization rather than a sequential hierarchy, with the primary area VisP (V1) being lower order relative to the other areas. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8813-comparison-against-task-driven-artificial-neural-networks-reveals-functional-properties-in-mouse-visual-cortex |
http://papers.nips.cc/paper/8813-comparison-against-task-driven-artificial-neural-networks-reveals-functional-properties-in-mouse-visual-cortex.pdf | |
PWC | https://paperswithcode.com/paper/comparison-against-task-driven-artificial |
Repo | |
Framework | |
Human Action Recognition Based on Spatial-Temporal Attention
Title | Human Action Recognition Based on Spatial-Temporal Attention |
Authors | Wensong Chan, Zhiqiang Tian, Xuguang Lan |
Abstract | Many state-of-the-art methods of recognizing human action are based on attention mechanism, which shows the importance of attention mechanism in action recognition. With the rapid development of neural networks, human action recognition has been achieved great improvement by using convolutional neural networks (CNN) or recurrent neural networks (RNN). In this paper, we propose a model based on spatial-temporal attention weighted LSTM. This model pays attention to the key part in each video frame, and also focuses on the important frames in each video sequence, thus the most important theme for our model is how to find out the key point spatially and the key frames temporally. We show a feasible architecture which can solve those two problems effectively and achieve a satisfactory result. Our model is trained and tested on three datasets including UCF-11, UCF-101, and HMDB51. Those results demonstrate a high performance of our model in human action recognition. |
Tasks | Temporal Action Localization |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=Byx7LjRcYm |
https://openreview.net/pdf?id=Byx7LjRcYm | |
PWC | https://paperswithcode.com/paper/human-action-recognition-based-on-spatial |
Repo | |
Framework | |
Query selection methods for automated corpora construction with a use case in food-drug interactions
Title | Query selection methods for automated corpora construction with a use case in food-drug interactions |
Authors | Georgeta Bordea, R, Tsanta riatsitohaina, Fleur Mougin, Natalia Grabar, Thierry Hamon |
Abstract | In this paper, we address the problem of automatically constructing a relevant corpus of scientific articles about food-drug interactions. There is a growing number of scientific publications that describe food-drug interactions but currently building a high-coverage corpus that can be used for information extraction purposes is not trivial. We investigate several methods for automating the query selection process using an expert-curated corpus of food-drug interactions. Our experiments show that index term features along with a decision tree classifier are the best approach for this task and that feature selection approaches and in particular gain ratio outperform frequency-based methods for query selection. |
Tasks | Feature Selection |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5013/ |
https://www.aclweb.org/anthology/W19-5013 | |
PWC | https://paperswithcode.com/paper/query-selection-methods-for-automated-corpora |
Repo | |
Framework | |
Stability of Stochastic Gradient Method with Momentum for Strongly Convex Loss Functions
Title | Stability of Stochastic Gradient Method with Momentum for Strongly Convex Loss Functions |
Authors | Ali Ramezani-Kebrya, Ashish Khisti, and Ben Liang |
Abstract | While momentum-based methods, in conjunction with the stochastic gradient descent, are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In practice, the momentum parameter is often chosen in a heuristic fashion with little theoretical guidance. In this work, we use the framework of algorithmic stability to provide an upper-bound on the generalization error for the class of strongly convex loss functions, under mild technical assumptions. Our bound decays to zero inversely with the size of the training set, and increases as the momentum parameter is increased. We also develop an upper-bound on the expected true risk, in terms of the number of training steps, the size of the training set, and the momentum parameter. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=S1lwRjR9YX |
https://openreview.net/pdf?id=S1lwRjR9YX | |
PWC | https://paperswithcode.com/paper/stability-of-stochastic-gradient-method-with |
Repo | |
Framework | |