January 27, 2020

3228 words 16 mins read

Paper Group ANR 1082

Designing Game of Theorems. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning. Grouping Capsules Based Different Types. 3D Hand Shape and Pose from Images in the Wild. StacNAS: Towards Stable and Consistent Differentiable Neural Architec …

Designing Game of Theorems


Title	Designing Game of Theorems
Authors	Yutaka Nagashima
Abstract	“Theorem proving is similar to the game of Go. So, we can probably improve our provers using deep learning, like DeepMind built the super-human computer Go program, AlphaGo.” Such optimism has been observed among participants of AITP2017. But is theorem proving really similar to Go? In this paper, we first identify the similarities and differences between them and then propose a system in which various provers keep competing against each other and changing themselves until they prove conjectures provided by users.
Tasks	Automated Theorem Proving, Game of Go
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08549v1
PDF	https://arxiv.org/pdf/1906.08549v1.pdf
PWC	https://paperswithcode.com/paper/designing-game-of-theorems
Repo
Framework

Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering


Title	Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering
Authors	Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, Bing Xiang
Abstract	BERT model has been successfully applied to open-domain QA tasks. However, previous work trains BERT by viewing passages corresponding to the same question as independent training instances, which may cause incomparable scores for answers from different passages. To tackle this issue, we propose a multi-passage BERT model to globally normalize answer scores across all passages of the same question, and this change enables our QA model find better answers by utilizing more passages. In addition, we find that splitting articles into passages with the length of 100 words by sliding window improves performance by 4%. By leveraging a passage ranker to select high-quality passages, multi-passage BERT gains additional 2%. Experiments on four standard benchmarks showed that our multi-passage BERT outperforms all state-of-the-art models on all benchmarks. In particular, on the OpenSQuAD dataset, our model gains 21.4% EM and 21.5% $F_1$ over all non-BERT models, and 5.8% EM and 6.5% $F_1$ over BERT-based models.
Tasks	Open-Domain Question Answering, Question Answering
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08167v2
PDF	https://arxiv.org/pdf/1908.08167v2.pdf
PWC	https://paperswithcode.com/paper/multi-passage-bert-a-globally-normalized-bert
Repo
Framework

TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning


Title	TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning
Authors	Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, Shanqing Cai
Abstract	TensorFlow Eager is a multi-stage, Python-embedded domain-specific language for hardware-accelerated machine learning, suitable for both interactive research and production. TensorFlow, which TensorFlow Eager extends, requires users to represent computations as dataflow graphs; this permits compiler optimizations and simplifies deployment but hinders rapid prototyping and run-time dynamism. TensorFlow Eager eliminates these usability costs without sacrificing the benefits furnished by graphs: It provides an imperative front-end to TensorFlow that executes operations immediately and a JIT tracer that translates Python functions composed of TensorFlow operations into executable dataflow graphs. TensorFlow Eager thus offers a multi-stage programming model that makes it easy to interpolate between imperative and staged execution in a single package.
Tasks
Published	2019-02-27
URL	http://arxiv.org/abs/1903.01855v1
PDF	http://arxiv.org/pdf/1903.01855v1.pdf
PWC	https://paperswithcode.com/paper/tensorflow-eager-a-multi-stage-python
Repo
Framework

Grouping Capsules Based Different Types


Title	Grouping Capsules Based Different Types
Authors	Qiang Ren
Abstract	Capsule network was introduced as a new architecture of neural networks, it encoding features as capsules to overcome the lacking of equivariant in the convolutional neural networks. It uses dynamic routing algorithm to train parameters in different capsule layers, but the dynamic routing algorithm need to be improved. In this paper, we propose a novel capsule network architecture and discussed the effect of initialization method of the coupling coefficient $c_{ij}$ on the model. First, we analyze the rate of change of the initial value of $c_{ij}$ when the dynamic routing algorithm iterates. The larger the initial value of $c_{ij}$, the better effect of the model. Then, we proposed improvement that training different types of capsules by grouping capsules based different types. And this improvement can adjust the initial value of $c_{ij}$ to make it more suitable. We experimented with our improvements on some computer vision datasets and achieved better results than the original capsule network
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04820v1
PDF	https://arxiv.org/pdf/1911.04820v1.pdf
PWC	https://paperswithcode.com/paper/grouping-capsules-based-different-types
Repo
Framework

3D Hand Shape and Pose from Images in the Wild


Title	3D Hand Shape and Pose from Images in the Wild
Authors	Adnane Boukhayma, Rodrigo de Bem, Philip H. S. Torr
Abstract	We present in this work the first end-to-end deep learning based method that predicts both 3D hand shape and pose from RGB images in the wild. Our network consists of the concatenation of a deep convolutional encoder, and a fixed model-based decoder. Given an input image, and optionally 2D joint detections obtained from an independent CNN, the encoder predicts a set of hand and view parameters. The decoder has two components: A pre-computed articulated mesh deformation hand model that generates a 3D mesh from the hand parameters, and a re-projection module controlled by the view parameters that projects the generated hand into the image domain. We show that using the shape and pose prior knowledge encoded in the hand model within a deep learning framework yields state-of-the-art performance in 3D pose prediction from images on standard benchmarks, and produces geometrically valid and plausible 3D reconstructions. Additionally, we show that training with weak supervision in the form of 2D joint annotations on datasets of images in the wild, in conjunction with full supervision in the form of 3D joint annotations on limited available datasets allows for good generalization to 3D shape and pose predictions on images in the wild.
Tasks	Pose Prediction
Published	2019-02-09
URL	http://arxiv.org/abs/1902.03451v1
PDF	http://arxiv.org/pdf/1902.03451v1.pdf
PWC	https://paperswithcode.com/paper/3d-hand-shape-and-pose-from-images-in-the
Repo
Framework

StacNAS: Towards Stable and Consistent Differentiable Neural Architecture Search


Title	StacNAS: Towards Stable and Consistent Differentiable Neural Architecture Search
Authors	Guilin Li, Xing Zhang, Zitong Wang, Zhenguo Li, Tong Zhang
Abstract	Differentiable Neural Architecture Search algorithms such as DARTS have attracted much attention due to the low search cost and competitive accuracy. However, it has been observed that DARTS can be unstable, especially when applied to new problems. One cause of the instability is the difficulty of two-level optimization. In addition, we identify two other causes: (1) Multicollinearity of correlated/similar operations leads to unpredictable change of the architecture parameters during search; (2) The optimization complexity gap between the proxy search stage and the final training leads to suboptimal architectures. Based on these findings, we propose a two-stage grouped variable pruning algorithm using one-level optimization. In the first stage, the best group is activated, and in the second stage, the best operation in the activated group is selected. Extensive experiments verify the superiority of the proposed method both for accuracy and for stability. For the DARTS search space, the proposed strategy obtains state-of-the-art accuracies on CIFAR-10, CIFAR-100 and ImageNet. Code is available at https://github.com/susan0199/stacnas.
Tasks	Neural Architecture Search
Published	2019-09-26
URL	https://arxiv.org/abs/1909.11926v4
PDF	https://arxiv.org/pdf/1909.11926v4.pdf
PWC	https://paperswithcode.com/paper/stacnas-towards-stable-and-consistent
Repo
Framework

Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference


Title	Several Experiments on Investigating Pretraining and Knowledge-Enhanced Models for Natural Language Inference
Authors	Tianda Li, Xiaodan Zhu, Quan Liu, Qian Chen, Zhigang Chen, Si Wei
Abstract	Natural language inference (NLI) is among the most challenging tasks in natural language understanding. Recent work on unsupervised pretraining that leverages unsupervised signals such as language-model and sentence prediction objectives has shown to be very effective on a wide range of NLP problems. It would still be desirable to further understand how it helps NLI; e.g., if it learns artifacts in data annotation or instead learn true inference knowledge. In addition, external knowledge that does not exist in the limited amount of NLI training data may be added to NLI models in two typical ways, e.g., from human-created resources or an unsupervised pretraining paradigm. We runs several experiments here to investigate whether they help NLI in the same way, and if not,how?
Tasks	Language Modelling, Natural Language Inference
Published	2019-04-27
URL	http://arxiv.org/abs/1904.12104v1
PDF	http://arxiv.org/pdf/1904.12104v1.pdf
PWC	https://paperswithcode.com/paper/several-experiments-on-investigating
Repo
Framework

Deep Connectomics Networks: Neural Network Architectures Inspired by Neuronal Networks


Title	Deep Connectomics Networks: Neural Network Architectures Inspired by Neuronal Networks
Authors	Nicholas Roberts, Dian Ang Yap, Vinay Uday Prabhu
Abstract	The interplay between inter-neuronal network topology and cognition has been studied deeply by connectomics researchers and network scientists, which is crucial towards understanding the remarkable efficacy of biological neural networks. Curiously, the deep learning revolution that revived neural networks has not paid much attention to topological aspects. The architectures of deep neural networks (DNNs) do not resemble their biological counterparts in the topological sense. We bridge this gap by presenting initial results of Deep Connectomics Networks (DCNs) as DNNs with topologies inspired by real-world neuronal networks. We show high classification accuracy obtained by DCNs whose architecture was inspired by the biological neuronal networks of C. Elegans and the mouse visual cortex.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.08986v1
PDF	https://arxiv.org/pdf/1912.08986v1.pdf
PWC	https://paperswithcode.com/paper/deep-connectomics-networks-neural-network
Repo
Framework

Understanding Dataset Design Choices for Multi-hop Reasoning


Title	Understanding Dataset Design Choices for Multi-hop Reasoning
Authors	Jifan Chen, Greg Durrett
Abstract	Learning multi-hop reasoning has been a key challenge for reading comprehension models, leading to the design of datasets that explicitly focus on it. Ideally, a model should not be able to perform well on a multi-hop question answering task without doing multi-hop reasoning. In this paper, we investigate two recently proposed datasets, WikiHop and HotpotQA. First, we explore sentence-factored models for these tasks; by design, these models cannot do multi-hop reasoning, but they are still able to solve a large number of examples in both datasets. Furthermore, we find spurious correlations in the unmasked version of WikiHop, which make it easy to achieve high performance considering only the questions and answers. Finally, we investigate one key difference between these datasets, namely span-based vs. multiple-choice formulations of the QA task. Multiple-choice versions of both datasets can be easily gamed, and two models we examine only marginally exceed a baseline in this setting. Overall, while these datasets are useful testbeds, high-performing models may not be learning as much multi-hop reasoning as previously thought.
Tasks	Question Answering, Reading Comprehension
Published	2019-04-27
URL	http://arxiv.org/abs/1904.12106v1
PDF	http://arxiv.org/pdf/1904.12106v1.pdf
PWC	https://paperswithcode.com/paper/understanding-dataset-design-choices-for
Repo
Framework

An online supervised learning algorithm based on triple spikes for spiking neural networks


Title	An online supervised learning algorithm based on triple spikes for spiking neural networks
Authors	Guojun Chen, Xianghong Lin, Guoen Wang
Abstract	Using precise times of every spike, spiking supervised learning has more effects on complex spatial-temporal pattern than supervised learning only through neuronal firing rates. The purpose of spiking supervised learning after spatial-temporal encoding is to emit desired spike trains with precise times. Existing algorithms of spiking supervised learning have excellent performances, but mechanisms of them still have some problems, such as the limitation of neuronal types and complex computation. Based on an online regulative mechanism of biological synapses, this paper proposes an online supervised learning algorithm of multiple spike trains for spiking neural networks. The proposed algorithm with a spatial-temporal transformation can make a simple direct regulation of synaptic weights as soon as firing time of an output spike is obtained. Besides, it is also not restricted by types of spiking neuron models. Relationship among desired output, actual output and input spike trains is firstly analyzed and synthesized to simply select a unit of pair-spike for a direct regulation. And then a computational method is constructed based on simple triple spikes using this direct regulation. Compared with other learning algorithms, results of experiments show that proposed algorithm has higher learning accuracy and efficiency.
Tasks
Published	2019-01-06
URL	http://arxiv.org/abs/1901.01549v2
PDF	http://arxiv.org/pdf/1901.01549v2.pdf
PWC	https://paperswithcode.com/paper/an-online-supervised-learning-algorithm-based
Repo
Framework

The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric


Title	The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric
Authors	Nathan Kallus, Angela Zhou
Abstract	Where machine-learned predictive risk scores inform high-stakes decisions, such as bail and sentencing in criminal justice, fairness has been a serious concern. Recent work has characterized the disparate impact that such risk scores can have when used for a binary classification task. This may not account, however, for the more diverse downstream uses of risk scores and their non-binary nature. To better account for this, in this paper, we investigate the fairness of predictive risk scores from the point of view of a bipartite ranking task, where one seeks to rank positive examples higher than negative ones. We introduce the xAUC disparity as a metric to assess the disparate impact of risk scores and define it as the difference in the probabilities of ranking a random positive example from one protected group above a negative one from another group and vice versa. We provide a decomposition of bipartite ranking loss into components that involve the discrepancy and components that involve pure predictive ability within each group. We use xAUC analysis to audit predictive risk scores for recidivism prediction, income prediction, and cardiac arrest prediction, where it describes disparities that are not evident from simply comparing within-group predictive performance.
Tasks
Published	2019-02-15
URL	https://arxiv.org/abs/1902.05826v2
PDF	https://arxiv.org/pdf/1902.05826v2.pdf
PWC	https://paperswithcode.com/paper/the-fairness-of-risk-scores-beyond
Repo
Framework

Question-Agnostic Attention for Visual Question Answering


Title	Question-Agnostic Attention for Visual Question Answering
Authors	Moshiur R Farazi, Salman H Khan, Nick Barnes
Abstract	Visual Question Answering (VQA) models employ attention mechanisms to discover image locations that are most relevant for answering a specific question. For this purpose, several multimodal fusion strategies have been proposed, ranging from relatively simple operations (e.g., linear sum) to more complex ones (e.g., Block). The resulting multimodal representations define an intermediate feature space for capturing the interplay between visual and semantic features, that is helpful in selectively focusing on image content. In this paper, we propose a question-agnostic attention mechanism that is complementary to the existing question-dependent attention mechanisms. Our proposed model parses object instances to obtain an `object map’ and applies this map on the visual features to generate Question-Agnostic Attention (QAA) features. In contrast to question-dependent attention approaches that are learned end-to-end, the proposed QAA does not involve question-specific training, and can be easily included in almost any existing VQA model as a generic light-weight pre-processing step, thereby adding minimal computation overhead for training. Further, when used in complement with the question-dependent attention, the QAA allows the model to focus on the regions containing objects that might have been overlooked by the learned attention representation. Through extensive evaluation on VQAv1, VQAv2 and TDIUC datasets, we show that incorporating complementary QAA allows state-of-the-art VQA models to perform better, and provides significant boost to simplistic VQA models, enabling them to performance on par with highly sophisticated fusion strategies. \|
Tasks	Question Answering, Visual Question Answering
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03289v1
PDF	https://arxiv.org/pdf/1908.03289v1.pdf
PWC	https://paperswithcode.com/paper/question-agnostic-attention-for-visual
Repo
Framework

Scalable Bayesian Non-linear Matrix Completion


Title	Scalable Bayesian Non-linear Matrix Completion
Authors	Xiangju Qin, Paul Blomstedt, Samuel Kaski
Abstract	Matrix completion aims to predict missing elements in a partially observed data matrix which in typical applications, such as collaborative filtering, is large and extremely sparsely observed. A standard solution is matrix factorization, which predicts unobserved entries as linear combinations of latent variables. We generalize to non-linear combinations in massive-scale matrices. Bayesian approaches have been proven beneficial in linear matrix completion, but not applied in the more general non-linear case, due to limited scalability. We introduce a Bayesian non-linear matrix completion algorithm, which is based on a recent Bayesian formulation of Gaussian process latent variable models. To solve the challenges regarding scalability and computation, we propose a data-parallel distributed computational approach with a restricted communication scheme. We evaluate our method on challenging out-of-matrix prediction tasks using both simulated and real-world data.
Tasks	Latent Variable Models, Matrix Completion
Published	2019-07-31
URL	https://arxiv.org/abs/1908.01009v1
PDF	https://arxiv.org/pdf/1908.01009v1.pdf
PWC	https://paperswithcode.com/paper/scalable-bayesian-non-linear-matrix
Repo
Framework

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition


Title	Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition
Authors	Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
Abstract	This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take a supervised approach that classifies each time-frequency (TF) bin into noise or speech by training a deep neural network (DNN). The performance of ASR, however, is degraded in an unknown noisy environment. To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF). This enables us to accurately estimate the SCMs of speech and noise not from observed noisy mixtures but from separated speech and noise components. In this paper we propose online MVDR beamforming by effectively initializing and incrementally updating the parameters of MNMF. Another main contribution is to comprehensively investigate the performances of ASR obtained by various types of spatial filters, i.e., time-invariant and variant versions of MVDR beamformers and those of rank-1 and full-rank multichannel Wiener filters, in combination with MNMF. The experimental results showed that the proposed method outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.
Tasks	Speech Enhancement, Speech Recognition
Published	2019-03-22
URL	http://arxiv.org/abs/1903.09341v2
PDF	http://arxiv.org/pdf/1903.09341v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-speech-enhancement-based-on
Repo
Framework

Can Machine Learning Identify Governing Laws For Dynamics in Complex Engineered Systems ? : A Study in Chemical Engineering


Title	Can Machine Learning Identify Governing Laws For Dynamics in Complex Engineered Systems ? : A Study in Chemical Engineering
Authors	Renganathan Subramanian, Shweta Singh
Abstract	Machine learning recently has been used to identify the governing equations for dynamics in physical systems. The promising results from applications on systems such as fluid dynamics and chemical kinetics inspire further investigation of these methods on complex engineered systems. Dynamics of these systems play a crucial role in design and operations. Hence, it would be advantageous to learn about the mechanisms that may be driving the complex dynamics of systems. In this work, our research question was aimed at addressing this open question about applicability and usefulness of novel machine learning approach in identifying the governing dynamical equations for engineered systems. We focused on distillation column which is an ubiquitous unit operation in chemical engineering and demonstrates complex dynamics i.e. it’s dynamics is a combination of heuristics and fundamental physical laws. We tested the method of Sparse Identification of Non-Linear Dynamics (SINDy) because of it’s ability to produce white-box models with terms that can be used for physical interpretation of dynamics. Time series data for dynamics was generated from simulation of distillation column using ASPEN Dynamics. One promising result was reduction of number of equations for dynamic simulation from 1000s in ASPEN to only 13 - one for each state variable. Prediction accuracy was high on the test data from system within the perturbation range, however outside perturbation range equations did not perform well. In terms of physical law extraction, some terms were interpretable as related to Fick’s law of diffusion (with concentration terms) and Henry’s law (with ratio of concentration and pressure terms). While some terms were interpretable, we conclude that more research is needed on combining engineering systems with machine learning approach to improve understanding of governing laws for unknown dynamics.
Tasks	Time Series
Published	2019-07-18
URL	https://arxiv.org/abs/1907.07755v1
PDF	https://arxiv.org/pdf/1907.07755v1.pdf
PWC	https://paperswithcode.com/paper/can-machine-learning-identify-governing-laws
Repo
Framework