January 27, 2020

3086 words 15 mins read

Paper Group ANR 1277

Data Smashing 2.0: Sequence Likelihood (SL) Divergence For Fast Time Series Comparison. Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings. Relaxed Parameter Sharing: Effectively Modeling Time-Varying Relationships in Clinical Time-Series. Multi-Task Bidirectional Transformer Representations for Irony De …

Data Smashing 2.0: Sequence Likelihood (SL) Divergence For Fast Time Series Comparison


Title	Data Smashing 2.0: Sequence Likelihood (SL) Divergence For Fast Time Series Comparison
Authors	Yi Huang, Ishanu Chattopadhyay
Abstract	Recognizing subtle historical patterns is central to modeling and forecasting problems in time series analysis. Here we introduce and develop a new approach to quantify deviations in the underlying hidden generators of observed data streams, resulting in a new efficiently computable universal metric for time series. The proposed metric is in the sense that we can compare and contrast data streams regardless of where and how they are generated and without any feature engineering step. The approach proposed in this paper is conceptually distinct from our previous work on data smashing, and vastly improves discrimination performance and computing speed. The core idea here is the generalization of the notion of KL divergence often used to compare probability distributions to a notion of divergence in time series. We call this the sequence likelihood (SL) divergence, which may be used to measure deviations within a well-defined class of discrete-valued stochastic processes. We devise efficient estimators of SL divergence from finite sample paths and subsequently formulate a universal metric useful for computing distance between time series produced by hidden stochastic generators.
Tasks	Feature Engineering, Time Series, Time Series Analysis
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12243v2
PDF	https://arxiv.org/pdf/1909.12243v2.pdf
PWC	https://paperswithcode.com/paper/data-smashing-20-sequence-likelihood-sl
Repo
Framework

Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings


Title	Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings
Authors	Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
Abstract	Since Bahdanau et al. [1] first introduced attention for neural machine translation, most sequence-to-sequence models made use of attention mechanisms [2, 3, 4]. While they produce soft-alignment matrices that could be interpreted as alignment between target and source languages, we lack metrics to quantify their quality, being unclear which approach produces the best alignments. This paper presents an empirical evaluation of 3 main sequence-to-sequence models (CNN, RNN and Transformer-based) for word discovery from unsegmented phoneme sequences. This task consists in aligning word sequences in a source language with phoneme sequences in a target language, inferring from it word segmentation on the target side [5]. Evaluating word segmentation quality can be seen as an extrinsic evaluation of the soft-alignment matrices produced during training. Our experiments in a low-resource scenario on Mboshi and English languages (both aligned to French) show that RNNs surprisingly outperform CNNs and Transformer for this task. Our results are confirmed by an intrinsic evaluation of alignment quality through the use of Average Normalized Entropy (ANE). Lastly, we improve our best word discovery model by using an alignment entropy confidence measure that accumulates ANE over all the occurrences of a given alignment pair in the collection.
Tasks	Machine Translation
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00184v2
PDF	https://arxiv.org/pdf/1907.00184v2.pdf
PWC	https://paperswithcode.com/paper/empirical-evaluation-of-sequence-to-sequence
Repo
Framework


Title	Relaxed Parameter Sharing: Effectively Modeling Time-Varying Relationships in Clinical Time-Series
Authors	Jeeheh Oh, Jiaxuan Wang, Shengpu Tang, Michael Sjoding, Jenna Wiens
Abstract	Recurrent neural networks (RNNs) are commonly applied to clinical time-series data with the goal of learning patient risk stratification models. Their effectiveness is due, in part, to their use of parameter sharing over time (i.e., cells are repeated hence the name recurrent). We hypothesize, however, that this trait also contributes to the increased difficulty such models have with learning relationships that change over time. Conditional shift, i.e., changes in the relationship between the input X and the output y, arises when risk factors associated with the event of interest change over the course of a patient admission. While in theory, RNNs and gated RNNs (e.g., LSTMs) in particular should be capable of learning time-varying relationships, when training data are limited, such models often fail to accurately capture these dynamics. We illustrate the advantages and disadvantages of complete parameter sharing (RNNs) by comparing an LSTM with shared parameters to a sequential architecture with time-varying parameters on prediction tasks involving three clinically-relevant outcomes: acute respiratory failure (ARF), shock, and in-hospital mortality. In experiments using synthetic data, we demonstrate how parameter sharing in LSTMs leads to worse performance in the presence of conditional shift. To improve upon the dichotomy between complete parameter sharing and no parameter sharing, we propose a novel RNN formulation based on a mixture model in which we relax parameter sharing over time. The proposed method outperforms standard LSTMs and other state-of-the-art baselines across all tasks. In settings with limited data, relaxed parameter sharing can lead to improved patient risk stratification performance.
Tasks	Time Series
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02898v2
PDF	https://arxiv.org/pdf/1906.02898v2.pdf
PWC	https://paperswithcode.com/paper/relaxed-weight-sharing-effectively-modeling
Repo
Framework

Multi-Task Bidirectional Transformer Representations for Irony Detection


Title	Multi-Task Bidirectional Transformer Representations for Irony Detection
Authors	Chiyu Zhang, Muhammad Abdul-Mageed
Abstract	Supervised deep learning requires large amounts of training data. In the context of the FIRE2019 Arabic irony detection shared task (IDAT@FIRE2019), we show how we mitigate this need by fine-tuning the pre-trained bidirectional encoders from transformers (BERT) on gold data in a multi-task setting. We further improve our models by by further pre-training BERT on `in-domain’ data, thus alleviating an issue of dialect mismatch in the Google-released BERT model. Our best model acquires 82.4 macro F1 score, and has the unique advantage of being feature-engineering free (i.e., based exclusively on deep learning). \|
Tasks	Feature Engineering
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03526v3
PDF	https://arxiv.org/pdf/1909.03526v3.pdf
PWC	https://paperswithcode.com/paper/multi-task-bidirectional-transformer
Repo
Framework

Natural Alpha Embeddings


Title	Natural Alpha Embeddings
Authors	Riccardo Volpi, Luigi Malagò
Abstract	Learning an embedding for a large collection of items is a popular approach to overcome the computational limitations associated to one-hot encodings. The aim of item embedding is to learn a low dimensional space for the representations, able to capture with its geometry relevant features or relationships for the data at hand. This can be achieved for example by exploiting adjacencies among items in large sets of unlabelled data. In this paper we interpret in an Information Geometric framework the item embeddings obtained from conditional models. By exploiting the $\alpha$-geometry of the exponential family, first introduced by Amari, we introduce a family of natural $\alpha$-embeddings represented by vectors in the tangent space of the probability simplex, which includes as a special case standard approaches available in the literature. A typical example is given by word embeddings, commonly used in natural language processing, such as Word2Vec and GloVe. In our analysis, we show how the $\alpha$-deformation parameter can impact on standard evaluation tasks.
Tasks	Word Embeddings
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02280v2
PDF	https://arxiv.org/pdf/1912.02280v2.pdf
PWC	https://paperswithcode.com/paper/natural-alpha-embeddings
Repo
Framework

Importance Weighted Adversarial Variational Autoencoders for Spike Inference from Calcium Imaging Data


Title	Importance Weighted Adversarial Variational Autoencoders for Spike Inference from Calcium Imaging Data
Authors	Daniel Jiwoong Im, Sridhama Prakhya, Jinyao Yan, Srinivas Turaga, Kristin Branson
Abstract	The Importance Weighted Auto Encoder (IWAE) objective has been shown to improve the training of generative models over the standard Variational Auto Encoder (VAE) objective. Here, we derive importance weighted extensions to AVB and AAE. These latent variable models use implicitly defined inference networks whose approximate posterior density q_\phi(zx) cannot be directly evaluated, an essential ingredient for importance weighting. We show improved training and inference in latent variable models with our adversarially trained importance weighting method, and derive new theoretical connections between adversarial generative model training criteria and marginal likelihood based methods. We apply these methods to the important problem of inferring spiking neural activity from calcium imaging data, a challenging posterior inference problem in neuroscience, and show that posterior samples from the adversarial methods outperform factorized posteriors used in VAEs.
Tasks	Latent Variable Models
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03214v3
PDF	https://arxiv.org/pdf/1906.03214v3.pdf
PWC	https://paperswithcode.com/paper/importance-weighted-adversarial-variational
Repo
Framework

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog


Title	Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Authors	Zhe Gan, Yu Cheng, Ahmed El Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao
Abstract	This paper presents a new model for visual dialog, Recurrent Dual Attention Network (ReDAN), using multi-step reasoning to answer a series of questions about an image. In each question-answering turn of a dialog, ReDAN infers the answer progressively through multiple reasoning steps. In each step of the reasoning process, the semantic representation of the question is updated based on the image and the previous dialog history, and the recurrently-refined representation is used for further reasoning in the subsequent step. On the VisDial v1.0 dataset, the proposed ReDAN model achieves a new state-of-the-art of 64.47% NDCG score. Visualization on the reasoning process further demonstrates that ReDAN can locate context-relevant visual and textual clues via iterative refinement, which can lead to the correct answer step-by-step.
Tasks	Question Answering, Visual Dialog
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00579v2
PDF	https://arxiv.org/pdf/1902.00579v2.pdf
PWC	https://paperswithcode.com/paper/multi-step-reasoning-via-recurrent-dual
Repo
Framework

Masked Gradient-Based Causal Structure Learning


Title	Masked Gradient-Based Causal Structure Learning
Authors	Ignavier Ng, Zhuangyan Fang, Shengyu Zhu, Zhitang Chen, Jun Wang
Abstract	This paper studies the problem of learning causal structures from observational data. We reformulate the Structural Equation Model (SEM) in an augmented form with a binary graph adjacency matrix and show that, if the original SEM is identifiable, then this augmented form can be identified up to super-graphs of the true causal graph under mild conditions. Three methods are further provided to remove spurious edges to recover the true graph. We next utilize the augmented form to develop a masked structure learning method that can be efficiently trained using gradient-based optimization methods, by leveraging a smooth characterization on acyclicity and the Gumbel-Softmax approach to approximate the binary adjacency matrix. It is found that the obtained entries are typically near zero or one, and can be easily thresholded to identify the edges. We conduct experiments on synthetic and real datasets to validate the effectiveness of the proposed method and show that the method can readily include different smooth functions to model causal relationships.
Tasks	Causal Discovery, Causal Inference
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08527v2
PDF	https://arxiv.org/pdf/1910.08527v2.pdf
PWC	https://paperswithcode.com/paper/masked-gradient-based-causal-structure
Repo
Framework

The Northumberland Dolphin Dataset: A Multimedia Individual Cetacean Dataset for Fine-Grained Categorisation


Title	The Northumberland Dolphin Dataset: A Multimedia Individual Cetacean Dataset for Fine-Grained Categorisation
Authors	Cameron Trotter, Georgia Atkinson, Matthew Sharpe, A. Stephen McGough, Nick Wright, Per Berggren
Abstract	Methods for cetacean research include photo-identification (photo-id) and passive acoustic monitoring (PAM) which generate thousands of images per expedition that are currently hand categorised by researchers into the individual dolphins sighted. With the vast amount of data obtained it is crucially important to develop a system that is able to categorise this quickly. The Northumberland Dolphin Dataset (NDD) is an on-going novel dataset project made up of above and below water images of, and spectrograms of whistles from, white-beaked dolphins. These are produced by photo-id and PAM data collection methods applied off the coast of Northumberland, UK. This dataset will aid in building cetacean identification models, reducing the number of human-hours required to categorise images. Example use cases and areas identified for speed up are examined.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02669v1
PDF	https://arxiv.org/pdf/1908.02669v1.pdf
PWC	https://paperswithcode.com/paper/the-northumberland-dolphin-dataset-a
Repo
Framework

Efficient Local Causal Discovery Based on Markov Blanket


Title	Efficient Local Causal Discovery Based on Markov Blanket
Authors	Shuai Yang, Hao Wang, Xuegang hu
Abstract	We study the problem of local causal discovery learning which identifies direct causes and effects of a target variable of interest in a causal network. The existing constraint-based local causal discovery approaches are inefficient, since these approaches do not take a triangular structure formed by a given variable and its child variables into account in learning local causal structure, and hence need to spend much time in distinguishing several direct effects. Additionally, these approaches depend on the standard MB (Markov Blanket) or PC (Parent and Children) discovery algorithms which demand to conduct lots of conditional independence tests to obtain the MB or PC sets. To overcome the above problems, in this paper, we propose a novel Efficient Local Causal Discovery algorithm via MB (ELCD) to identify direct causes and effects of a given variable. More specifically, we design a new algorithm for Efficient Oriented MB discovery, name EOMB. EOMB not only utilizes fewer conditional independence tests to identify MB, but also is able to identify more direct effects of a given variable with the help of triangular causal structures and determine several direct causes as much as possible. In addition, based on the proposed EOMB, ELCD is presented to learn a local causal structure around a target variable. The benefits of ELCD are that it not only can determine the direct causes and effects of a given variable accurately, but also runs faster than other local causal discovery algorithms. Experimental results on eight Bayesian networks (BNs) show that our proposed approach performs better than state-of-the-art baseline methods.
Tasks	Causal Discovery
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01288v2
PDF	https://arxiv.org/pdf/1910.01288v2.pdf
PWC	https://paperswithcode.com/paper/efficient-local-causal-discovery-based-on
Repo
Framework

Ensemble learning based linear power flow


Title	Ensemble learning based linear power flow
Authors	Ren Hu, QiFeng Li
Abstract	This paper develops an ensemble learning-based linearization approach for power flow, which differs from the network-parameter based direct current (DC) power flow or other extended versions of linearization. As a novel data-driven linearization through data mining, it firstly applies the polynomial regression (PR) as a basic learner to capture the linear relationships between the bus voltage as the independent variable and the active or reactive power as the dependent variable in rectangular coordinates. Then, gradient boosting (GB) and bagging as ensemble learning methods are introduced to combine all basic learners to boost the model performance. The fitted linear power flow model is also relaxed to compute the optimal power flow (OPF). The simulating results of standard IEEE cases indicate that (1) ensemble learning methods outperform PR and GB works better than bagging; (2) as for solving OPF, the data-driven model excels the DC model and the SDP relaxation in the computational accuracy, and works faster than ACOPF and SDPOPF.
Tasks
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08655v1
PDF	https://arxiv.org/pdf/1910.08655v1.pdf
PWC	https://paperswithcode.com/paper/ensemble-learning-based-linear-power-flow
Repo
Framework

Towards Aggregating Weighted Feature Attributions


Title	Towards Aggregating Weighted Feature Attributions
Authors	Umang Bhatt, Pradeep Ravikumar, Jose M. F. Moura
Abstract	Current approaches for explaining machine learning models fall into two distinct classes: antecedent event influence and value attribution. The former leverages training instances to describe how much influence a training point exerts on a test point, while the latter attempts to attribute value to the features most pertinent to a given prediction. In this work, we discuss an algorithm, AVA: Aggregate Valuation of Antecedents, that fuses these two explanation classes to form a new approach to feature attribution that not only retrieves local explanations but also captures global patterns learned by a model. Our experimentation convincingly favors weighting and aggregating feature attributions via AVA.
Tasks
Published	2019-01-20
URL	http://arxiv.org/abs/1901.10040v1
PDF	http://arxiv.org/pdf/1901.10040v1.pdf
PWC	https://paperswithcode.com/paper/towards-aggregating-weighted-feature
Repo
Framework

Pedestrian re-identification based on Tree branch network with local and global learning


Title	Pedestrian re-identification based on Tree branch network with local and global learning
Authors	Hui Li, Meng Yang, Zhihui Lai, Weishi Zheng, Zitong Yu
Abstract	Deep part-based methods in recent literature have revealed the great potential of learning local part-level representation for pedestrian image in the task of person re-identification. However, global features that capture discriminative holistic information of human body are usually ignored or not well exploited. This motivates us to investigate joint learning global and local features from pedestrian images. Specifically, in this work, we propose a novel framework termed tree branch network (TBN) for person re-identification. Given a pedestrain image, the feature maps generated by the backbone CNN, are partitioned recursively into several pieces, each of which is followed by a bottleneck structure that learns finer-grained features for each level in the hierarchical tree-like framework. In this way, representations are learned in a coarse-to-fine manner and finally assembled to produce more discriminative image descriptions. Experimental results demonstrate the effectiveness of the global and local feature learning method in the proposed TBN framework. We also show significant improvement in performance over state-of-the-art methods on three public benchmarks: Market-1501, CUHK-03 and DukeMTMC.
Tasks	Person Re-Identification
Published	2019-03-31
URL	http://arxiv.org/abs/1904.00355v1
PDF	http://arxiv.org/pdf/1904.00355v1.pdf
PWC	https://paperswithcode.com/paper/pedestrian-re-identification-based-on-tree
Repo
Framework

Unsupervised Model Selection for Variational Disentangled Representation Learning


Title	Unsupervised Model Selection for Variational Disentangled Representation Learning
Authors	Sunny Duan, Loic Matthey, Andre Saraiva, Nicholas Watters, Christopher P. Burgess, Alexander Lerchner, Irina Higgins
Abstract	Disentangled representations have recently been shown to improve fairness, data efficiency and generalisation in simple supervised and reinforcement learning tasks. To extend the benefits of disentangled representations to more complex domains and practical applications, it is important to enable hyperparameter tuning and model selection of existing unsupervised approaches without requiring access to ground truth attribute labels, which are not available for most datasets. This paper addresses this problem by introducing a simple yet robust and reliable method for unsupervised disentangled model selection. Our approach, Unsupervised Disentanglement Ranking (UDR), leverages the recent theoretical results that explain why variational autoencoders disentangle (Rolinek et al, 2019), to quantify the quality of disentanglement by performing pairwise comparisons between trained model representations. We show that our approach performs comparably to the existing supervised alternatives across 5,400 models from six state of the art unsupervised disentangled representation learning model classes. Furthermore, we show that the ranking produced by our approach correlates well with the final task performance on two different domains.
Tasks	Model Selection, Representation Learning
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12614v4
PDF	https://arxiv.org/pdf/1905.12614v4.pdf
PWC	https://paperswithcode.com/paper/a-heuristic-for-unsupervised-model-selection
Repo
Framework

Ensemble Feature for Person Re-Identification


Title	Ensemble Feature for Person Re-Identification
Authors	Jiabao Wang, Yang Li, Zhuang Miao
Abstract	In person re-identification (re-ID), the key task is feature representation, which is used to compute distance or similarity in prediction. Person re-ID achieves great improvement when deep learning methods are introduced to tackle this problem. The features extracted by convolutional neural networks (CNN) are more effective and discriminative than the hand-crafted features. However, deep feature extracted by a single CNN network is not robust enough in testing stage. To improve the ability of feature representation, we propose a new ensemble network (EnsembleNet) by dividing a single network into multiple end-to-end branches. The ensemble feature is obtained by concatenating each of the branch features to represent a person. EnsembleNet is designed based on ResNet-50 and its backbone shares most of the parameters for saving computation and memory cost. Experimental results show that our EnsembleNet achieves the state-of-the-art performance on the public Market1501, DukeMTMC-reID and CUHK03 person re-ID benchmarks.
Tasks	Person Re-Identification
Published	2019-01-17
URL	http://arxiv.org/abs/1901.05798v1
PDF	http://arxiv.org/pdf/1901.05798v1.pdf
PWC	https://paperswithcode.com/paper/ensemble-feature-for-person-re-identification
Repo
Framework