February 1, 2020

3068 words 15 mins read

Paper Group AWR 149

Paper Group AWR 149

Learning the Tangent Space of Dynamical Instabilities from Data. Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation. Deeply-supervised Knowledge Synergy. Using LSTMs for climate change assessment studies on droughts and floods. Adaptive Morphological Reconstruction for Seeded Image Segmentati …

Learning the Tangent Space of Dynamical Instabilities from Data

Title Learning the Tangent Space of Dynamical Instabilities from Data
Authors Antoine Blanchard, Themistoklis P. Sapsis
Abstract For a large class of dynamical systems, the optimally time-dependent (OTD) modes, a set of deformable orthonormal tangent vectors that track directions of instabilities along any trajectory, are known to depend “pointwise” on the state of the system on the attractor, and not on the history of the trajectory. We leverage the power of neural networks to learn this “pointwise” mapping from phase space to OTD space directly from data. The result of the learning process is a cartography of directions associated with strongest instabilities in phase space. Implications for data-driven prediction and control of dynamical instabilities are discussed.
Tasks
Published 2019-07-24
URL https://arxiv.org/abs/1907.10413v2
PDF https://arxiv.org/pdf/1907.10413v2.pdf
PWC https://paperswithcode.com/paper/machine-learning-the-tangent-space-of
Repo https://github.com/ablancha/deep-OTD
Framework none

Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation

Title Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation
Authors Loïc Vial, Benjamin Lecouteux, Didier Schwab
Abstract In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database. We propose two different methods that greatly reduces the size of neural WSD models, with the benefit of improving their coverage without additional training data, and without impacting their precision. In addition to our method, we present a WSD system which relies on pre-trained BERT word vectors in order to achieve results that significantly outperform the state of the art on all WSD evaluation tasks.
Tasks Word Sense Disambiguation
Published 2019-05-14
URL https://arxiv.org/abs/1905.05677v3
PDF https://arxiv.org/pdf/1905.05677v3.pdf
PWC https://paperswithcode.com/paper/sense-vocabulary-compression-through-the
Repo https://github.com/getalp/disambiguate
Framework pytorch

Deeply-supervised Knowledge Synergy

Title Deeply-supervised Knowledge Synergy
Authors Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao
Abstract Convolutional Neural Networks (CNNs) have become deeper and more complicated compared with the pioneering AlexNet. However, current prevailing training scheme follows the previous way of adding supervision to the last layer of the network only and propagating error information up layer-by-layer. In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference. Inspired by the deeply-supervised learning scheme, we first append auxiliary supervision branches on top of certain intermediate network layers. While properly using auxiliary supervision can improve model accuracy to some degree, we go one step further to explore the possibility of utilizing the probabilistic knowledge dynamically learnt by the classifiers connected to the backbone network as a new regularization to improve the training. A novel synergy loss, which considers pairwise knowledge matching among all supervision branches, is presented. Intriguingly, it enables dense pairwise knowledge matching operations in both top-down and bottom-up directions at each training iteration, resembling a dynamic synergy process for the same task. We evaluate DKS on image classification datasets using state-of-the-art CNN architectures, and show that the models trained with it are consistently better than the corresponding counterparts. For instance, on the ImageNet classification benchmark, our ResNet-152 model outperforms the baseline model with a 1.47% margin in Top-1 accuracy. Code is available at https://github.com/sundw2014/DKS.
Tasks Image Classification
Published 2019-06-03
URL https://arxiv.org/abs/1906.00675v2
PDF https://arxiv.org/pdf/1906.00675v2.pdf
PWC https://paperswithcode.com/paper/190600675
Repo https://github.com/sundw2014/DKS
Framework pytorch

Using LSTMs for climate change assessment studies on droughts and floods

Title Using LSTMs for climate change assessment studies on droughts and floods
Authors Frederik Kratzert, Daniel Klotz, Johannes Brandstetter, Pieter-Jan Hoedt, Grey Nearing, Sepp Hochreiter
Abstract Climate change affects occurrences of floods and droughts worldwide. However, predicting climate impacts over individual watersheds is difficult, primarily because accurate hydrological forecasts require models that are calibrated to past data. In this work we present a large-scale LSTM-based modeling approach that – by training on large data sets – learns a diversity of hydrological behaviors. Previous work shows that this model is more accurate than current state-of-the-art models, even when the LSTM-based approach operates out-of-sample and the latter in-sample. In this work, we show how this model can assess the sensitivity of the underlying systems with regard to extreme (high and low) flows in individual watersheds over the continental US.
Tasks
Published 2019-11-10
URL https://arxiv.org/abs/1911.03941v2
PDF https://arxiv.org/pdf/1911.03941v2.pdf
PWC https://paperswithcode.com/paper/using-lstms-for-climate-change-assessment
Repo https://github.com/kratzert/neurips2019_climate_change_workshop
Framework none

Adaptive Morphological Reconstruction for Seeded Image Segmentation

Title Adaptive Morphological Reconstruction for Seeded Image Segmentation
Authors Tao Lei, Xiaohong Jia, Tongliang Liu, Shigang Liu, Hongying Meng, Asoke K. Nandi
Abstract Morphological reconstruction (MR) is often employed by seeded image segmentation algorithms such as watershed transform and power watershed as it is able to filter seeds (regional minima) to reduce over-segmentation. However, MR might mistakenly filter meaningful seeds that are required for generating accurate segmentation and it is also sensitive to the scale because a single-scale structuring element is employed. In this paper, a novel adaptive morphological reconstruction (AMR) operation is proposed that has three advantages. Firstly, AMR can adaptively filter useless seeds while preserving meaningful ones. Secondly, AMR is insensitive to the scale of structuring elements because multiscale structuring elements are employed. Finally, AMR has two attractive properties: monotonic increasingness and convergence that help seeded segmentation algorithms to achieve a hierarchical segmentation. Experiments clearly demonstrate that AMR is useful for improving algorithms of seeded image segmentation and seed-based spectral segmentation. Compared to several state-of-the-art algorithms, the proposed algorithms provide better segmentation results requiring less computing time. Source code is available at https://github.com/SUST-reynole/AMR.
Tasks Semantic Segmentation
Published 2019-04-08
URL http://arxiv.org/abs/1904.03973v1
PDF http://arxiv.org/pdf/1904.03973v1.pdf
PWC https://paperswithcode.com/paper/adaptive-morphological-reconstruction-for
Repo https://github.com/SUST-reynole/AMR
Framework none

On Model Stability as a Function of Random Seed

Title On Model Stability as a Function of Random Seed
Authors Pranava Madhyastha, Rishabh Jain
Abstract In this paper, we focus on quantifying model stability as a function of random seed by investigating the effects of the induced randomness on model performance and the robustness of the model in general. We specifically perform a controlled study on the effect of random seeds on the behaviour of attention, gradient-based and surrogate model based (LIME) interpretations. Our analysis suggests that random seeds can adversely affect the consistency of models resulting in counterfactual interpretations. We propose a technique called Aggressive Stochastic Weight Averaging (ASWA)and an extension called Norm-filtered Aggressive Stochastic Weight Averaging (NASWA) which improves the stability of models over random seeds. With our ASWA and NASWA based optimization, we are able to improve the robustness of the original model, on average reducing the standard deviation of the model’s performance by 72%.
Tasks
Published 2019-09-23
URL https://arxiv.org/abs/1909.10447v1
PDF https://arxiv.org/pdf/1909.10447v1.pdf
PWC https://paperswithcode.com/paper/190910447
Repo https://github.com/rishj97/ModelStability
Framework pytorch
Title Review-Driven Answer Generation for Product-Related Questions in E-Commerce
Authors Shiqian Chen, Chenliang Li, Feng Ji, Wei Zhou, Haiqing Chen
Abstract The users often have many product-related questions before they make a purchase decision in E-commerce. However, it is often time-consuming to examine each user review to identify the desired information. In this paper, we propose a novel review-driven framework for answer generation for product-related questions in E-commerce, named RAGE. We develope RAGE on the basis of the multi-layer convolutional architecture to facilitate speed-up of answer generation with the parallel computation. For each question, RAGE first extracts the relevant review snippets from the reviews of the corresponding product. Then, we devise a mechanism to identify the relevant information from the noise-prone review snippets and incorporate this information to guide the answer generation. The experiments on two real-world E-Commerce datasets show that the proposed RAGE significantly outperforms the existing alternatives in producing more accurate and informative answers in natural language. Moreover, RAGE takes much less time for both model training and answer generation than the existing RNN based generation models.
Tasks
Published 2019-04-27
URL http://arxiv.org/abs/1905.01994v1
PDF http://arxiv.org/pdf/1905.01994v1.pdf
PWC https://paperswithcode.com/paper/190501994
Repo https://github.com/WHUIR/RAGE
Framework tf

Describing like humans: on diversity in image captioning

Title Describing like humans: on diversity in image captioning
Authors Qingzhong Wang, Antoni B. Chan
Abstract Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and multiple levels of detail, and thus there is a variety of captions that express different concepts and details that might be interesting for different humans. Therefore only evaluating accuracy is not sufficient for measuring the performance of captioning models — the diversity of the generated captions should also be considered. In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. We conduct extensive experiments to re-evaluate recent captioning models in the context of both diversity and accuracy. We find that there is still a large gap between the model and human performance in terms of both accuracy and diversity and the models that have optimized accuracy (CIDEr) have low diversity. We also show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.
Tasks Image Captioning
Published 2019-03-28
URL https://arxiv.org/abs/1903.12020v3
PDF https://arxiv.org/pdf/1903.12020v3.pdf
PWC https://paperswithcode.com/paper/describing-like-humans-on-diversity-in-image
Repo https://github.com/qingzwang/DiversityMetrics
Framework tf

Explaining Deep Classification of Time-Series Data with Learned Prototypes

Title Explaining Deep Classification of Time-Series Data with Learned Prototypes
Authors Alan H. Gee, Diego Garcia-Olano, Joydeep Ghosh, David Paydarfar
Abstract The emergence of deep learning networks raises a need for explainable AI so that users and domain experts can be confident applying them to high-risk decisions. In this paper, we leverage data from the latent space induced by deep learning models to learn stereotypical representations or “prototypes” during training to elucidate the algorithmic decision-making process. We study how leveraging prototypes effect classification decisions of two dimensional time-series data in a few different settings: (1) electrocardiogram (ECG) waveforms to detect clinical bradycardia, a slowing of heart rate, in preterm infants, (2) respiration waveforms to detect apnea of prematurity, and (3) audio waveforms to classify spoken digits. We improve upon existing models by optimizing for increased prototype diversity and robustness, visualize how these prototypes in the latent space are used by the model to distinguish classes, and show that prototypes are capable of learning features on two dimensional time-series data to produce explainable insights during classification tasks. We show that the prototypes are capable of learning real-world features - bradycardia in ECG, apnea in respiration, and articulation in speech - as well as features within sub-classes. Our novel work leverages learned prototypical framework on two dimensional time-series data to produce explainable insights during classification tasks.
Tasks Decision Making, Time Series
Published 2019-04-18
URL https://arxiv.org/abs/1904.08935v3
PDF https://arxiv.org/pdf/1904.08935v3.pdf
PWC https://paperswithcode.com/paper/explaining-deep-classification-of-time-series
Repo https://github.com/alangee/ijcai19-ts-prototypes
Framework none

The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas

Title The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas
Authors Michael P. J. Camilleri, Christopher K. I. Williams
Abstract While label fusion from multiple noisy annotations is a well understood concept in data wrangling (tackled for example by the Dawid-Skene (DS) model), we consider the extended problem of carrying out learning when the labels themselves are not consistently annotated with the same schema. We show that even if annotators use disparate, albeit related, label-sets, we can still draw inferences for the underlying full label-set. We propose the Inter-Schema AdapteR (ISAR) to translate the fully-specified label-set to the one used by each annotator, enabling learning under such heterogeneous schemas, without the need to re-annotate the data. We apply our method to a mouse behavioural dataset, achieving significant gains (compared with DS) in out-of-sample log-likelihood (-3.40 to -2.39) and F1-score (0.785 to 0.864).
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01251v2
PDF https://arxiv.org/pdf/1906.01251v2.pdf
PWC https://paperswithcode.com/paper/the-extended-dawid-skene-model-fusing
Repo https://github.com/michael-camilleri/ISAR-Inter_Schema_AdapteR
Framework none

Neural Jump Stochastic Differential Equations

Title Neural Jump Stochastic Differential Equations
Authors Junteng Jia, Austin R. Benson
Abstract Many time series are effectively generated by a combination of deterministic continuous flows along with discrete jumps sparked by stochastic events. However, we usually do not have the equation of motion describing the flows, or how they are affected by jumps. To this end, we introduce Neural Jump Stochastic Differential Equations that provide a data-driven approach to learn continuous and discrete dynamic behavior, i.e., hybrid systems that both flow and jump. Our approach extends the framework of Neural Ordinary Differential Equations with a stochastic process term that models discrete events. We then model temporal point processes with a piecewise-continuous latent trajectory, where the discontinuities are caused by stochastic events whose conditional intensity depends on the latent state. We demonstrate the predictive capabilities of our model on a range of synthetic and real-world marked point process datasets, including classical point processes (such as Hawkes processes), awards on Stack Overflow, medical records, and earthquake monitoring.
Tasks Point Processes, Time Series
Published 2019-05-24
URL https://arxiv.org/abs/1905.10403v3
PDF https://arxiv.org/pdf/1905.10403v3.pdf
PWC https://paperswithcode.com/paper/neural-jump-stochastic-differential-equations
Repo https://github.com/mitmath/18S096SciML
Framework none

Track to Reconstruct and Reconstruct to Track

Title Track to Reconstruct and Reconstruct to Track
Authors Jonathon Luiten, Tobias Fischer, Bastian Leibe
Abstract Object tracking and 3D reconstruction are often performed together, with tracking used as input for reconstruction. However, the obtained reconstructions also provide useful information for improving tracking. We propose a novel method that closes this loop, first tracking to reconstruct, and then reconstructing to track. Our approach, MOTSFusion (Multi-Object Tracking, Segmentation and dynamic object Fusion), exploits the 3D motion extracted from dynamic object reconstructions to track objects through long periods of complete occlusion and to recover missing detections. Our approach first builds up short tracklets using 2D optical flow, and then fuses these into dynamic 3D object reconstructions. The precise 3D object motion of these reconstructions is used to merge tracklets through occlusion into long-term tracks, and to locate objects when detections are missing. On KITTI, our reconstruction-based tracking reduces the number of ID switches of the initial tracklets by more than 50%, and outperforms all previous approaches for both bounding box and segmentation tracking.
Tasks 3D Reconstruction, Multi-Object Tracking, Object Tracking, Optical Flow Estimation
Published 2019-09-30
URL https://arxiv.org/abs/1910.00130v2
PDF https://arxiv.org/pdf/1910.00130v2.pdf
PWC https://paperswithcode.com/paper/track-to-reconstruct-and-reconstruct-to-track
Repo https://github.com/tobiasfshr/MOTSFusion
Framework tf

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Title MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
Authors Shaohuai Shi, Xiaowen Chu, Bo Li
Abstract Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this paper, we observe that many DNNs have a large number of layers with only a small amount of data to be communicated at each layer in distributed training, which could make WFBP inefficient. Based on the fact that merging some short communication tasks into a single one can reduce the overall communication time, we formulate an optimization problem to minimize the training time in pipelining communications and computations. We derive an optimal solution that can be solved efficiently without affecting the training performance. We then apply the solution to propose a distributed training algorithm named merged-gradient WFBP (MG-WFBP) and implement it in two platforms Caffe and PyTorch. Extensive experiments in three GPU clusters are conducted to verify the effectiveness of MG-WFBP. We further exploit the trace-based simulation of 64 GPUs to explore the potential scaling efficiency of MG-WFBP. Experimental results show that MG-WFBP achieves much better scaling performance than existing methods.
Tasks
Published 2019-12-18
URL https://arxiv.org/abs/1912.09268v1
PDF https://arxiv.org/pdf/1912.09268v1.pdf
PWC https://paperswithcode.com/paper/mg-wfbp-merging-gradients-wisely-for
Repo https://github.com/HKBU-HPML/MG-WFBP
Framework pytorch

A Flexible Generative Framework for Graph-based Semi-supervised Learning

Title A Flexible Generative Framework for Graph-based Semi-supervised Learning
Authors Jiaqi Ma, Weijing Tang, Ji Zhu, Qiaozhu Mei
Abstract We consider a family of problems that are concerned about making predictions for the majority of unlabeled, graph-structured data samples based on a small proportion of labeled samples. Relational information among the data samples, often encoded in the graph/network structure, is shown to be helpful for these semi-supervised learning tasks. However, conventional graph-based regularization methods and recent graph neural networks do not fully leverage the interrelations between the features, the graph, and the labels. In this work, we propose a flexible generative framework for graph-based semi-supervised learning, which approaches the joint distribution of the node features, labels, and the graph structure. Borrowing insights from random graph models in network science literature, this joint distribution can be instantiated using various distribution families. For the inference of missing labels, we exploit recent advances of scalable variational inference techniques to approximate the Bayesian posterior. We conduct thorough experiments on benchmark datasets for graph-based semi-supervised learning. Results show that the proposed methods outperform the state-of-the-art models in most settings.
Tasks
Published 2019-05-26
URL https://arxiv.org/abs/1905.10769v2
PDF https://arxiv.org/pdf/1905.10769v2.pdf
PWC https://paperswithcode.com/paper/a-flexible-generative-framework-for-graph
Repo https://github.com/jiaqima/GenGNN
Framework pytorch

Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks

Title Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks
Authors Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
Abstract Sequential labeling-based NER approaches restrict each word belonging to at most one entity mention, which will face a serious problem when recognizing nested entity mentions. In this paper, we propose to resolve this problem by modeling and leveraging the head-driven phrase structures of entity mentions, i.e., although a mention can nest other mentions, they will not share the same head word. Specifically, we propose Anchor-Region Networks (ARNs), a sequence-to-nuggets architecture for nested mention detection. ARNs first identify anchor words (i.e., possible head words) of all mentions, and then recognize the mention boundaries for each anchor word by exploiting regular phrase structures. Furthermore, we also design Bag Loss, an objective function which can train ARNs in an end-to-end manner without using any anchor word annotation. Experiments show that ARNs achieve the state-of-the-art performance on three standard nested entity mention detection benchmarks.
Tasks Named Entity Recognition, Nested Mention Recognition, Nested Named Entity Recognition
Published 2019-06-10
URL https://arxiv.org/abs/1906.03783v1
PDF https://arxiv.org/pdf/1906.03783v1.pdf
PWC https://paperswithcode.com/paper/sequence-to-nuggets-nested-entity-mention
Repo https://github.com/sanmusunrise/ARNs
Framework pytorch
comments powered by Disqus