February 1, 2020

3068 words 15 mins read

Paper Group AWR 149

Learning the Tangent Space of Dynamical Instabilities from Data. Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation. Deeply-supervised Knowledge Synergy. Using LSTMs for climate change assessment studies on droughts and floods. Adaptive Morphological Reconstruction for Seeded Image Segmentati …

Learning the Tangent Space of Dynamical Instabilities from Data


Title	Learning the Tangent Space of Dynamical Instabilities from Data
Authors	Antoine Blanchard, Themistoklis P. Sapsis
Abstract	For a large class of dynamical systems, the optimally time-dependent (OTD) modes, a set of deformable orthonormal tangent vectors that track directions of instabilities along any trajectory, are known to depend “pointwise” on the state of the system on the attractor, and not on the history of the trajectory. We leverage the power of neural networks to learn this “pointwise” mapping from phase space to OTD space directly from data. The result of the learning process is a cartography of directions associated with strongest instabilities in phase space. Implications for data-driven prediction and control of dynamical instabilities are discussed.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10413v2
PDF	https://arxiv.org/pdf/1907.10413v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-the-tangent-space-of
Repo	https://github.com/ablancha/deep-OTD
Framework	none

Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation


Title	Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation
Authors	Loïc Vial, Benjamin Lecouteux, Didier Schwab
Abstract	In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database. We propose two different methods that greatly reduces the size of neural WSD models, with the benefit of improving their coverage without additional training data, and without impacting their precision. In addition to our method, we present a WSD system which relies on pre-trained BERT word vectors in order to achieve results that significantly outperform the state of the art on all WSD evaluation tasks.
Tasks	Word Sense Disambiguation
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05677v3
PDF	https://arxiv.org/pdf/1905.05677v3.pdf
PWC	https://paperswithcode.com/paper/sense-vocabulary-compression-through-the
Repo	https://github.com/getalp/disambiguate
Framework	pytorch

Deeply-supervised Knowledge Synergy


Title	Deeply-supervised Knowledge Synergy
Authors	Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao
Abstract	Convolutional Neural Networks (CNNs) have become deeper and more complicated compared with the pioneering AlexNet. However, current prevailing training scheme follows the previous way of adding supervision to the last layer of the network only and propagating error information up layer-by-layer. In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference. Inspired by the deeply-supervised learning scheme, we first append auxiliary supervision branches on top of certain intermediate network layers. While properly using auxiliary supervision can improve model accuracy to some degree, we go one step further to explore the possibility of utilizing the probabilistic knowledge dynamically learnt by the classifiers connected to the backbone network as a new regularization to improve the training. A novel synergy loss, which considers pairwise knowledge matching among all supervision branches, is presented. Intriguingly, it enables dense pairwise knowledge matching operations in both top-down and bottom-up directions at each training iteration, resembling a dynamic synergy process for the same task. We evaluate DKS on image classification datasets using state-of-the-art CNN architectures, and show that the models trained with it are consistently better than the corresponding counterparts. For instance, on the ImageNet classification benchmark, our ResNet-152 model outperforms the baseline model with a 1.47% margin in Top-1 accuracy. Code is available at https://github.com/sundw2014/DKS.
Tasks	Image Classification
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00675v2
PDF	https://arxiv.org/pdf/1906.00675v2.pdf
PWC	https://paperswithcode.com/paper/190600675
Repo	https://github.com/sundw2014/DKS
Framework	pytorch

Using LSTMs for climate change assessment studies on droughts and floods


Title	Using LSTMs for climate change assessment studies on droughts and floods
Authors	Frederik Kratzert, Daniel Klotz, Johannes Brandstetter, Pieter-Jan Hoedt, Grey Nearing, Sepp Hochreiter
Abstract	Climate change affects occurrences of floods and droughts worldwide. However, predicting climate impacts over individual watersheds is difficult, primarily because accurate hydrological forecasts require models that are calibrated to past data. In this work we present a large-scale LSTM-based modeling approach that – by training on large data sets – learns a diversity of hydrological behaviors. Previous work shows that this model is more accurate than current state-of-the-art models, even when the LSTM-based approach operates out-of-sample and the latter in-sample. In this work, we show how this model can assess the sensitivity of the underlying systems with regard to extreme (high and low) flows in individual watersheds over the continental US.
Tasks
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03941v2
PDF	https://arxiv.org/pdf/1911.03941v2.pdf
PWC	https://paperswithcode.com/paper/using-lstms-for-climate-change-assessment
Repo	https://github.com/kratzert/neurips2019_climate_change_workshop
Framework	none

Adaptive Morphological Reconstruction for Seeded Image Segmentation


Title	Adaptive Morphological Reconstruction for Seeded Image Segmentation
Authors	Tao Lei, Xiaohong Jia, Tongliang Liu, Shigang Liu, Hongying Meng, Asoke K. Nandi
Abstract	Morphological reconstruction (MR) is often employed by seeded image segmentation algorithms such as watershed transform and power watershed as it is able to filter seeds (regional minima) to reduce over-segmentation. However, MR might mistakenly filter meaningful seeds that are required for generating accurate segmentation and it is also sensitive to the scale because a single-scale structuring element is employed. In this paper, a novel adaptive morphological reconstruction (AMR) operation is proposed that has three advantages. Firstly, AMR can adaptively filter useless seeds while preserving meaningful ones. Secondly, AMR is insensitive to the scale of structuring elements because multiscale structuring elements are employed. Finally, AMR has two attractive properties: monotonic increasingness and convergence that help seeded segmentation algorithms to achieve a hierarchical segmentation. Experiments clearly demonstrate that AMR is useful for improving algorithms of seeded image segmentation and seed-based spectral segmentation. Compared to several state-of-the-art algorithms, the proposed algorithms provide better segmentation results requiring less computing time. Source code is available at https://github.com/SUST-reynole/AMR.
Tasks	Semantic Segmentation
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03973v1
PDF	http://arxiv.org/pdf/1904.03973v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-morphological-reconstruction-for
Repo	https://github.com/SUST-reynole/AMR
Framework	none

On Model Stability as a Function of Random Seed


Title	On Model Stability as a Function of Random Seed
Authors	Pranava Madhyastha, Rishabh Jain
Abstract	In this paper, we focus on quantifying model stability as a function of random seed by investigating the effects of the induced randomness on model performance and the robustness of the model in general. We specifically perform a controlled study on the effect of random seeds on the behaviour of attention, gradient-based and surrogate model based (LIME) interpretations. Our analysis suggests that random seeds can adversely affect the consistency of models resulting in counterfactual interpretations. We propose a technique called Aggressive Stochastic Weight Averaging (ASWA)and an extension called Norm-filtered Aggressive Stochastic Weight Averaging (NASWA) which improves the stability of models over random seeds. With our ASWA and NASWA based optimization, we are able to improve the robustness of the original model, on average reducing the standard deviation of the model’s performance by 72%.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10447v1
PDF	https://arxiv.org/pdf/1909.10447v1.pdf
PWC	https://paperswithcode.com/paper/190910447
Repo	https://github.com/rishj97/ModelStability
Framework	pytorch


Title	Review-Driven Answer Generation for Product-Related Questions in E-Commerce
Authors	Shiqian Chen, Chenliang Li, Feng Ji, Wei Zhou, Haiqing Chen
Abstract	The users often have many product-related questions before they make a purchase decision in E-commerce. However, it is often time-consuming to examine each user review to identify the desired information. In this paper, we propose a novel review-driven framework for answer generation for product-related questions in E-commerce, named RAGE. We develope RAGE on the basis of the multi-layer convolutional architecture to facilitate speed-up of answer generation with the parallel computation. For each question, RAGE first extracts the relevant review snippets from the reviews of the corresponding product. Then, we devise a mechanism to identify the relevant information from the noise-prone review snippets and incorporate this information to guide the answer generation. The experiments on two real-world E-Commerce datasets show that the proposed RAGE significantly outperforms the existing alternatives in producing more accurate and informative answers in natural language. Moreover, RAGE takes much less time for both model training and answer generation than the existing RNN based generation models.
Tasks
Published	2019-04-27
URL	http://arxiv.org/abs/1905.01994v1
PDF	http://arxiv.org/pdf/1905.01994v1.pdf
PWC	https://paperswithcode.com/paper/190501994
Repo	https://github.com/WHUIR/RAGE
Framework	tf

Describing like humans: on diversity in image captioning


Title	Describing like humans: on diversity in image captioning
Authors	Qingzhong Wang, Antoni B. Chan
Abstract	Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and multiple levels of detail, and thus there is a variety of captions that express different concepts and details that might be interesting for different humans. Therefore only evaluating accuracy is not sufficient for measuring the performance of captioning models — the diversity of the generated captions should also be considered. In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. We conduct extensive experiments to re-evaluate recent captioning models in the context of both diversity and accuracy. We find that there is still a large gap between the model and human performance in terms of both accuracy and diversity and the models that have optimized accuracy (CIDEr) have low diversity. We also show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions.
Tasks	Image Captioning
Published	2019-03-28
URL	https://arxiv.org/abs/1903.12020v3
PDF	https://arxiv.org/pdf/1903.12020v3.pdf
PWC	https://paperswithcode.com/paper/describing-like-humans-on-diversity-in-image
Repo	https://github.com/qingzwang/DiversityMetrics
Framework	tf

Explaining Deep Classification of Time-Series Data with Learned Prototypes


Title	Explaining Deep Classification of Time-Series Data with Learned Prototypes
Authors	Alan H. Gee, Diego Garcia-Olano, Joydeep Ghosh, David Paydarfar
Abstract	The emergence of deep learning networks raises a need for explainable AI so that users and domain experts can be confident applying them to high-risk decisions. In this paper, we leverage data from the latent space induced by deep learning models to learn stereotypical representations or “prototypes” during training to elucidate the algorithmic decision-making process. We study how leveraging prototypes effect classification decisions of two dimensional time-series data in a few different settings: (1) electrocardiogram (ECG) waveforms to detect clinical bradycardia, a slowing of heart rate, in preterm infants, (2) respiration waveforms to detect apnea of prematurity, and (3) audio waveforms to classify spoken digits. We improve upon existing models by optimizing for increased prototype diversity and robustness, visualize how these prototypes in the latent space are used by the model to distinguish classes, and show that prototypes are capable of learning features on two dimensional time-series data to produce explainable insights during classification tasks. We show that the prototypes are capable of learning real-world features - bradycardia in ECG, apnea in respiration, and articulation in speech - as well as features within sub-classes. Our novel work leverages learned prototypical framework on two dimensional time-series data to produce explainable insights during classification tasks.
Tasks	Decision Making, Time Series
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08935v3
PDF	https://arxiv.org/pdf/1904.08935v3.pdf
PWC	https://paperswithcode.com/paper/explaining-deep-classification-of-time-series
Repo	https://github.com/alangee/ijcai19-ts-prototypes
Framework	none

The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas


Title	The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas
Authors	Michael P. J. Camilleri, Christopher K. I. Williams
Abstract	While label fusion from multiple noisy annotations is a well understood concept in data wrangling (tackled for example by the Dawid-Skene (DS) model), we consider the extended problem of carrying out learning when the labels themselves are not consistently annotated with the same schema. We show that even if annotators use disparate, albeit related, label-sets, we can still draw inferences for the underlying full label-set. We propose the Inter-Schema AdapteR (ISAR) to translate the fully-specified label-set to the one used by each annotator, enabling learning under such heterogeneous schemas, without the need to re-annotate the data. We apply our method to a mouse behavioural dataset, achieving significant gains (compared with DS) in out-of-sample log-likelihood (-3.40 to -2.39) and F1-score (0.785 to 0.864).
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01251v2
PDF	https://arxiv.org/pdf/1906.01251v2.pdf
PWC	https://paperswithcode.com/paper/the-extended-dawid-skene-model-fusing
Repo	https://github.com/michael-camilleri/ISAR-Inter_Schema_AdapteR
Framework	none

Neural Jump Stochastic Differential Equations


Title	Neural Jump Stochastic Differential Equations
Authors	Junteng Jia, Austin R. Benson
Abstract	Many time series are effectively generated by a combination of deterministic continuous flows along with discrete jumps sparked by stochastic events. However, we usually do not have the equation of motion describing the flows, or how they are affected by jumps. To this end, we introduce Neural Jump Stochastic Differential Equations that provide a data-driven approach to learn continuous and discrete dynamic behavior, i.e., hybrid systems that both flow and jump. Our approach extends the framework of Neural Ordinary Differential Equations with a stochastic process term that models discrete events. We then model temporal point processes with a piecewise-continuous latent trajectory, where the discontinuities are caused by stochastic events whose conditional intensity depends on the latent state. We demonstrate the predictive capabilities of our model on a range of synthetic and real-world marked point process datasets, including classical point processes (such as Hawkes processes), awards on Stack Overflow, medical records, and earthquake monitoring.
Tasks	Point Processes, Time Series
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10403v3
PDF	https://arxiv.org/pdf/1905.10403v3.pdf
PWC	https://paperswithcode.com/paper/neural-jump-stochastic-differential-equations
Repo	https://github.com/mitmath/18S096SciML
Framework	none

Track to Reconstruct and Reconstruct to Track


Title	Track to Reconstruct and Reconstruct to Track
Authors	Jonathon Luiten, Tobias Fischer, Bastian Leibe
Abstract	Object tracking and 3D reconstruction are often performed together, with tracking used as input for reconstruction. However, the obtained reconstructions also provide useful information for improving tracking. We propose a novel method that closes this loop, first tracking to reconstruct, and then reconstructing to track. Our approach, MOTSFusion (Multi-Object Tracking, Segmentation and dynamic object Fusion), exploits the 3D motion extracted from dynamic object reconstructions to track objects through long periods of complete occlusion and to recover missing detections. Our approach first builds up short tracklets using 2D optical flow, and then fuses these into dynamic 3D object reconstructions. The precise 3D object motion of these reconstructions is used to merge tracklets through occlusion into long-term tracks, and to locate objects when detections are missing. On KITTI, our reconstruction-based tracking reduces the number of ID switches of the initial tracklets by more than 50%, and outperforms all previous approaches for both bounding box and segmentation tracking.
Tasks	3D Reconstruction, Multi-Object Tracking, Object Tracking, Optical Flow Estimation
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00130v2
PDF	https://arxiv.org/pdf/1910.00130v2.pdf
PWC	https://paperswithcode.com/paper/track-to-reconstruct-and-reconstruct-to-track
Repo	https://github.com/tobiasfshr/MOTSFusion
Framework	tf

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning


Title	MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
Authors	Shaohuai Shi, Xiaowen Chu, Bo Li
Abstract	Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this paper, we observe that many DNNs have a large number of layers with only a small amount of data to be communicated at each layer in distributed training, which could make WFBP inefficient. Based on the fact that merging some short communication tasks into a single one can reduce the overall communication time, we formulate an optimization problem to minimize the training time in pipelining communications and computations. We derive an optimal solution that can be solved efficiently without affecting the training performance. We then apply the solution to propose a distributed training algorithm named merged-gradient WFBP (MG-WFBP) and implement it in two platforms Caffe and PyTorch. Extensive experiments in three GPU clusters are conducted to verify the effectiveness of MG-WFBP. We further exploit the trace-based simulation of 64 GPUs to explore the potential scaling efficiency of MG-WFBP. Experimental results show that MG-WFBP achieves much better scaling performance than existing methods.
Tasks
Published	2019-12-18
URL	https://arxiv.org/abs/1912.09268v1
PDF	https://arxiv.org/pdf/1912.09268v1.pdf
PWC	https://paperswithcode.com/paper/mg-wfbp-merging-gradients-wisely-for
Repo	https://github.com/HKBU-HPML/MG-WFBP
Framework	pytorch

A Flexible Generative Framework for Graph-based Semi-supervised Learning


Title	A Flexible Generative Framework for Graph-based Semi-supervised Learning
Authors	Jiaqi Ma, Weijing Tang, Ji Zhu, Qiaozhu Mei
Abstract	We consider a family of problems that are concerned about making predictions for the majority of unlabeled, graph-structured data samples based on a small proportion of labeled samples. Relational information among the data samples, often encoded in the graph/network structure, is shown to be helpful for these semi-supervised learning tasks. However, conventional graph-based regularization methods and recent graph neural networks do not fully leverage the interrelations between the features, the graph, and the labels. In this work, we propose a flexible generative framework for graph-based semi-supervised learning, which approaches the joint distribution of the node features, labels, and the graph structure. Borrowing insights from random graph models in network science literature, this joint distribution can be instantiated using various distribution families. For the inference of missing labels, we exploit recent advances of scalable variational inference techniques to approximate the Bayesian posterior. We conduct thorough experiments on benchmark datasets for graph-based semi-supervised learning. Results show that the proposed methods outperform the state-of-the-art models in most settings.
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10769v2
PDF	https://arxiv.org/pdf/1905.10769v2.pdf
PWC	https://paperswithcode.com/paper/a-flexible-generative-framework-for-graph
Repo	https://github.com/jiaqima/GenGNN
Framework	pytorch

Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks


Title	Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks
Authors	Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
Abstract	Sequential labeling-based NER approaches restrict each word belonging to at most one entity mention, which will face a serious problem when recognizing nested entity mentions. In this paper, we propose to resolve this problem by modeling and leveraging the head-driven phrase structures of entity mentions, i.e., although a mention can nest other mentions, they will not share the same head word. Specifically, we propose Anchor-Region Networks (ARNs), a sequence-to-nuggets architecture for nested mention detection. ARNs first identify anchor words (i.e., possible head words) of all mentions, and then recognize the mention boundaries for each anchor word by exploiting regular phrase structures. Furthermore, we also design Bag Loss, an objective function which can train ARNs in an end-to-end manner without using any anchor word annotation. Experiments show that ARNs achieve the state-of-the-art performance on three standard nested entity mention detection benchmarks.
Tasks	Named Entity Recognition, Nested Mention Recognition, Nested Named Entity Recognition
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03783v1
PDF	https://arxiv.org/pdf/1906.03783v1.pdf
PWC	https://paperswithcode.com/paper/sequence-to-nuggets-nested-entity-mention
Repo	https://github.com/sanmusunrise/ARNs
Framework	pytorch