Paper Group AWR 149
Learning the Tangent Space of Dynamical Instabilities from Data. Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation. Deeply-supervised Knowledge Synergy. Using LSTMs for climate change assessment studies on droughts and floods. Adaptive Morphological Reconstruction for Seeded Image Segmentati …
Learning the Tangent Space of Dynamical Instabilities from Data
Title | Learning the Tangent Space of Dynamical Instabilities from Data |
Authors | Antoine Blanchard, Themistoklis P. Sapsis |
Abstract | For a large class of dynamical systems, the optimally time-dependent (OTD) modes, a set of deformable orthonormal tangent vectors that track directions of instabilities along any trajectory, are known to depend “pointwise” on the state of the system on the attractor, and not on the history of the trajectory. We leverage the power of neural networks to learn this “pointwise” mapping from phase space to OTD space directly from data. The result of the learning process is a cartography of directions associated with strongest instabilities in phase space. Implications for data-driven prediction and control of dynamical instabilities are discussed. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10413v2 |
https://arxiv.org/pdf/1907.10413v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-the-tangent-space-of |
Repo | https://github.com/ablancha/deep-OTD |
Framework | none |
Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation
Title | Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation |
Authors | Loïc Vial, Benjamin Lecouteux, Didier Schwab |
Abstract | In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database. We propose two different methods that greatly reduces the size of neural WSD models, with the benefit of improving their coverage without additional training data, and without impacting their precision. In addition to our method, we present a WSD system which relies on pre-trained BERT word vectors in order to achieve results that significantly outperform the state of the art on all WSD evaluation tasks. |
Tasks | Word Sense Disambiguation |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05677v3 |
https://arxiv.org/pdf/1905.05677v3.pdf | |
PWC | https://paperswithcode.com/paper/sense-vocabulary-compression-through-the |
Repo | https://github.com/getalp/disambiguate |
Framework | pytorch |
Deeply-supervised Knowledge Synergy
Title | Deeply-supervised Knowledge Synergy |
Authors | Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao |
Abstract | Convolutional Neural Networks (CNNs) have become deeper and more complicated compared with the pioneering AlexNet. However, current prevailing training scheme follows the previous way of adding supervision to the last layer of the network only and propagating error information up layer-by-layer. In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference. Inspired by the deeply-supervised learning scheme, we first append auxiliary supervision branches on top of certain intermediate network layers. While properly using auxiliary supervision can improve model accuracy to some degree, we go one step further to explore the possibility of utilizing the probabilistic knowledge dynamically learnt by the classifiers connected to the backbone network as a new regularization to improve the training. A novel synergy loss, which considers pairwise knowledge matching among all supervision branches, is presented. Intriguingly, it enables dense pairwise knowledge matching operations in both top-down and bottom-up directions at each training iteration, resembling a dynamic synergy process for the same task. We evaluate DKS on image classification datasets using state-of-the-art CNN architectures, and show that the models trained with it are consistently better than the corresponding counterparts. For instance, on the ImageNet classification benchmark, our ResNet-152 model outperforms the baseline model with a 1.47% margin in Top-1 accuracy. Code is available at https://github.com/sundw2014/DKS. |
Tasks | Image Classification |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00675v2 |
https://arxiv.org/pdf/1906.00675v2.pdf | |
PWC | https://paperswithcode.com/paper/190600675 |
Repo | https://github.com/sundw2014/DKS |
Framework | pytorch |
Using LSTMs for climate change assessment studies on droughts and floods
Title | Using LSTMs for climate change assessment studies on droughts and floods |
Authors | Frederik Kratzert, Daniel Klotz, Johannes Brandstetter, Pieter-Jan Hoedt, Grey Nearing, Sepp Hochreiter |
Abstract | Climate change affects occurrences of floods and droughts worldwide. However, predicting climate impacts over individual watersheds is difficult, primarily because accurate hydrological forecasts require models that are calibrated to past data. In this work we present a large-scale LSTM-based modeling approach that – by training on large data sets – learns a diversity of hydrological behaviors. Previous work shows that this model is more accurate than current state-of-the-art models, even when the LSTM-based approach operates out-of-sample and the latter in-sample. In this work, we show how this model can assess the sensitivity of the underlying systems with regard to extreme (high and low) flows in individual watersheds over the continental US. |
Tasks | |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03941v2 |
https://arxiv.org/pdf/1911.03941v2.pdf | |
PWC | https://paperswithcode.com/paper/using-lstms-for-climate-change-assessment |
Repo | https://github.com/kratzert/neurips2019_climate_change_workshop |
Framework | none |
Adaptive Morphological Reconstruction for Seeded Image Segmentation
Title | Adaptive Morphological Reconstruction for Seeded Image Segmentation |
Authors | Tao Lei, Xiaohong Jia, Tongliang Liu, Shigang Liu, Hongying Meng, Asoke K. Nandi |
Abstract | Morphological reconstruction (MR) is often employed by seeded image segmentation algorithms such as watershed transform and power watershed as it is able to filter seeds (regional minima) to reduce over-segmentation. However, MR might mistakenly filter meaningful seeds that are required for generating accurate segmentation and it is also sensitive to the scale because a single-scale structuring element is employed. In this paper, a novel adaptive morphological reconstruction (AMR) operation is proposed that has three advantages. Firstly, AMR can adaptively filter useless seeds while preserving meaningful ones. Secondly, AMR is insensitive to the scale of structuring elements because multiscale structuring elements are employed. Finally, AMR has two attractive properties: monotonic increasingness and convergence that help seeded segmentation algorithms to achieve a hierarchical segmentation. Experiments clearly demonstrate that AMR is useful for improving algorithms of seeded image segmentation and seed-based spectral segmentation. Compared to several state-of-the-art algorithms, the proposed algorithms provide better segmentation results requiring less computing time. Source code is available at https://github.com/SUST-reynole/AMR. |
Tasks | Semantic Segmentation |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.03973v1 |
http://arxiv.org/pdf/1904.03973v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-morphological-reconstruction-for |
Repo | https://github.com/SUST-reynole/AMR |
Framework | none |
On Model Stability as a Function of Random Seed
Title | On Model Stability as a Function of Random Seed |
Authors | Pranava Madhyastha, Rishabh Jain |
Abstract | In this paper, we focus on quantifying model stability as a function of random seed by investigating the effects of the induced randomness on model performance and the robustness of the model in general. We specifically perform a controlled study on the effect of random seeds on the behaviour of attention, gradient-based and surrogate model based (LIME) interpretations. Our analysis suggests that random seeds can adversely affect the consistency of models resulting in counterfactual interpretations. We propose a technique called Aggressive Stochastic Weight Averaging (ASWA)and an extension called Norm-filtered Aggressive Stochastic Weight Averaging (NASWA) which improves the stability of models over random seeds. With our ASWA and NASWA based optimization, we are able to improve the robustness of the original model, on average reducing the standard deviation of the model’s performance by 72%. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10447v1 |
https://arxiv.org/pdf/1909.10447v1.pdf | |
PWC | https://paperswithcode.com/paper/190910447 |
Repo | https://github.com/rishj97/ModelStability |
Framework | pytorch |
Review-Driven Answer Generation for Product-Related Questions in E-Commerce
Title | Review-Driven Answer Generation for Product-Related Questions in E-Commerce |
Authors | Shiqian Chen, Chenliang Li, Feng Ji, Wei Zhou, Haiqing Chen |
Abstract | The users often have many product-related questions before they make a purchase decision in E-commerce. However, it is often time-consuming to examine each user review to identify the desired information. In this paper, we propose a novel review-driven framework for answer generation for product-related questions in E-commerce, named RAGE. We develope RAGE on the basis of the multi-layer convolutional architecture to facilitate speed-up of answer generation with the parallel computation. For each question, RAGE first extracts the relevant review snippets from the reviews of the corresponding product. Then, we devise a mechanism to identify the relevant information from the noise-prone review snippets and incorporate this information to guide the answer generation. The experiments on two real-world E-Commerce datasets show that the proposed RAGE significantly outperforms the existing alternatives in producing more accurate and informative answers in natural language. Moreover, RAGE takes much less time for both model training and answer generation than the existing RNN based generation models. |
Tasks | |
Published | 2019-04-27 |
URL | http://arxiv.org/abs/1905.01994v1 |
http://arxiv.org/pdf/1905.01994v1.pdf | |
PWC | https://paperswithcode.com/paper/190501994 |
Repo | https://github.com/WHUIR/RAGE |
Framework | tf |
Describing like humans: on diversity in image captioning
Title | Describing like humans: on diversity in image captioning |
Authors | Qingzhong Wang, Antoni B. Chan |
Abstract | Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and multiple levels of detail, and thus there is a variety of captions that express different concepts and details that might be interesting for different humans. Therefore only evaluating accuracy is not sufficient for measuring the performance of captioning models — the diversity of the generated captions should also be considered. In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. We conduct extensive experiments to re-evaluate recent captioning models in the context of both diversity and accuracy. We find that there is still a large gap between the model and human performance in terms of both accuracy and diversity and the models that have optimized accuracy (CIDEr) have low diversity. We also show that balancing the cross-entropy loss and CIDEr reward in reinforcement learning during training can effectively control the tradeoff between diversity and accuracy of the generated captions. |
Tasks | Image Captioning |
Published | 2019-03-28 |
URL | https://arxiv.org/abs/1903.12020v3 |
https://arxiv.org/pdf/1903.12020v3.pdf | |
PWC | https://paperswithcode.com/paper/describing-like-humans-on-diversity-in-image |
Repo | https://github.com/qingzwang/DiversityMetrics |
Framework | tf |
Explaining Deep Classification of Time-Series Data with Learned Prototypes
Title | Explaining Deep Classification of Time-Series Data with Learned Prototypes |
Authors | Alan H. Gee, Diego Garcia-Olano, Joydeep Ghosh, David Paydarfar |
Abstract | The emergence of deep learning networks raises a need for explainable AI so that users and domain experts can be confident applying them to high-risk decisions. In this paper, we leverage data from the latent space induced by deep learning models to learn stereotypical representations or “prototypes” during training to elucidate the algorithmic decision-making process. We study how leveraging prototypes effect classification decisions of two dimensional time-series data in a few different settings: (1) electrocardiogram (ECG) waveforms to detect clinical bradycardia, a slowing of heart rate, in preterm infants, (2) respiration waveforms to detect apnea of prematurity, and (3) audio waveforms to classify spoken digits. We improve upon existing models by optimizing for increased prototype diversity and robustness, visualize how these prototypes in the latent space are used by the model to distinguish classes, and show that prototypes are capable of learning features on two dimensional time-series data to produce explainable insights during classification tasks. We show that the prototypes are capable of learning real-world features - bradycardia in ECG, apnea in respiration, and articulation in speech - as well as features within sub-classes. Our novel work leverages learned prototypical framework on two dimensional time-series data to produce explainable insights during classification tasks. |
Tasks | Decision Making, Time Series |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08935v3 |
https://arxiv.org/pdf/1904.08935v3.pdf | |
PWC | https://paperswithcode.com/paper/explaining-deep-classification-of-time-series |
Repo | https://github.com/alangee/ijcai19-ts-prototypes |
Framework | none |
The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas
Title | The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas |
Authors | Michael P. J. Camilleri, Christopher K. I. Williams |
Abstract | While label fusion from multiple noisy annotations is a well understood concept in data wrangling (tackled for example by the Dawid-Skene (DS) model), we consider the extended problem of carrying out learning when the labels themselves are not consistently annotated with the same schema. We show that even if annotators use disparate, albeit related, label-sets, we can still draw inferences for the underlying full label-set. We propose the Inter-Schema AdapteR (ISAR) to translate the fully-specified label-set to the one used by each annotator, enabling learning under such heterogeneous schemas, without the need to re-annotate the data. We apply our method to a mouse behavioural dataset, achieving significant gains (compared with DS) in out-of-sample log-likelihood (-3.40 to -2.39) and F1-score (0.785 to 0.864). |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01251v2 |
https://arxiv.org/pdf/1906.01251v2.pdf | |
PWC | https://paperswithcode.com/paper/the-extended-dawid-skene-model-fusing |
Repo | https://github.com/michael-camilleri/ISAR-Inter_Schema_AdapteR |
Framework | none |
Neural Jump Stochastic Differential Equations
Title | Neural Jump Stochastic Differential Equations |
Authors | Junteng Jia, Austin R. Benson |
Abstract | Many time series are effectively generated by a combination of deterministic continuous flows along with discrete jumps sparked by stochastic events. However, we usually do not have the equation of motion describing the flows, or how they are affected by jumps. To this end, we introduce Neural Jump Stochastic Differential Equations that provide a data-driven approach to learn continuous and discrete dynamic behavior, i.e., hybrid systems that both flow and jump. Our approach extends the framework of Neural Ordinary Differential Equations with a stochastic process term that models discrete events. We then model temporal point processes with a piecewise-continuous latent trajectory, where the discontinuities are caused by stochastic events whose conditional intensity depends on the latent state. We demonstrate the predictive capabilities of our model on a range of synthetic and real-world marked point process datasets, including classical point processes (such as Hawkes processes), awards on Stack Overflow, medical records, and earthquake monitoring. |
Tasks | Point Processes, Time Series |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10403v3 |
https://arxiv.org/pdf/1905.10403v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-jump-stochastic-differential-equations |
Repo | https://github.com/mitmath/18S096SciML |
Framework | none |
Track to Reconstruct and Reconstruct to Track
Title | Track to Reconstruct and Reconstruct to Track |
Authors | Jonathon Luiten, Tobias Fischer, Bastian Leibe |
Abstract | Object tracking and 3D reconstruction are often performed together, with tracking used as input for reconstruction. However, the obtained reconstructions also provide useful information for improving tracking. We propose a novel method that closes this loop, first tracking to reconstruct, and then reconstructing to track. Our approach, MOTSFusion (Multi-Object Tracking, Segmentation and dynamic object Fusion), exploits the 3D motion extracted from dynamic object reconstructions to track objects through long periods of complete occlusion and to recover missing detections. Our approach first builds up short tracklets using 2D optical flow, and then fuses these into dynamic 3D object reconstructions. The precise 3D object motion of these reconstructions is used to merge tracklets through occlusion into long-term tracks, and to locate objects when detections are missing. On KITTI, our reconstruction-based tracking reduces the number of ID switches of the initial tracklets by more than 50%, and outperforms all previous approaches for both bounding box and segmentation tracking. |
Tasks | 3D Reconstruction, Multi-Object Tracking, Object Tracking, Optical Flow Estimation |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1910.00130v2 |
https://arxiv.org/pdf/1910.00130v2.pdf | |
PWC | https://paperswithcode.com/paper/track-to-reconstruct-and-reconstruct-to-track |
Repo | https://github.com/tobiasfshr/MOTSFusion |
Framework | tf |
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
Title | MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning |
Authors | Shaohuai Shi, Xiaowen Chu, Bo Li |
Abstract | Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks (DNNs) on computer clusters. With the increase of computational power, network communications generally limit the system scalability. Wait-free backpropagation (WFBP) is a popular solution to overlap communications with computations during the training process. In this paper, we observe that many DNNs have a large number of layers with only a small amount of data to be communicated at each layer in distributed training, which could make WFBP inefficient. Based on the fact that merging some short communication tasks into a single one can reduce the overall communication time, we formulate an optimization problem to minimize the training time in pipelining communications and computations. We derive an optimal solution that can be solved efficiently without affecting the training performance. We then apply the solution to propose a distributed training algorithm named merged-gradient WFBP (MG-WFBP) and implement it in two platforms Caffe and PyTorch. Extensive experiments in three GPU clusters are conducted to verify the effectiveness of MG-WFBP. We further exploit the trace-based simulation of 64 GPUs to explore the potential scaling efficiency of MG-WFBP. Experimental results show that MG-WFBP achieves much better scaling performance than existing methods. |
Tasks | |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.09268v1 |
https://arxiv.org/pdf/1912.09268v1.pdf | |
PWC | https://paperswithcode.com/paper/mg-wfbp-merging-gradients-wisely-for |
Repo | https://github.com/HKBU-HPML/MG-WFBP |
Framework | pytorch |
A Flexible Generative Framework for Graph-based Semi-supervised Learning
Title | A Flexible Generative Framework for Graph-based Semi-supervised Learning |
Authors | Jiaqi Ma, Weijing Tang, Ji Zhu, Qiaozhu Mei |
Abstract | We consider a family of problems that are concerned about making predictions for the majority of unlabeled, graph-structured data samples based on a small proportion of labeled samples. Relational information among the data samples, often encoded in the graph/network structure, is shown to be helpful for these semi-supervised learning tasks. However, conventional graph-based regularization methods and recent graph neural networks do not fully leverage the interrelations between the features, the graph, and the labels. In this work, we propose a flexible generative framework for graph-based semi-supervised learning, which approaches the joint distribution of the node features, labels, and the graph structure. Borrowing insights from random graph models in network science literature, this joint distribution can be instantiated using various distribution families. For the inference of missing labels, we exploit recent advances of scalable variational inference techniques to approximate the Bayesian posterior. We conduct thorough experiments on benchmark datasets for graph-based semi-supervised learning. Results show that the proposed methods outperform the state-of-the-art models in most settings. |
Tasks | |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10769v2 |
https://arxiv.org/pdf/1905.10769v2.pdf | |
PWC | https://paperswithcode.com/paper/a-flexible-generative-framework-for-graph |
Repo | https://github.com/jiaqima/GenGNN |
Framework | pytorch |
Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks
Title | Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks |
Authors | Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun |
Abstract | Sequential labeling-based NER approaches restrict each word belonging to at most one entity mention, which will face a serious problem when recognizing nested entity mentions. In this paper, we propose to resolve this problem by modeling and leveraging the head-driven phrase structures of entity mentions, i.e., although a mention can nest other mentions, they will not share the same head word. Specifically, we propose Anchor-Region Networks (ARNs), a sequence-to-nuggets architecture for nested mention detection. ARNs first identify anchor words (i.e., possible head words) of all mentions, and then recognize the mention boundaries for each anchor word by exploiting regular phrase structures. Furthermore, we also design Bag Loss, an objective function which can train ARNs in an end-to-end manner without using any anchor word annotation. Experiments show that ARNs achieve the state-of-the-art performance on three standard nested entity mention detection benchmarks. |
Tasks | Named Entity Recognition, Nested Mention Recognition, Nested Named Entity Recognition |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03783v1 |
https://arxiv.org/pdf/1906.03783v1.pdf | |
PWC | https://paperswithcode.com/paper/sequence-to-nuggets-nested-entity-mention |
Repo | https://github.com/sanmusunrise/ARNs |
Framework | pytorch |