Paper Group ANR 1668
Natural Language Generation Challenges for Explainable AI. STG2Seq: Spatial-temporal Graph to Sequence Model for Multi-step Passenger Demand Forecasting. HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing. Multilevel Initialization for Layer-Parallel Deep Neural Network Training. Distributed creation of Machine learning …
Natural Language Generation Challenges for Explainable AI
Title | Natural Language Generation Challenges for Explainable AI |
Authors | Ehud Reiter |
Abstract | Good quality explanations of artificial intelligence (XAI) reasoning must be written (and evaluated) for an explanatory purpose, targeted towards their readers, have a good narrative and causal structure, and highlight where uncertainty and data quality affect the AI output. I discuss these challenges from a Natural Language Generation (NLG) perspective, and highlight four specific NLG for XAI research challenges. |
Tasks | Text Generation |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08794v1 |
https://arxiv.org/pdf/1911.08794v1.pdf | |
PWC | https://paperswithcode.com/paper/natural-language-generation-challenges-for |
Repo | |
Framework | |
STG2Seq: Spatial-temporal Graph to Sequence Model for Multi-step Passenger Demand Forecasting
Title | STG2Seq: Spatial-temporal Graph to Sequence Model for Multi-step Passenger Demand Forecasting |
Authors | Lei Bai, Lina Yao, Salil. S Kanhere, Xianzhi Wang, Quan. Z Sheng |
Abstract | Multi-step passenger demand forecasting is a crucial task in on-demand vehicle sharing services. However, predicting passenger demand over multiple time horizons is generally challenging due to the nonlinear and dynamic spatial-temporal dependencies. In this work, we propose to model multi-step citywide passenger demand prediction based on a graph and use a hierarchical graph convolutional structure to capture both spatial and temporal correlations simultaneously. Our model consists of three parts: 1) a long-term encoder to encode historical passenger demands; 2) a short-term encoder to derive the next-step prediction for generating multi-step prediction; 3) an attention-based output module to model the dynamic temporal and channel-wise information. Experiments on three real-world datasets show that our model consistently outperforms many baseline methods and state-of-the-art models. |
Tasks | Graph-to-Sequence |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10069v1 |
https://arxiv.org/pdf/1905.10069v1.pdf | |
PWC | https://paperswithcode.com/paper/stg2seq-spatial-temporal-graph-to-sequence |
Repo | |
Framework | |
HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing
Title | HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing |
Authors | Wei Jiang, Zhenghua Li, Yu Zhang, Min Zhang |
Abstract | This paper describes a simple UCCA semantic graph parsing approach. The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote edges and discontinuous nodes for future recovery. In this way, we can make use of existing syntactic parsing techniques. Based on the data statistics, we recover discontinuous nodes directly according to the output labels of the constituent parser and use a biaffine classification model to recover the more complex remote edges. The classification model and the constituent parser are simultaneously trained under the multi-task learning framework. We use the multilingual BERT as extra features in the open tracks. Our system ranks the first place in the six English/German closed/open tracks among seven participating systems. For the seventh cross-lingual track, where there is little training data for French, we propose a language embedding approach to utilize English and German training data, and our result ranks the second place. |
Tasks | Multi-Task Learning |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04153v2 |
http://arxiv.org/pdf/1903.04153v2.pdf | |
PWC | https://paperswithcode.com/paper/hltsuda-at-semeval-2019-task-1-ucca-graph |
Repo | |
Framework | |
Multilevel Initialization for Layer-Parallel Deep Neural Network Training
Title | Multilevel Initialization for Layer-Parallel Deep Neural Network Training |
Authors | Eric C. Cyr, Stefanie Günther, Jacob B. Schroder |
Abstract | This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks enabled by scalable layer-parallel training. To do this, we apply a refinement strategy across the time domain, that is equivalent to refining in the layer dimension. The resulting refinements create deep networks, with good initializations for the network parameters coming from the coarser trained networks. We investigate the effectiveness of such multilevel “nested iteration” strategies for network training, showing supporting numerical evidence of reduced run time for equivalent accuracy. In addition, we study whether the initialization strategies provide a regularizing effect on the overall training process and reduce sensitivity to hyperparameters and randomness in initial network parameters. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.08974v1 |
https://arxiv.org/pdf/1912.08974v1.pdf | |
PWC | https://paperswithcode.com/paper/multilevel-initialization-for-layer-parallel |
Repo | |
Framework | |
Distributed creation of Machine learning agents for Blockchain analysis
Title | Distributed creation of Machine learning agents for Blockchain analysis |
Authors | Zvezdin Besarabov, Todor Kolev |
Abstract | Creating efficient deep neural networks involves repetitive manual optimization of the topology and the hyperparameters. This human intervention significantly inhibits the process. Recent publications propose various Neural Architecture Search (NAS) algorithms that automate this work. We have applied a customized NAS algorithm with network morphism and Bayesian optimization to the problem of cryptocurrency predictions, where it achieved results on par with our best manually designed models. This is consistent with the findings of other teams, while several known experiments suggest that given enough computing power, NAS algorithms can surpass state-of-the-art neural network models designed by humans. In this paper, we propose a blockchain network protocol that incentivises independent computing nodes to run NAS algorithms and compete in finding better neural network models for a particular task. If implemented, such network can be an autonomous and self-improving source of machine learning models, significantly boosting and democratizing the access to AI capabilities for many industries. |
Tasks | Neural Architecture Search |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.03848v1 |
https://arxiv.org/pdf/1909.03848v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-creation-of-machine-learning |
Repo | |
Framework | |
GAN- vs. JPEG2000 Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation
Title | GAN- vs. JPEG2000 Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation |
Authors | Jonas Löhdefink, Andreas Bär, Nico M. Schmidt, Fabian Hüger, Peter Schlicht, Tim Fingscheidt |
Abstract | The high amount of sensors required for autonomous driving poses enormous challenges on the capacity of automotive bus systems. There is a need to understand tradeoffs between bitrate and perception performance. In this paper, we compare the image compression standards JPEG, JPEG2000, and WebP to a modern encoder/decoder image compression approach based on generative adversarial networks (GANs). We evaluate both the pure compression performance using typical metrics such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and others, but also the performance of a subsequent perception function, namely a semantic segmentation (characterized by the mean intersection over union (mIoU) measure). Not surprisingly, for all investigated compression methods, a higher bitrate means better results in all investigated quality metrics. Interestingly, however, we show that the semantic segmentation mIoU of the GAN autoencoder in the highly relevant low-bitrate regime (at 0.0625 bit/pixel) is better by 3.9% absolute than JPEG2000, although the latter still is considerably better in terms of PSNR (5.91 dB difference). This effect can greatly be enlarged by training the semantic segmentation model with images originating from the decoder, so that the mIoU using the segmentation model trained by GAN reconstructions exceeds the use of the model trained with original images by almost 20% absolute. We conclude that distributed perception in future autonomous driving will most probably not provide a solution to the automotive bus capacity bottleneck by using standard compression schemes such as JPEG2000, but requires modern coding approaches, with the GAN encoder/decoder method being a promising candidate. |
Tasks | Autonomous Driving, Image Compression, Semantic Segmentation |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04311v1 |
http://arxiv.org/pdf/1902.04311v1.pdf | |
PWC | https://paperswithcode.com/paper/gan-vs-jpeg2000-image-compression-for |
Repo | |
Framework | |
Distributional Negative Sampling for Knowledge Base Completion
Title | Distributional Negative Sampling for Knowledge Base Completion |
Authors | Sarthak Dash, Alfio Gliozzo |
Abstract | State-of-the-art approaches for Knowledge Base Completion (KBC) exploit deep neural networks trained with both false and true assertions: positive assertions are explicitly taken from the knowledge base, whereas negative ones are generated by random sampling of entities. In this paper, we argue that random sampling is not a good training strategy since it is highly likely to generate a huge number of nonsensical assertions during training, which does not provide relevant training signal to the system. Hence, it slows down the learning process and decreases accuracy. To address this issue, we propose an alternative approach called Distributional Negative Sampling that generates meaningful negative examples which are highly likely to be false. Our approach achieves a significant improvement in Mean Reciprocal Rank values amongst two different KBC algorithms in three standard academic benchmarks. |
Tasks | Knowledge Base Completion |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.06178v1 |
https://arxiv.org/pdf/1908.06178v1.pdf | |
PWC | https://paperswithcode.com/paper/distributional-negative-sampling-for |
Repo | |
Framework | |
Analysis of User Dwell Time by Category in News Application
Title | Analysis of User Dwell Time by Category in News Application |
Authors | Yoshifumi Seki, Mitsuo Yoshida |
Abstract | Dwell time indicates how long a user looked at a page, and this is used especially in fields where ratings from users such as search engines, recommender systems, and advertisements are important. Despite the importance of this index, however, its characteristics are not well known. In this paper, we analyze the dwell time of news pages according to category in smartphone application. Our aim is to clarify the characteristics of dwell time and the relation between length of news page and dwell time, for each category. The results indicated different dwell time trends for each category. For example, the social category had fewer news pages with shorter dwell time than peaks, compared to other categories, and there were a few news pages with remarkably short dwell time. We also found a large difference by category in the correlation value between dwell time and length of news page. Specifically, political news had the highest correlation value and technology news had the lowest. In addition, we found that a user tends to get sufficient information about the news content from the news title in short dwell times. |
Tasks | Recommendation Systems |
Published | 2019-08-23 |
URL | https://arxiv.org/abs/1908.08690v1 |
https://arxiv.org/pdf/1908.08690v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-user-dwell-time-by-category-in |
Repo | |
Framework | |
The accuracy vs. coverage trade-off in patient-facing diagnosis models
Title | The accuracy vs. coverage trade-off in patient-facing diagnosis models |
Authors | Anitha Kannan, Jason Alan Fries, Eric Kramer, Jen Jen Chen, Nigam Shah, Xavier Amatriain |
Abstract | A third of adults in America use the Internet to diagnose medical concerns, and online symptom checkers are increasingly part of this process. These tools are powered by diagnosis models similar to clinical decision support systems, with the primary difference being the coverage of symptoms and diagnoses. To be useful to patients and physicians, these models must have high accuracy while covering a meaningful space of symptoms and diagnoses. To the best of our knowledge, this paper is the first in studying the trade-off between the coverage of the model and its performance for diagnosis. To this end, we learn diagnosis models with different coverage from EHR data. We find a 1% drop in top-3 accuracy for every 10 diseases added to the coverage. We also observe that complexity for these models does not affect performance, with linear models performing as well as neural networks. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.08041v1 |
https://arxiv.org/pdf/1912.08041v1.pdf | |
PWC | https://paperswithcode.com/paper/the-accuracy-vs-coverage-trade-off-in-patient |
Repo | |
Framework | |
Application of Deep Learning in Generating Desired Design Options: Experiments Using Synthetic Training Dataset
Title | Application of Deep Learning in Generating Desired Design Options: Experiments Using Synthetic Training Dataset |
Authors | Zohreh Shaghaghian, Wei Yan |
Abstract | Most design methods contain a forward framework, asking for primary specifications of a building to generate an output or assess its performance. However, architects urge for specific objectives though uncertain of the proper design parameters. Deep Learning (DL) algorithms provide an intelligent workflow in which the system can learn from sequential training experiments. This study applies a method using DL algorithms towards generating demanded design options. In this study, an object recognition problem is investigated to initially predict the label of unseen sample images based on training dataset consisting of different types of synthetic 2D shapes; later, a generative DL algorithm is applied to be trained and generate new shapes for given labels. In the next step, the algorithm is trained to generate a window/wall pattern for desired light/shadow performance based on the spatial daylight autonomy (sDA) metrics. The experiments show promising results both in predicting unseen sample shapes and generating new design options. |
Tasks | Object Recognition |
Published | 2019-12-28 |
URL | https://arxiv.org/abs/2001.05849v1 |
https://arxiv.org/pdf/2001.05849v1.pdf | |
PWC | https://paperswithcode.com/paper/application-of-deep-learning-in-generating |
Repo | |
Framework | |
DISCO: Depth Inference from Stereo using Context
Title | DISCO: Depth Inference from Stereo using Context |
Authors | Kunal Swami, Kaushik Raghavan, Nikhilanj Pelluri, Rituparna Sarkar, Pankaj Bajpai |
Abstract | Recent deep learning based approaches have outperformed classical stereo matching methods. However, current deep learning based end-to-end stereo matching methods adopt a generic encoder-decoder style network with skip connections. To limit computational requirement, many networks perform excessive down sampling, which results in significant loss of useful low-level information. Additionally, many network designs do not exploit the rich multi-scale contextual information. In this work, we address these aforementioned problems by carefully designing the network architecture to preserve required spatial information throughout the network, while at the same time achieve large effective receptive field to extract multiscale contextual information. For the first time, we create a synthetic disparity dataset reflecting real life images captured using a smartphone; this enables us to obtain state-of-the-art results on common real life images. The proposed model DISCO is pre-trained on the synthetic Scene Flow dataset and evaluated on popular benchmarks and our in-house dataset of challenging real life images. The proposed model outperforms existing state-of-the-art methods in terms of quality as well as quantitative metrics. |
Tasks | Stereo Matching, Stereo Matching Hand |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00050v1 |
https://arxiv.org/pdf/1906.00050v1.pdf | |
PWC | https://paperswithcode.com/paper/190600050 |
Repo | |
Framework | |
Design of Communication Systems using Deep Learning: A Variational Inference Perspective
Title | Design of Communication Systems using Deep Learning: A Variational Inference Perspective |
Authors | Vishnu Raj, Sheetal Kalyani |
Abstract | Recent research in the design of end to end communication system using deep learning has produced models which can outperform traditional communication schemes. Most of these architectures leveraged autoencoders to design the encoder at the transmitter and decoder at the receiver and train them jointly by modeling transmit symbols as latent codes from the encoder. However, in communication systems, the receiver has to work with noise corrupted versions of transmit symbols. Traditional autoencoders are not designed to work with latent codes corrupted with noise. In this work, we provide a framework to design end to end communication systems which accounts for the existence of noise corrupted transmit symbols. The proposed method uses deep neural architecture. An objective function for optimizing these models is derived based on the concepts of variational inference. Further, domain knowledge such as channel type can be systematically integrated into the objective. Through numerical simulation, the proposed method is shown to consistently produce models with better packing density and achieving it faster in multiple popular channel models as compared to the previous works leveraging deep learning models. |
Tasks | |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08559v3 |
https://arxiv.org/pdf/1904.08559v3.pdf | |
PWC | https://paperswithcode.com/paper/design-of-communication-systems-using-deep |
Repo | |
Framework | |
On the Optimality of Trees Generated by ID3
Title | On the Optimality of Trees Generated by ID3 |
Authors | Alon Brutzkus, Amit Daniely, Eran Malach |
Abstract | Since its inception in the 1980s, ID3 has become one of the most successful and widely used algorithms for learning decision trees. However, its theoretical properties remain poorly understood. In this work, we introduce a novel metric of a decision tree algorithm’s performance, called mean iteration statistical consistency (MIC), which measures optimality of trees generated by ID3. As opposed to previous metrics, MIC can differentiate between different decision tree algorithms and compare their performance. We provide theoretical and empirical evidence that the TopDown variant of ID3, introduced by Kearns and Mansour (1996), has near-optimal MIC in various settings for learning read-once DNFs under product distributions. In contrast, another widely used variant of ID3 has MIC which is not near-optimal. We show that the MIC analysis predicts well the performance of these algorithms in practice. Our results present a novel view of decision tree algorithms which may lead to better and more practical guarantees for these algorithms. |
Tasks | |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05444v2 |
https://arxiv.org/pdf/1907.05444v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-optimality-of-trees-generated-by-id3 |
Repo | |
Framework | |
Revisiting Stochastic Extragradient
Title | Revisiting Stochastic Extragradient |
Authors | Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtárik, Yura Malitsky |
Abstract | We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy that is motivated by approximating implicit updates. Since the existing stochastic extragradient algorithm, called Mirror-Prox, of (Juditsky et al., 2011) diverges on a simple bilinear problem when the domain is not bounded, we prove guarantees for solving variational inequality that go beyond existing settings. Furthermore, we illustrate numerically that the proposed variant converges faster than many other methods on bilinear saddle-point problems. We also discuss how extragradient can be applied to training Generative Adversarial Networks (GANs) and how it compares to other methods. Our experiments on GANs demonstrate that the introduced approach may make the training faster in terms of data passes, while its higher iteration complexity makes the advantage smaller. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11373v2 |
https://arxiv.org/pdf/1905.11373v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-stochastic-extragradient |
Repo | |
Framework | |
Learning to Ask Unanswerable Questions for Machine Reading Comprehension
Title | Learning to Ask Unanswerable Questions for Machine Reading Comprehension |
Authors | Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu |
Abstract | Machine reading comprehension with unanswerable questions is a challenging task. In this work, we propose a data augmentation technique by automatically generating relevant unanswerable questions according to an answerable question paired with its corresponding paragraph that contains the answer. We introduce a pair-to-sequence model for unanswerable question generation, which effectively captures the interactions between the question and the paragraph. We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset. Experimental results show that the pair-to-sequence model performs consistently better compared with the sequence-to-sequence baseline. We further use the automatically generated unanswerable questions as a means of data augmentation on the SQuAD 2.0 dataset, yielding 1.9 absolute F1 improvement with BERT-base model and 1.7 absolute F1 improvement with BERT-large model. |
Tasks | Data Augmentation, Machine Reading Comprehension, Question Generation, Reading Comprehension |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06045v1 |
https://arxiv.org/pdf/1906.06045v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-ask-unanswerable-questions-for |
Repo | |
Framework | |