January 26, 2020

3210 words 16 mins read

Paper Group ANR 1357

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. Continual Learning Using World Models for Pseudo-Rehearsal. Layer Dynamics of Linearised Neural Nets. Locally Differentially Private Minimum Finding. Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach. A Case …

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems


Title	SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Authors	Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
Abstract	In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
Tasks	Transfer Learning
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00537v3
PDF	https://arxiv.org/pdf/1905.00537v3.pdf
PWC	https://paperswithcode.com/paper/superglue-a-stickier-benchmark-for-general
Repo
Framework

Continual Learning Using World Models for Pseudo-Rehearsal


Title	Continual Learning Using World Models for Pseudo-Rehearsal
Authors	Nicholas Ketz, Soheil Kolouri, Praveen Pilly
Abstract	The utility of learning a dynamics/world model of the environment in reinforcement learning has been shown in a many ways. When using neural networks, however, these models suffer catastrophic forgetting when learned in a lifelong or continual fashion. Current solutions to the continual learning problem require experience to be segmented and labeled as discrete tasks, however, in continuous experience it is generally unclear what a sufficient segmentation of tasks would be. Here we propose a method to continually learn these internal world models through the interleaving of internally generated episodes of past experiences (i.e., pseudo-rehearsal). We show this method can sequentially learn unsupervised temporal prediction, without task labels, in a disparate set of Atari games. Empirically, this interleaving of the internally generated rollouts with the external environment’s observations leads to a consistent reduction in temporal prediction loss compared to non-interleaved learning and is preserved over repeated random exposures to various tasks. Similarly, using a network distillation approach, we show that modern policy gradient based reinforcement learning algorithms can use this internal model to continually learn to optimize reward based on the world model’s representation of the environment.
Tasks	Atari Games, Continual Learning
Published	2019-03-06
URL	https://arxiv.org/abs/1903.02647v2
PDF	https://arxiv.org/pdf/1903.02647v2.pdf
PWC	https://paperswithcode.com/paper/using-world-models-for-pseudo-rehearsal-in
Repo
Framework

Layer Dynamics of Linearised Neural Nets


Title	Layer Dynamics of Linearised Neural Nets
Authors	Saurav Basu, Koyel Mukherjee, Shrihari Vasudevan
Abstract	Despite the phenomenal success of deep learning in recent years, there remains a gap in understanding the fundamental mechanics of neural nets. More research is focussed on handcrafting complex and larger networks, and the design decisions are often ad-hoc and based on intuition. Some recent research has aimed to demystify the learning dynamics in neural nets by attempting to build a theory from first principles, such as characterising the non-linear dynamics of specialised \textit{linear} deep neural nets (such as orthogonal networks). In this work, we expand and derive properties of learning dynamics respected by general multi-layer linear neural nets. Although an over-parameterisation of a single layer linear network, linear multi-layer neural nets offer interesting insights that explain how learning dynamics proceed in small pockets of the data space. We show in particular that multiple layers in linear nets grow at approximately the same rate, and there are distinct phases of learning with markedly different layer growth. We then apply a linearisation process to a general RelU neural net and show how nonlinearity breaks down the growth symmetry observed in liner neural nets. Overall, our work can be viewed as an initial step in building a theory for understanding the effect of layer design on the learning dynamics from first principles.
Tasks
Published	2019-04-24
URL	http://arxiv.org/abs/1904.10689v1
PDF	http://arxiv.org/pdf/1904.10689v1.pdf
PWC	https://paperswithcode.com/paper/layer-dynamics-of-linearised-neural-nets
Repo
Framework

Locally Differentially Private Minimum Finding


Title	Locally Differentially Private Minimum Finding
Authors	Kazuto Fukuchi, Chia-Mu Yu, Arashi Haishima, Jun Sakuma
Abstract	We investigate a problem of finding the minimum, in which each user has a real value and we want to estimate the minimum of these values under the local differential privacy constraint. We reveal that this problem is fundamentally difficult, and we cannot construct a mechanism that is consistent in the worst case. Instead of considering the worst case, we aim to construct a private mechanism whose error rate is adaptive to the easiness of estimation of the minimum. As a measure of easiness, we introduce a parameter $\alpha$ that characterizes the fatness of the minimum-side tail of the user data distribution. As a result, we reveal that the mechanism can achieve $O((\ln^6N/\epsilon^2N)^{1/2\alpha})$ error without knowledge of $\alpha$ and the error rate is near-optimal in the sense that any mechanism incurs $\Omega((1/\epsilon^2N)^{1/2\alpha})$ error. Furthermore, we demonstrate that our mechanism outperforms a naive mechanism by empirical evaluations on synthetic datasets. Also, we conducted experiments on the MovieLens dataset and a purchase history dataset and demonstrate that our algorithm achieves $\tilde{O}((1/N)^{1/2\alpha})$ error adaptively to $\alpha$.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11067v1
PDF	https://arxiv.org/pdf/1905.11067v1.pdf
PWC	https://paperswithcode.com/paper/locally-differentially-private-minimum
Repo
Framework

Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach. A Case Study on Automatic Classification of Global Terrorist Attacks


Title	Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach. A Case Study on Automatic Classification of Global Terrorist Attacks
Authors	Shadi Diab
Abstract	The objective of this research is to enhance performance of Stochastic Gradient Descent (SGD) algorithm in text classification. In our research, we proposed using SGD learning with Grid-Search approach to fine-tuning hyper-parameters in order to enhance the performance of SGD classification. We explored different settings for representation, transformation and weighting features from the summary description of terrorist attacks incidents obtained from the Global Terrorism Database as a pre-classification step, and validated SGD learning on Support Vector Machine (SVM), Logistic Regression and Perceptron classifiers by stratified 10-K-fold cross-validation to compare the performance of different classifiers embedded in SGD algorithm. The research concludes that using a grid-search to find the hyper-parameters optimize SGD classification, not in the pre-classification settings only, but also in the performance of the classifiers in terms of accuracy and execution time.
Tasks	Text Classification
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06542v2
PDF	http://arxiv.org/pdf/1902.06542v2.pdf
PWC	https://paperswithcode.com/paper/optimizing-stochastic-gradient-descent-in
Repo
Framework

Sampling-Free Learning of Bayesian Quantized Neural Networks


Title	Sampling-Free Learning of Bayesian Quantized Neural Networks
Authors	Jiahao Su, Milan Cvitkovic, Furong Huang
Abstract	Bayesian learning of model parameters in neural networks is important in scenarios where estimates with well-calibrated uncertainty are important. In this paper, we propose Bayesian quantized networks (BQNs), quantized neural networks (QNNs) for which we learn a posterior distribution over their discrete parameters. We provide a set of efficient algorithms for learning and prediction in BQNs without the need to sample from their parameters or activations, which not only allows for differentiable learning in QNNs, but also reduces the variance in gradients. We evaluate BQNs on MNIST, Fashion-MNIST, KMNIST and CIFAR10 image classification datasets, compared against bootstrap ensemble of QNNs (E-QNN). We demonstrate BQNs achieve both lower predictive errors and better-calibrated uncertainties than E-QNN (with less than 20% of the negative log-likelihood).
Tasks	Image Classification
Published	2019-12-06
URL	https://arxiv.org/abs/1912.02992v1
PDF	https://arxiv.org/pdf/1912.02992v1.pdf
PWC	https://paperswithcode.com/paper/sampling-free-learning-of-bayesian-quantized-1
Repo
Framework

SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis


Title	SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis
Authors	Pavel Efimov, Leonid Boytsov, Pavel Braslavski
Abstract	SberQuAD – a large scale analog of Stanford SQuAD in the Russian language - is a valuable resource that has not been properly presented to the scientific community. We fill this gap by providing a description, a thorough analysis, and baseline experimental results.
Tasks	Reading Comprehension
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09723v2
PDF	https://arxiv.org/pdf/1912.09723v2.pdf
PWC	https://paperswithcode.com/paper/sberquad-russian-reading-comprehension
Repo
Framework

Entity-Consistent End-to-end Task-Oriented Dialogue System with KB Retriever


Title	Entity-Consistent End-to-end Task-Oriented Dialogue System with KB Retriever
Authors	Libo Qin, Yijia Liu, Wanxiang Che, Haoyang Wen, Yangming Li, Ting Liu
Abstract	Querying the knowledge base (KB) has long been a challenge in the end-to-end task-oriented dialogue system. Previous sequence-to-sequence (Seq2Seq) dialogue generation work treats the KB query as an attention over the entire KB, without the guarantee that the generated entities are consistent with each other. In this paper, we propose a novel framework which queries the KB in two steps to improve the consistency of generated entities. In the first step, inspired by the observation that a response can usually be supported by a single KB row, we introduce a KB retrieval component which explicitly returns the most relevant KB row given a dialogue history. The retrieval result is further used to filter the irrelevant entities in a Seq2Seq response generation model to improve the consistency among the output entities. In the second step, we further perform the attention mechanism to address the most correlated KB column. Two methods are proposed to make the training feasible without labeled retrieval data, which include distant supervision and Gumbel-Softmax technique. Experiments on two publicly available task oriented dialog datasets show the effectiveness of our model by outperforming the baseline systems and producing entity-consistent responses.
Tasks	Dialogue Generation
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06762v2
PDF	https://arxiv.org/pdf/1909.06762v2.pdf
PWC	https://paperswithcode.com/paper/entity-consistent-end-to-end-task-oriented
Repo
Framework

Designovel’s system description for Fashion-IQ challenge 2019


Title	Designovel’s system description for Fashion-IQ challenge 2019
Authors	Jianri Li, Jae-whan Lee, Woo-sang Song, Ki-young Shin, Byung-hyun Go
Abstract	This paper describes Designovel’s systems which are submitted to the Fashion IQ Challenge 2019. Goal of the challenge is building an image retrieval system where input query is a candidate image plus two text phrases describe user’s feedback about visual differences between the candidate image and the search target. We built the systems by combining methods from recent work on deep metric learning, multi-modal retrieval and natual language processing. First, we encode both candidate and target images with CNNs into high-level representations, and encode text descriptions to a single text vector using Transformer-based encoder. Then we compose candidate image vector and text representation into a single vector which is exptected to be biased toward target image vector. Finally, we compute cosine similarities between composed vector and encoded vectors of whole dataset, and rank them in desceding order to get ranked list. We experimented with Fashion IQ 2019 dataset in various settings of hyperparameters, achieved 39.12% average recall by a single model and 43.67% average recall by an ensemble of 16 models on test dataset.
Tasks	Image Retrieval, Metric Learning
Published	2019-10-21
URL	https://arxiv.org/abs/1910.11119v1
PDF	https://arxiv.org/pdf/1910.11119v1.pdf
PWC	https://paperswithcode.com/paper/designovels-system-description-for-fashion-iq
Repo
Framework

Lung segmentation on chest x-ray images in patients with severe abnormal findings using deep learning


Title	Lung segmentation on chest x-ray images in patients with severe abnormal findings using deep learning
Authors	Mizuho Nishio, Koji Fujimoto, Kaori Togashi
Abstract	Rationale and objectives: Several studies have evaluated the usefulness of deep learning for lung segmentation using chest x-ray (CXR) images with small- or medium-sized abnormal findings. Here, we built a database including both CXR images with severe abnormalities and experts’ lung segmentation results, and aimed to evaluate our network’s efficacy in lung segmentation from these images. Materials and Methods: For lung segmentation, CXR images from the Japanese Society of Radiological Technology (JSRT, N = 247) and Montgomery databases (N = 138), were included, and 65 additional images depicting severe abnormalities from a public database were evaluated and annotated by a radiologist, thereby adding lung segmentation results to these images. Baseline U-net was used to segment the lungs in images from the three databases. Subsequently, the U-net network architecture was automatically optimized for lung segmentation from CXR images using Bayesian optimization. Dice similarity coefficient (DSC) was calculated to confirm segmentation. Results: Our results demonstrated that using baseline U-net yielded poorer lung segmentation results in our database than those in the JSRT and Montgomery databases, implying that robust segmentation of lungs may be difficult because of severe abnormalities. The DSC values with baseline U-net for the JSRT, Montgomery and our databases were 0.979, 0.941, and 0.889, respectively, and with optimized U-net, 0.976, 0.973, and 0.932, respectively. Conclusion: For robust lung segmentation, the U-net architecture was optimized via Bayesian optimization, and our results demonstrate that the optimized U-net was more robust than baseline U-net in lung segmentation from CXR images with large-sized abnormalities.
Tasks
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07704v1
PDF	https://arxiv.org/pdf/1908.07704v1.pdf
PWC	https://paperswithcode.com/paper/lung-segmentation-on-chest-x-ray-images-in
Repo
Framework

Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning


Title	Deep Semantic Parsing of Freehand Sketches with Homogeneous Transformation, Soft-Weighted Loss, and Staged Learning
Authors	Ying Zheng, Hongxun Yao, Xiaoshuai Sun
Abstract	In this paper, we propose a novel deep framework for part-level semantic parsing of freehand sketches, which makes three main contributions that are experimentally shown to have substantial practical merit. First, we introduce a new idea named homogeneous transformation to address the problem of domain adaptation. For the task of sketch parsing, there is no available data of labeled freehand sketches that can be directly used for model training. An alternative solution is to learn from the existing parsing data of real images, while the domain adaptation is an inevitable problem. Unlike existing methods that utilize the edge maps of real images to approximate freehand sketches, the proposed homogeneous transformation method transforms the data from two different domains into a homogeneous space to minimize the semantic gap. Second, we design a soft-weighted loss function as guidance for the training process, which gives attention to both the ambiguous label boundary and class imbalance. Third, we present a staged learning strategy to improve the parsing performance of the trained model, which takes advantage of the shared information and specific characteristic from different sketch categories. Extensive experimental results demonstrate the effectiveness of these methods. Specifically, to evaluate the generalization ability of our homogeneous transformation method, additional experiments at the task of sketch-based image retrieval are conducted on the QMUL FG-SBIR dataset. By integrating the proposed three methods into a unified framework, our final deep semantic sketch parsing (DeepSSP) model achieves the state-of-the-art on the public SketchParse dataset.
Tasks	Domain Adaptation, Image Retrieval, Semantic Parsing, Sketch-Based Image Retrieval
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06023v1
PDF	https://arxiv.org/pdf/1910.06023v1.pdf
PWC	https://paperswithcode.com/paper/deep-semantic-parsing-of-freehand-sketches
Repo
Framework

Modelling Semantic Categories using Conceptual Neighborhood


Title	Modelling Semantic Categories using Conceptual Neighborhood
Authors	Zied Bouraoui, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
Abstract	While many methods for learning vector space embeddings have been proposed in the field of Natural Language Processing, these methods typically do not distinguish between categories and individuals. Intuitively, if individuals are represented as vectors, we can think of categories as (soft) regions in the embedding space. Unfortunately, meaningful regions can be difficult to estimate, especially since we often have few examples of individuals that belong to a given category. To address this issue, we rely on the fact that different categories are often highly interdependent. In particular, categories often have conceptual neighbors, which are disjoint from but closely related to the given category (e.g.\ fruit and vegetable). Our hypothesis is that more accurate category representations can be learned by relying on the assumption that the regions representing such conceptual neighbors should be adjacent in the embedding space. We propose a simple method for identifying conceptual neighbors and then show that incorporating these conceptual neighbors indeed leads to more accurate region based representations.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01220v1
PDF	https://arxiv.org/pdf/1912.01220v1.pdf
PWC	https://paperswithcode.com/paper/modelling-semantic-categories-using
Repo
Framework

TraffickCam: Explainable Image Matching For Sex Trafficking Investigations


Title	TraffickCam: Explainable Image Matching For Sex Trafficking Investigations
Authors	Abby Stylianou, Richard Souvenir, Robert Pless
Abstract	Investigations of sex trafficking sometimes have access to photographs of victims in hotel rooms. These images directly link victims to places, which can help verify where victims have been trafficked or where traffickers might operate in the future. Current machine learning approaches give promising results in image search to find the matching hotel. This paper explores approaches to make this end-to-end system better support government and law enforcement requirements, including improved performance, visualization approaches that explain what parts of the image led to a match, and infrastructure to support exporting the results of a query.
Tasks	Image Retrieval
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03455v1
PDF	https://arxiv.org/pdf/1910.03455v1.pdf
PWC	https://paperswithcode.com/paper/traffickcam-explainable-image-matching-for
Repo
Framework

TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir


Title	TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir
Authors	Tao B. Schardl, Siddharth Samsi
Abstract	This work introduces TapirXLA, a replacement for TensorFlow’s XLA compiler that embeds recursive fork-join parallelism into XLA’s low-level representation of code. Machine-learning applications rely on efficient parallel processing to achieve performance, and they employ a variety of technologies to improve performance, including compiler technology. But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation. This work studies how Tapir, a compiler intermediate representation (IR) that embeds parallelism into a mainstream compiler IR, can be incorporated into a compiler for machine learning to remedy this problem. TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation. TapirXLA encodes the parallelism within high-level TensorFlow operations using Tapir’s representation of fork-join parallelism. TapirXLA also exposes to the compiler implementations of linear-algebra library routines whose parallel operations are encoded using Tapir’s representation. We compared the performance of TensorFlow using TapirXLA against TensorFlow using an unmodified XLA compiler. On four neural-network benchmarks, TapirXLA speeds up the parallel running time of the network by a geometric-mean multiplicative factor of 30% to 100%, across four CPU architectures.
Tasks
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11338v1
PDF	https://arxiv.org/pdf/1908.11338v1.pdf
PWC	https://paperswithcode.com/paper/tapirxla-embedding-fork-join-parallelism-into
Repo
Framework

Attention-Gated Graph Convolutions for Extracting Drug Interaction Information from Drug Labels


Title	Attention-Gated Graph Convolutions for Extracting Drug Interaction Information from Drug Labels
Authors	Tung Tran, Ramakanth Kavuluru, Halil Kilicoglu
Abstract	Preventable adverse events as a result of medical errors present a growing concern in the healthcare system. As drug-drug interactions (DDIs) may lead to preventable adverse events, being able to extract DDIs from drug labels into a machine-processable form is an important step toward effective dissemination of drug safety information. In this study, we tackle the problem of jointly extracting drugs and their interactions, including interaction outcome, from drug labels. Our deep learning approach entails composing various intermediate representations including sequence and graph based context, where the latter is derived using graph convolutions (GC) with a novel attention-based gating mechanism (holistically called GCA). These representations are then composed in meaningful ways to handle all subtasks jointly. To overcome scarcity in training data, we additionally propose transfer learning by pre-training on related DDI data. Our model is trained and evaluated on the 2018 TAC DDI corpus. Our GCA model in conjunction with transfer learning performs at 39.20% F1 and 26.09% F1 on entity recognition (ER) and relation extraction (RE) respectively on the first official test set and at 45.30% F1 and 27.87% F1 on ER and RE respectively on the second official test set corresponding to an improvement over our prior best results by up to 6 absolute F1 points. After controlling for available training data, our model exhibits state-of-the-art performance by improving over the next comparable best outcome by roughly three F1 points in ER and 1.5 F1 points in RE evaluation across two official test sets.
Tasks	Relation Extraction, Transfer Learning
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12419v2
PDF	https://arxiv.org/pdf/1910.12419v2.pdf
PWC	https://paperswithcode.com/paper/attention-gated-graph-convolution-for
Repo
Framework