April 3, 2020

3105 words 15 mins read

Paper Group AWR 59

Paper Group AWR 59

Beheshti-NER: Persian Named Entity Recognition Using BERT. Learnergy: Energy-based Machine Learners. ReZero is All You Need: Fast Convergence at Large Depth. Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits. GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers. Adaptive Paramete …

Beheshti-NER: Persian Named Entity Recognition Using BERT

Title Beheshti-NER: Persian Named Entity Recognition Using BERT
Authors Ehsan Taher, Seyed Abbas Hoseini, Mehrnoush Shamsfard
Abstract Named entity recognition is a natural language processing task to recognize and extract spans of text associated with named entities and classify them in semantic Categories. Google BERT is a deep bidirectional language model, pre-trained on large corpora that can be fine-tuned to solve many NLP tasks such as question answering, named entity recognition, part of speech tagging and etc. In this paper, we use the pre-trained deep bidirectional network, BERT, to make a model for named entity recognition in Persian. We also compare the results of our model with the previous state of the art results achieved on Persian NER. Our evaluation metric is CONLL 2003 score in two levels of word and phrase. This model achieved second place in NSURL-2019 task 7 competition which associated with NER for the Persian language. our results in this competition are 83.5 and 88.4 f1 CONLL score respectively in phrase and word level evaluation.
Tasks Language Modelling, Named Entity Recognition, Part-Of-Speech Tagging, Question Answering
Published 2020-03-19
URL https://arxiv.org/abs/2003.08875v1
PDF https://arxiv.org/pdf/2003.08875v1.pdf
PWC https://paperswithcode.com/paper/beheshti-ner-persian-named-entity-recognition
Repo https://github.com/sEhsanTaher/Beheshti-NER
Framework none

Learnergy: Energy-based Machine Learners

Title Learnergy: Energy-based Machine Learners
Authors Mateus Roder, Gustavo Henrique de Rosa, João Paulo Papa
Abstract Throughout the last years, machine learning techniques have been broadly encouraged in the context of deep learning architectures. An interesting algorithm denoted as Restricted Boltzmann Machine relies on energy- and probabilistic-based nature to tackle with the most diverse applications, such as classification, reconstruction, and generation of images and signals. Nevertheless, one can see they are not adequately renowned when compared to other well-known deep learning techniques, e.g., Convolutional Neural Networks. Such behavior promotes the lack of researches and implementations around the literature, coping with the challenge of sufficiently comprehending these energy-based systems. Therefore, in this paper, we propose a Python-inspired framework in the context of energy-based architectures, denoted as Learnergy. Essentially, Learnergy is built upon PyTorch for providing a more friendly environment and a faster prototyping workspace, as well as, possibility the usage of CUDA computations, speeding up their computational time.
Published 2020-03-16
URL https://arxiv.org/abs/2003.07443v1
PDF https://arxiv.org/pdf/2003.07443v1.pdf
PWC https://paperswithcode.com/paper/learnergy-energy-based-machine-learners
Repo https://github.com/gugarosa/learnergy
Framework pytorch

ReZero is All You Need: Fast Convergence at Large Depth

Title ReZero is All You Need: Fast Convergence at Large Depth
Authors Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, Julian McAuley
Abstract Deep networks have enabled significant performance gains across domains, but they often suffer from vanishing/exploding gradients. This is especially true for Transformer architectures where depth beyond 12 layers is difficult to train without large datasets and computational budgets. In general, we find that inefficient signal propagation impedes learning in deep networks. In Transformers, multi-head self-attention is the main cause of this poor signal propagation. To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. We apply this technique to language modeling and find that we can easily train ReZero-Transformer networks over a hundred layers. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10.
Tasks Language Modelling
Published 2020-03-10
URL https://arxiv.org/abs/2003.04887v1
PDF https://arxiv.org/pdf/2003.04887v1.pdf
PWC https://paperswithcode.com/paper/rezero-is-all-you-need-fast-convergence-at
Repo https://github.com/fabio-deep/ReZero-ResNet
Framework pytorch

Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits

Title Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits
Authors Michael Chesser, Asangi Jayatilaka, Renuka Visvanathan, Christophe Fumeaux, Alanson Sample, Damith C. Ranasinghe
Abstract Falls have serious consequences and are prevalent in acute hospitals and nursing homes caring for older people. Most falls occur in bedrooms and near the bed. Technological interventions to mitigate the risk of falling aim to automatically monitor bed-exit events and subsequently alert healthcare personnel to provide timely supervisions. We observe that frequency-domain information related to patient activities exist predominantly in very low frequencies. Therefore, we recognise the potential to employ a low resolution acceleration sensing modality in contrast to powering and sensing with a conventional MEMS (Micro Electro Mechanical System) accelerometer. Consequently, we investigate a batteryless sensing modality with low cost wirelessly powered Radio Frequency Identification (RFID) technology with the potential for convenient integration into clothing, such as hospital gowns. We design and build a passive accelerometer-based RFID sensor embodiment—ID-Sensor—for our study. The sensor design allows deriving ultra low resolution acceleration data from the rate of change of unique RFID tag identifiers in accordance with the movement of a patient’s upper body. We investigate two convolutional neural network architectures for learning from raw RFID-only data streams and compare performance with a traditional shallow classifier with engineered features. We evaluate performance with 23 hospitalized older patients. We demonstrate, for the first time and to the best of knowledge, that: i) the low resolution acceleration data embedded in the RF powered ID-Sensor data stream can provide a practicable method for activity recognition; and ii) highly discriminative features can be efficiently learned from the raw RFID-only data stream using a fully convolutional network architecture.
Tasks Activity Recognition
Published 2020-03-19
URL https://arxiv.org/abs/2003.08530v1
PDF https://arxiv.org/pdf/2003.08530v1.pdf
PWC https://paperswithcode.com/paper/super-low-resolution-rf-powered
Repo https://github.com/AdelaideAuto-IDLab/ID-Sensor
Framework tf

GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers

Title GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers
Authors Fabio Massimo Zanzotto, Viviana Bono, Paola Vocca, Andrea Santilli, Danilo Croce, Giorgio Gambosi, Roberto Basili
Abstract Creativity is one of the driving forces of human kind as it allows to break current understanding to envision new ideas, which may revolutionize entire fields of knowledge. Scientific research offers a challenging environment where to learn a model for the creative process. In fact, scientific research is a creative act in the formal settings of the scientific method and this creative act is described in articles. In this paper, we dare to introduce the novel, scientifically and philosophically challenging task of Generating Abstracts of Scientific Papers from abstracts of cited papers (GASP) as a text-to-text task to investigate scientific creativity, To foster research in this novel, challenging task, we prepared a dataset by using services where that solve the problem of copyright and, hence, the dataset is public available with its standard split. Finally, we experimented with two vanilla summarization systems to start the analysis of the complexity of the GASP task.
Published 2020-02-28
URL https://arxiv.org/abs/2003.04996v1
PDF https://arxiv.org/pdf/2003.04996v1.pdf
PWC https://paperswithcode.com/paper/gasp-generating-abstracts-of-scientific
Repo https://github.com/ART-Group-it/GASP
Framework none

Adaptive Parameterization for Neural Dialogue Generation

Title Adaptive Parameterization for Neural Dialogue Generation
Authors Hengyi Cai, Hongshen Chen, Cheng Zhang, Yonghao Song, Xiaofang Zhao, Dawei Yin
Abstract Neural conversation systems generate responses based on the sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with a single set of learned parameters to generate responses for given input contexts. When confronting diverse conversations, its adaptability is rather limited and the model is hence prone to generate generic responses. In this work, we propose an {\bf Ada}ptive {\bf N}eural {\bf D}ialogue generation model, \textsc{AdaND}, which manages various conversations with conversation-specific parameterization. For each conversation, the model generates parameters of the encoder-decoder by referring to the input context. In particular, we propose two adaptive parameterization mechanisms: a context-aware and a topic-aware parameterization mechanism. The context-aware parameterization directly generates the parameters by capturing local semantics of the given context. The topic-aware parameterization enables parameter sharing among conversations with similar topics by first inferring the latent topics of the given context and then generating the parameters with respect to the distributional topics. Extensive experiments conducted on a large-scale real-world conversational dataset show that our model achieves superior performance in terms of both quantitative metrics and human evaluations.
Tasks Dialogue Generation
Published 2020-01-18
URL https://arxiv.org/abs/2001.06626v1
PDF https://arxiv.org/pdf/2001.06626v1.pdf
PWC https://paperswithcode.com/paper/adaptive-parameterization-for-neural-dialogue-1
Repo https://github.com/hengyicai/AdaND
Framework pytorch

A Deep Generative Model for Fragment-Based Molecule Generation

Title A Deep Generative Model for Fragment-Based Molecule Generation
Authors Marco Podda, Davide Bacciu, Alessio Micheli
Abstract Molecule generation is a challenging open problem in cheminformatics. Currently, deep generative approaches addressing the challenge belong to two broad categories, differing in how molecules are represented. One approach encodes molecular graphs as strings of text, and learns their corresponding character-based language model. Another, more expressive, approach operates directly on the molecular graph. In this work, we address two limitations of the former: generation of invalid and duplicate molecules. To improve validity rates, we develop a language model for small molecular substructures called fragments, loosely inspired by the well-known paradigm of Fragment-Based Drug Design. In other words, we generate molecules fragment by fragment, instead of atom by atom. To improve uniqueness rates, we present a frequency-based masking strategy that helps generate molecules with infrequent fragments. We show experimentally that our model largely outperforms other language model-based competitors, reaching state-of-the-art performances typical of graph-based approaches. Moreover, generated molecules display molecular properties similar to those in the training sample, even in absence of explicit task-specific supervision.
Tasks Language Modelling
Published 2020-02-28
URL https://arxiv.org/abs/2002.12826v1
PDF https://arxiv.org/pdf/2002.12826v1.pdf
PWC https://paperswithcode.com/paper/a-deep-generative-model-for-fragment-based
Repo https://github.com/marcopodda/fragment-based-dgm
Framework none

Self-Adaptive Training: beyond Empirical Risk Minimization

Title Self-Adaptive Training: beyond Empirical Risk Minimization
Authors Lang Huang, Chao Zhang, Hongyang Zhang
Abstract We propose self-adaptive training—a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost—to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the error-capacity curve of self-adaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recently-discovered double-descent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at \url{https://github.com/LayneH/self-adaptive-training}.
Published 2020-02-24
URL https://arxiv.org/abs/2002.10319v1
PDF https://arxiv.org/pdf/2002.10319v1.pdf
PWC https://paperswithcode.com/paper/self-adaptive-training-beyond-empirical-risk
Repo https://github.com/LayneH/self-adaptive-training
Framework pytorch

Do We Need Zero Training Loss After Achieving Zero Training Error?

Title Do We Need Zero Training Loss After Achieving Zero Training Error?
Authors Takashi Ishida, Ikko Yamane, Tomoya Sakai, Gang Niu, Masashi Sugiyama
Abstract Overparameterized deep networks have the capacity to memorize training data with zero training error. Even after memorization, the training loss continues to approach zero, making the model overconfident and the test performance degraded. Since existing regularizers do not directly aim to avoid zero training loss, they often fail to maintain a moderate level of training loss, ending up with a too small or too large loss. We propose a direct solution called flooding that intentionally prevents further reduction of the training loss when it reaches a reasonably small value, which we call the flooding level. Our approach makes the loss float around the flooding level by doing mini-batched gradient descent as usual but gradient ascent if the training loss is below the flooding level. This can be implemented with one line of code, and is compatible with any stochastic optimizer and other regularizers. With flooding, the model will continue to “random walk” with the same non-zero training loss, and we expect it to drift into an area with a flat loss landscape that leads to better generalization. We experimentally show that flooding improves performance and as a byproduct, induces a double descent curve of the test loss.
Published 2020-02-20
URL https://arxiv.org/abs/2002.08709v1
PDF https://arxiv.org/pdf/2002.08709v1.pdf
PWC https://paperswithcode.com/paper/do-we-need-zero-training-loss-after-achieving
Repo https://github.com/takashiishida/flooding
Framework pytorch

Deep Autotuner: a Pitch Correcting Network for Singing Performances

Title Deep Autotuner: a Pitch Correcting Network for Singing Performances
Authors Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim
Abstract We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to the closest pitch among the twelve equal-tempered scale degrees. The proposed system treats pitch as a continuous value rather than relying on a set of discretized notes found in musical scores, thus allowing for improvisation and harmonization in the singing performance. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. Our model is trained on both incorrect intonation, for which it learns a correction, and intentional pitch variation, which it learns to preserve. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning.
Published 2020-02-12
URL https://arxiv.org/abs/2002.05511v1
PDF https://arxiv.org/pdf/2002.05511v1.pdf
PWC https://paperswithcode.com/paper/deep-autotuner-a-pitch-correcting-network-for
Repo https://github.com/sannawag/autotuner
Framework none

Block-wise Scrambled Image Recognition Using Adaptation Network

Title Block-wise Scrambled Image Recognition Using Adaptation Network
Authors Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa
Abstract In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines. Hence, both the perceptual information hiding and the corresponding object recognition methods should be developed. Block-wise image scrambling is introduced to hide perceptual information from a third party. In addition, an adaptation network is proposed to recognize those scrambled images. Experimental comparisons conducted using CIFAR datasets demonstrated that the proposed adaptation network performed well in incorporating simple perceptual information hiding into DNN-based image classification.
Tasks Image Classification, Object Recognition
Published 2020-01-21
URL https://arxiv.org/abs/2001.07761v1
PDF https://arxiv.org/pdf/2001.07761v1.pdf
PWC https://paperswithcode.com/paper/block-wise-scrambled-image-recognition-using
Repo https://github.com/MADONOKOUKI/Block-wise-Scrambled-Image-Recognition
Framework pytorch

iDLG: Improved Deep Leakage from Gradients

Title iDLG: Improved Deep Leakage from Gradients
Authors Bo Zhao, Konda Reddy Mopuri, Hakan Bilen
Abstract It is widely believed that sharing gradients will not leak private training data in distributed learning systems such as Collaborative Learning and Federated Learning, etc. Recently, Zhu et al. presented an approach which shows the possibility to obtain private training data from the publicly shared gradients. In their Deep Leakage from Gradient (DLG) method, they synthesize the dummy data and corresponding labels with the supervision of shared gradients. However, DLG has difficulty in convergence and discovering the ground-truth labels consistently. In this paper, we find that sharing gradients definitely leaks the ground-truth labels. We propose a simple but reliable approach to extract accurate data from the gradients. Particularly, our approach can certainly extract the ground-truth labels as opposed to DLG, hence we name it Improved DLG (iDLG). Our approach is valid for any differentiable model trained with cross-entropy loss over one-hot labels. We mathematically illustrate how our method can extract ground-truth labels from the gradients and empirically demonstrate the advantages over DLG.
Published 2020-01-08
URL https://arxiv.org/abs/2001.02610v1
PDF https://arxiv.org/pdf/2001.02610v1.pdf
PWC https://paperswithcode.com/paper/idlg-improved-deep-leakage-from-gradients
Repo https://github.com/PatrickZH/Improved-Deep-Leakage-from-Gradients
Framework pytorch

Understanding Self-Training for Gradual Domain Adaptation

Title Understanding Self-Training for Gradual Domain Adaptation
Authors Ananya Kumar, Tengyu Ma, Percy Liang
Abstract Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces. We consider gradual domain adaptation, where the goal is to adapt an initial classifier trained on a source domain given only unlabeled data that shifts gradually in distribution towards a target domain. We prove the first non-vacuous upper bound on the error of self-training with gradual shifts, under settings where directly adapting to the target domain can result in unbounded error. The theoretical analysis leads to algorithmic insights, highlighting that regularization and label sharpening are essential even when we have infinite data, and suggesting that self-training works particularly well for shifts with small Wasserstein-infinity distance. Leveraging the gradual shift structure leads to higher accuracies on a rotating MNIST dataset and a realistic Portraits dataset.
Tasks Domain Adaptation
Published 2020-02-26
URL https://arxiv.org/abs/2002.11361v1
PDF https://arxiv.org/pdf/2002.11361v1.pdf
PWC https://paperswithcode.com/paper/understanding-self-training-for-gradual
Repo https://github.com/p-lambda/gradual_domain_adaptation
Framework tf

Hierarchically Decoupled Imitation for Morphological Transfer

Title Hierarchically Decoupled Imitation for Morphological Transfer
Authors Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto
Abstract Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning. For such tasks, we argue that transferring learned information from a morphologically simpler agent can massively improve the sample efficiency of a more complex one. To this end, we propose a hierarchical decoupling of policies into two parts: an independently learned low-level policy and a transferable high-level policy. To remedy poor transfer performance due to mismatch in morphologies, we contribute two key ideas. First, we show that incentivizing a complex agent’s low-level to imitate a simpler agent’s low-level significantly improves zero-shot high-level transfer. Second, we show that KL-regularized training of the high level stabilizes learning and prevents mode-collapse. Finally, on a suite of publicly released navigation and manipulation environments, we demonstrate the applicability of hierarchical transfer on long-range tasks across morphologies. Our code and videos can be found at https://sites.google.com/berkeley.edu/morphology-transfer.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01709v1
PDF https://arxiv.org/pdf/2003.01709v1.pdf
PWC https://paperswithcode.com/paper/hierarchically-decoupled-imitation-for
Repo https://github.com/jhejna/hierarchical_morphology_transfer
Framework tf

A Benchmark for Systematic Generalization in Grounded Language Understanding

Title A Benchmark for Systematic Generalization in Grounded Language Understanding
Authors Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, Brenden M. Lake
Abstract Human language users easily interpret expressions that describe unfamiliar situations composed from familiar parts (“greet the pink brontosaurus by the ferris wheel”). Modern neural networks, by contrast, struggle to interpret compositions unseen in training. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in models of situated language understanding. We take inspiration from standard models of meaning composition in formal linguistics. Going beyond an earlier related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world. This allows us to build novel generalization tasks that probe the acquisition of linguistically motivated rules. For example, agents must understand how adjectives such as ‘small’ are interpreted relative to the current world state or how adverbs such as ‘cautiously’ combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules.
Published 2020-03-11
URL https://arxiv.org/abs/2003.05161v1
PDF https://arxiv.org/pdf/2003.05161v1.pdf
PWC https://paperswithcode.com/paper/a-benchmark-for-systematic-generalization-in
Repo https://github.com/LauraRuis/multimodal_seq2seq_gSCAN
Framework pytorch
comments powered by Disqus