Paper Group AWR 59
Beheshti-NER: Persian Named Entity Recognition Using BERT. Learnergy: Energy-based Machine Learners. ReZero is All You Need: Fast Convergence at Large Depth. Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits. GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers. Adaptive Paramete …
Beheshti-NER: Persian Named Entity Recognition Using BERT
Title | Beheshti-NER: Persian Named Entity Recognition Using BERT |
Authors | Ehsan Taher, Seyed Abbas Hoseini, Mehrnoush Shamsfard |
Abstract | Named entity recognition is a natural language processing task to recognize and extract spans of text associated with named entities and classify them in semantic Categories. Google BERT is a deep bidirectional language model, pre-trained on large corpora that can be fine-tuned to solve many NLP tasks such as question answering, named entity recognition, part of speech tagging and etc. In this paper, we use the pre-trained deep bidirectional network, BERT, to make a model for named entity recognition in Persian. We also compare the results of our model with the previous state of the art results achieved on Persian NER. Our evaluation metric is CONLL 2003 score in two levels of word and phrase. This model achieved second place in NSURL-2019 task 7 competition which associated with NER for the Persian language. our results in this competition are 83.5 and 88.4 f1 CONLL score respectively in phrase and word level evaluation. |
Tasks | Language Modelling, Named Entity Recognition, Part-Of-Speech Tagging, Question Answering |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08875v1 |
https://arxiv.org/pdf/2003.08875v1.pdf | |
PWC | https://paperswithcode.com/paper/beheshti-ner-persian-named-entity-recognition |
Repo | https://github.com/sEhsanTaher/Beheshti-NER |
Framework | none |
Learnergy: Energy-based Machine Learners
Title | Learnergy: Energy-based Machine Learners |
Authors | Mateus Roder, Gustavo Henrique de Rosa, João Paulo Papa |
Abstract | Throughout the last years, machine learning techniques have been broadly encouraged in the context of deep learning architectures. An interesting algorithm denoted as Restricted Boltzmann Machine relies on energy- and probabilistic-based nature to tackle with the most diverse applications, such as classification, reconstruction, and generation of images and signals. Nevertheless, one can see they are not adequately renowned when compared to other well-known deep learning techniques, e.g., Convolutional Neural Networks. Such behavior promotes the lack of researches and implementations around the literature, coping with the challenge of sufficiently comprehending these energy-based systems. Therefore, in this paper, we propose a Python-inspired framework in the context of energy-based architectures, denoted as Learnergy. Essentially, Learnergy is built upon PyTorch for providing a more friendly environment and a faster prototyping workspace, as well as, possibility the usage of CUDA computations, speeding up their computational time. |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07443v1 |
https://arxiv.org/pdf/2003.07443v1.pdf | |
PWC | https://paperswithcode.com/paper/learnergy-energy-based-machine-learners |
Repo | https://github.com/gugarosa/learnergy |
Framework | pytorch |
ReZero is All You Need: Fast Convergence at Large Depth
Title | ReZero is All You Need: Fast Convergence at Large Depth |
Authors | Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, Julian McAuley |
Abstract | Deep networks have enabled significant performance gains across domains, but they often suffer from vanishing/exploding gradients. This is especially true for Transformer architectures where depth beyond 12 layers is difficult to train without large datasets and computational budgets. In general, we find that inefficient signal propagation impedes learning in deep networks. In Transformers, multi-head self-attention is the main cause of this poor signal propagation. To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. We apply this technique to language modeling and find that we can easily train ReZero-Transformer networks over a hundred layers. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10. |
Tasks | Language Modelling |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04887v1 |
https://arxiv.org/pdf/2003.04887v1.pdf | |
PWC | https://paperswithcode.com/paper/rezero-is-all-you-need-fast-convergence-at |
Repo | https://github.com/fabio-deep/ReZero-ResNet |
Framework | pytorch |
Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits
Title | Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits |
Authors | Michael Chesser, Asangi Jayatilaka, Renuka Visvanathan, Christophe Fumeaux, Alanson Sample, Damith C. Ranasinghe |
Abstract | Falls have serious consequences and are prevalent in acute hospitals and nursing homes caring for older people. Most falls occur in bedrooms and near the bed. Technological interventions to mitigate the risk of falling aim to automatically monitor bed-exit events and subsequently alert healthcare personnel to provide timely supervisions. We observe that frequency-domain information related to patient activities exist predominantly in very low frequencies. Therefore, we recognise the potential to employ a low resolution acceleration sensing modality in contrast to powering and sensing with a conventional MEMS (Micro Electro Mechanical System) accelerometer. Consequently, we investigate a batteryless sensing modality with low cost wirelessly powered Radio Frequency Identification (RFID) technology with the potential for convenient integration into clothing, such as hospital gowns. We design and build a passive accelerometer-based RFID sensor embodiment—ID-Sensor—for our study. The sensor design allows deriving ultra low resolution acceleration data from the rate of change of unique RFID tag identifiers in accordance with the movement of a patient’s upper body. We investigate two convolutional neural network architectures for learning from raw RFID-only data streams and compare performance with a traditional shallow classifier with engineered features. We evaluate performance with 23 hospitalized older patients. We demonstrate, for the first time and to the best of knowledge, that: i) the low resolution acceleration data embedded in the RF powered ID-Sensor data stream can provide a practicable method for activity recognition; and ii) highly discriminative features can be efficiently learned from the raw RFID-only data stream using a fully convolutional network architecture. |
Tasks | Activity Recognition |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08530v1 |
https://arxiv.org/pdf/2003.08530v1.pdf | |
PWC | https://paperswithcode.com/paper/super-low-resolution-rf-powered |
Repo | https://github.com/AdelaideAuto-IDLab/ID-Sensor |
Framework | tf |
GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers
Title | GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers |
Authors | Fabio Massimo Zanzotto, Viviana Bono, Paola Vocca, Andrea Santilli, Danilo Croce, Giorgio Gambosi, Roberto Basili |
Abstract | Creativity is one of the driving forces of human kind as it allows to break current understanding to envision new ideas, which may revolutionize entire fields of knowledge. Scientific research offers a challenging environment where to learn a model for the creative process. In fact, scientific research is a creative act in the formal settings of the scientific method and this creative act is described in articles. In this paper, we dare to introduce the novel, scientifically and philosophically challenging task of Generating Abstracts of Scientific Papers from abstracts of cited papers (GASP) as a text-to-text task to investigate scientific creativity, To foster research in this novel, challenging task, we prepared a dataset by using services where that solve the problem of copyright and, hence, the dataset is public available with its standard split. Finally, we experimented with two vanilla summarization systems to start the analysis of the complexity of the GASP task. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2003.04996v1 |
https://arxiv.org/pdf/2003.04996v1.pdf | |
PWC | https://paperswithcode.com/paper/gasp-generating-abstracts-of-scientific |
Repo | https://github.com/ART-Group-it/GASP |
Framework | none |
Adaptive Parameterization for Neural Dialogue Generation
Title | Adaptive Parameterization for Neural Dialogue Generation |
Authors | Hengyi Cai, Hongshen Chen, Cheng Zhang, Yonghao Song, Xiaofang Zhao, Dawei Yin |
Abstract | Neural conversation systems generate responses based on the sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with a single set of learned parameters to generate responses for given input contexts. When confronting diverse conversations, its adaptability is rather limited and the model is hence prone to generate generic responses. In this work, we propose an {\bf Ada}ptive {\bf N}eural {\bf D}ialogue generation model, \textsc{AdaND}, which manages various conversations with conversation-specific parameterization. For each conversation, the model generates parameters of the encoder-decoder by referring to the input context. In particular, we propose two adaptive parameterization mechanisms: a context-aware and a topic-aware parameterization mechanism. The context-aware parameterization directly generates the parameters by capturing local semantics of the given context. The topic-aware parameterization enables parameter sharing among conversations with similar topics by first inferring the latent topics of the given context and then generating the parameters with respect to the distributional topics. Extensive experiments conducted on a large-scale real-world conversational dataset show that our model achieves superior performance in terms of both quantitative metrics and human evaluations. |
Tasks | Dialogue Generation |
Published | 2020-01-18 |
URL | https://arxiv.org/abs/2001.06626v1 |
https://arxiv.org/pdf/2001.06626v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-parameterization-for-neural-dialogue-1 |
Repo | https://github.com/hengyicai/AdaND |
Framework | pytorch |
A Deep Generative Model for Fragment-Based Molecule Generation
Title | A Deep Generative Model for Fragment-Based Molecule Generation |
Authors | Marco Podda, Davide Bacciu, Alessio Micheli |
Abstract | Molecule generation is a challenging open problem in cheminformatics. Currently, deep generative approaches addressing the challenge belong to two broad categories, differing in how molecules are represented. One approach encodes molecular graphs as strings of text, and learns their corresponding character-based language model. Another, more expressive, approach operates directly on the molecular graph. In this work, we address two limitations of the former: generation of invalid and duplicate molecules. To improve validity rates, we develop a language model for small molecular substructures called fragments, loosely inspired by the well-known paradigm of Fragment-Based Drug Design. In other words, we generate molecules fragment by fragment, instead of atom by atom. To improve uniqueness rates, we present a frequency-based masking strategy that helps generate molecules with infrequent fragments. We show experimentally that our model largely outperforms other language model-based competitors, reaching state-of-the-art performances typical of graph-based approaches. Moreover, generated molecules display molecular properties similar to those in the training sample, even in absence of explicit task-specific supervision. |
Tasks | Language Modelling |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12826v1 |
https://arxiv.org/pdf/2002.12826v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-generative-model-for-fragment-based |
Repo | https://github.com/marcopodda/fragment-based-dgm |
Framework | none |
Self-Adaptive Training: beyond Empirical Risk Minimization
Title | Self-Adaptive Training: beyond Empirical Risk Minimization |
Authors | Lang Huang, Chao Zhang, Hongyang Zhang |
Abstract | We propose self-adaptive training—a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost—to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the error-capacity curve of self-adaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recently-discovered double-descent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at \url{https://github.com/LayneH/self-adaptive-training}. |
Tasks | |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10319v1 |
https://arxiv.org/pdf/2002.10319v1.pdf | |
PWC | https://paperswithcode.com/paper/self-adaptive-training-beyond-empirical-risk |
Repo | https://github.com/LayneH/self-adaptive-training |
Framework | pytorch |
Do We Need Zero Training Loss After Achieving Zero Training Error?
Title | Do We Need Zero Training Loss After Achieving Zero Training Error? |
Authors | Takashi Ishida, Ikko Yamane, Tomoya Sakai, Gang Niu, Masashi Sugiyama |
Abstract | Overparameterized deep networks have the capacity to memorize training data with zero training error. Even after memorization, the training loss continues to approach zero, making the model overconfident and the test performance degraded. Since existing regularizers do not directly aim to avoid zero training loss, they often fail to maintain a moderate level of training loss, ending up with a too small or too large loss. We propose a direct solution called flooding that intentionally prevents further reduction of the training loss when it reaches a reasonably small value, which we call the flooding level. Our approach makes the loss float around the flooding level by doing mini-batched gradient descent as usual but gradient ascent if the training loss is below the flooding level. This can be implemented with one line of code, and is compatible with any stochastic optimizer and other regularizers. With flooding, the model will continue to “random walk” with the same non-zero training loss, and we expect it to drift into an area with a flat loss landscape that leads to better generalization. We experimentally show that flooding improves performance and as a byproduct, induces a double descent curve of the test loss. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08709v1 |
https://arxiv.org/pdf/2002.08709v1.pdf | |
PWC | https://paperswithcode.com/paper/do-we-need-zero-training-loss-after-achieving |
Repo | https://github.com/takashiishida/flooding |
Framework | pytorch |
Deep Autotuner: a Pitch Correcting Network for Singing Performances
Title | Deep Autotuner: a Pitch Correcting Network for Singing Performances |
Authors | Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim |
Abstract | We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to the closest pitch among the twelve equal-tempered scale degrees. The proposed system treats pitch as a continuous value rather than relying on a set of discretized notes found in musical scores, thus allowing for improvisation and harmonization in the singing performance. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. Our model is trained on both incorrect intonation, for which it learns a correction, and intentional pitch variation, which it learns to preserve. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning. |
Tasks | |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05511v1 |
https://arxiv.org/pdf/2002.05511v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-autotuner-a-pitch-correcting-network-for |
Repo | https://github.com/sannawag/autotuner |
Framework | none |
Block-wise Scrambled Image Recognition Using Adaptation Network
Title | Block-wise Scrambled Image Recognition Using Adaptation Network |
Authors | Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa |
Abstract | In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines. Hence, both the perceptual information hiding and the corresponding object recognition methods should be developed. Block-wise image scrambling is introduced to hide perceptual information from a third party. In addition, an adaptation network is proposed to recognize those scrambled images. Experimental comparisons conducted using CIFAR datasets demonstrated that the proposed adaptation network performed well in incorporating simple perceptual information hiding into DNN-based image classification. |
Tasks | Image Classification, Object Recognition |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07761v1 |
https://arxiv.org/pdf/2001.07761v1.pdf | |
PWC | https://paperswithcode.com/paper/block-wise-scrambled-image-recognition-using |
Repo | https://github.com/MADONOKOUKI/Block-wise-Scrambled-Image-Recognition |
Framework | pytorch |
iDLG: Improved Deep Leakage from Gradients
Title | iDLG: Improved Deep Leakage from Gradients |
Authors | Bo Zhao, Konda Reddy Mopuri, Hakan Bilen |
Abstract | It is widely believed that sharing gradients will not leak private training data in distributed learning systems such as Collaborative Learning and Federated Learning, etc. Recently, Zhu et al. presented an approach which shows the possibility to obtain private training data from the publicly shared gradients. In their Deep Leakage from Gradient (DLG) method, they synthesize the dummy data and corresponding labels with the supervision of shared gradients. However, DLG has difficulty in convergence and discovering the ground-truth labels consistently. In this paper, we find that sharing gradients definitely leaks the ground-truth labels. We propose a simple but reliable approach to extract accurate data from the gradients. Particularly, our approach can certainly extract the ground-truth labels as opposed to DLG, hence we name it Improved DLG (iDLG). Our approach is valid for any differentiable model trained with cross-entropy loss over one-hot labels. We mathematically illustrate how our method can extract ground-truth labels from the gradients and empirically demonstrate the advantages over DLG. |
Tasks | |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02610v1 |
https://arxiv.org/pdf/2001.02610v1.pdf | |
PWC | https://paperswithcode.com/paper/idlg-improved-deep-leakage-from-gradients |
Repo | https://github.com/PatrickZH/Improved-Deep-Leakage-from-Gradients |
Framework | pytorch |
Understanding Self-Training for Gradual Domain Adaptation
Title | Understanding Self-Training for Gradual Domain Adaptation |
Authors | Ananya Kumar, Tengyu Ma, Percy Liang |
Abstract | Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces. We consider gradual domain adaptation, where the goal is to adapt an initial classifier trained on a source domain given only unlabeled data that shifts gradually in distribution towards a target domain. We prove the first non-vacuous upper bound on the error of self-training with gradual shifts, under settings where directly adapting to the target domain can result in unbounded error. The theoretical analysis leads to algorithmic insights, highlighting that regularization and label sharpening are essential even when we have infinite data, and suggesting that self-training works particularly well for shifts with small Wasserstein-infinity distance. Leveraging the gradual shift structure leads to higher accuracies on a rotating MNIST dataset and a realistic Portraits dataset. |
Tasks | Domain Adaptation |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11361v1 |
https://arxiv.org/pdf/2002.11361v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-self-training-for-gradual |
Repo | https://github.com/p-lambda/gradual_domain_adaptation |
Framework | tf |
Hierarchically Decoupled Imitation for Morphological Transfer
Title | Hierarchically Decoupled Imitation for Morphological Transfer |
Authors | Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto |
Abstract | Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning. For such tasks, we argue that transferring learned information from a morphologically simpler agent can massively improve the sample efficiency of a more complex one. To this end, we propose a hierarchical decoupling of policies into two parts: an independently learned low-level policy and a transferable high-level policy. To remedy poor transfer performance due to mismatch in morphologies, we contribute two key ideas. First, we show that incentivizing a complex agent’s low-level to imitate a simpler agent’s low-level significantly improves zero-shot high-level transfer. Second, we show that KL-regularized training of the high level stabilizes learning and prevents mode-collapse. Finally, on a suite of publicly released navigation and manipulation environments, we demonstrate the applicability of hierarchical transfer on long-range tasks across morphologies. Our code and videos can be found at https://sites.google.com/berkeley.edu/morphology-transfer. |
Tasks | |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01709v1 |
https://arxiv.org/pdf/2003.01709v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchically-decoupled-imitation-for |
Repo | https://github.com/jhejna/hierarchical_morphology_transfer |
Framework | tf |
A Benchmark for Systematic Generalization in Grounded Language Understanding
Title | A Benchmark for Systematic Generalization in Grounded Language Understanding |
Authors | Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, Brenden M. Lake |
Abstract | Human language users easily interpret expressions that describe unfamiliar situations composed from familiar parts (“greet the pink brontosaurus by the ferris wheel”). Modern neural networks, by contrast, struggle to interpret compositions unseen in training. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in models of situated language understanding. We take inspiration from standard models of meaning composition in formal linguistics. Going beyond an earlier related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world. This allows us to build novel generalization tasks that probe the acquisition of linguistically motivated rules. For example, agents must understand how adjectives such as ‘small’ are interpreted relative to the current world state or how adverbs such as ‘cautiously’ combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules. |
Tasks | |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05161v1 |
https://arxiv.org/pdf/2003.05161v1.pdf | |
PWC | https://paperswithcode.com/paper/a-benchmark-for-systematic-generalization-in |
Repo | https://github.com/LauraRuis/multimodal_seq2seq_gSCAN |
Framework | pytorch |