April 3, 2020

3105 words 15 mins read

Paper Group AWR 59

Beheshti-NER: Persian Named Entity Recognition Using BERT. Learnergy: Energy-based Machine Learners. ReZero is All You Need: Fast Convergence at Large Depth. Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits. GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers. Adaptive Paramete …

Beheshti-NER: Persian Named Entity Recognition Using BERT


Title	Beheshti-NER: Persian Named Entity Recognition Using BERT
Authors	Ehsan Taher, Seyed Abbas Hoseini, Mehrnoush Shamsfard
Abstract	Named entity recognition is a natural language processing task to recognize and extract spans of text associated with named entities and classify them in semantic Categories. Google BERT is a deep bidirectional language model, pre-trained on large corpora that can be fine-tuned to solve many NLP tasks such as question answering, named entity recognition, part of speech tagging and etc. In this paper, we use the pre-trained deep bidirectional network, BERT, to make a model for named entity recognition in Persian. We also compare the results of our model with the previous state of the art results achieved on Persian NER. Our evaluation metric is CONLL 2003 score in two levels of word and phrase. This model achieved second place in NSURL-2019 task 7 competition which associated with NER for the Persian language. our results in this competition are 83.5 and 88.4 f1 CONLL score respectively in phrase and word level evaluation.
Tasks	Language Modelling, Named Entity Recognition, Part-Of-Speech Tagging, Question Answering
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08875v1
PDF	https://arxiv.org/pdf/2003.08875v1.pdf
PWC	https://paperswithcode.com/paper/beheshti-ner-persian-named-entity-recognition
Repo	https://github.com/sEhsanTaher/Beheshti-NER
Framework	none

Learnergy: Energy-based Machine Learners


Title	Learnergy: Energy-based Machine Learners
Authors	Mateus Roder, Gustavo Henrique de Rosa, João Paulo Papa
Abstract	Throughout the last years, machine learning techniques have been broadly encouraged in the context of deep learning architectures. An interesting algorithm denoted as Restricted Boltzmann Machine relies on energy- and probabilistic-based nature to tackle with the most diverse applications, such as classification, reconstruction, and generation of images and signals. Nevertheless, one can see they are not adequately renowned when compared to other well-known deep learning techniques, e.g., Convolutional Neural Networks. Such behavior promotes the lack of researches and implementations around the literature, coping with the challenge of sufficiently comprehending these energy-based systems. Therefore, in this paper, we propose a Python-inspired framework in the context of energy-based architectures, denoted as Learnergy. Essentially, Learnergy is built upon PyTorch for providing a more friendly environment and a faster prototyping workspace, as well as, possibility the usage of CUDA computations, speeding up their computational time.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07443v1
PDF	https://arxiv.org/pdf/2003.07443v1.pdf
PWC	https://paperswithcode.com/paper/learnergy-energy-based-machine-learners
Repo	https://github.com/gugarosa/learnergy
Framework	pytorch

ReZero is All You Need: Fast Convergence at Large Depth


Title	ReZero is All You Need: Fast Convergence at Large Depth
Authors	Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, Julian McAuley
Abstract	Deep networks have enabled significant performance gains across domains, but they often suffer from vanishing/exploding gradients. This is especially true for Transformer architectures where depth beyond 12 layers is difficult to train without large datasets and computational budgets. In general, we find that inefficient signal propagation impedes learning in deep networks. In Transformers, multi-head self-attention is the main cause of this poor signal propagation. To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. We apply this technique to language modeling and find that we can easily train ReZero-Transformer networks over a hundred layers. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10.
Tasks	Language Modelling
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04887v1
PDF	https://arxiv.org/pdf/2003.04887v1.pdf
PWC	https://paperswithcode.com/paper/rezero-is-all-you-need-fast-convergence-at
Repo	https://github.com/fabio-deep/ReZero-ResNet
Framework	pytorch

Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits


Title	Super Low Resolution RF Powered Accelerometers for Alerting on Hospitalized Patient Bed Exits
Authors	Michael Chesser, Asangi Jayatilaka, Renuka Visvanathan, Christophe Fumeaux, Alanson Sample, Damith C. Ranasinghe
Abstract	Falls have serious consequences and are prevalent in acute hospitals and nursing homes caring for older people. Most falls occur in bedrooms and near the bed. Technological interventions to mitigate the risk of falling aim to automatically monitor bed-exit events and subsequently alert healthcare personnel to provide timely supervisions. We observe that frequency-domain information related to patient activities exist predominantly in very low frequencies. Therefore, we recognise the potential to employ a low resolution acceleration sensing modality in contrast to powering and sensing with a conventional MEMS (Micro Electro Mechanical System) accelerometer. Consequently, we investigate a batteryless sensing modality with low cost wirelessly powered Radio Frequency Identification (RFID) technology with the potential for convenient integration into clothing, such as hospital gowns. We design and build a passive accelerometer-based RFID sensor embodiment—ID-Sensor—for our study. The sensor design allows deriving ultra low resolution acceleration data from the rate of change of unique RFID tag identifiers in accordance with the movement of a patient’s upper body. We investigate two convolutional neural network architectures for learning from raw RFID-only data streams and compare performance with a traditional shallow classifier with engineered features. We evaluate performance with 23 hospitalized older patients. We demonstrate, for the first time and to the best of knowledge, that: i) the low resolution acceleration data embedded in the RF powered ID-Sensor data stream can provide a practicable method for activity recognition; and ii) highly discriminative features can be efficiently learned from the raw RFID-only data stream using a fully convolutional network architecture.
Tasks	Activity Recognition
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08530v1
PDF	https://arxiv.org/pdf/2003.08530v1.pdf
PWC	https://paperswithcode.com/paper/super-low-resolution-rf-powered
Repo	https://github.com/AdelaideAuto-IDLab/ID-Sensor
Framework	tf

GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers


Title	GASP! Generating Abstracts of Scientific Papers from Abstracts of Cited Papers
Authors	Fabio Massimo Zanzotto, Viviana Bono, Paola Vocca, Andrea Santilli, Danilo Croce, Giorgio Gambosi, Roberto Basili
Abstract	Creativity is one of the driving forces of human kind as it allows to break current understanding to envision new ideas, which may revolutionize entire fields of knowledge. Scientific research offers a challenging environment where to learn a model for the creative process. In fact, scientific research is a creative act in the formal settings of the scientific method and this creative act is described in articles. In this paper, we dare to introduce the novel, scientifically and philosophically challenging task of Generating Abstracts of Scientific Papers from abstracts of cited papers (GASP) as a text-to-text task to investigate scientific creativity, To foster research in this novel, challenging task, we prepared a dataset by using services where that solve the problem of copyright and, hence, the dataset is public available with its standard split. Finally, we experimented with two vanilla summarization systems to start the analysis of the complexity of the GASP task.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2003.04996v1
PDF	https://arxiv.org/pdf/2003.04996v1.pdf
PWC	https://paperswithcode.com/paper/gasp-generating-abstracts-of-scientific
Repo	https://github.com/ART-Group-it/GASP
Framework	none

Adaptive Parameterization for Neural Dialogue Generation


Title	Adaptive Parameterization for Neural Dialogue Generation
Authors	Hengyi Cai, Hongshen Chen, Cheng Zhang, Yonghao Song, Xiaofang Zhao, Dawei Yin
Abstract	Neural conversation systems generate responses based on the sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with a single set of learned parameters to generate responses for given input contexts. When confronting diverse conversations, its adaptability is rather limited and the model is hence prone to generate generic responses. In this work, we propose an {\bf Ada}ptive {\bf N}eural {\bf D}ialogue generation model, \textsc{AdaND}, which manages various conversations with conversation-specific parameterization. For each conversation, the model generates parameters of the encoder-decoder by referring to the input context. In particular, we propose two adaptive parameterization mechanisms: a context-aware and a topic-aware parameterization mechanism. The context-aware parameterization directly generates the parameters by capturing local semantics of the given context. The topic-aware parameterization enables parameter sharing among conversations with similar topics by first inferring the latent topics of the given context and then generating the parameters with respect to the distributional topics. Extensive experiments conducted on a large-scale real-world conversational dataset show that our model achieves superior performance in terms of both quantitative metrics and human evaluations.
Tasks	Dialogue Generation
Published	2020-01-18
URL	https://arxiv.org/abs/2001.06626v1
PDF	https://arxiv.org/pdf/2001.06626v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-parameterization-for-neural-dialogue-1
Repo	https://github.com/hengyicai/AdaND
Framework	pytorch

A Deep Generative Model for Fragment-Based Molecule Generation


Title	A Deep Generative Model for Fragment-Based Molecule Generation
Authors	Marco Podda, Davide Bacciu, Alessio Micheli
Abstract	Molecule generation is a challenging open problem in cheminformatics. Currently, deep generative approaches addressing the challenge belong to two broad categories, differing in how molecules are represented. One approach encodes molecular graphs as strings of text, and learns their corresponding character-based language model. Another, more expressive, approach operates directly on the molecular graph. In this work, we address two limitations of the former: generation of invalid and duplicate molecules. To improve validity rates, we develop a language model for small molecular substructures called fragments, loosely inspired by the well-known paradigm of Fragment-Based Drug Design. In other words, we generate molecules fragment by fragment, instead of atom by atom. To improve uniqueness rates, we present a frequency-based masking strategy that helps generate molecules with infrequent fragments. We show experimentally that our model largely outperforms other language model-based competitors, reaching state-of-the-art performances typical of graph-based approaches. Moreover, generated molecules display molecular properties similar to those in the training sample, even in absence of explicit task-specific supervision.
Tasks	Language Modelling
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12826v1
PDF	https://arxiv.org/pdf/2002.12826v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-generative-model-for-fragment-based
Repo	https://github.com/marcopodda/fragment-based-dgm
Framework	none

Self-Adaptive Training: beyond Empirical Risk Minimization


Title	Self-Adaptive Training: beyond Empirical Risk Minimization
Authors	Lang Huang, Chao Zhang, Hongyang Zhang
Abstract	We propose self-adaptive training—a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost—to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the error-capacity curve of self-adaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recently-discovered double-descent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at \url{https://github.com/LayneH/self-adaptive-training}.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10319v1
PDF	https://arxiv.org/pdf/2002.10319v1.pdf
PWC	https://paperswithcode.com/paper/self-adaptive-training-beyond-empirical-risk
Repo	https://github.com/LayneH/self-adaptive-training
Framework	pytorch

Do We Need Zero Training Loss After Achieving Zero Training Error?


Title	Do We Need Zero Training Loss After Achieving Zero Training Error?
Authors	Takashi Ishida, Ikko Yamane, Tomoya Sakai, Gang Niu, Masashi Sugiyama
Abstract	Overparameterized deep networks have the capacity to memorize training data with zero training error. Even after memorization, the training loss continues to approach zero, making the model overconfident and the test performance degraded. Since existing regularizers do not directly aim to avoid zero training loss, they often fail to maintain a moderate level of training loss, ending up with a too small or too large loss. We propose a direct solution called flooding that intentionally prevents further reduction of the training loss when it reaches a reasonably small value, which we call the flooding level. Our approach makes the loss float around the flooding level by doing mini-batched gradient descent as usual but gradient ascent if the training loss is below the flooding level. This can be implemented with one line of code, and is compatible with any stochastic optimizer and other regularizers. With flooding, the model will continue to “random walk” with the same non-zero training loss, and we expect it to drift into an area with a flat loss landscape that leads to better generalization. We experimentally show that flooding improves performance and as a byproduct, induces a double descent curve of the test loss.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08709v1
PDF	https://arxiv.org/pdf/2002.08709v1.pdf
PWC	https://paperswithcode.com/paper/do-we-need-zero-training-loss-after-achieving
Repo	https://github.com/takashiishida/flooding
Framework	pytorch

Deep Autotuner: a Pitch Correcting Network for Singing Performances


Title	Deep Autotuner: a Pitch Correcting Network for Singing Performances
Authors	Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim
Abstract	We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to the closest pitch among the twelve equal-tempered scale degrees. The proposed system treats pitch as a continuous value rather than relying on a set of discretized notes found in musical scores, thus allowing for improvisation and harmonization in the singing performance. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. Our model is trained on both incorrect intonation, for which it learns a correction, and intentional pitch variation, which it learns to preserve. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning.
Tasks
Published	2020-02-12
URL	https://arxiv.org/abs/2002.05511v1
PDF	https://arxiv.org/pdf/2002.05511v1.pdf
PWC	https://paperswithcode.com/paper/deep-autotuner-a-pitch-correcting-network-for
Repo	https://github.com/sannawag/autotuner
Framework	none

Block-wise Scrambled Image Recognition Using Adaptation Network


Title	Block-wise Scrambled Image Recognition Using Adaptation Network
Authors	Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa
Abstract	In this study, a perceptually hidden object-recognition method is investigated to generate secure images recognizable by humans but not machines. Hence, both the perceptual information hiding and the corresponding object recognition methods should be developed. Block-wise image scrambling is introduced to hide perceptual information from a third party. In addition, an adaptation network is proposed to recognize those scrambled images. Experimental comparisons conducted using CIFAR datasets demonstrated that the proposed adaptation network performed well in incorporating simple perceptual information hiding into DNN-based image classification.
Tasks	Image Classification, Object Recognition
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07761v1
PDF	https://arxiv.org/pdf/2001.07761v1.pdf
PWC	https://paperswithcode.com/paper/block-wise-scrambled-image-recognition-using
Repo	https://github.com/MADONOKOUKI/Block-wise-Scrambled-Image-Recognition
Framework	pytorch

iDLG: Improved Deep Leakage from Gradients


Title	iDLG: Improved Deep Leakage from Gradients
Authors	Bo Zhao, Konda Reddy Mopuri, Hakan Bilen
Abstract	It is widely believed that sharing gradients will not leak private training data in distributed learning systems such as Collaborative Learning and Federated Learning, etc. Recently, Zhu et al. presented an approach which shows the possibility to obtain private training data from the publicly shared gradients. In their Deep Leakage from Gradient (DLG) method, they synthesize the dummy data and corresponding labels with the supervision of shared gradients. However, DLG has difficulty in convergence and discovering the ground-truth labels consistently. In this paper, we find that sharing gradients definitely leaks the ground-truth labels. We propose a simple but reliable approach to extract accurate data from the gradients. Particularly, our approach can certainly extract the ground-truth labels as opposed to DLG, hence we name it Improved DLG (iDLG). Our approach is valid for any differentiable model trained with cross-entropy loss over one-hot labels. We mathematically illustrate how our method can extract ground-truth labels from the gradients and empirically demonstrate the advantages over DLG.
Tasks
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02610v1
PDF	https://arxiv.org/pdf/2001.02610v1.pdf
PWC	https://paperswithcode.com/paper/idlg-improved-deep-leakage-from-gradients
Repo	https://github.com/PatrickZH/Improved-Deep-Leakage-from-Gradients
Framework	pytorch

Understanding Self-Training for Gradual Domain Adaptation


Title	Understanding Self-Training for Gradual Domain Adaptation
Authors	Ananya Kumar, Tengyu Ma, Percy Liang
Abstract	Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces. We consider gradual domain adaptation, where the goal is to adapt an initial classifier trained on a source domain given only unlabeled data that shifts gradually in distribution towards a target domain. We prove the first non-vacuous upper bound on the error of self-training with gradual shifts, under settings where directly adapting to the target domain can result in unbounded error. The theoretical analysis leads to algorithmic insights, highlighting that regularization and label sharpening are essential even when we have infinite data, and suggesting that self-training works particularly well for shifts with small Wasserstein-infinity distance. Leveraging the gradual shift structure leads to higher accuracies on a rotating MNIST dataset and a realistic Portraits dataset.
Tasks	Domain Adaptation
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11361v1
PDF	https://arxiv.org/pdf/2002.11361v1.pdf
PWC	https://paperswithcode.com/paper/understanding-self-training-for-gradual
Repo	https://github.com/p-lambda/gradual_domain_adaptation
Framework	tf

Hierarchically Decoupled Imitation for Morphological Transfer


Title	Hierarchically Decoupled Imitation for Morphological Transfer
Authors	Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto
Abstract	Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning. For such tasks, we argue that transferring learned information from a morphologically simpler agent can massively improve the sample efficiency of a more complex one. To this end, we propose a hierarchical decoupling of policies into two parts: an independently learned low-level policy and a transferable high-level policy. To remedy poor transfer performance due to mismatch in morphologies, we contribute two key ideas. First, we show that incentivizing a complex agent’s low-level to imitate a simpler agent’s low-level significantly improves zero-shot high-level transfer. Second, we show that KL-regularized training of the high level stabilizes learning and prevents mode-collapse. Finally, on a suite of publicly released navigation and manipulation environments, we demonstrate the applicability of hierarchical transfer on long-range tasks across morphologies. Our code and videos can be found at https://sites.google.com/berkeley.edu/morphology-transfer.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01709v1
PDF	https://arxiv.org/pdf/2003.01709v1.pdf
PWC	https://paperswithcode.com/paper/hierarchically-decoupled-imitation-for
Repo	https://github.com/jhejna/hierarchical_morphology_transfer
Framework	tf

A Benchmark for Systematic Generalization in Grounded Language Understanding


Title	A Benchmark for Systematic Generalization in Grounded Language Understanding
Authors	Laura Ruis, Jacob Andreas, Marco Baroni, Diane Bouchacourt, Brenden M. Lake
Abstract	Human language users easily interpret expressions that describe unfamiliar situations composed from familiar parts (“greet the pink brontosaurus by the ferris wheel”). Modern neural networks, by contrast, struggle to interpret compositions unseen in training. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in models of situated language understanding. We take inspiration from standard models of meaning composition in formal linguistics. Going beyond an earlier related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world. This allows us to build novel generalization tasks that probe the acquisition of linguistically motivated rules. For example, agents must understand how adjectives such as ‘small’ are interpreted relative to the current world state or how adverbs such as ‘cautiously’ combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules.
Tasks
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05161v1
PDF	https://arxiv.org/pdf/2003.05161v1.pdf
PWC	https://paperswithcode.com/paper/a-benchmark-for-systematic-generalization-in
Repo	https://github.com/LauraRuis/multimodal_seq2seq_gSCAN
Framework	pytorch