February 2, 2020

3299 words 16 mins read

Paper Group AWR 32

Subject Cross Validation in Human Activity Recognition. Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis. BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification. Integrating Relation Constraints with Neural Relation Extractors. Speaker Adaptive Training using Model Agnostic Meta-Learning. Meta-learning fo …

Subject Cross Validation in Human Activity Recognition


Title	Subject Cross Validation in Human Activity Recognition
Authors	Akbar Dehghani, Tristan Glatard, Emad Shihab
Abstract	K-fold Cross Validation is commonly used to evaluate classifiers and tune their hyperparameters. However, it assumes that data points are Independent and Identically Distributed (i.i.d.) so that samples used in the training and test sets can be selected randomly and uniformly. In Human Activity Recognition datasets, we note that the samples produced by the same subjects are likely to be correlated due to diverse factors. Hence, k-fold cross validation may overestimate the performance of activity recognizers, in particular when overlapping sliding windows are used. In this paper, we investigate the effect of Subject Cross Validation on the performance of Human Activity Recognition, both with non-overlapping and with overlapping sliding windows. Results show that k-fold cross validation artificially increases the performance of recognizers by about 10%, and even by 16% when overlapping windows are used. In addition, we do not observe any performance gain from the use of overlapping windows. We conclude that Human Activity Recognition systems should be evaluated by Subject Cross Validation, and that overlapping windows are not worth their extra computational cost.
Tasks	Activity Recognition, Human Activity Recognition
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02666v2
PDF	http://arxiv.org/pdf/1904.02666v2.pdf
PWC	https://paperswithcode.com/paper/subject-cross-validation-in-human-activity
Repo	https://github.com/big-data-lab-team/paper-generalizability-window-size
Framework	none

Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis


Title	Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis
Authors	Zongwei Zhou, Vatsal Sodha, Md Mahfuzur Rahman Siddiquee, Ruibin Feng, Nima Tajbakhsh, Michael B. Gotway, Jianming Liang
Abstract	Transfer learning from natural image to medical image has established as one of the most practical paradigms in deep learning for medical image analysis. However, to fit this paradigm, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information and inevitably compromising the performance. To overcome this limitation, we have built a set of models, called Generic Autodidactic Models, nicknamed Models Genesis, because they are created ex nihilo (with no manual labeling), self-taught (learned by self-supervision), and generic (served as source models for generating application-specific target models). Our extensive experiments demonstrate that our Models Genesis significantly outperform learning from scratch in all five target 3D applications covering both segmentation and classification. More importantly, learning a model from scratch simply in 3D may not necessarily yield performance better than transfer learning from ImageNet in 2D, but our Models Genesis consistently top any 2D approaches including fine-tuning the models pre-trained from ImageNet as well as fine-tuning the 2D versions of our Models Genesis, confirming the importance of 3D anatomical information and significance of our Models Genesis for 3D medical imaging. This performance is attributed to our unified self-supervised learning framework, built on a simple yet powerful observation: the sophisticated yet recurrent anatomy in medical images can serve as strong supervision signals for deep models to learn common anatomical representation automatically via self-supervision. As open science, all pre-trained Models Genesis are available at https://github.com/MrGiovanni/ModelsGenesis.
Tasks	Brain Tumor Segmentation, Liver Segmentation, Lung Nodule Detection, Lung Nodule Segmentation, Pulmonary Embolism Detection, Transfer Learning
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06912v1
PDF	https://arxiv.org/pdf/1908.06912v1.pdf
PWC	https://paperswithcode.com/paper/models-genesis-generic-autodidactic-models
Repo	https://github.com/cswin/AWC
Framework	pytorch

BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification


Title	BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification
Authors	Marc Rußwurm, Sébastien Lefèvre, Marco Körner
Abstract	This dataset challenges the time series community with the task of satellite-based vegetation identification on large scale real-world dataset of satellite data acquired during one entire year. It consists of time series data with associated crop types from 580k field parcels in Brittany, France (Breizh in local language). Along with this dataset, we provide results and code of a Long Short-Term Memory network and Transformer network as baselines. We release dataset, along with preprocessing scripts and baseline models in https://github.com/TUM-LMF/BreizhCrops and encourage methodical researchers to benchmark and develop novel methods applied to satellite-based crop monitoring.
Tasks	Time Series
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11893v1
PDF	https://arxiv.org/pdf/1905.11893v1.pdf
PWC	https://paperswithcode.com/paper/breizhcrops-a-satellite-time-series-dataset
Repo	https://github.com/TUM-LMF/BreizhCrops
Framework	pytorch

Integrating Relation Constraints with Neural Relation Extractors


Title	Integrating Relation Constraints with Neural Relation Extractors
Authors	Yuan Ye, Yansong Feng, Bingfeng Luo, Yuxuan Lai, Dongyan Zhao
Abstract	Recent years have seen rapid progress in identifying predefined relationship between entity pairs using neural networks NNs. However, such models often make predictions for each entity pair individually, thus often fail to solve the inconsistency among different predictions, which can be characterized by discrete relation constraints. These constraints are often defined over combinations of entity-relation-entity triples, since there often lack of explicitly well-defined type and cardinality requirements for the relations. In this paper, we propose a unified framework to integrate relation constraints with NNs by introducing a new loss term, ConstraintLoss. Particularly, we develop two efficient methods to capture how well the local predictions from multiple instance pairs satisfy the relation constraints. Experiments on both English and Chinese datasets show that our approach can help NNs learn from discrete relation constraints to reduce inconsistency among local predictions, and outperform popular neural relation extraction NRE models even enhanced with extra post-processing. Our source code and datasets will be released at https://github.com/PKUYeYuan/Constraint-Loss-AAAI-2020.
Tasks	Relation Extraction
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11493v1
PDF	https://arxiv.org/pdf/1911.11493v1.pdf
PWC	https://paperswithcode.com/paper/integrating-relation-constraints-with-neural
Repo	https://github.com/PKUYeYuan/Constraint-Loss-AAAI-2020
Framework	none

Speaker Adaptive Training using Model Agnostic Meta-Learning


Title	Speaker Adaptive Training using Model Agnostic Meta-Learning
Authors	Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals
Abstract	Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions. Conventionally, model-based speaker adaptive training is performed by having a set of speaker dependent parameters that are jointly optimised with speaker independent parameters in order to remove speaker variation. However, this does not scale well if all neural network weights are to be adapted to the speaker. In this paper we formulate speaker adaptive training as a meta-learning task, in which an adaptation process using gradient descent is encoded directly into the training of the model. We compare our approach with test-only adaptation of a standard baseline model and a SAT-LHUC model with a learned speaker adaptation schedule and demonstrate that the meta-learning approach achieves comparable results.
Tasks	Meta-Learning
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10605v1
PDF	https://arxiv.org/pdf/1910.10605v1.pdf
PWC	https://paperswithcode.com/paper/speaker-adaptive-training-using-model
Repo	https://github.com/ondrejklejch/learning_to_adapt
Framework	tf

Meta-learning for fast classifier adaptation to new users of Signature Verification systems


Title	Meta-learning for fast classifier adaptation to new users of Signature Verification systems
Authors	Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira
Abstract	Offline Handwritten Signature verification presents a challenging Pattern Recognition problem, where only knowledge of the positive class is available for training. While classifiers have access to a few genuine signatures for training, during generalization they also need to discriminate forgeries. This is particularly challenging for skilled forgeries, where a forger practices imitating the user’s signature, and often is able to create forgeries visually close to the original signatures. Most work in the literature address this issue by training for a surrogate objective: discriminating genuine signatures of a user and random forgeries (signatures from other users). In this work, we propose a solution for this problem based on meta-learning, where there are two levels of learning: a task-level (where a task is to learn a classifier for a given user) and a meta-level (learning across tasks). In particular, the meta-learner guides the adaptation (learning) of a classifier for each user, which is a lightweight operation that only requires genuine signatures. The meta-learning procedure learns what is common for the classification across different users. In a scenario where skilled forgeries from a subset of users are available, the meta-learner can guide classifiers to be discriminative of skilled forgeries even if the classifiers themselves do not use skilled forgeries for learning. Experiments conducted on the GPDS-960 dataset show improved performance compared to Writer-Independent systems, and achieve results comparable to state-of-the-art Writer-Dependent systems in the regime of few samples per user (5 reference signatures).
Tasks	Meta-Learning
Published	2019-10-17
URL	https://arxiv.org/abs/1910.08060v1
PDF	https://arxiv.org/pdf/1910.08060v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-for-fast-classifier-adaptation
Repo	https://github.com/luizgh/sigver
Framework	pytorch

Deep Back-Projection Networks for Single Image Super-resolution


Title	Deep Back-Projection Networks for Single Image Super-resolution
Authors	Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita
Abstract	Previous feed-forward architectures of recently proposed deep super-resolution networks learn the features of low-resolution inputs and the non-linear mapping from those to a high-resolution output. However, this approach does not fully address the mutual dependencies of low- and high-resolution images. We propose Deep Back-Projection Networks (DBPN), the winner of two image super-resolution challenges (NTIRE2018 and PIRM2018), that exploit iterative up- and down-sampling layers. These layers are formed as a unit providing an error feedback mechanism for projection errors. We construct mutually-connected up- and down-sampling units each of which represents different types of image degradation and high-resolution components. We also show that extending this idea to several variants applying the latest deep network trends, such as recurrent network, dense connection, and residual learning, to improve the performance. The experimental results yield superior results and in particular establishing new state-of-the-art results across multiple data sets, especially for large scaling factors such as 8x.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-04-04
URL	http://arxiv.org/abs/1904.05677v1
PDF	http://arxiv.org/pdf/1904.05677v1.pdf
PWC	https://paperswithcode.com/paper/deep-back-projection-networks-for-single
Repo	https://github.com/alterzero/DBPN-Pytorch
Framework	pytorch

Multimodal Speech Emotion Recognition and Ambiguity Resolution


Title	Multimodal Speech Emotion Recognition and Ambiguity Resolution
Authors	Gaurav Sahu
Abstract	Identifying emotion from speech is a non-trivial task pertaining to the ambiguous definition of emotion itself. In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition. Formalizing our problem as a multi-class classification problem, we compare the performance of two categories of models. For both, we extract eight hand-crafted features from the audio signal. In the first approach, the extracted features are used to train six traditional machine learning classifiers, whereas the second approach is based on deep learning wherein a baseline feed-forward neural network and an LSTM-based classifier are trained over the same features. In order to resolve ambiguity in communication, we also include features from the text domain. We report accuracy, f-score, precision, and recall for the different experiment settings we evaluated our models in. Overall, we show that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition.
Tasks	Emotion Recognition, Feature Engineering, Multimodal Emotion Recognition, Speech Emotion Recognition
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06022v1
PDF	http://arxiv.org/pdf/1904.06022v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-speech-emotion-recognition-and
Repo	https://github.com/Demfier/multimodal-speech-emotion-recognition
Framework	pytorch

Development of Use-specific High Performance Cyber-Nanomaterial Optical Detectors by Effective Choice of Machine Learning Algorithms


Title	Development of Use-specific High Performance Cyber-Nanomaterial Optical Detectors by Effective Choice of Machine Learning Algorithms
Authors	Davoud Hejazi, Shuangjun Liu, Amirreza Farnoosh, Sarah Ostadabbas, Swastik Kar
Abstract	Due to their inherent variabilities,nanomaterial-based sensors are challenging to translate into real-world applications,where reliability/reproducibility is key.Recently we showed Bayesian inference can be employed on engineered variability in layered nanomaterial-based optical transmission filters to determine optical wavelengths with high accuracy/precision.In many practical applications the sensing cost/speed and long-term reliability can be equal or more important considerations.Though various machine learning tools are frequently used on sensor/detector networks to address these,nonetheless their effectiveness on nanomaterial-based sensors has not been explored.Here we show the best choice of ML algorithm in a cyber-nanomaterial detector is mainly determined by specific use considerations,e.g.,accuracy, computational cost,speed, and resilience against drifts/ageing effects.When sufficient data/computing resources are provided,highest sensing accuracy can be achieved by the kNN and Bayesian inference algorithms,but but can be computationally expensive for real-time applications.In contrast,artificial neural networks are computationally expensive to train,but provide the fastest result under testing conditions and remain reasonably accurate.When data is limited,SVMs perform well even with small training sets,while other algorithms show considerable reduction in accuracy if data is scarce,hence,setting a lower limit on the size of required training data.We show by tracking/modeling the long-term drifts of the detector performance over large (1year) period,it is possible to improve the predictive accuracy with no need for recalibration.Our research shows for the first time if the ML algorithm is chosen specific to use-case,low-cost solution-processed cyber-nanomaterial detectors can be practically implemented under diverse operational requirements,despite their inherent variabilities.
Tasks	Bayesian Inference
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11751v3
PDF	https://arxiv.org/pdf/1912.11751v3.pdf
PWC	https://paperswithcode.com/paper/development-of-use-specific-high-performance
Repo	https://github.com/ostadabbas/Machine-Learning-for-Precise-Optical-Wavelength-Estimation
Framework	pytorch

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds


Title	RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
Authors	Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, Andrew Markham
Abstract	We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI.
Tasks	3D Semantic Segmentation, Semantic Segmentation
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11236v2
PDF	https://arxiv.org/pdf/1911.11236v2.pdf
PWC	https://paperswithcode.com/paper/191111236
Repo	https://github.com/QingyongHu/RandLA-Net
Framework	tf

Conditional LSTM-GAN for Melody Generation from Lyrics


Title	Conditional LSTM-GAN for Melody Generation from Lyrics
Authors	Yi Yu, Simon Canales
Abstract	Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables to learn and discover latent relationship between interesting lyrics and accompanying melody. Unfortunately, the limited availability of paired lyrics-melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory - Generative Adversarial Network (LSTM-GAN) for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics.
Tasks
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05551v1
PDF	https://arxiv.org/pdf/1908.05551v1.pdf
PWC	https://paperswithcode.com/paper/conditional-lstm-gan-for-melody-generation
Repo	https://github.com/rachit221195/melody-generation-from-lyrics
Framework	pytorch

Boosting Scene Character Recognition by Learning Canonical Forms of Glyphs


Title	Boosting Scene Character Recognition by Learning Canonical Forms of Glyphs
Authors	Yizhi Wang, Zhouhui Lian, Yingmin Tang, Jianguo Xiao
Abstract	As one of the fundamental problems in document analysis, scene character recognition has attracted considerable interests in recent years. But the problem is still considered to be extremely challenging due to many uncontrollable factors including glyph transformation, blur, noisy background, uneven illumination, etc. In this paper, we propose a novel methodology for boosting scene character recognition by learning canonical forms of glyphs, based on the fact that characters appearing in scene images are all derived from their corresponding canonical forms. Our key observation is that more discriminative features can be learned by solving specially-designed generative tasks compared to traditional classification-based feature learning frameworks. Specifically, we design a GAN-based model to make the learned deep feature of a given scene character be capable of reconstructing corresponding glyphs in a number of standard font styles. In this manner, we obtain deep features for scene characters that are more discriminative in recognition and less sensitive against the above-mentioned factors. Our experiments conducted on several publicly-available databases demonstrate the superiority of our method compared to the state of the art.
Tasks
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05577v2
PDF	https://arxiv.org/pdf/1907.05577v2.pdf
PWC	https://paperswithcode.com/paper/boosting-scene-character-recognition-by
Repo	https://github.com/Actasidiot/CGRN
Framework	tf

CenterFace: Joint Face Detection and Alignment Using Face as Point


Title	CenterFace: Joint Face Detection and Alignment Using Face as Point
Authors	Yuanyuan Xu, Wan Yan, Haixin Sun, Genke Yang, Jiliang Luo
Abstract	Face detection and alignment in unconstrained environment is always deployed on edge devices which have limited memory storage and low computing power. This paper proposes a one-stage method named CenterFace to simultaneously predict facial box and landmark location with real-time speed and high accuracy. The proposed method also belongs to the anchor free category. This is achieved by: (a) learning face existing possibility by the semantic maps, (b) learning bounding box, offsets and five landmarks for each position that potentially contains a face. Specifically, the method can run in real-time on a single CPU core and 200 FPS using NVIDIA 2080TI for VGA-resolution images, and can simultaneously achieve superior accuracy (WIDER FACE Val/Test-Easy: 0.935/0.932, Medium: 0.924/0.921, Hard: 0.875/0.873 and FDDB discontinuous: 0.980, continuous: 0.732). A demo of CenterFace can be available at https://github.com/Star-Clouds/CenterFace.
Tasks	Face Detection
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03599v1
PDF	https://arxiv.org/pdf/1911.03599v1.pdf
PWC	https://paperswithcode.com/paper/centerface-joint-face-detection-and-alignment
Repo	https://github.com/Star-Clouds/CenterFace
Framework	none

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization


Title	ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization
Authors	Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Quoc Tran-Dinh
Abstract	We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al, 2017) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The algorithms only require an average smoothness assumption of the nonconvex objective term and additional bounded variance assumption if applied to expectation problems. They work with both constant and adaptive step-sizes, while allowing single sample and mini-batches. In all these cases, we prove that our algorithms can achieve the best-known complexity bounds. One key step of our methods is new constant and adaptive step-sizes that help to achieve desired complexity bounds while improving practical performance. Our constant step-size is much larger than existing methods including proximal SVRG schemes in the single sample case. We also specify the algorithm to the non-composite case that covers existing state-of-the-arts in terms of complexity bounds. Our update also allows one to trade-off between step-sizes and mini-batch sizes to improve performance. We test the proposed algorithms on two composite nonconvex problems and neural networks using several well-known datasets.
Tasks
Published	2019-02-15
URL	http://arxiv.org/abs/1902.05679v2
PDF	http://arxiv.org/pdf/1902.05679v2.pdf
PWC	https://paperswithcode.com/paper/proxsarah-an-efficient-algorithmic-framework
Repo	https://github.com/unc-optimization/StochasticProximalMethods
Framework	tf

Convolutional Neural Network with Median Layers for Denoising Salt-and-Pepper Contaminations


Title	Convolutional Neural Network with Median Layers for Denoising Salt-and-Pepper Contaminations
Authors	Luming Liang, Sen Deng, Lionel Gueguen, Mingqiang Wei, Xinming Wu, Jing Qin
Abstract	We propose a deep fully convolutional neural network with a new type of layer, named median layer, to restore images contaminated by the salt-and-pepper (s&p) noise. A median layer simply performs median filtering on all feature channels. By adding this kind of layer into some widely used fully convolutional deep neural networks, we develop an end-to-end network that removes the extremely high-level s&p noise without performing any non-trivial preprocessing tasks, which is different from all the existing literature in s&p noise removal. Experiments show that inserting median layers into a simple fully-convolutional network with the L2 loss significantly boosts the signal-to-noise ratio. Quantitative comparisons testify that our network outperforms the state-of-the-art methods with a limited amount of training data. The source code has been released for public evaluation and use (https://github.com/llmpass/medianDenoise).
Tasks	Denoising
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06452v1
PDF	https://arxiv.org/pdf/1908.06452v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-network-with-median
Repo	https://github.com/llmpass/medianDenoise
Framework	tf