Paper Group AWR 32
Subject Cross Validation in Human Activity Recognition. Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis. BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification. Integrating Relation Constraints with Neural Relation Extractors. Speaker Adaptive Training using Model Agnostic Meta-Learning. Meta-learning fo …
Subject Cross Validation in Human Activity Recognition
Title | Subject Cross Validation in Human Activity Recognition |
Authors | Akbar Dehghani, Tristan Glatard, Emad Shihab |
Abstract | K-fold Cross Validation is commonly used to evaluate classifiers and tune their hyperparameters. However, it assumes that data points are Independent and Identically Distributed (i.i.d.) so that samples used in the training and test sets can be selected randomly and uniformly. In Human Activity Recognition datasets, we note that the samples produced by the same subjects are likely to be correlated due to diverse factors. Hence, k-fold cross validation may overestimate the performance of activity recognizers, in particular when overlapping sliding windows are used. In this paper, we investigate the effect of Subject Cross Validation on the performance of Human Activity Recognition, both with non-overlapping and with overlapping sliding windows. Results show that k-fold cross validation artificially increases the performance of recognizers by about 10%, and even by 16% when overlapping windows are used. In addition, we do not observe any performance gain from the use of overlapping windows. We conclude that Human Activity Recognition systems should be evaluated by Subject Cross Validation, and that overlapping windows are not worth their extra computational cost. |
Tasks | Activity Recognition, Human Activity Recognition |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02666v2 |
http://arxiv.org/pdf/1904.02666v2.pdf | |
PWC | https://paperswithcode.com/paper/subject-cross-validation-in-human-activity |
Repo | https://github.com/big-data-lab-team/paper-generalizability-window-size |
Framework | none |
Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis
Title | Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis |
Authors | Zongwei Zhou, Vatsal Sodha, Md Mahfuzur Rahman Siddiquee, Ruibin Feng, Nima Tajbakhsh, Michael B. Gotway, Jianming Liang |
Abstract | Transfer learning from natural image to medical image has established as one of the most practical paradigms in deep learning for medical image analysis. However, to fit this paradigm, 3D imaging tasks in the most prominent imaging modalities (e.g., CT and MRI) have to be reformulated and solved in 2D, losing rich 3D anatomical information and inevitably compromising the performance. To overcome this limitation, we have built a set of models, called Generic Autodidactic Models, nicknamed Models Genesis, because they are created ex nihilo (with no manual labeling), self-taught (learned by self-supervision), and generic (served as source models for generating application-specific target models). Our extensive experiments demonstrate that our Models Genesis significantly outperform learning from scratch in all five target 3D applications covering both segmentation and classification. More importantly, learning a model from scratch simply in 3D may not necessarily yield performance better than transfer learning from ImageNet in 2D, but our Models Genesis consistently top any 2D approaches including fine-tuning the models pre-trained from ImageNet as well as fine-tuning the 2D versions of our Models Genesis, confirming the importance of 3D anatomical information and significance of our Models Genesis for 3D medical imaging. This performance is attributed to our unified self-supervised learning framework, built on a simple yet powerful observation: the sophisticated yet recurrent anatomy in medical images can serve as strong supervision signals for deep models to learn common anatomical representation automatically via self-supervision. As open science, all pre-trained Models Genesis are available at https://github.com/MrGiovanni/ModelsGenesis. |
Tasks | Brain Tumor Segmentation, Liver Segmentation, Lung Nodule Detection, Lung Nodule Segmentation, Pulmonary Embolism Detection, Transfer Learning |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06912v1 |
https://arxiv.org/pdf/1908.06912v1.pdf | |
PWC | https://paperswithcode.com/paper/models-genesis-generic-autodidactic-models |
Repo | https://github.com/cswin/AWC |
Framework | pytorch |
BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification
Title | BreizhCrops: A Satellite Time Series Dataset for Crop Type Identification |
Authors | Marc Rußwurm, Sébastien Lefèvre, Marco Körner |
Abstract | This dataset challenges the time series community with the task of satellite-based vegetation identification on large scale real-world dataset of satellite data acquired during one entire year. It consists of time series data with associated crop types from 580k field parcels in Brittany, France (Breizh in local language). Along with this dataset, we provide results and code of a Long Short-Term Memory network and Transformer network as baselines. We release dataset, along with preprocessing scripts and baseline models in https://github.com/TUM-LMF/BreizhCrops and encourage methodical researchers to benchmark and develop novel methods applied to satellite-based crop monitoring. |
Tasks | Time Series |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11893v1 |
https://arxiv.org/pdf/1905.11893v1.pdf | |
PWC | https://paperswithcode.com/paper/breizhcrops-a-satellite-time-series-dataset |
Repo | https://github.com/TUM-LMF/BreizhCrops |
Framework | pytorch |
Integrating Relation Constraints with Neural Relation Extractors
Title | Integrating Relation Constraints with Neural Relation Extractors |
Authors | Yuan Ye, Yansong Feng, Bingfeng Luo, Yuxuan Lai, Dongyan Zhao |
Abstract | Recent years have seen rapid progress in identifying predefined relationship between entity pairs using neural networks NNs. However, such models often make predictions for each entity pair individually, thus often fail to solve the inconsistency among different predictions, which can be characterized by discrete relation constraints. These constraints are often defined over combinations of entity-relation-entity triples, since there often lack of explicitly well-defined type and cardinality requirements for the relations. In this paper, we propose a unified framework to integrate relation constraints with NNs by introducing a new loss term, ConstraintLoss. Particularly, we develop two efficient methods to capture how well the local predictions from multiple instance pairs satisfy the relation constraints. Experiments on both English and Chinese datasets show that our approach can help NNs learn from discrete relation constraints to reduce inconsistency among local predictions, and outperform popular neural relation extraction NRE models even enhanced with extra post-processing. Our source code and datasets will be released at https://github.com/PKUYeYuan/Constraint-Loss-AAAI-2020. |
Tasks | Relation Extraction |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11493v1 |
https://arxiv.org/pdf/1911.11493v1.pdf | |
PWC | https://paperswithcode.com/paper/integrating-relation-constraints-with-neural |
Repo | https://github.com/PKUYeYuan/Constraint-Loss-AAAI-2020 |
Framework | none |
Speaker Adaptive Training using Model Agnostic Meta-Learning
Title | Speaker Adaptive Training using Model Agnostic Meta-Learning |
Authors | Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals |
Abstract | Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions. Conventionally, model-based speaker adaptive training is performed by having a set of speaker dependent parameters that are jointly optimised with speaker independent parameters in order to remove speaker variation. However, this does not scale well if all neural network weights are to be adapted to the speaker. In this paper we formulate speaker adaptive training as a meta-learning task, in which an adaptation process using gradient descent is encoded directly into the training of the model. We compare our approach with test-only adaptation of a standard baseline model and a SAT-LHUC model with a learned speaker adaptation schedule and demonstrate that the meta-learning approach achieves comparable results. |
Tasks | Meta-Learning |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10605v1 |
https://arxiv.org/pdf/1910.10605v1.pdf | |
PWC | https://paperswithcode.com/paper/speaker-adaptive-training-using-model |
Repo | https://github.com/ondrejklejch/learning_to_adapt |
Framework | tf |
Meta-learning for fast classifier adaptation to new users of Signature Verification systems
Title | Meta-learning for fast classifier adaptation to new users of Signature Verification systems |
Authors | Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira |
Abstract | Offline Handwritten Signature verification presents a challenging Pattern Recognition problem, where only knowledge of the positive class is available for training. While classifiers have access to a few genuine signatures for training, during generalization they also need to discriminate forgeries. This is particularly challenging for skilled forgeries, where a forger practices imitating the user’s signature, and often is able to create forgeries visually close to the original signatures. Most work in the literature address this issue by training for a surrogate objective: discriminating genuine signatures of a user and random forgeries (signatures from other users). In this work, we propose a solution for this problem based on meta-learning, where there are two levels of learning: a task-level (where a task is to learn a classifier for a given user) and a meta-level (learning across tasks). In particular, the meta-learner guides the adaptation (learning) of a classifier for each user, which is a lightweight operation that only requires genuine signatures. The meta-learning procedure learns what is common for the classification across different users. In a scenario where skilled forgeries from a subset of users are available, the meta-learner can guide classifiers to be discriminative of skilled forgeries even if the classifiers themselves do not use skilled forgeries for learning. Experiments conducted on the GPDS-960 dataset show improved performance compared to Writer-Independent systems, and achieve results comparable to state-of-the-art Writer-Dependent systems in the regime of few samples per user (5 reference signatures). |
Tasks | Meta-Learning |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.08060v1 |
https://arxiv.org/pdf/1910.08060v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-for-fast-classifier-adaptation |
Repo | https://github.com/luizgh/sigver |
Framework | pytorch |
Deep Back-Projection Networks for Single Image Super-resolution
Title | Deep Back-Projection Networks for Single Image Super-resolution |
Authors | Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita |
Abstract | Previous feed-forward architectures of recently proposed deep super-resolution networks learn the features of low-resolution inputs and the non-linear mapping from those to a high-resolution output. However, this approach does not fully address the mutual dependencies of low- and high-resolution images. We propose Deep Back-Projection Networks (DBPN), the winner of two image super-resolution challenges (NTIRE2018 and PIRM2018), that exploit iterative up- and down-sampling layers. These layers are formed as a unit providing an error feedback mechanism for projection errors. We construct mutually-connected up- and down-sampling units each of which represents different types of image degradation and high-resolution components. We also show that extending this idea to several variants applying the latest deep network trends, such as recurrent network, dense connection, and residual learning, to improve the performance. The experimental results yield superior results and in particular establishing new state-of-the-art results across multiple data sets, especially for large scaling factors such as 8x. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.05677v1 |
http://arxiv.org/pdf/1904.05677v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-back-projection-networks-for-single |
Repo | https://github.com/alterzero/DBPN-Pytorch |
Framework | pytorch |
Multimodal Speech Emotion Recognition and Ambiguity Resolution
Title | Multimodal Speech Emotion Recognition and Ambiguity Resolution |
Authors | Gaurav Sahu |
Abstract | Identifying emotion from speech is a non-trivial task pertaining to the ambiguous definition of emotion itself. In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition. Formalizing our problem as a multi-class classification problem, we compare the performance of two categories of models. For both, we extract eight hand-crafted features from the audio signal. In the first approach, the extracted features are used to train six traditional machine learning classifiers, whereas the second approach is based on deep learning wherein a baseline feed-forward neural network and an LSTM-based classifier are trained over the same features. In order to resolve ambiguity in communication, we also include features from the text domain. We report accuracy, f-score, precision, and recall for the different experiment settings we evaluated our models in. Overall, we show that lighter machine learning based models trained over a few hand-crafted features are able to achieve performance comparable to the current deep learning based state-of-the-art method for emotion recognition. |
Tasks | Emotion Recognition, Feature Engineering, Multimodal Emotion Recognition, Speech Emotion Recognition |
Published | 2019-04-12 |
URL | http://arxiv.org/abs/1904.06022v1 |
http://arxiv.org/pdf/1904.06022v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-speech-emotion-recognition-and |
Repo | https://github.com/Demfier/multimodal-speech-emotion-recognition |
Framework | pytorch |
Development of Use-specific High Performance Cyber-Nanomaterial Optical Detectors by Effective Choice of Machine Learning Algorithms
Title | Development of Use-specific High Performance Cyber-Nanomaterial Optical Detectors by Effective Choice of Machine Learning Algorithms |
Authors | Davoud Hejazi, Shuangjun Liu, Amirreza Farnoosh, Sarah Ostadabbas, Swastik Kar |
Abstract | Due to their inherent variabilities,nanomaterial-based sensors are challenging to translate into real-world applications,where reliability/reproducibility is key.Recently we showed Bayesian inference can be employed on engineered variability in layered nanomaterial-based optical transmission filters to determine optical wavelengths with high accuracy/precision.In many practical applications the sensing cost/speed and long-term reliability can be equal or more important considerations.Though various machine learning tools are frequently used on sensor/detector networks to address these,nonetheless their effectiveness on nanomaterial-based sensors has not been explored.Here we show the best choice of ML algorithm in a cyber-nanomaterial detector is mainly determined by specific use considerations,e.g.,accuracy, computational cost,speed, and resilience against drifts/ageing effects.When sufficient data/computing resources are provided,highest sensing accuracy can be achieved by the kNN and Bayesian inference algorithms,but but can be computationally expensive for real-time applications.In contrast,artificial neural networks are computationally expensive to train,but provide the fastest result under testing conditions and remain reasonably accurate.When data is limited,SVMs perform well even with small training sets,while other algorithms show considerable reduction in accuracy if data is scarce,hence,setting a lower limit on the size of required training data.We show by tracking/modeling the long-term drifts of the detector performance over large (1year) period,it is possible to improve the predictive accuracy with no need for recalibration.Our research shows for the first time if the ML algorithm is chosen specific to use-case,low-cost solution-processed cyber-nanomaterial detectors can be practically implemented under diverse operational requirements,despite their inherent variabilities. |
Tasks | Bayesian Inference |
Published | 2019-12-26 |
URL | https://arxiv.org/abs/1912.11751v3 |
https://arxiv.org/pdf/1912.11751v3.pdf | |
PWC | https://paperswithcode.com/paper/development-of-use-specific-high-performance |
Repo | https://github.com/ostadabbas/Machine-Learning-for-Precise-Optical-Wavelength-Estimation |
Framework | pytorch |
RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
Title | RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds |
Authors | Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, Andrew Markham |
Abstract | We study the problem of efficient semantic segmentation for large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Extensive experiments show that our RandLA-Net can process 1 million points in a single pass with up to 200X faster than existing approaches. Moreover, our RandLA-Net clearly surpasses state-of-the-art approaches for semantic segmentation on two large-scale benchmarks Semantic3D and SemanticKITTI. |
Tasks | 3D Semantic Segmentation, Semantic Segmentation |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11236v2 |
https://arxiv.org/pdf/1911.11236v2.pdf | |
PWC | https://paperswithcode.com/paper/191111236 |
Repo | https://github.com/QingyongHu/RandLA-Net |
Framework | tf |
Conditional LSTM-GAN for Melody Generation from Lyrics
Title | Conditional LSTM-GAN for Melody Generation from Lyrics |
Authors | Yi Yu, Simon Canales |
Abstract | Melody generation from lyrics has been a challenging research issue in the field of artificial intelligence and music, which enables to learn and discover latent relationship between interesting lyrics and accompanying melody. Unfortunately, the limited availability of paired lyrics-melody dataset with alignment information has hindered the research progress. To address this problem, we create a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment through leveraging different music sources where alignment relationship between syllables and music attributes is extracted. Most importantly, we propose a novel deep generative model, conditional Long Short-Term Memory - Generative Adversarial Network (LSTM-GAN) for melody generation from lyrics, which contains a deep LSTM generator and a deep LSTM discriminator both conditioned on lyrics. In particular, lyrics-conditioned melody and alignment relationship between syllables of given lyrics and notes of predicted melody are generated simultaneously. Experimental results have proved the effectiveness of our proposed lyrics-to-melody generative model, where plausible and tuneful sequences can be inferred from lyrics. |
Tasks | |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05551v1 |
https://arxiv.org/pdf/1908.05551v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-lstm-gan-for-melody-generation |
Repo | https://github.com/rachit221195/melody-generation-from-lyrics |
Framework | pytorch |
Boosting Scene Character Recognition by Learning Canonical Forms of Glyphs
Title | Boosting Scene Character Recognition by Learning Canonical Forms of Glyphs |
Authors | Yizhi Wang, Zhouhui Lian, Yingmin Tang, Jianguo Xiao |
Abstract | As one of the fundamental problems in document analysis, scene character recognition has attracted considerable interests in recent years. But the problem is still considered to be extremely challenging due to many uncontrollable factors including glyph transformation, blur, noisy background, uneven illumination, etc. In this paper, we propose a novel methodology for boosting scene character recognition by learning canonical forms of glyphs, based on the fact that characters appearing in scene images are all derived from their corresponding canonical forms. Our key observation is that more discriminative features can be learned by solving specially-designed generative tasks compared to traditional classification-based feature learning frameworks. Specifically, we design a GAN-based model to make the learned deep feature of a given scene character be capable of reconstructing corresponding glyphs in a number of standard font styles. In this manner, we obtain deep features for scene characters that are more discriminative in recognition and less sensitive against the above-mentioned factors. Our experiments conducted on several publicly-available databases demonstrate the superiority of our method compared to the state of the art. |
Tasks | |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05577v2 |
https://arxiv.org/pdf/1907.05577v2.pdf | |
PWC | https://paperswithcode.com/paper/boosting-scene-character-recognition-by |
Repo | https://github.com/Actasidiot/CGRN |
Framework | tf |
CenterFace: Joint Face Detection and Alignment Using Face as Point
Title | CenterFace: Joint Face Detection and Alignment Using Face as Point |
Authors | Yuanyuan Xu, Wan Yan, Haixin Sun, Genke Yang, Jiliang Luo |
Abstract | Face detection and alignment in unconstrained environment is always deployed on edge devices which have limited memory storage and low computing power. This paper proposes a one-stage method named CenterFace to simultaneously predict facial box and landmark location with real-time speed and high accuracy. The proposed method also belongs to the anchor free category. This is achieved by: (a) learning face existing possibility by the semantic maps, (b) learning bounding box, offsets and five landmarks for each position that potentially contains a face. Specifically, the method can run in real-time on a single CPU core and 200 FPS using NVIDIA 2080TI for VGA-resolution images, and can simultaneously achieve superior accuracy (WIDER FACE Val/Test-Easy: 0.935/0.932, Medium: 0.924/0.921, Hard: 0.875/0.873 and FDDB discontinuous: 0.980, continuous: 0.732). A demo of CenterFace can be available at https://github.com/Star-Clouds/CenterFace. |
Tasks | Face Detection |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03599v1 |
https://arxiv.org/pdf/1911.03599v1.pdf | |
PWC | https://paperswithcode.com/paper/centerface-joint-face-detection-and-alignment |
Repo | https://github.com/Star-Clouds/CenterFace |
Framework | none |
ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization
Title | ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization |
Authors | Nhan H. Pham, Lam M. Nguyen, Dzung T. Phan, Quoc Tran-Dinh |
Abstract | We propose a new stochastic first-order algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al, 2017) and consist of two steps: a proximal gradient and an averaging step making them different from existing nonconvex proximal-type algorithms. The algorithms only require an average smoothness assumption of the nonconvex objective term and additional bounded variance assumption if applied to expectation problems. They work with both constant and adaptive step-sizes, while allowing single sample and mini-batches. In all these cases, we prove that our algorithms can achieve the best-known complexity bounds. One key step of our methods is new constant and adaptive step-sizes that help to achieve desired complexity bounds while improving practical performance. Our constant step-size is much larger than existing methods including proximal SVRG schemes in the single sample case. We also specify the algorithm to the non-composite case that covers existing state-of-the-arts in terms of complexity bounds. Our update also allows one to trade-off between step-sizes and mini-batch sizes to improve performance. We test the proposed algorithms on two composite nonconvex problems and neural networks using several well-known datasets. |
Tasks | |
Published | 2019-02-15 |
URL | http://arxiv.org/abs/1902.05679v2 |
http://arxiv.org/pdf/1902.05679v2.pdf | |
PWC | https://paperswithcode.com/paper/proxsarah-an-efficient-algorithmic-framework |
Repo | https://github.com/unc-optimization/StochasticProximalMethods |
Framework | tf |
Convolutional Neural Network with Median Layers for Denoising Salt-and-Pepper Contaminations
Title | Convolutional Neural Network with Median Layers for Denoising Salt-and-Pepper Contaminations |
Authors | Luming Liang, Sen Deng, Lionel Gueguen, Mingqiang Wei, Xinming Wu, Jing Qin |
Abstract | We propose a deep fully convolutional neural network with a new type of layer, named median layer, to restore images contaminated by the salt-and-pepper (s&p) noise. A median layer simply performs median filtering on all feature channels. By adding this kind of layer into some widely used fully convolutional deep neural networks, we develop an end-to-end network that removes the extremely high-level s&p noise without performing any non-trivial preprocessing tasks, which is different from all the existing literature in s&p noise removal. Experiments show that inserting median layers into a simple fully-convolutional network with the L2 loss significantly boosts the signal-to-noise ratio. Quantitative comparisons testify that our network outperforms the state-of-the-art methods with a limited amount of training data. The source code has been released for public evaluation and use (https://github.com/llmpass/medianDenoise). |
Tasks | Denoising |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06452v1 |
https://arxiv.org/pdf/1908.06452v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-network-with-median |
Repo | https://github.com/llmpass/medianDenoise |
Framework | tf |