Paper Group ANR 491
Collaborative Learning for Deep Neural Networks. Do Better ImageNet Models Transfer Better?. Boosting for Comparison-Based Learning. Recurrent Multiresolution Convolutional Networks for VHR Image Classification. An empirical evaluation of AMR parsing for legal documents. Spline Error Weighting for Robust Visual-Inertial Fusion. Sign-Perturbed Sums: …
Collaborative Learning for Deep Neural Networks
Title | Collaborative Learning for Deep Neural Networks |
Authors | Guocong Song, Wei Chai |
Abstract | We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It acquires the strengths from auxiliary training, multi-task learning and knowledge distillation. There are two important mechanisms involved in collaborative learning. First, the consensus of multiple views from different classifier heads on the same example provides supplementary information as well as regularization to each classifier, thereby improving generalization. Second, intermediate-level representation (ILR) sharing with backpropagation rescaling aggregates the gradient flows from all heads, which not only reduces training computational complexity, but also facilitates supervision to the shared layers. The empirical results on CIFAR and ImageNet datasets demonstrate that deep neural networks learned as a group in a collaborative way significantly reduce the generalization error and increase the robustness to label noise. |
Tasks | Multi-Task Learning |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11761v2 |
http://arxiv.org/pdf/1805.11761v2.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-learning-for-deep-neural |
Repo | |
Framework | |
Do Better ImageNet Models Transfer Better?
Title | Do Better ImageNet Models Transfer Better? |
Authors | Simon Kornblith, Jonathon Shlens, Quoc V. Le |
Abstract | Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks. However, this hypothesis has never been systematically tested. Here, we compare the performance of 16 classification networks on 12 image classification datasets. We find that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy ($r = 0.99$ and $0.96$, respectively). In the former setting, we find that this relationship is very sensitive to the way in which networks are trained on ImageNet; many common forms of regularization slightly improve ImageNet accuracy but yield penultimate layer features that are much worse for transfer learning. Additionally, we find that, on two small fine-grained image classification datasets, pretraining on ImageNet provides minimal benefits, indicating the learned features from ImageNet do not transfer well to fine-grained tasks. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested. |
Tasks | Fine-Grained Image Classification, Image Classification, Transfer Learning |
Published | 2018-05-23 |
URL | https://arxiv.org/abs/1805.08974v3 |
https://arxiv.org/pdf/1805.08974v3.pdf | |
PWC | https://paperswithcode.com/paper/do-better-imagenet-models-transfer-better |
Repo | |
Framework | |
Boosting for Comparison-Based Learning
Title | Boosting for Comparison-Based Learning |
Authors | Michaël Perrot, Ulrike von Luxburg |
Abstract | We consider the problem of classification in a comparison-based setting: given a set of objects, we only have access to triplet comparisons of the form “object $x_i$ is closer to object $x_j$ than to object $x_k$.” In this paper we introduce TripletBoost, a new method that can learn a classifier just from such triplet comparisons. The main idea is to aggregate the triplets information into weak classifiers, which can subsequently be boosted to a strong classifier. Our method has two main advantages: (i) it is applicable to data from any metric space, and (ii) it can deal with large scale problems using only passively obtained and noisy triplets. We derive theoretical generalization guarantees and a lower bound on the number of necessary triplets, and we empirically show that our method is both competitive with state of the art approaches and resistant to noise. |
Tasks | |
Published | 2018-10-31 |
URL | https://arxiv.org/abs/1810.13333v2 |
https://arxiv.org/pdf/1810.13333v2.pdf | |
PWC | https://paperswithcode.com/paper/boosting-for-comparison-based-learning |
Repo | |
Framework | |
Recurrent Multiresolution Convolutional Networks for VHR Image Classification
Title | Recurrent Multiresolution Convolutional Networks for VHR Image Classification |
Authors | John Ray Bergado, Claudio Persello, Alfred Stein |
Abstract | Classification of very high resolution (VHR) satellite images has three major challenges: 1) inherent low intra-class and high inter-class spectral similarities, 2) mismatching resolution of available bands, and 3) the need to regularize noisy classification maps. Conventional methods have addressed these challenges by adopting separate stages of image fusion, feature extraction, and post-classification map regularization. These processing stages, however, are not jointly optimizing the classification task at hand. In this study, we propose a single-stage framework embedding the processing stages in a recurrent multiresolution convolutional network trained in an end-to-end manner. The feedforward version of the network, called FuseNet, aims to match the resolution of the panchromatic and multispectral bands in a VHR image using convolutional layers with corresponding downsampling and upsampling operations. Contextual label information is incorporated into FuseNet by means of a recurrent version called ReuseNet. We compared FuseNet and ReuseNet against the use of separate processing steps for both image fusion, e.g. pansharpening and resampling through interpolation, and map regularization such as conditional random fields. We carried out our experiments on a land cover classification task using a Worldview-03 image of Quezon City, Philippines and the ISPRS 2D semantic labeling benchmark dataset of Vaihingen, Germany. FuseNet and ReuseNet surpass the baseline approaches in both quantitative and qualitative results. |
Tasks | Image Classification |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.05793v1 |
http://arxiv.org/pdf/1806.05793v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-multiresolution-convolutional |
Repo | |
Framework | |
An empirical evaluation of AMR parsing for legal documents
Title | An empirical evaluation of AMR parsing for legal documents |
Authors | Sinh Vu Trong, Minh Nguyen Le |
Abstract | Many approaches have been proposed to tackle the problem of Abstract Meaning Representation (AMR) parsing, helps solving various natural language processing issues recently. In our paper, we provide an overview of different methods in AMR parsing and their performances when analyzing legal documents. We conduct experiments of different AMR parsers on our annotated dataset extracted from the English version of Japanese Civil Code. Our results show the limitations as well as open a room for improvements of current parsing techniques when applying in this complicated domain. |
Tasks | Amr Parsing |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08078v1 |
http://arxiv.org/pdf/1811.08078v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-evaluation-of-amr-parsing-for |
Repo | |
Framework | |
Spline Error Weighting for Robust Visual-Inertial Fusion
Title | Spline Error Weighting for Robust Visual-Inertial Fusion |
Authors | Hannes Ovrén, Per-Erik Forssén |
Abstract | In this paper we derive and test a probability-based weighting that can balance residuals of different types in spline fitting. In contrast to previous formulations, the proposed spline error weighting scheme also incorporates a prediction of the approximation error of the spline fit. We demonstrate the effectiveness of the prediction in a synthetic experiment, and apply it to visual-inertial fusion on rolling shutter cameras. This results in a method that can estimate 3D structure with metric scale on generic first-person videos. We also propose a quality measure for spline fitting, that can be used to automatically select the knot spacing. Experiments verify that the obtained trajectory quality corresponds well with the requested quality. Finally, by linearly scaling the weights, we show that the proposed spline error weighting minimizes the estimation errors on real sequences, in terms of scale and end-point errors. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04820v1 |
http://arxiv.org/pdf/1804.04820v1.pdf | |
PWC | https://paperswithcode.com/paper/spline-error-weighting-for-robust-visual-1 |
Repo | |
Framework | |
Sign-Perturbed Sums: A New System Identification Approach for Constructing Exact Non-Asymptotic Confidence Regions in Linear Regression Models
Title | Sign-Perturbed Sums: A New System Identification Approach for Constructing Exact Non-Asymptotic Confidence Regions in Linear Regression Models |
Authors | Balázs Cs. Csáji, Marco C. Campi, Erik Weyer |
Abstract | We propose a new system identification method, called Sign-Perturbed Sums (SPS), for constructing non-asymptotic confidence regions under mild statistical assumptions. SPS is introduced for linear regression models, including but not limited to FIR systems, and we show that the SPS confidence regions have exact confidence probabilities, i.e., they contain the true parameter with a user-chosen exact probability for any finite data set. Moreover, we also prove that the SPS regions are star convex with the Least-Squares (LS) estimate as a star center. The main assumptions of SPS are that the noise terms are independent and symmetrically distributed about zero, but they can be nonstationary, and their distributions need not be known. The paper also proposes a computationally efficient ellipsoidal outer approximation algorithm for SPS. Finally, SPS is demonstrated through a number of simulation experiments. |
Tasks | |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08216v1 |
http://arxiv.org/pdf/1807.08216v1.pdf | |
PWC | https://paperswithcode.com/paper/sign-perturbed-sums-a-new-system |
Repo | |
Framework | |
Gated Recurrent Unit Based Acoustic Modeling with Future Context
Title | Gated Recurrent Unit Based Acoustic Modeling with Future Context |
Authors | Jie Li, Xiaorui Wang, Yuanyuan Zhao, Yan Li |
Abstract | The use of future contextual information is typically shown to be helpful for acoustic modeling. However, for the recurrent neural network (RNN), it’s not so easy to model the future temporal context effectively, meanwhile keep lower model latency. In this paper, we attempt to design a RNN acoustic model that being capable of utilizing the future context effectively and directly, with the model latency and computation cost as low as possible. The proposed model is based on the minimal gated recurrent unit (mGRU) with an input projection layer inserted in it. Two context modules, temporal encoding and temporal convolution, are specifically designed for this architecture to model the future context. Experimental results on the Switchboard task and an internal Mandarin ASR task show that, the proposed model performs much better than long short-term memory (LSTM) and mGRU models, whereas enables online decoding with a maximum latency of 170 ms. This model even outperforms a very strong baseline, TDNN-LSTM, with smaller model latency and almost half less parameters. |
Tasks | |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07024v1 |
http://arxiv.org/pdf/1805.07024v1.pdf | |
PWC | https://paperswithcode.com/paper/gated-recurrent-unit-based-acoustic-modeling |
Repo | |
Framework | |
Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only
Title | Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only |
Authors | Sergey Zakharov, Benjamin Planche, Ziyan Wu, Andreas Hutter, Harald Kosch, Slobodan Ilic |
Abstract | With the increasing availability of large databases of 3D CAD models, depth-based recognition methods can be trained on an uncountable number of synthetically rendered images. However, discrepancies with the real data acquired from various depth sensors still noticeably impede progress. Previous works adopted unsupervised approaches to generate more realistic depth data, but they all require real scans for training, even if unlabeled. This still represents a strong requirement, especially when considering real-life/industrial settings where real training images are hard or impossible to acquire, but texture-less 3D models are available. We thus propose a novel approach leveraging only CAD models to bridge the realism gap. Purely trained on synthetic data, playing against an extensive augmentation pipeline in an unsupervised manner, our generative adversarial network learns to effectively segment depth images and recover the clean synthetic-looking depth information even from partial occlusions. As our solution is not only fully decoupled from the real domains but also from the task-specific analytics, the pre-processed scans can be handed to any kind and number of recognition methods also trained on synthetic data. Through various experiments, we demonstrate how this simplifies their training and consistently enhances their performance, with results on par with the same methods trained on real data, and better than usual approaches doing the reverse mapping. |
Tasks | |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.09113v2 |
http://arxiv.org/pdf/1804.09113v2.pdf | |
PWC | https://paperswithcode.com/paper/keep-it-unreal-bridging-the-realism-gap-for |
Repo | |
Framework | |
Data Poisoning Attacks in Contextual Bandits
Title | Data Poisoning Attacks in Contextual Bandits |
Authors | Yuzhe Ma, Kwang-Sung Jun, Lihong Li, Xiaojin Zhu |
Abstract | We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm. |
Tasks | data poisoning, Multi-Armed Bandits |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05760v2 |
http://arxiv.org/pdf/1808.05760v2.pdf | |
PWC | https://paperswithcode.com/paper/data-poisoning-attacks-in-contextual-bandits |
Repo | |
Framework | |
State-of-the-Art Economic Load Dispatch of Power Systems Using Particle Swarm Optimization
Title | State-of-the-Art Economic Load Dispatch of Power Systems Using Particle Swarm Optimization |
Authors | Mahamad Nabab Alam |
Abstract | Metaheuristic particle swarm optimization (PSO) algorithm has emerged as one of the most promising optimization techniques in solving highly constrained non-linear and non-convex optimization problems in different areas of electrical engineering. Economic operation of the power system is one of the most important areas of electrical engineering where PSO has been used efficiently in solving various issues of practical systems. In this paper, a comprehensive survey of research works in solving various aspects of economic load dispatch (ELD) problems of power system engineering using different types of PSO algorithms is presented. Five important areas of ELD problems have been identified, and the papers published in the general area of ELD using PSO have been classified into these five sections. These five areas are (i) single objective economic load dispatch, (ii) dynamic economic load dispatch, (iii) economic load dispatch with non-conventional sources, (iv) multi-objective environmental/economic dispatch, and (v) economic load dispatch of microgrids. At the end of each category, a table is provided which describes the main features of the papers in brief. The promising future works are given at the conclusion of the review. |
Tasks | |
Published | 2018-12-30 |
URL | http://arxiv.org/abs/1812.11610v1 |
http://arxiv.org/pdf/1812.11610v1.pdf | |
PWC | https://paperswithcode.com/paper/state-of-the-art-economic-load-dispatch-of |
Repo | |
Framework | |
Machine-learned epidemiology: real-time detection of foodborne illness at scale
Title | Machine-learned epidemiology: real-time detection of foodborne illness at scale |
Authors | Adam Sadilek, Stephanie Caty, Lauren DiPrete, Raed Mansour, Tom Schenk Jr, Mark Bergtholdt, Ashish Jha, Prem Ramaswami, Evgeniy Gabrilovich |
Abstract | Machine learning has become an increasingly powerful tool for solving complex problems, and its application in public health has been underutilized. The objective of this study is to test the efficacy of a machine-learned model of foodborne illness detection in a real-world setting. To this end, we built FINDER, a machine-learned model for real-time detection of foodborne illness using anonymous and aggregated web search and location data. We computed the fraction of people who visited a particular restaurant and later searched for terms indicative of food poisoning to identify potentially unsafe restaurants. We used this information to focus restaurant inspections in two cities and demonstrated that FINDER improves the accuracy of health inspections; restaurants identified by FINDER are 3.1 times as likely to be deemed unsafe during the inspection as restaurants identified by existing methods. Additionally, FINDER enables us to ascertain previously intractable epidemiological information, for example, in 38% of cases the restaurant potentially causing food poisoning was not the last one visited, which may explain the lower precision of complaint-based inspections. We found that FINDER is able to reliably identify restaurants that have an active lapse in food safety, allowing for implementation of corrective actions that would prevent the potential spread of foodborne illness. |
Tasks | Epidemiology |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.01813v1 |
http://arxiv.org/pdf/1812.01813v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learned-epidemiology-real-time |
Repo | |
Framework | |
Model Selection Techniques – An Overview
Title | Model Selection Techniques – An Overview |
Authors | Jie Ding, Vahid Tarokh, Yuhong Yang |
Abstract | In the era of big data, analysts usually explore various statistical models or machine learning methods for observed data in order to facilitate scientific discoveries or gain predictive power. Whatever data and fitting procedures are employed, a crucial step is to select the most appropriate model or method from a set of candidates. Model selection is a key ingredient in data analysis for reliable and reproducible statistical inference or prediction, and thus central to scientific studies in fields such as ecology, economics, engineering, finance, political science, biology, and epidemiology. There has been a long history of model selection techniques that arise from researches in statistics, information theory, and signal processing. A considerable number of methods have been proposed, following different philosophies and exhibiting varying performances. The purpose of this article is to bring a comprehensive overview of them, in terms of their motivation, large sample performance, and applicability. We provide integrated and practically relevant discussions on theoretical properties of state-of- the-art model selection approaches. We also share our thoughts on some controversial views on the practice of model selection. |
Tasks | Epidemiology, Model Selection |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09583v1 |
http://arxiv.org/pdf/1810.09583v1.pdf | |
PWC | https://paperswithcode.com/paper/model-selection-techniques-an-overview |
Repo | |
Framework | |
Mechanisms for Integrated Feature Normalization and Remaining Useful Life Estimation Using LSTMs Applied to Hard-Disks
Title | Mechanisms for Integrated Feature Normalization and Remaining Useful Life Estimation Using LSTMs Applied to Hard-Disks |
Authors | Sanchita Basak, Saptarshi Sengupta, Abhishek Dubey |
Abstract | With emerging smart communities, improving overall system availability is becoming a major concern. In order to improve the reliability of the components in a system we propose an inference model to predict Remaining Useful Life (RUL) of those components. In this paper we work with components of backend data servers such as hard disks, that are subject to degradation. A Deep Long-Short Term Memory (LSTM) Network is used as the backbone of this fast, data-driven decision framework and dynamically captures the pattern of the incoming data. In the article, we discuss the architecture of the neural network and describe the mechanisms to choose the various hyper-parameters. Further, we describe the challenges faced in extracting effective training sets from highly unorganized and class-imbalanced big data and establish methods for online predictions with extensive data pre-processing, feature extraction and validation through online simulation sets with unknown remaining useful lives of the hard disks. Our algorithm performs especially well in predicting RUL near the critical zone of a device approaching failure. With the proposed approach we are able to predict whether a disk is going to fail in next ten days with an average precision of 0.8435. We also show that the architecture trained on a particular model can be used to predict RUL for devices in different models from same manufacturer through transfer learning. |
Tasks | Transfer Learning |
Published | 2018-10-21 |
URL | https://arxiv.org/abs/1810.08985v3 |
https://arxiv.org/pdf/1810.08985v3.pdf | |
PWC | https://paperswithcode.com/paper/mechanisms-for-integrated-feature |
Repo | |
Framework | |
Convolutional Recurrent Neural Networks for Glucose Prediction
Title | Convolutional Recurrent Neural Networks for Glucose Prediction |
Authors | Kezhi Li, John Daniels, Chengyuan Liu, Pau Herrero, Pantelis Georgiou |
Abstract | Control of blood glucose is essential for diabetes management. Current digital therapeutic approaches for subjects with Type 1 diabetes mellitus (T1DM) such as the artificial pancreas and insulin bolus calculators leverage machine learning techniques for predicting subcutaneous glucose for improved control. Deep learning has recently been applied in healthcare and medical research to achieve state-of-the-art results in a range of tasks including disease diagnosis, and patient state prediction among others. In this work, we present a deep learning model that is capable of forecasting glucose levels with leading accuracy for simulated patient cases (RMSE = 9.38$\pm$0.71 [mg/dL] over a 30-minute horizon, RMSE = 18.87$\pm$2.25 [mg/dL] over a 60-minute horizon) and real patient cases (RMSE = 21.07$\pm$2.35 [mg/dL] for 30-minute, RMSE = 33.27$\pm$4.79% for 60-minute). In addition, the model provides competitive performance in providing effective prediction horizon ($PH_{eff}$) with minimal time lag both in a simulated patient dataset ($PH_{eff}$ = 29.0$\pm$0.7 for 30-min and $PH_{eff}$ = 49.8$\pm$2.9 for 60-min) and in a real patient dataset ($PH_{eff}$ = 19.3$\pm$3.1 for 30-min and $PH_{eff}$ = 29.3$\pm$9.4 for 60-min). This approach is evaluated on a dataset of 10 simulated cases generated from the UVa/Padova simulator and a clinical dataset of 10 real cases each containing glucose readings, insulin bolus, and meal (carbohydrate) data. Performance of the recurrent convolutional neural network is benchmarked against four algorithms. The proposed algorithm is implemented on an Android mobile phone, with an execution time of $6$ms on a phone compared to an execution time of $780$ms on a laptop. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03043v5 |
http://arxiv.org/pdf/1807.03043v5.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-recurrent-neural-networks-for-3 |
Repo | |
Framework | |