Paper Group ANR 1013
Exploring Frame Segmentation Networks for Temporal Action Localization. Segregation Dynamics with Reinforcement Learning and Agent Based Modeling. Learning Action Models from Disordered and Noisy Plan Traces. Object-Guided Instance Segmentation for Biological Images. Learning an Adaptive Learning Rate Schedule. Shape-aware Feature Extraction for In …
Exploring Frame Segmentation Networks for Temporal Action Localization
Title | Exploring Frame Segmentation Networks for Temporal Action Localization |
Authors | Ke Yang, Xiaolong Shen, Peng Qiao, Shijie Li, Dongsheng Li, Yong Dou |
Abstract | Temporal action localization is an important task of computer vision. Though many methods have been proposed, it still remains an open question how to predict the temporal location of action segments precisely. Most state-of-the-art works train action classifiers on video segments pre-determined by action proposal. However, recent work found that a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. In this paper, we propose a Frame Segmentation Network (FSN) that places a temporal CNN on top of the 2D spatial CNNs. Spatial CNNs are responsible for abstracting semantics in spatial dimension while temporal CNN is responsible for introducing temporal context information and performing dense predictions. The proposed FSN can make dense predictions at frame-level for a video clip using both spatial and temporal context information. FSN is trained in an end-to-end manner, so the model can be optimized in spatial and temporal domain jointly. We also adapt FSN to use it in weakly supervised scenario (WFSN), where only video level labels are provided when training. Experiment results on public dataset show that FSN achieves superior performance in both frame-level action localization and temporal action localization. |
Tasks | Action Localization, Temporal Action Localization |
Published | 2019-02-14 |
URL | http://arxiv.org/abs/1902.05488v1 |
http://arxiv.org/pdf/1902.05488v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-frame-segmentation-networks-for |
Repo | |
Framework | |
Segregation Dynamics with Reinforcement Learning and Agent Based Modeling
Title | Segregation Dynamics with Reinforcement Learning and Agent Based Modeling |
Authors | Egemen Sert, Yaneer Bar-Yam, Alfredo J. Morales |
Abstract | Societies are complex. Properties of social systems can be explained by the interplay and weaving of individual actions. Incentives are key to understand people’s choices and decisions. For instance, individual preferences of where to live may lead to the emergence of social segregation. In this paper, we combine Reinforcement Learning (RL) with Agent Based Models (ABM) in order to address the self-organizing dynamics of social segregation and explore the space of possibilities that emerge from considering different types of incentives. Our model promotes the creation of interdependencies and interactions among multiple agents of two different kinds that want to segregate from each other. For this purpose, agents use Deep Q-Networks to make decisions based on the rules of the Schelling Segregation model and the Predator-Prey model. Despite the segregation incentive, our experiments show that spatial integration can be achieved by establishing interdependencies among agents of different kinds. They also reveal that segregated areas are more probable to host older people than diverse areas, which attract younger ones. Through this work, we show that the combination of RL and ABMs can create an artificial environment for policy makers to observe potential and existing behaviors associated to incentives. |
Tasks | |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08711v1 |
https://arxiv.org/pdf/1909.08711v1.pdf | |
PWC | https://paperswithcode.com/paper/segregation-dynamics-with-reinforcement |
Repo | |
Framework | |
Learning Action Models from Disordered and Noisy Plan Traces
Title | Learning Action Models from Disordered and Noisy Plan Traces |
Authors | Hankz Hankui Zhuo, Jing Peng, Subbarao Kambhampati |
Abstract | There is increasing awareness in the planning community that the burden of specifying complete domain models is too high, which impedes the applicability of planning technology in many real-world domains. Although there have many learning systems that help automatically learning domain models, most existing work assumes that the input traces are completely correct. A more realistic situation is that the plan traces are disordered and noisy, such as plan traces described by natural language. In this paper we propose and evaluate an approach for doing this. Our approach takes as input a set of plan traces with disordered actions and noise and outputs action models that can best explain the plan traces. We use a MAX-SAT framework for learning, where the constraints are derived from the given plan traces. Unlike traditional action models learners, the states in plan traces can be partially observable and noisy as well as the actions in plan traces can be disordered and parallel. We demonstrate the effectiveness of our approach through a systematic empirical evaluation with both IPC domains and the real-world dataset extracted from natural language documents. |
Tasks | |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09800v2 |
https://arxiv.org/pdf/1908.09800v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-action-models-from-disordered-and |
Repo | |
Framework | |
Object-Guided Instance Segmentation for Biological Images
Title | Object-Guided Instance Segmentation for Biological Images |
Authors | Jingru Yi, Hui Tang, Pengxiang Wu, Bo Liu, Daniel J. Hoeppner, Dimitris N. Metaxas, Lianyi Han, Wei Fan |
Abstract | Instance segmentation of biological images is essential for studying object behaviors and properties. The challenges, such as clustering, occlusion, and adhesion problems of the objects, make instance segmentation a non-trivial task. Current box-free instance segmentation methods typically rely on local pixel-level information. Due to a lack of global object view, these methods are prone to over- or under-segmentation. On the contrary, the box-based instance segmentation methods incorporate object detection into the segmentation, performing better in identifying the individual instances. In this paper, we propose a new box-based instance segmentation method. Mainly, we locate the object bounding boxes from their center points. The object features are subsequently reused in the segmentation branch as a guide to separate the clustered instances within an RoI patch. Along with the instance normalization, the model is able to recover the target object distribution and suppress the distribution of neighboring attached objects. Consequently, the proposed model performs excellently in segmenting the clustered objects while retaining the target object details. The proposed method achieves state-of-the-art performances on three biological datasets: cell nuclei, plant phenotyping dataset, and neural cells. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.09199v1 |
https://arxiv.org/pdf/1911.09199v1.pdf | |
PWC | https://paperswithcode.com/paper/object-guided-instance-segmentation-for |
Repo | |
Framework | |
Learning an Adaptive Learning Rate Schedule
Title | Learning an Adaptive Learning Rate Schedule |
Authors | Zhen Xu, Andrew M. Dai, Jonas Kemp, Luke Metz |
Abstract | The learning rate is one of the most important hyper-parameters for model training and generalization. However, current hand-designed parametric learning rate schedules offer limited flexibility and the predefined schedule may not match the training dynamics of high dimensional and non-convex optimization problems. In this paper, we propose a reinforcement learning based framework that can automatically learn an adaptive learning rate schedule by leveraging the information from past training histories. The learning rate dynamically changes based on the current training dynamics. To validate this framework, we conduct experiments with different neural network architectures on the Fashion MINIST and CIFAR10 datasets. Experimental results show that the auto-learned learning rate controller can achieve better test results. In addition, the trained controller network is generalizable – able to be trained on one data set and transferred to new problems. |
Tasks | |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09712v1 |
https://arxiv.org/pdf/1909.09712v1.pdf | |
PWC | https://paperswithcode.com/paper/190909712 |
Repo | |
Framework | |
Shape-aware Feature Extraction for Instance Segmentation
Title | Shape-aware Feature Extraction for Instance Segmentation |
Authors | Hao Ding, Siyuan Qiao, Wei Shen, Alan Yuille |
Abstract | Modern instance segmentation approaches mainly adopt a sequential paradigm - ``detect then segment’', as popularized by Mask R-CNN, which have achieved considerable progress. However, they usually struggle to segment huddled instances, i.e., instances which are crowded together. The essential reason is the detection step is only learned under box-level supervision. Without the guidance from the mask-level supervision, the features extracted from the regions containing huddled instances are noisy and ambiguous, which makes the detection problem ill-posed. To address this issue, we propose a new region-of-interest (RoI) feature extraction strategy, named Shape-aware RoIAlign, which focuses feature extraction within a region aligned well with the shape of the instance-of-interest rather than a rectangular RoI. We instantiate Shape-aware RoIAlign by introducing a novel refining module built upon Mask R-CNN, which takes the mask predicted by Mask R-CNN as the region to guide the computation of Shape-aware RoIAlign. Based on the RoI features re-computed by Shape-aware RoIAlign, the refining module updates the bounding box as well as the mask predicted by Mask R-CNN. Experimental results show that the refining module equipped with Shape-aware RoIAlign achieves consistent and remarkable improvements than Mask R-CNN models with different backbones, respectively, on the challenging COCO dataset. The code will be released. | |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11263v1 |
https://arxiv.org/pdf/1911.11263v1.pdf | |
PWC | https://paperswithcode.com/paper/shape-aware-feature-extraction-for-instance |
Repo | |
Framework | |
Equalization Loss for Large Vocabulary Instance Segmentation
Title | Equalization Loss for Large Vocabulary Instance Segmentation |
Authors | Jingru Tan, Changbao Wang, Quanquan Li, Junjie Yan |
Abstract | Recent object detection and instance segmentation tasks mainly focus on datasets with a relatively small set of categories, e.g. Pascal VOC with 20 classes and COCO with 80 classes. The new large vocabulary dataset LVIS brings new challenges to conventional methods. In this work, we propose an equalization loss to solve the long tail of rare categories problem. Combined with exploiting the data from detection datasets to alleviate the effect of missing-annotation problems during the training, our method achieves 5.1% overall AP gain and 11.4% AP gain of rare categories on LVIS benchmark without any bells and whistles compared to Mask R-CNN baseline. Finally we achieve 28.9 mask AP on the test-set of the LVIS and rank 1st place in LVIS Challenge 2019. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04692v1 |
https://arxiv.org/pdf/1911.04692v1.pdf | |
PWC | https://paperswithcode.com/paper/equalization-loss-for-large-vocabulary |
Repo | |
Framework | |
Communication-Efficient Local Decentralized SGD Methods
Title | Communication-Efficient Local Decentralized SGD Methods |
Authors | Xiang Li, Wenhao Yang, Shusen Wang, Zhihua Zhang |
Abstract | Recently, the technique of local updates is a powerful tool in centralized settings to improve communication efficiency via periodical communication. For decentralized settings, it is still unclear how to efficiently combine local updates and decentralized communication. In this work, we propose an algorithm named as LD-SGD, which incorporates arbitrary update schemes that alternate between multiple Local updates and multiple Decentralized SGDs, and provide an analytical framework for LD-SGD. Under the framework, we present a sufficient condition to guarantee the convergence. We show that LD-SGD converges to a critical point for a wide range of update schemes when the objective is non-convex and the training data are non-identically independent distributed. Moreover, our framework brings many insights into the design of update schemes for decentralized optimization. As examples, we specify two update schemes and show how they help improve communication efficiency. Specifically, the first scheme alternates the number of local and global update steps. From our analysis, the ratio of the number of local updates to that of decentralized SGD trades off communication and computation. The second scheme is to periodically shrink the length of local updates. We show that the decaying strategy helps improve communication efficiency both theoretically and empirically. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09126v3 |
https://arxiv.org/pdf/1910.09126v3.pdf | |
PWC | https://paperswithcode.com/paper/communication-efficient-decentralized |
Repo | |
Framework | |
Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models
Title | Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models |
Authors | Xin He, Shihao Wang, Shaohuai Shi, Zhenheng Tang, Yuxin Wang, Zhihao Zhao, Jing Dai, Ronghao Ni, Xiaofeng Zhang, Xiaoming Liu, Zhili Wu, Wu Yu, Xiaowen Chu |
Abstract | Skin disease is one of the most common types of human diseases, which may happen to everyone regardless of age, gender or race. Due to the high visual diversity, human diagnosis highly relies on personal experience; and there is a serious shortage of experienced dermatologists in many countries. To alleviate this problem, computer-aided diagnosis with state-of-the-art (SOTA) machine learning techniques would be a promising solution. In this paper, we aim at understanding the performance of convolutional neural network (CNN) based approaches. We first build two versions of skin disease datasets from Internet images: (a) Skin-10, which contains 10 common classes of skin disease with a total of 10,218 images; (b) Skin-100, which is a larger dataset that consists of 19,807 images of 100 skin disease classes. Based on these datasets, we benchmark several SOTA CNN models and show that the accuracy of skin-100 is much lower than the accuracy of skin-10. We then implement an ensemble method based on several CNN models and achieve the best accuracy of 79.01% for Skin-10 and 53.54% for Skin-100. We also present an object detection based approach by introducing bounding boxes into the Skin-10 dataset. Our results show that object detection can help improve the accuracy of some skin disease classes. |
Tasks | Object Detection |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08705v1 |
https://arxiv.org/pdf/1911.08705v1.pdf | |
PWC | https://paperswithcode.com/paper/computer-aided-clinical-skin-disease |
Repo | |
Framework | |
Automated Computer Evaluation of Acute Ischemic Stroke and Large Vessel Occlusion
Title | Automated Computer Evaluation of Acute Ischemic Stroke and Large Vessel Occlusion |
Authors | Jia You, Philip L. H. Yu, Anderson C. O. Tsang, Eva L. H. Tsui, Pauline P. S. Woo, Gilberto K. K. Leung |
Abstract | Large vessel occlusion (LVO) plays an important role in the diagnosis of acute ischemic stroke. Identifying LVO of patients in the early stage on admission would significantly lower the probabilities of suffering from severe effects due to stroke or even save their lives. In this paper, we utilized both structural and imaging data from all recorded acute ischemic stroke patients in Hong Kong. Total 300 patients (200 training and 100 testing) are used in this study. We established three hierarchical models based on demographic data, clinical data and features obtained from computerized tomography (CT) scans. The first two stages of modeling are merely based on demographic and clinical data. Besides, the third model utilized extra CT imaging features obtained from deep learning model. The optimal cutoff is determined at the maximal Youden index based on 10-fold cross-validation. With both clinical and imaging features, the Level-3 model achieved the best performance on testing data. The sensitivity, specificity, Youden index, accuracy and area under the curve (AUC) are 0.930, 0.684, 0.614, 0.790 and 0.850 respectively. |
Tasks | |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.08059v1 |
https://arxiv.org/pdf/1906.08059v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-computer-evaluation-of-acute |
Repo | |
Framework | |
Selecting Artificially-Generated Sentences for Fine-Tuning Neural Machine Translation
Title | Selecting Artificially-Generated Sentences for Fine-Tuning Neural Machine Translation |
Authors | Alberto Poncelas, Andy Way |
Abstract | Neural Machine Translation (NMT) models tend to achieve best performance when larger sets of parallel sentences are provided for training. For this reason, augmenting the training set with artificially-generated sentence pairs can boost performance. Nonetheless, the performance can also be improved with a small number of sentences if they are in the same domain as the test set. Accordingly, we want to explore the use of artificially-generated sentences along with data-selection algorithms to improve German-to-English NMT models trained solely with authentic data. In this work, we show how artificially-generated sentences can be more beneficial than authentic pairs, and demonstrate their advantages when used in combination with data-selection algorithms. |
Tasks | Machine Translation |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12016v1 |
https://arxiv.org/pdf/1909.12016v1.pdf | |
PWC | https://paperswithcode.com/paper/selecting-artificially-generated-sentences |
Repo | |
Framework | |
The Extended Kalman Filter is a Natural Gradient Descent in Trajectory Space
Title | The Extended Kalman Filter is a Natural Gradient Descent in Trajectory Space |
Authors | Yann Ollivier |
Abstract | The extended Kalman filter is perhaps the most standard tool to estimate in real time the state of a dynamical system from noisy measurements of some function of the system, with extensive practical applications (such as position tracking via GPS). While the plain Kalman filter for linear systems is well-understood, the extended Kalman filter relies on linearizations which have been debated. We recover the exact extended Kalman filter equations from first principles in statistical learning: the extended Kalman filter is equal to Amari’s online natural gradient, applied in the space of trajectories of the system. Namely, each possible trajectory of the dynamical system defines a probability law over possible observations. In principle this makes it possible to treat the underlying trajectory as the parameter of a statistical model of the observations. Then the parameter can be learned by gradient ascent on the log-likelihood of observations, as they become available. Using Amari’s natural gradient from information geometry (a gradient descent preconditioned with the Fisher matrix, which provides parameterization-invariance) exactly recovers the extended Kalman filter. This applies only to a particular choice of process noise in the Kalman filter, namely, taking noise proportional to the posterior covariance - a canonical choice in the absence of specific model information. |
Tasks | |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00696v1 |
http://arxiv.org/pdf/1901.00696v1.pdf | |
PWC | https://paperswithcode.com/paper/the-extended-kalman-filter-is-a-natural |
Repo | |
Framework | |
Towards Understanding and Modeling Empathy for Use in Motivational Design Thinking
Title | Towards Understanding and Modeling Empathy for Use in Motivational Design Thinking |
Authors | Gloria Washington, Rouzbeh Shirvani |
Abstract | Design Thinking workshops are used by companies to help generate new ideas for technologies and products by engaging subjects in exercises to understand their users’ wants and become more empathetic towards their needs. The “aha moment” experienced during these thought-provoking, step outside the yourself activities occurs when a group of persons iterate over several problems and converge upon a solution that will fit seamlessly everyday life. With the increasing use and cost of Design workshops being offered, it is important that technology be developed that can help identify empathy and its onset in humans. This position paper presents an approach to modeling empathy using Gaussian mixture models and heart rate and skin conductance. This paper also presents an updated approach to Design Thinking that helps to ensure participants are thinking outside of their own race’s, culture’s, or other affiliations’ motives. |
Tasks | |
Published | 2019-07-28 |
URL | https://arxiv.org/abs/1907.12001v1 |
https://arxiv.org/pdf/1907.12001v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-understanding-and-modeling-empathy |
Repo | |
Framework | |
Adversarial Deep Embedded Clustering: on a better trade-off between Feature Randomness and Feature Drift
Title | Adversarial Deep Embedded Clustering: on a better trade-off between Feature Randomness and Feature Drift |
Authors | Nairouz Mrabah, Mohamed Bouguessa, Riadh Ksantini |
Abstract | Clustering using deep autoencoders has been thoroughly investigated in recent years. Current approaches rely on simultaneously learning embedded features and clustering the data points in the latent space. Although numerous deep clustering approaches outperform the shallow models in achieving favorable results on several high-semantic datasets, a critical weakness of such models has been overlooked. In the absence of concrete supervisory signals, the embedded clustering objective function may distort the latent space by learning from unreliable pseudo-labels. Thus, the network can learn non-representative features, which in turn undermines the discriminative ability, yielding worse pseudo-labels. In order to alleviate the effect of random discriminative features, modern autoencoder-based clustering papers propose to use the reconstruction loss for pretraining and as a regularizer during the clustering phase. Nevertheless, a clustering-reconstruction trade-off can cause the \textit{Feature Drift} phenomena. In this paper, we propose ADEC (Adversarial Deep Embedded Clustering) a novel autoencoder-based clustering model, which addresses a dual problem, namely, \textit{Feature Randomness} and \textit{Feature Drift}, using adversarial training. We empirically demonstrate the suitability of our model on handling these problems using benchmark real datasets. Experimental results validate that our model outperforms state-of-the-art autoencoder-based clustering methods. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11832v1 |
https://arxiv.org/pdf/1909.11832v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-deep-embedded-clustering-on-a |
Repo | |
Framework | |
Automatic techniques for cochlear implant CT image analysis
Title | Automatic techniques for cochlear implant CT image analysis |
Authors | Yiyuan Zhao |
Abstract | The goals of this dissertation are to fully automate the image processing techniques needed in the post-operative stage of IGCIP and to perform a thorough analysis of (a) the robustness of the automatic image processing techniques used in IGCIP and (b) assess the sensitivity of the IGCIP process as a whole to individual components. The automatic methods that have been developed include the automatic localization of both closely- and distantly-spaced CI electrode arrays in post-implantation CTs and the automatic selection of electrode configurations based on the stimulation patterns. Together with the existing automatic techniques developed for IGCIP, the proposed automatic methods enable an end-to-end IGCIP process that takes pre- and post-implantation CT images as input and produces a patient-customized electrode configuration as output. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10922v1 |
https://arxiv.org/pdf/1909.10922v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-techniques-for-cochlear-implant-ct |
Repo | |
Framework | |