Paper Group ANR 1013
A Comprehensive Survey of Deep Learning for Image Captioning. Understanding and Improving Recurrent Networks for Human Activity Recognition by Continuous Attention. Arianna+: Scalable Human Activity Recognition by Reasoning with a Network of Ontologies. Linear Span Network for Object Skeleton Detection. State-space analysis of an Ising model reveal …
A Comprehensive Survey of Deep Learning for Image Captioning
Title | A Comprehensive Survey of Deep Learning for Image Captioning |
Authors | Md. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga |
Abstract | Generating a description of an image is called image captioning. Image captioning requires to recognize the important objects, their attributes and their relationships in an image. It also needs to generate syntactically and semantically correct sentences. Deep learning-based techniques are capable of handling the complexities and challenges of image captioning. In this survey paper, we aim to present a comprehensive review of existing deep learning-based image captioning techniques. We discuss the foundation of the techniques to analyze their performances, strengths and limitations. We also discuss the datasets and the evaluation metrics popularly used in deep learning based automatic image captioning. |
Tasks | Image Captioning |
Published | 2018-10-06 |
URL | http://arxiv.org/abs/1810.04020v2 |
http://arxiv.org/pdf/1810.04020v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comprehensive-survey-of-deep-learning-for |
Repo | |
Framework | |
Understanding and Improving Recurrent Networks for Human Activity Recognition by Continuous Attention
Title | Understanding and Improving Recurrent Networks for Human Activity Recognition by Continuous Attention |
Authors | Ming Zeng, Haoxiang Gao, Tong Yu, Ole J. Mengshoel, Helge Langseth, Ian Lane, Xiaobing Liu |
Abstract | Deep neural networks, including recurrent networks, have been successfully applied to human activity recognition. Unfortunately, the final representation learned by recurrent networks might encode some noise (irrelevant signal components, unimportant sensor modalities, etc.). Besides, it is difficult to interpret the recurrent networks to gain insight into the models’ behavior. To address these issues, we propose two attention models for human activity recognition: temporal attention and sensor attention. These two mechanisms adaptively focus on important signals and sensor modalities. To further improve the understandability and mean F1 score, we add continuity constraints, considering that continuous sensor signals are more robust than discrete ones. We evaluate the approaches on three datasets and obtain state-of-the-art results. Furthermore, qualitative analysis shows that the attention learned by the models agree well with human intuition. |
Tasks | Activity Recognition, Human Activity Recognition |
Published | 2018-10-07 |
URL | http://arxiv.org/abs/1810.04038v1 |
http://arxiv.org/pdf/1810.04038v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-and-improving-recurrent |
Repo | |
Framework | |
Arianna+: Scalable Human Activity Recognition by Reasoning with a Network of Ontologies
Title | Arianna+: Scalable Human Activity Recognition by Reasoning with a Network of Ontologies |
Authors | Syed Yusha Kareem, Luca Buoncompagni, Fulvio Mastrogiovanni |
Abstract | Aging population ratios are rising significantly. Meanwhile, smart home based health monitoring services are evolving rapidly to become a viable alternative to traditional healthcare solutions. Such services can augment qualitative analyses done by gerontologists with quantitative data. Hence, the recognition of Activities of Daily Living (ADL) has become an active domain of research in recent times. For a system to perform human activity recognition in a real-world environment, multiple requirements exist, such as scalability, robustness, ability to deal with uncertainty (e.g., missing sensor data), to operate with multi-occupants and to take into account their privacy and security. This paper attempts to address the requirements of scalability and robustness, by describing a reasoning mechanism based on modular spatial and/or temporal context models as a network of ontologies. The reasoning mechanism has been implemented in a smart home system referred to as Arianna+. The paper presents and discusses a use case, and experiments are performed on a simulated dataset, to showcase Arianna+'s modularity feature, internal working, and computational performance. Results indicate scalability and robustness for human activity recognition processes. |
Tasks | Activity Recognition, Human Activity Recognition |
Published | 2018-09-21 |
URL | http://arxiv.org/abs/1809.08208v1 |
http://arxiv.org/pdf/1809.08208v1.pdf | |
PWC | https://paperswithcode.com/paper/arianna-scalable-human-activity-recognition |
Repo | |
Framework | |
Linear Span Network for Object Skeleton Detection
Title | Linear Span Network for Object Skeleton Detection |
Authors | Chang Liu, Wei Ke, Fei Qin, Qixiang Ye |
Abstract | Robust object skeleton detection requires to explore rich representative visual features and effective feature fusion strategies. In this paper, we first re-visit the implementation of HED, the essential principle of which can be ideally described with a linear reconstruction model. Hinted by this, we formalize a Linear Span framework, and propose Linear Span Network (LSN) modified by Linear Span Units (LSUs), which minimize the reconstruction error of convolutional network. LSN further utilizes subspace linear span beside the feature linear span to increase the independence of convolutional features and the efficiency of feature integration, which enlarges the capability of fitting complex ground-truth. As a result, LSN can effectively suppress the cluttered backgrounds and reconstruct object skeletons. Experimental results validate the state-of-the-art performance of the proposed LSN. |
Tasks | Object Skeleton Detection |
Published | 2018-07-25 |
URL | http://arxiv.org/abs/1807.09601v1 |
http://arxiv.org/pdf/1807.09601v1.pdf | |
PWC | https://paperswithcode.com/paper/linear-span-network-for-object-skeleton |
Repo | |
Framework | |
State-space analysis of an Ising model reveals contributions of pairwise interactions to sparseness, fluctuation, and stimulus coding of monkey V1 neurons
Title | State-space analysis of an Ising model reveals contributions of pairwise interactions to sparseness, fluctuation, and stimulus coding of monkey V1 neurons |
Authors | Jimmy Gaudreault, Hideaki Shimazaki |
Abstract | In this study, we analyzed the activity of monkey V1 neurons responding to grating stimuli of different orientations using inference methods for a time-dependent Ising model. The method provides optimal estimation of time-dependent neural interactions with credible intervals according to the sequential Bayes estimation algorithm. Furthermore, it allows us to trace dynamics of macroscopic network properties such as entropy, sparseness, and fluctuation. Here we report that, in all examined stimulus conditions, pairwise interactions contribute to increasing sparseness and fluctuation. We then demonstrate that the orientation of the grating stimulus is in part encoded in the pairwise interactions of the neural populations. These results demonstrate the utility of the state-space Ising model in assessing contributions of neural interactions during stimulus processing. |
Tasks | |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.08900v1 |
http://arxiv.org/pdf/1807.08900v1.pdf | |
PWC | https://paperswithcode.com/paper/state-space-analysis-of-an-ising-model |
Repo | |
Framework | |
Bootstrapping Multilingual Intent Models via Machine Translation for Dialog Automation
Title | Bootstrapping Multilingual Intent Models via Machine Translation for Dialog Automation |
Authors | Nicholas Ruiz, Srinivas Bangalore, John Chen |
Abstract | With the resurgence of chat-based dialog systems in consumer and enterprise applications, there has been much success in developing data-driven and rule-based natural language models to understand human intent. Since these models require large amounts of data and in-domain knowledge, expanding an equivalent service into new markets is disrupted by language barriers that inhibit dialog automation. This paper presents a user study to evaluate the utility of out-of-the-box machine translation technology to (1) rapidly bootstrap multilingual spoken dialog systems and (2) enable existing human analysts to understand foreign language utterances. We additionally evaluate the utility of machine translation in human assisted environments, where a portion of the traffic is processed by analysts. In English->Spanish experiments, we observe a high potential for dialog automation, as well as the potential for human analysts to process foreign language utterances with high accuracy. |
Tasks | Machine Translation |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04453v1 |
http://arxiv.org/pdf/1805.04453v1.pdf | |
PWC | https://paperswithcode.com/paper/bootstrapping-multilingual-intent-models-via |
Repo | |
Framework | |
Invariants of multidimensional time series based on their iterated-integral signature
Title | Invariants of multidimensional time series based on their iterated-integral signature |
Authors | Joscha Diehl, Jeremy Reizenstein |
Abstract | We introduce a novel class of features for multidimensional time series, that are invariant with respect to transformations of the ambient space. The general linear group, the group of rotations and the group of permutations of the axes are considered. The starting point for their construction is Chen’s iterated-integral signature. |
Tasks | Time Series |
Published | 2018-01-18 |
URL | http://arxiv.org/abs/1801.06104v2 |
http://arxiv.org/pdf/1801.06104v2.pdf | |
PWC | https://paperswithcode.com/paper/invariants-of-multidimensional-time-series |
Repo | |
Framework | |
3D FCN Feature Driven Regression Forest-Based Pancreas Localization and Segmentation
Title | 3D FCN Feature Driven Regression Forest-Based Pancreas Localization and Segmentation |
Authors | Masahiro Oda, Natsuki Shimizu, Holger R. Roth, Ken’ichi Karasawa, Takayuki Kitasaka, Kazunari Misawa, Michitaka Fujiwara, Daniel Rueckert, Kensaku Mori |
Abstract | This paper presents a fully automated atlas-based pancreas segmentation method from CT volumes utilizing 3D fully convolutional network (FCN) feature-based pancreas localization. Segmentation of the pancreas is difficult because it has larger inter-patient spatial variations than other organs. Previous pancreas segmentation methods failed to deal with such variations. We propose a fully automated pancreas segmentation method that contains novel localization and segmentation. Since the pancreas neighbors many other organs, its position and size are strongly related to the positions of the surrounding organs. We estimate the position and the size of the pancreas (localized) from global features by regression forests. As global features, we use intensity differences and 3D FCN deep learned features, which include automatically extracted essential features for segmentation. We chose 3D FCN features from a trained 3D U-Net, which is trained to perform multi-organ segmentation. The global features include both the pancreas and surrounding organ information. After localization, a patient-specific probabilistic atlas-based pancreas segmentation is performed. In evaluation results with 146 CT volumes, we achieved 60.6% of the Jaccard index and 73.9% of the Dice overlap. |
Tasks | Automated Pancreas Segmentation, Pancreas Segmentation |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.03019v1 |
http://arxiv.org/pdf/1806.03019v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-fcn-feature-driven-regression-forest-based |
Repo | |
Framework | |
Tighter Variational Bounds are Not Necessarily Better
Title | Tighter Variational Bounds are Not Necessarily Better |
Authors | Tom Rainforth, Adam R. Kosiorek, Tuan Anh Le, Chris J. Maddison, Maximilian Igl, Frank Wood, Yee Whye Teh |
Abstract | We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04537v3 |
http://arxiv.org/pdf/1802.04537v3.pdf | |
PWC | https://paperswithcode.com/paper/tighter-variational-bounds-are-not |
Repo | |
Framework | |
Multi-Class Lesion Diagnosis with Pixel-wise Classification Network
Title | Multi-Class Lesion Diagnosis with Pixel-wise Classification Network |
Authors | Manu Goyal, Jiahua Ng, Moi Hoon Yap |
Abstract | Lesion diagnosis of skin lesions is a very challenging task due to high inter-class similarities and intra-class variations in terms of color, size, site and appearance among different skin lesions. With the emergence of computer vision especially deep learning algorithms, lesion diagnosis is made possible using these algorithms trained on dermoscopic images. Usually, deep classification networks are used for the lesion diagnosis to determine different types of skin lesions. In this work, we used pixel-wise classification network to provide lesion diagnosis rather than classification network. We propose to use DeeplabV3+ for multi-class lesion diagnosis in dermoscopic images of Task 3 of ISIC Challenge 2018. We used various post-processing methods with DeeplabV3+ to determine the lesion diagnosis in this challenge and submitted the test results. |
Tasks | |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.09227v1 |
http://arxiv.org/pdf/1807.09227v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-class-lesion-diagnosis-with-pixel-wise |
Repo | |
Framework | |
DFTerNet: Towards 2-bit Dynamic Fusion Networks for Accurate Human Activity Recognition
Title | DFTerNet: Towards 2-bit Dynamic Fusion Networks for Accurate Human Activity Recognition |
Authors | Zhan Yang, Osolo Ian Raymond, ChengYuan Zhang, Ying Wan, Jun Long |
Abstract | Deep Convolutional Neural Networks (DCNNs) are currently popular in human activity recognition applications. However, in the face of modern artificial intelligence sensor-based games, many research achievements cannot be practically applied on portable devices. DCNNs are typically resource-intensive and too large to be deployed on portable devices, thus this limits the practical application of complex activity detection. In addition, since portable devices do not possess high-performance Graphic Processing Units (GPUs), there is hardly any improvement in Action Game (ACT) experience. Besides, in order to deal with multi-sensor collaboration, all previous human activity recognition models typically treated the representations from different sensor signal sources equally. However, distinct types of activities should adopt different fusion strategies. In this paper, a novel scheme is proposed. This scheme is used to train 2-bit Convolutional Neural Networks with weights and activations constrained to {-0.5,0,0.5}. It takes into account the correlation between different sensor signal sources and the activity types. This model, which we refer to as DFTerNet, aims at producing a more reliable inference and better trade-offs for practical applications. Our basic idea is to exploit quantization of weights and activations directly in pre-trained filter banks and adopt dynamic fusion strategies for different activity types. Experiments demonstrate that by using dynamic fusion strategy can exceed the baseline model performance by up to ~5% on activity recognition like OPPORTUNITY and PAMAP2 datasets. Using the quantization method proposed, we were able to achieve performances closer to that of full-precision counterpart. These results were also verified using the UniMiB-SHAR dataset. In addition, the proposed method can achieve ~9x acceleration on CPUs and ~11x memory saving. |
Tasks | Action Detection, Activity Detection, Activity Recognition, Human Activity Recognition, Quantization |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1808.04228v2 |
http://arxiv.org/pdf/1808.04228v2.pdf | |
PWC | https://paperswithcode.com/paper/dfternet-towards-2-bit-dynamic-fusion |
Repo | |
Framework | |
Deep Transfer Learning for Cross-domain Activity Recognition
Title | Deep Transfer Learning for Cross-domain Activity Recognition |
Authors | Jindong Wang, Vincent W. Zheng, Yiqiang Chen, Meiyu Huang |
Abstract | Human activity recognition plays an important role in people’s daily life. However, it is often expensive and time-consuming to acquire sufficient labeled activity data. To solve this problem, transfer learning leverages the labeled samples from the source domain to annotate the target domain which has few or none labels. Unfortunately, when there are several source domains available, it is difficult to select the right source domains for transfer. The right source domain means that it has the most similar properties with the target domain, thus their similarity is higher, which can facilitate transfer learning. Choosing the right source domain helps the algorithm perform well and prevents the negative transfer. In this paper, we propose an effective Unsupervised Source Selection algorithm for Activity Recognition (USSAR). USSAR is able to select the most similar $K$ source domains from a list of available domains. After this, we propose an effective Transfer Neural Network to perform knowledge transfer for Activity Recognition (TNNAR). TNNAR could capture both the time and spatial relationship between activities while transferring knowledge. Experiments on three public activity recognition datasets demonstrate that: 1) The USSAR algorithm is effective in selecting the best source domains. 2) The TNNAR method can reach high accuracy when performing activity knowledge transfer. |
Tasks | Activity Recognition, Cross-Domain Activity Recognition, Human Activity Recognition, Transfer Learning |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07963v2 |
http://arxiv.org/pdf/1807.07963v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-transfer-learning-for-cross-domain |
Repo | |
Framework | |
Signature moments to characterize laws of stochastic processes
Title | Signature moments to characterize laws of stochastic processes |
Authors | Ilya Chevyrev, Harald Oberhauser |
Abstract | The normalized sequence of moments characterizes the law of any finite-dimensional random variable. We prove an analogous result for path-valued random variables, that is stochastic processes, by using the normalized sequence of signature moments. We use this to define a metric for laws of stochastic processes. This metric can be efficiently estimated from finite samples, even if the stochastic processes themselves evolve in high-dimensional state spaces. As an application, we provide a non-parametric two-sample hypothesis test for laws of stochastic processes. |
Tasks | |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10971v1 |
http://arxiv.org/pdf/1810.10971v1.pdf | |
PWC | https://paperswithcode.com/paper/signature-moments-to-characterize-laws-of |
Repo | |
Framework | |
DeepSymmetry : Using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures
Title | DeepSymmetry : Using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures |
Authors | Guillaume Pagès, Sergei Grudinin |
Abstract | Motivation: Thanks to the recent advances in structural biology, nowadays three-dimensional structures of various proteins are solved on a routine basis. A large portion of these contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. Results: We present DeepSymmetry, a versatile method based on three-dimensional (3D) convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order, and also the corresponding symmetry axes. Detection of symmetry axes is based on learning six-dimensional Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem repeated proteins and also with symmetrical assemblies. For example, we have discovered over 10,000 putative tandem repeat proteins that are not currently present in the RepeatsDB database. Availability: The method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. |
Tasks | |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12026v1 |
http://arxiv.org/pdf/1810.12026v1.pdf | |
PWC | https://paperswithcode.com/paper/deepsymmetry-using-3d-convolutional-networks |
Repo | |
Framework | |
The Benefits of Population Diversity in Evolutionary Algorithms: A Survey of Rigorous Runtime Analyses
Title | The Benefits of Population Diversity in Evolutionary Algorithms: A Survey of Rigorous Runtime Analyses |
Authors | Dirk Sudholt |
Abstract | Population diversity is crucial in evolutionary algorithms to enable global exploration and to avoid poor performance due to premature convergence. This book chapter reviews runtime analyses that have shown benefits of population diversity, either through explicit diversity mechanisms or through naturally emerging diversity. These works show that the benefits of diversity are manifold: diversity is important for global exploration and the ability to find several global optima. Diversity enhances crossover and enables crossover to be more effective than mutation. Diversity can be crucial in dynamic optimization, when the problem landscape changes over time. And, finally, it facilitates search for the whole Pareto front in evolutionary multiobjective optimization. The presented analyses rigorously quantify the performance of evolutionary algorithms in the light of population diversity, laying the foundation for a rigorous understanding of how search dynamics are affected by the presence or absence of population diversity and the introduction of diversity mechanisms. |
Tasks | Multiobjective Optimization |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.10087v1 |
http://arxiv.org/pdf/1801.10087v1.pdf | |
PWC | https://paperswithcode.com/paper/the-benefits-of-population-diversity-in |
Repo | |
Framework | |