January 29, 2020

3183 words 15 mins read

Paper Group ANR 741

Multitask Learning of Temporal Connectionism in Convolutional Networks using a Joint Distribution Loss Function to Simultaneously Identify Tools and Phase in Surgical Videos. Extensions of Generic DOL for Generic Ontology Design Patterns. Machine Learning-based Signal Detection for PMH Signals in Load-modulated MIMO System. Recurrent Existence Dete …

Multitask Learning of Temporal Connectionism in Convolutional Networks using a Joint Distribution Loss Function to Simultaneously Identify Tools and Phase in Surgical Videos


Title	Multitask Learning of Temporal Connectionism in Convolutional Networks using a Joint Distribution Loss Function to Simultaneously Identify Tools and Phase in Surgical Videos
Authors	Shanka Subhra Mondal, Rachana Sathish, Debdoot Sheet
Abstract	Surgical workflow analysis is of importance for understanding onset and persistence of surgical phases and individual tool usage across surgery and in each phase. It is beneficial for clinical quality control and to hospital administrators for understanding surgery planning. Video acquired during surgery typically can be leveraged for this task. Currently, a combination of convolutional neural network (CNN) and recurrent neural networks (RNN) are popularly used for video analysis in general, not only being restricted to surgical videos. In this paper, we propose a multi-task learning framework using CNN followed by a bi-directional long short term memory (Bi-LSTM) to learn to encapsulate both forward and backward temporal dependencies. Further, the joint distribution indicating set of tools associated with a phase is used as an additional loss during learning to correct for their co-occurrence in any predictions. Experimental evaluation is performed using the Cholec80 dataset. We report a mean average precision (mAP) score of 0.99 and 0.86 for tool and phase identification respectively which are higher compared to prior-art in the field.
Tasks	Multi-Task Learning
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08315v2
PDF	https://arxiv.org/pdf/1905.08315v2.pdf
PWC	https://paperswithcode.com/paper/multitask-learning-of-temporal-connectionism
Repo
Framework

Extensions of Generic DOL for Generic Ontology Design Patterns


Title	Extensions of Generic DOL for Generic Ontology Design Patterns
Authors	Mihai Codescu, Bernd Krieg-Brückner, Till Mossakowski
Abstract	Generic ontologies were introduced as an extension (Generic DOL) of the Distributed Ontology, Modeling and Specification Language, DOL, with the aim to provide a language for Generic Ontology Design Patterns. In this paper we present a number of new language constructs that increase the expressivity and the generality of Generic DOL, among them sequential and optional parameters, list parameters with recursion, and local sub-patterns. These are illustrated with non-trivial patterns: generic value sets and (nested) qualitatively graded relations, demonstrated as definitional building blocks in an application domain.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06275v1
PDF	https://arxiv.org/pdf/1906.06275v1.pdf
PWC	https://paperswithcode.com/paper/extensions-of-generic-dol-for-generic
Repo
Framework

Machine Learning-based Signal Detection for PMH Signals in Load-modulated MIMO System


Title	Machine Learning-based Signal Detection for PMH Signals in Load-modulated MIMO System
Authors	Jinle Zhu, Qiang Li, Li Hu, Hongyang Chen, Nirwan Ansari
Abstract	Phase Modulation on the Hypersphere (PMH) is a power efficient modulation scheme for the \textit{load-modulated} multiple-input multiple-output (MIMO) transmitters with central power amplifiers (CPA). However, it is difficult to obtain the precise channel state information (CSI), and the traditional optimal maximum likelihood (ML) detection scheme incurs high complexity which increases exponentially with the number of antennas and the number of bits carried per antenna in the PMH modulation. To detect the PMH signals without knowing the prior CSI, we first propose a signal detection scheme, termed as the hypersphere clustering scheme based on the expectation maximization (EM) algorithm with maximum likelihood detection (HEM-ML). By leveraging machine learning, the proposed detection scheme can accurately obtain information of the channel from a few of the received symbols with little resource cost and achieve comparable detection results as that of the optimal ML detector. To further reduce the computational complexity in the ML detection in HEM-ML, we also propose the second signal detection scheme, termed as the hypersphere clustering scheme based on the EM algorithm with KD-tree detection (HEM-KD). The CSI obtained from the EM algorithm is used to build a spatial KD-tree receiver codebook and the signal detection problem can be transformed into a nearest neighbor search (NNS) problem. The detection complexity of HEM-KD is significantly reduced without any detection performance loss as compared to HEM-ML. Extensive simulation results verify the effectiveness of our proposed detection schemes.
Tasks
Published	2019-11-24
URL	https://arxiv.org/abs/1911.13238v1
PDF	https://arxiv.org/pdf/1911.13238v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-based-signal-detection-for
Repo
Framework

Recurrent Existence Determination Through Policy Optimization


Title	Recurrent Existence Determination Through Policy Optimization
Authors	Baoxiang Wang
Abstract	Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness. One of the possible reasons is that humans can skip most of the clutter and attend only on salient regions. Recurrent attention models (RAM) are the first computational models to imitate the way humans process images via the REINFORCE algorithm. Despite that RAM is originally designed for image recognition, we extend it and present recurrent existence determination, an attention-based mechanism to solve the existence determination. Our algorithm employs a novel $k$-maximum aggregation layer and a new reward mechanism to address the issue of delayed rewards, which would have caused the instability of the training process. The experimental analysis demonstrates significant efficiency and accuracy improvement over existing approaches, on both synthetic and real-world datasets.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.13551v2
PDF	https://arxiv.org/pdf/1905.13551v2.pdf
PWC	https://paperswithcode.com/paper/190513551
Repo
Framework

Fast and Effective Adaptation of Facial Action Unit Detection Deep Model


Title	Fast and Effective Adaptation of Facial Action Unit Detection Deep Model
Authors	Mihee Lee, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic
Abstract	Detecting facial action units (AU) is one of the fundamental steps in automatic recognition of facial expression of emotions and cognitive states. Though there have been a variety of approaches proposed for this task, most of these models are trained only for the specific target AUs, and as such they fail to easily adapt to the task of recognition of new AUs (i.e., those not initially used to train the target models). In this paper, we propose a deep learning approach for facial AU detection that can easily and in a fast manner adapt to a new AU or target subject by leveraging only a few labeled samples from the new task (either an AU or subject). To this end, we propose a modeling approach based on the notion of the model-agnostic meta-learning, originally proposed for the general image recognition/detection tasks (e.g., the character recognition from the Omniglot dataset). Specifically, each subject and/or AU is treated as a new learning task and the model learns to adapt based on the knowledge of the previous tasks (the AUs and subjects used to pre-train the target models). Thus, given a new subject or AU, this meta-knowledge (that is shared among training and test tasks) is used to adapt the model to the new task using the notion of deep learning and model-agnostic meta-learning. We show on two benchmark datasets (BP4D and DISFA) for facial AU detection that the proposed approach can be easily adapted to new tasks (AUs/subjects). Using only a few labeled examples from these tasks, the model achieves large improvements over the baselines (i.e., non-adapted models).
Tasks	Action Unit Detection, Facial Action Unit Detection, Meta-Learning, Omniglot
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12158v2
PDF	https://arxiv.org/pdf/1909.12158v2.pdf
PWC	https://paperswithcode.com/paper/fast-and-effective-adaptation-of-facial
Repo
Framework

Structured Summarization of Academic Publications


Title	Structured Summarization of Academic Publications
Authors	Alexios Gidiotis, Grigorios Tsoumakas
Abstract	We propose SUSIE, a novel summarization method that can work with state-of-the-art summarization models in order to produce structured scientific summaries for academic articles. We also created PMC-SA, a new dataset of academic publications, suitable for the task of structured summarization with neural networks. We apply SUSIE combined with three different summarization models on the new PMC-SA dataset and we show that the proposed method improves the performance of all models by as much as 4 ROUGE points.
Tasks
Published	2019-05-19
URL	https://arxiv.org/abs/1905.07695v2
PDF	https://arxiv.org/pdf/1905.07695v2.pdf
PWC	https://paperswithcode.com/paper/structured-summarization-of-academic
Repo
Framework

Noise2Blur: Online Noise Extraction and Denoising


Title	Noise2Blur: Online Noise Extraction and Denoising
Authors	Huangxing Lin, Weihong Zeng, Xinghao Ding, Xueyang Fu, Yue Huang, John Paisley
Abstract	We propose a new framework called Noise2Blur (N2B) for training robust image denoising models without pre-collected paired noisy/clean images. The training of the model requires only some (or even one) noisy images, some random unpaired clean images, and noise-free but blurred labels obtained by predefined filtering of the noisy images. The N2B model consists of two parts: a denoising network and a noise extraction network. First, the noise extraction network learns to output a noise map using the noise information from the denoising network under the guidence of the blurred labels. Then, the noise map is added to a clean image to generate a new ``noisy/clean’’ image pair. Using the new image pair, the denoising network learns to generate clean and high-quality images from noisy observations. These two networks are trained simultaneously and mutually aid each other to learn the mappings of noise to clean/blur. Experiments on several denoising tasks show that the denoising performance of N2B is close to that of other denoising CNNs trained with pre-collected paired data. \|
Tasks	Denoising, Image Denoising
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01158v1
PDF	https://arxiv.org/pdf/1912.01158v1.pdf
PWC	https://paperswithcode.com/paper/noise2blur-online-noise-extraction-and
Repo
Framework

Learning Sub-Sampling and Signal Recovery with Applications in Ultrasound Imaging


Title	Learning Sub-Sampling and Signal Recovery with Applications in Ultrasound Imaging
Authors	Iris A. M. Huijben, Bastiaan S. Veeling, Kees Janse, Massimo Mischi, Ruud J. G. van Sloun
Abstract	Limitations on bandwidth and power consumption impose strict bounds on data rates of diagnostic imaging systems. Consequently, the design of suitable (i.e. task- and data-aware) compression and reconstruction techniques has attracted considerable attention in recent years. Compressed sensing emerged as a popular framework for sparse signal reconstruction from a small set of compressed measurements. However, typical compressed sensing designs measure a (non)linearly weighted combination of all input signal elements, which poses practical challenges. These designs are also not necessarily task-optimal. In addition, real-time recovery is hampered by the iterative and time-consuming nature of sparse recovery algorithms. Recently, deep learning methods have shown promise for fast recovery from compressed measurements, but the design of adequate and practical sensing strategies remains a challenge. Here, we propose a deep learning solution termed Deep Probabilistic Sub-sampling (DPS), that learns a task-driven sub-sampling pattern, while jointly training a subsequent task model. Once learned, the task-based sub-sampling patterns are fixed and straightforwardly implementable, e.g. by non-uniform analog-to-digital conversion, sparse array design, or slow-time ultrasound pulsing schemes. The effectiveness of our framework is demonstrated in-silico for sparse signal recovery from partial Fourier measurements, and in-vivo for both anatomical image and tissue-motion (Doppler) reconstruction from sub-sampled medical ultrasound imaging data.
Tasks
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05764v2
PDF	https://arxiv.org/pdf/1908.05764v2.pdf
PWC	https://paperswithcode.com/paper/learning-sub-sampling-and-signal-recovery
Repo
Framework

Generating fMRI volumes from T1-weighted volumes using 3D CycleGAN


Title	Generating fMRI volumes from T1-weighted volumes using 3D CycleGAN
Authors	David Abramian, Anders Eklund
Abstract	Registration between an fMRI volume and a T1-weighted volume is challenging, since fMRI volumes contain geometric distortions. Here we present preliminary results showing that 3D CycleGAN can be used to synthesize fMRI volumes from T1-weighted volumes, and vice versa, which can facilitate registration.
Tasks
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08533v2
PDF	https://arxiv.org/pdf/1907.08533v2.pdf
PWC	https://paperswithcode.com/paper/generating-fmri-volumes-from-t1-weighted
Repo
Framework

Learning to Generate Grounded Image Captions without Localization Supervision


Title	Learning to Generate Grounded Image Captions without Localization Supervision
Authors	Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
Abstract	When generating a sentence description for an image, it frequently remains unclear how well the generated caption is grounded in the image or if the model hallucinates based on priors in the dataset and/or the language model. The most common way of relating image regions with words in caption models is through an attention mechanism over the regions that is used as input to predict the next word. The model must therefore learn to predict the attention without knowing the word it should localize. In this work, we propose a novel cyclical training regimen that forces the model to localize each word in the image after the sentence decoder generates it and then reconstruct the sentence from the localized image region(s) to match the ground-truth. The initial decoder and the proposed reconstructor share parameters during training and are learned jointly with the localizer, allowing the model to regularize the attention mechanism. Our proposed framework only requires learning one extra fully-connected layer (the localizer), a layer that can be removed at test time. We show that our model significantly improves grounding accuracy without relying on grounding supervision or introducing extra computation during inference.
Tasks	Image Captioning, Language Modelling
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00283v1
PDF	https://arxiv.org/pdf/1906.00283v1.pdf
PWC	https://paperswithcode.com/paper/190600283
Repo
Framework

Efficient Curvature Estimation for Oriented Point Clouds


Title	Efficient Curvature Estimation for Oriented Point Clouds
Authors	Yueqi Cao, Didong Li, Huafei Sun, Amir H Assadi, Shiqiang Zhang
Abstract	There is an immense literature focused on estimating the curvature of an unknown surface from point cloud dataset. Most existing algorithms estimate the curvature indirectly, that is, to estimate the surface locally by some basis functions and then calculate the curvature of such surface as an estimate of the curvature. Recently several methods have been proposed to estimate the curvature directly. However, these algorithms lack of theoretical guarantee on estimation error on small to moderate datasets. In this paper, we propose a direct and efficient method to estimate the curvature for oriented point cloud data without any surface approximation. In fact, we estimate the Weingarten map using a least square method, so that Gaussian curvature, mean curvature and principal curvatures can be obtained automatically from the Weingarten map. We show the convergence rate of our Weingarten Map Estimation (WME) algorithm is $n^{-2/3}$ both theoretically and numerically. Finally, we apply our method to point cloud simplification and surface reconstruction.
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10725v1
PDF	https://arxiv.org/pdf/1905.10725v1.pdf
PWC	https://paperswithcode.com/paper/efficient-curvature-estimation-for-oriented
Repo
Framework

Systematic improvement of user engagement with academic titles using computational linguistics


Title	Systematic improvement of user engagement with academic titles using computational linguistics
Authors	Nim Dvir, Ruti Gafni
Abstract	This paper describes a novel approach to systematically improve information interactions based solely on its wording. Following an interdisciplinary literature review, we recognized three key attributes of words that drive user engagement: (1) Novelty (2) Familiarity (3) Emotionality. Based on these attributes, we developed a model to systematically improve a given content using computational linguistics, natural language processing (NLP) and text analysis (word frequency, sentiment analysis and lexical substitution). We conducted a pilot study (n=216) in which the model was used to formalize evaluation and optimization of academic titles. A between-group design (A/B testing) was used to compare responses to the original and modified (treatment) titles. Data was collected for selection and evaluation (User Engagement Scale). The pilot results suggest that user engagement with digital information is fostered by, and perhaps dependent upon, the wording being used. They also provide empirical support that engaging content can be systematically evaluated and produced. The preliminary results show that the modified (treatment) titles had significantly higher scores for information use and user engagement (selection and evaluation). We propose that computational linguistics is a useful approach for optimizing information interactions. The empirically based insights can inform the development of digital content strategies, thereby improving the success of information interactions.elop more sophisticated interaction measures.
Tasks	Sentiment Analysis
Published	2019-06-23
URL	https://arxiv.org/abs/1906.09569v1
PDF	https://arxiv.org/pdf/1906.09569v1.pdf
PWC	https://paperswithcode.com/paper/systematic-improvement-of-user-engagement
Repo
Framework

Image Classification base on PCA of Multi-view Deep Representation


Title	Image Classification base on PCA of Multi-view Deep Representation
Authors	Yaoqi Sun, Liang Li, Liang Zheng, Ji Hu, Yatong Jiang, Chenggang Yan
Abstract	In the age of information explosion, image classification is the key technology of dealing with and organizing a large number of image data. Currently, the classical image classification algorithms are mostly based on RGB images or grayscale images, and fail to make good use of the depth information about objects or scenes. The depth information in the images has a strong complementary effect, which can enhance the classification accuracy significantly. In this paper, we propose an image classification technology using principal component analysis based on multi-view depth characters. In detail, firstly, the depth image of the original image is estimated; secondly, depth characters are extracted from the RGB views and the depth view separately, and then the reducing dimension operation through the PCA is implemented. Eventually, the SVM is applied to image classification. The experimental results show that the method has good performance.
Tasks	Image Classification
Published	2019-03-12
URL	http://arxiv.org/abs/1903.04814v1
PDF	http://arxiv.org/pdf/1903.04814v1.pdf
PWC	https://paperswithcode.com/paper/image-classification-base-on-pca-of-multi
Repo
Framework

Machine Learning Software Engineering in Practice: An Industrial Case Study


Title	Machine Learning Software Engineering in Practice: An Industrial Case Study
Authors	Md Saidur Rahman, Emilio Rivera, Foutse Khomh, Yann-Gaël Guéhéneuc, Bernd Lehnert
Abstract	SAP is the market leader in enterprise software offering an end-to-end suite of applications and services to enable their customers worldwide to operate their business. Especially, retail customers of SAP deal with millions of sales transactions for their day-to-day business. Transactions are created during retail sales at the point of sale (POS) terminals and then sent to some central servers for validations and other business operations. A considerable proportion of the retail transactions may have inconsistencies due to many technical and human errors. SAP provides an automated process for error detection but still requires a manual process by dedicated employees using workbench software for correction. However, manual corrections of these errors are time-consuming, labor-intensive, and may lead to further errors due to incorrect modifications. This is not only a performance overhead on the customers’ business workflow but it also incurs high operational costs. Thus, automated detection and correction of transaction errors are very important regarding their potential business values and the improvement in the business workflow. In this paper, we present an industrial case study where we apply machine learning (ML) to automatically detect transaction errors and propose corrections. We identify and discuss the challenges that we faced during this collaborative research and development project, from three distinct perspectives: Software Engineering, Machine Learning, and industry-academia collaboration. We report on our experience and insights from the project with guidelines for the identified challenges. We believe that our findings and recommendations can help researchers and practitioners embarking into similar endeavors.
Tasks
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07154v1
PDF	https://arxiv.org/pdf/1906.07154v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-software-engineering-in
Repo
Framework

Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer


Title	Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer
Authors	Merlijn Blaauw, Jordi Bonada
Abstract	We propose a sequence-to-sequence singing synthesizer, which avoids the need for training data with pre-aligned phonetic and acoustic features. Rather than the more common approach of a content-based attention mechanism combined with an autoregressive decoder, we use a different mechanism suitable for feed-forward synthesis. Given that phonetic timings in singing are highly constrained by the musical score, we derive an approximate initial alignment with the help of a simple duration model. Then, using a decoder based on a feed-forward variant of the Transformer model, a series of self-attention and convolutional layers refines the result of the initial alignment to reach the target acoustic features. Advantages of this approach include faster inference and avoiding the exposure bias issues that affect autoregressive models trained by teacher forcing. We evaluate the effectiveness of this model compared to an autoregressive baseline, the importance of self-attention, and the importance of the accuracy of the duration model.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09989v2
PDF	https://arxiv.org/pdf/1910.09989v2.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-singing-synthesis-using
Repo
Framework