October 19, 2019

3090 words 15 mins read

Paper Group ANR 323

Datalog: Bag Semantics via Set Semantics. Skin disease identification from dermoscopy images using deep convolutional neural network. Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction. Efficient Greedy Coordinate Descent for Composite Problems. A Survey of Deep Learning Techniques for Mobile Robot Applicati …

Datalog: Bag Semantics via Set Semantics


Title	Datalog: Bag Semantics via Set Semantics
Authors	Leopoldo Bertossi, Georg Gottlob, Reinhard Pichler
Abstract	Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog, the so-called {\em warded Datalog}$^\pm$, under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a practical point of view, this allows us to handle the bag semantics of Datalog by powerful, existing query engines for the required extension of Datalog. This use of Datalog$^\pm$ is extended to give a set semantics to duplicates in Datalog$^\pm$ itself. We investigate the properties of the resulting Datalog$^\pm$ programs, the problem of deciding multiplicities, and expressibility of some bag operations. Moreover, the proposed translation has the potential for interesting applications such as to Multiset Relational Algebra and the semantic web query language SPARQL with bag semantics.
Tasks
Published	2018-03-17
URL	http://arxiv.org/abs/1803.06445v3
PDF	http://arxiv.org/pdf/1803.06445v3.pdf
PWC	https://paperswithcode.com/paper/datalog-bag-semantics-via-set-semantics
Repo
Framework

Skin disease identification from dermoscopy images using deep convolutional neural network


Title	Skin disease identification from dermoscopy images using deep convolutional neural network
Authors	Anabik Pal, Sounak Ray, Utpal Garain
Abstract	In this paper, a deep neural network based ensemble method is experimented for automatic identification of skin disease from dermoscopic images. The developed algorithm is applied on the task3 of the ISIC 2018 challenge dataset (Skin Lesion Analysis Towards Melanoma Detection).
Tasks
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09163v1
PDF	http://arxiv.org/pdf/1807.09163v1.pdf
PWC	https://paperswithcode.com/paper/skin-disease-identification-from-dermoscopy
Repo
Framework

Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction


Title	Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction
Authors	Luowei Zhou, Nathan Louis, Jason J. Corso
Abstract	We study weakly-supervised video object grounding: given a video segment and a corresponding descriptive sentence, the goal is to localize objects that are mentioned from the sentence in the video. During training, no object bounding boxes are available, but the set of possible objects to be grounded is known beforehand. Existing approaches in the image domain use Multiple Instance Learning (MIL) to ground objects by enforcing matches between visual and semantic features. A naive extension of this approach to the video domain is to treat the entire segment as a bag of spatial object proposals. However, an object existing sparsely across multiple frames might not be detected completely since successfully spotting it from one single frame would trigger a satisfactory match. To this end, we propagate the weak supervisory signal from the segment level to frames that likely contain the target object. For frames that are unlikely to contain the target objects, we use an alternative penalty loss. We also leverage the interactions among objects as a textual guide for the grounding. We evaluate our model on the newly-collected benchmark YouCook2-BoundingBox and show improvements over competitive baselines.
Tasks	Multiple Instance Learning
Published	2018-05-08
URL	http://arxiv.org/abs/1805.02834v2
PDF	http://arxiv.org/pdf/1805.02834v2.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-video-object-grounding-from
Repo
Framework

Efficient Greedy Coordinate Descent for Composite Problems


Title	Efficient Greedy Coordinate Descent for Composite Problems
Authors	Sai Praneeth Karimireddy, Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi
Abstract	Coordinate descent with random coordinate selection is the current state of the art for many large scale optimization problems. However, greedy selection of the steepest coordinate on smooth problems can yield convergence rates independent of the dimension $n$, and requiring upto $n$ times fewer iterations. In this paper, we consider greedy updates that are based on subgradients for a class of non-smooth composite problems, which includes $L1$-regularized problems, SVMs and related applications. For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case. This was previously conjectured to be true for a stronger greedy coordinate selection strategy. Furthermore, we show that (ii) our new selection rule can be mapped to instances of maximum inner product search, allowing to leverage standard nearest neighbor algorithms to speed up the implementation. We demonstrate the validity of the approach through extensive numerical experiments.
Tasks
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06999v1
PDF	http://arxiv.org/pdf/1810.06999v1.pdf
PWC	https://paperswithcode.com/paper/efficient-greedy-coordinate-descent-for
Repo
Framework

A Survey of Deep Learning Techniques for Mobile Robot Applications


Title	A Survey of Deep Learning Techniques for Mobile Robot Applications
Authors	Jahanzaib Shabbir, Tarique Anwer
Abstract	Advancements in deep learning over the years have attracted research into how deep artificial neural networks can be used in robotic systems. This research survey will present a summarization of the current research with a specific focus on the gains and obstacles for deep learning to be applied to mobile robotics.
Tasks
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07608v1
PDF	http://arxiv.org/pdf/1803.07608v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-deep-learning-techniques-for
Repo
Framework

A Novel Multi-Task Tensor Correlation Neural Network for Facial Attribute Prediction


Title	A Novel Multi-Task Tensor Correlation Neural Network for Facial Attribute Prediction
Authors	Mingxing Duan, Kenli Li, Qi Tian
Abstract	Face multi-attribute prediction benefits substantially from multi-task learning (MTL), which learns multiple face attributes simultaneously to achieve shared or mutually related representations of different attributes. The most widely used MTL convolutional neural network is heuristically or empirically designed by sharing all of the convolutional layers and splitting at the fully connected layers for task-specific losses. However, it is improper to view all low and mid-level features for different attributes as being the same, especially when these attributes are only loosely related. In this paper, we propose a novel multi-attribute tensor correlation neural network (MTCN) for face attribute prediction. The structure shares the information in low-level features (e.g., the first two convolutional layers) but splits that in high-level features (e.g., from the third convolutional layer to the fully connected layer). At the same time, during high-level feature extraction, each subnetwork (e.g., Age-Net, Gender-Net, …, and Smile-Net) excavates closely related features from other networks to enhance its features. Then, we project the features of the C9 layers of the fine-tuned subnetworks into a highly correlated space by using a novel tensor correlation analysis algorithm (NTCCA). The final face attribute prediction is made based on the correlation matrix. Experimental results on benchmarks with multiple face attributes (CelebA and LFWA) show that the proposed approach has superior performance compared to state-of-the-art methods.
Tasks	Multi-Task Learning
Published	2018-04-09
URL	http://arxiv.org/abs/1804.02810v1
PDF	http://arxiv.org/pdf/1804.02810v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-multi-task-tensor-correlation-neural
Repo
Framework

Transformationally Identical and Invariant Convolutional Neural Networks through Symmetric Element Operators


Title	Transformationally Identical and Invariant Convolutional Neural Networks through Symmetric Element Operators
Authors	Shih Chung B. Lo, Matthew T. Freedman, Seong K. Mun, Shuo Gu
Abstract	Mathematically speaking, a transformationally invariant operator, such as a transformationally identical (TI) matrix kernel (i.e., K= T{K}), commutes with the transformation (T{.}) itself when they operate on the first operand matrix. We found that by consistently applying the same type of TI kernels in a convolutional neural networks (CNN) system, the commutative property holds throughout all layers of convolution processes with and without involving an activation function and/or a 1D convolution across channels within a layer. We further found that any CNN possessing the same TI kernel property for all convolution layers followed by a flatten layer with weight sharing among their transformation corresponding elements would output the same result for all transformation versions of the original input vector. In short, CNN[ Vi ] = CNN[ T{Vi} ] providing every K = T{K} in CNN, where Vi denotes input vector and CNN[.] represents the whole CNN process as a function of input vector that produces an output vector. With such a transformationally identical CNN (TI-CNN) system, each transformation, that is not associated with a predefined TI used in data augmentation, would inherently include all of its corresponding transformation versions of the input vector for the training. Hence the use of same TI property for every kernel in the CNN would serve as an orientation or a translation independent training guide in conjunction with the error-backpropagation during the training. This TI kernel property is desirable for applications requiring a highly consistent output result from corresponding transformation versions of an input. Several C programming routines are provided to facilitate interested parties of using the TI-CNN technique which is expected to produce a better generalization performance than its ordinary CNN counterpart.
Tasks	Data Augmentation
Published	2018-06-10
URL	http://arxiv.org/abs/1806.03636v3
PDF	http://arxiv.org/pdf/1806.03636v3.pdf
PWC	https://paperswithcode.com/paper/transformationally-identical-and-invariant-1
Repo
Framework

Epipolar Geometry based Learning of Multi-view Depth and Ego-Motion from Monocular Sequences


Title	Epipolar Geometry based Learning of Multi-view Depth and Ego-Motion from Monocular Sequences
Authors	Vignesh Prasad, Dipanjan Das, Brojeshwar Bhowmick
Abstract	Deep approaches to predict monocular depth and ego-motion have grown in recent years due to their ability to produce dense depth from monocular images. The main idea behind them is to optimize the photometric consistency over image sequences by warping one view into another, similar to direct visual odometry methods. One major drawback is that these methods infer depth from a single view, which might not effectively capture the relation between pixels. Moreover, simply minimizing the photometric loss does not ensure proper pixel correspondences, which is a key factor for accurate depth and pose estimations. In contrast, we propose a 2-view depth network to infer the scene depth from consecutive frames, thereby learning inter-pixel relationships. To ensure better correspondences, thereby better geometric understanding, we propose incorporating epipolar constraints to make the learning more geometrically sound. We use the Essential matrix obtained using Nist’er’s Five Point Algorithm, to enforce meaningful geometric constraints, rather than using it as training labels. This allows us to use lesser no. of trainable parameters compared to state-of-the-art methods. The proposed method results in better depth images and pose estimates, which capture the scene structure and motion in a better way. Such a geometrically constrained learning performs successfully even in cases where simply minimizing the photometric error would fail.
Tasks	Visual Odometry
Published	2018-12-23
URL	http://arxiv.org/abs/1812.11922v3
PDF	http://arxiv.org/pdf/1812.11922v3.pdf
PWC	https://paperswithcode.com/paper/epipolar-geometry-based-learning-of-multi
Repo
Framework

Tree-structured multi-stage principal component analysis (TMPCA): theory and applications


Title	Tree-structured multi-stage principal component analysis (TMPCA): theory and applications
Authors	Yuanhang Su, Ruiyuan Lin, C. -C. Jay Kuo
Abstract	A PCA based sequence-to-vector (seq2vec) dimension reduction method for the text classification problem, called the tree-structured multi-stage principal component analysis (TMPCA) is presented in this paper. Theoretical analysis and applicability of TMPCA are demonstrated as an extension to our previous work (Su, Huang & Kuo). Unlike conventional word-to-vector embedding methods, the TMPCA method conducts dimension reduction at the sequence level without labeled training data. Furthermore, it can preserve the sequential structure of input sequences. We show that TMPCA is computationally efficient and able to facilitate sequence-based text classification tasks by preserving strong mutual information between its input and output mathematically. It is also demonstrated by experimental results that a dense (fully connected) network trained on the TMPCA preprocessed data achieves better performance than state-of-the-art fastText and other neural-network-based solutions.
Tasks	Dimensionality Reduction, Text Classification
Published	2018-07-22
URL	http://arxiv.org/abs/1807.08228v2
PDF	http://arxiv.org/pdf/1807.08228v2.pdf
PWC	https://paperswithcode.com/paper/tree-structured-multi-stage-principal
Repo
Framework

Event-Based Features Selection and Tracking from Intertwined Estimation of Velocity and Generative Contours


Title	Event-Based Features Selection and Tracking from Intertwined Estimation of Velocity and Generative Contours
Authors	Laurent Dardelet, Sio-Hoi Ieng, Ryad Benosman
Abstract	This paper presents a new event-based method for detecting and tracking features from the output of an event-based camera. Unlike many tracking algorithms from the computer vision community, this process does not aim for particular predefined shapes such as corners. It relies on a dual intertwined iterative continuous – pure event-based – estimation of the velocity vector and a bayesian description of the generative feature contours. By projecting along estimated speeds updated for each incoming event it is possible to identify and determine the spatial location and generative contour of the tracked feature while iteratively updating the estimation of the velocity vector. Results on several environments are shown taking into account large variations in terms of luminosity, speed, nature and size of the tracked features. The usage of speed instead of positions allows for a much faster feedback allowing for very fast convergence rates.
Tasks
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07839v1
PDF	http://arxiv.org/pdf/1811.07839v1.pdf
PWC	https://paperswithcode.com/paper/event-based-features-selection-and-tracking
Repo
Framework

Geometric-based Line Segment Tracking for HDR Stereo Sequences


Title	Geometric-based Line Segment Tracking for HDR Stereo Sequences
Authors	Ruben Gomez-Ojeda, Javier Gonzalez-Jimenez
Abstract	In this work, we propose a purely geometrical approach for the robust matching of line segments for challenging stereo streams with severe illumination changes or High Dynamic Range (HDR) environments. To that purpose, we exploit the univocal nature of the matching problem, i.e. every observation must be corresponded with a single feature or not corresponded at all. We state the problem as a sparse, convex, L1-minimization of the matching vector regularized by the geometric constraints. This formulation allows for the robust tracking of line segments along sequences where traditional appearance-based matching techniques tend to fail due to dynamic changes in illumination conditions. Moreover, the proposed matching algorithm also results in a considerable speed-up of previous state of the art techniques making it suitable for real-time applications such as Visual Odometry (VO). This, of course, comes at expense of a slightly lower number of matches in comparison with appearance based methods, and also limits its application to continuous video sequences, as it is rather constrained to small pose increments between consecutive frames. We validate the claimed advantages by first evaluating the matching performance in challenging video sequences, and then testing the method in a benchmarked point and line based VO algorithm.
Tasks	Visual Odometry
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09368v1
PDF	http://arxiv.org/pdf/1809.09368v1.pdf
PWC	https://paperswithcode.com/paper/geometric-based-line-segment-tracking-for-hdr
Repo
Framework

The Aqualoc Dataset: Towards Real-Time Underwater Localization from a Visual-Inertial-Pressure Acquisition System


Title	The Aqualoc Dataset: Towards Real-Time Underwater Localization from a Visual-Inertial-Pressure Acquisition System
Authors	Maxime Ferrera, Julien Moras, Pauline Trouvé-Peloux, Vincent Creuze, Denis Dégez
Abstract	This paper presents a new underwater dataset acquired from a visual-inertial-pressure acquisition system and meant to be used to benchmark visual odometry, visual SLAM and multi-sensors SLAM solutions. The dataset is publicly available and contains ground-truth trajectories for evaluation.
Tasks	Visual Odometry
Published	2018-09-19
URL	http://arxiv.org/abs/1809.07076v1
PDF	http://arxiv.org/pdf/1809.07076v1.pdf
PWC	https://paperswithcode.com/paper/the-aqualoc-dataset-towards-real-time
Repo
Framework

Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors


Title	Recurrent Binary Embedding for GPU-Enabled Exhaustive Retrieval from Billion-Scale Semantic Vectors
Authors	Ying Shan, Jian Jiao, Jie Zhu, JC Mao
Abstract	Rapid advances in GPU hardware and multiple areas of Deep Learning open up a new opportunity for billion-scale information retrieval with exhaustive search. Building on top of the powerful concept of semantic learning, this paper proposes a Recurrent Binary Embedding (RBE) model that learns compact representations for real-time retrieval. The model has the unique ability to refine a base binary vector by progressively adding binary residual vectors to meet the desired accuracy. The refined vector enables efficient implementation of exhaustive similarity computation with bit-wise operations, followed by a near- lossless k-NN selection algorithm, also proposed in this paper. The proposed algorithms are integrated into an end-to-end multi-GPU system that retrieves thousands of top items from over a billion candidates in real-time. The RBE model and the retrieval system were evaluated with data from a major paid search engine. When measured against the state-of-the-art model for binary representation and the full precision model for semantic embedding, RBE significantly outperformed the former, and filled in over 80% of the AUC gap in-between. Experiments comparing with our production retrieval system also demonstrated superior performance. While the primary focus of this paper is to build RBE based on a particular class of semantic models, generalizing to other types is straightforward, as exemplified by two different models at the end of the paper.
Tasks	Information Retrieval
Published	2018-02-18
URL	http://arxiv.org/abs/1802.06466v1
PDF	http://arxiv.org/pdf/1802.06466v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-binary-embedding-for-gpu-enabled
Repo
Framework

Pulse Sequence Resilient Fast Brain Segmentation


Title	Pulse Sequence Resilient Fast Brain Segmentation
Authors	Amod Jog, Bruce Fischl
Abstract	Accurate automatic segmentation of brain anatomy from $T_1$-weighted~($T_1$-w) magnetic resonance images~(MRI) has been a computationally intensive bottleneck in neuroimaging pipelines, with state-of-the-art results obtained by unsupervised intensity modeling-based methods and multi-atlas registration and label fusion. With the advent of powerful supervised convolutional neural networks~(CNN)-based learning algorithms, it is now possible to produce a high quality brain segmentation within seconds. However, the very supervised nature of these methods makes it difficult to generalize them on data different from what they have been trained on. Modern neuroimaging studies are necessarily multi-center initiatives with a wide variety of acquisition protocols. Despite stringent protocol harmonization practices, it is not possible to standardize the whole gamut of MRI imaging parameters across scanners, field strengths, receive coils etc., that affect image contrast. In this paper we propose a CNN-based segmentation algorithm that, in addition to being highly accurate and fast, is also resilient to variation in the input $T_1$-w acquisition. Our approach relies on building approximate forward models of $T_1$-w pulse sequences that produce a typical test image. We use the forward models to augment the training data with test data specific training examples. These augmented data can be used to update and/or build a more robust segmentation model that is more attuned to the test data imaging properties. Our method generates highly accurate, state-of-the-art segmentation results~(overall Dice overlap=0.94), within seconds and is consistent across a wide-range of protocols.
Tasks	Brain Segmentation
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11598v1
PDF	http://arxiv.org/pdf/1807.11598v1.pdf
PWC	https://paperswithcode.com/paper/pulse-sequence-resilient-fast-brain
Repo
Framework

A Modality-Adaptive Method for Segmenting Brain Tumors and Organs-at-Risk in Radiation Therapy Planning


Title	A Modality-Adaptive Method for Segmenting Brain Tumors and Organs-at-Risk in Radiation Therapy Planning
Authors	Mikael Agn, Per Munck af Rosenschöld, Oula Puonti, Michael J. Lundemann, Laura Mancini, Anastasia Papadaki, Steffi Thust, John Ashburner, Ian Law, Koen Van Leemput
Abstract	In this paper we present a method for simultaneously segmenting brain tumors and an extensive set of organs-at-risk for radiation therapy planning of glioblastomas. The method combines a contrast-adaptive generative model for whole-brain segmentation with a new spatial regularization model of tumor shape using convolutional restricted Boltzmann machines. We demonstrate experimentally that the method is able to adapt to image acquisitions that differ substantially from any available training data, ensuring its applicability across treatment sites; that its tumor segmentation accuracy is comparable to that of the current state of the art; and that it captures most organs-at-risk sufficiently well for radiation therapy planning purposes. The proposed method may be a valuable step towards automating the delineation of brain tumors and organs-at-risk in glioblastoma patients undergoing radiation therapy.
Tasks	Brain Segmentation
Published	2018-07-18
URL	http://arxiv.org/abs/1807.10588v2
PDF	http://arxiv.org/pdf/1807.10588v2.pdf
PWC	https://paperswithcode.com/paper/a-modality-adaptive-method-for-segmenting
Repo
Framework