January 29, 2020

3298 words 16 mins read

Paper Group ANR 762

Learning and T-Norms Theory. Mobile Recognition of Wikipedia Featured Sites using Deep Learning and Crowd-sourced Imagery. Efficient Object Detection Model for Real-Time UAV Applications. Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core. Scene Graph based Image Retrieval – A case study on the CLEVR Dataset. ROI Regularization fo …

Learning and T-Norms Theory


Title	Learning and T-Norms Theory
Authors	Francesco Giannini, Giuseppe Marra, Michelangelo Diligenti, Marco Maggini, Marco Gori
Abstract	Neuro-symbolic approaches have recently gained popularity to inject prior knowledge into a learner without requiring it to induce this knowledge from data. These approaches can potentially learn competitive solutions with a significant reduction of the amount of supervised data. A large class of neuro-symbolic approaches is based on First-Order Logic to represent prior knowledge, relaxed to a differentiable form using fuzzy logic. This paper shows that the loss function expressing these neuro-symbolic learning tasks can be unambiguously determined given the selection of a t-norm generator. When restricted to supervised learning, the presented theoretical apparatus provides a clean justification to the popular cross-entropy loss, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. However, the proposed learning formulation extends the advantages of the cross-entropy loss to the general knowledge that can be represented by a neuro-symbolic method. Therefore, the methodology allows the development of a novel class of loss functions, which are shown in the experimental results to lead to faster convergence rates than the approaches previously proposed in the literature.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11468v2
PDF	https://arxiv.org/pdf/1907.11468v2.pdf
PWC	https://paperswithcode.com/paper/learning-and-t-norms-theory
Repo
Framework

Mobile Recognition of Wikipedia Featured Sites using Deep Learning and Crowd-sourced Imagery


Title	Mobile Recognition of Wikipedia Featured Sites using Deep Learning and Crowd-sourced Imagery
Authors	Jimin Tan, Anastasios Noulas, Diego Sáez, Rossano Schifanella
Abstract	Rendering Wikipedia content through mobile and augmented reality mediums can enable new forms of interaction in urban-focused user communities facilitating learning, communication and knowledge exchange. With this objective in mind, in this work we develop a mobile application that allows for the recognition of notable sites featured on Wikipedia. The application is powered by a deep neural network that has been trained on crowd-sourced imagery describing sites of interest, such as buildings, statues, museums or other physical entities that are present and visually accessible in an urban environment. We describe an end-to-end pipeline that describes data collection, model training and evaluation of our application considering online and real world scenarios. We identify a number of challenges in the site recognition task which arise due to visual similarities amongst the classified sites as well as due to noise introduce by the surrounding built environment. We demonstrate how using mobile contextual information, such as user location, orientation and attention patterns can significantly alleviate such challenges. Moreover, we present an unsupervised learning technique to de-noise crowd-sourced imagery which improves classification performance further.
Tasks	Denoising, Image Denoising
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09705v2
PDF	https://arxiv.org/pdf/1910.09705v2.pdf
PWC	https://paperswithcode.com/paper/notable-site-recognition-on-mobile-devices
Repo
Framework

Efficient Object Detection Model for Real-Time UAV Applications


Title	Efficient Object Detection Model for Real-Time UAV Applications
Authors	Subrahmanyam Vaddi, Chandan Kumar, Ali Jannesari
Abstract	Unmanned Aerial Vehicles (UAVs) especially drones, equipped with vision techniques have become very popular in recent years, with their extensive use in wide range of applications. Many of these applications require use of computer vision techniques, particularly object detection from the information captured by on-board camera. In this paper, we propose an end to end object detection model running on a UAV platform which is suitable for real-time applications. We propose a deep feature pyramid architecture which makes use of inherent properties of features extracted from Convolutional Networks by capturing more generic features in the images (such as edge, color etc.) along with the minute detailed features specific to the classes contained in our problem. We use VisDrone-18 dataset for our studies which contain different objects such as pedestrians, vehicles, bicycles etc. We provide software and hardware architecture of our platform used in this study. We implemented our model with both ResNet and MobileNet as convolutional bases. Our model combined with modified focal loss function, produced a desirable performance of 30.6 mAP for object detection with an inference time of 14 fps. We compared our results with RetinaNet-ResNet-50 and HAL-RetinaNet and shown that our model combined with MobileNet as backend feature extractor gave the best results in terms of accuracy, speed and memory efficiency and is best suitable for real time object detection with drones.
Tasks	Object Detection, Real-Time Object Detection
Published	2019-05-30
URL	https://arxiv.org/abs/1906.00786v1
PDF	https://arxiv.org/pdf/1906.00786v1.pdf
PWC	https://paperswithcode.com/paper/190600786
Repo
Framework

Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core


Title	Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core
Authors	Wadim Kehl, Federico Tombari, Slobodan Ilic, Nassir Navab
Abstract	We present a novel method to track 3D models in color and depth data. To this end, we introduce approximations that accelerate the state-of-the-art in region-based tracking by an order of magnitude while retaining similar accuracy. Furthermore, we show how the method can be made more robust in the presence of depth data and consequently formulate a new joint contour and ICP tracking energy. We present better results than the state-of-the-art while being much faster then most other methods and achieving all of the above on a single CPU core.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10249v1
PDF	https://arxiv.org/pdf/1911.10249v1.pdf
PWC	https://paperswithcode.com/paper/real-time-3d-model-tracking-in-color-and-1
Repo
Framework

Scene Graph based Image Retrieval – A case study on the CLEVR Dataset


Title	Scene Graph based Image Retrieval – A case study on the CLEVR Dataset
Authors	Sahana Ramnath, Amrita Saha, Soumen Chakrabarti, Mitesh M. Khapra
Abstract	With the prolification of multimodal interaction in various domains, recently there has been much interest in text based image retrieval in the computer vision community. However most of the state of the art techniques model this problem in a purely neural way, which makes it difficult to incorporate pragmatic strategies in searching a large scale catalog especially when the search requirements are insufficient and the model needs to resort to an interactive retrieval process through multiple iterations of question-answering. Motivated by this, we propose a neural-symbolic approach for a one-shot retrieval of images from a large scale catalog, given the caption description. To facilitate this, we represent the catalog and caption as scene-graphs and model the retrieval task as a learnable graph matching problem, trained end-to-end with a REINFORCE algorithm. Further, we briefly describe an extension of this pipeline to an iterative retrieval framework, based on interactive questioning and answering.
Tasks	Graph Matching, Image Retrieval, Question Answering
Published	2019-11-03
URL	https://arxiv.org/abs/1911.00850v1
PDF	https://arxiv.org/pdf/1911.00850v1.pdf
PWC	https://paperswithcode.com/paper/scene-graph-based-image-retrieval-a-case
Repo
Framework

ROI Regularization for Semi-supervised and Supervised Learning


Title	ROI Regularization for Semi-supervised and Supervised Learning
Authors	Hiroshi Kaizuka, Yasuhiro Nagasaki, Ryo Sako
Abstract	We propose ROI regularization (ROIreg) as a semi-supervised learning method for image classification. ROIreg focuses on the maximum probability of a posterior probability distribution g(x) obtained when inputting an unlabeled data sample x into a convolutional neural network (CNN). ROIreg divides the pixel set of x into multiple blocks and evaluates, for each block, its contribution to the maximum probability. A masked data sample x_ROI is generated by replacing blocks with relatively small degrees of contribution with random images. Then, ROIreg trains CNN so that g(x_ROI ) does not change as much as possible from g(x). Therefore, ROIreg can be said to refine the classification ability of CNN more. On the other hand, Virtual Adverserial Training (VAT), which is an excellent semi-supervised learning method, generates data sample x_VAT by perturbing x in the direction in which g(x) changes most. Then, VAT trains CNN so that g(x_VAT ) does not change from g(x) as much as possible. Therefore, VAT can be said to be a method to improve CNN’s weakness. Thus, ROIreg and VAT have complementary training effects. In fact, the combination of VAT and ROIreg improves the results obtained when using VAT or ROIreg alone. This combination also improves the state-of-the-art on “SVHN with and without data augmentation” and “CIFAR-10 without data augmentation”. We also propose a method called ROI augmentation (ROIaug) as a method to apply ROIreg to data augmentation in supervised learning. However, the evaluation function used there is different from the standard cross-entropy. ROIaug improves the performance of supervised learning for both SVHN and CIFAR-10. Finally, we investigate the performance degradation of VAT and VAT+ROIreg when data samples not belonging to classification classes are included in unlabeled data.
Tasks	Data Augmentation, Image Classification
Published	2019-05-15
URL	https://arxiv.org/abs/1905.08615v1
PDF	https://arxiv.org/pdf/1905.08615v1.pdf
PWC	https://paperswithcode.com/paper/190508615
Repo
Framework

Scaling Matters in Deep Structured-Prediction Models


Title	Scaling Matters in Deep Structured-Prediction Models
Authors	Aleksandr Shevchenko, Anton Osokin
Abstract	Deep structured-prediction energy-based models combine the expressive power of learned representations and the ability of embedding knowledge about the task at hand into the system. A common way to learn parameters of such models consists in a multistage procedure where different combinations of components are trained at different stages. The joint end-to-end training of the whole system is then done as the last fine-tuning stage. This multistage approach is time-consuming and cumbersome as it requires multiple runs until convergence and multiple rounds of hyperparameter tuning. From this point of view, it is beneficial to start the joint training procedure from the beginning. However, such approaches often unexpectedly fail and deliver results worse than the multistage ones. In this paper, we hypothesize that one reason for joint training of deep energy-based models to fail is the incorrect relative normalization of different components in the energy function. We propose online and offline scaling algorithms that fix the joint training and demonstrate their efficacy on three different tasks.
Tasks	Structured Prediction
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11088v1
PDF	http://arxiv.org/pdf/1902.11088v1.pdf
PWC	https://paperswithcode.com/paper/scaling-matters-in-deep-structured-prediction
Repo
Framework

SecureBoost: A Lossless Federated Learning Framework


Title	SecureBoost: A Lossless Federated Learning Framework
Authors	Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Qiang Yang
Abstract	The protection of user privacy is an important concern in machine learning, as evidenced by the rolling out of the General Data Protection Regulation (GDPR) in the European Union (EU) in May 2018. The GDPR is designed to give users more control over their personal data, which motivates us to explore machine learning frameworks with data sharing without violating user privacy. To meet this goal, in this paper, we propose a novel lossless privacy-preserving tree-boosting system known as SecureBoost in the setting of federated learning. This federated-learning system allows a learning process to be jointly conducted over multiple parties with partially common user samples but different feature sets, which corresponds to a vertically partitioned virtual data set. An advantage of SecureBoost is that it provides the same level of accuracy as the non-privacy-preserving approach while at the same time, reveal no information of each private data provider. We theoretically prove that the SecureBoost framework is as accurate as other non-federated gradient tree-boosting algorithms that bring the data into one place. In addition, along with a proof of security, we discuss what would be required to make the protocols completely secure.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08755v1
PDF	http://arxiv.org/pdf/1901.08755v1.pdf
PWC	https://paperswithcode.com/paper/secureboost-a-lossless-federated-learning
Repo
Framework

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates


Title	Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates
Authors	Javier Iranzo-Sánchez, Joan Albert Silvestre-Cerdà, Javier Jorge, Nahuel Roselló, Adrià Giménez, Albert Sanchis, Jorge Civera, Alfons Juan
Abstract	Current research into spoken language translation (SLT),or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition, machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.
Tasks	Machine Translation, Speech Recognition
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03167v3
PDF	https://arxiv.org/pdf/1911.03167v3.pdf
PWC	https://paperswithcode.com/paper/europarl-st-a-multilingual-corpus-for-speech
Repo
Framework

A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus


Title	A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus
Authors	Adam Rambousek, Ales Horak, Vit Suchomel, Vit Baisa
Abstract	This paper describes a new system for semi-automatically building, extending and managing a terminological thesaurus—a multilingual terminology dictionary enriched with relationships between the terms themselves to form a thesaurus. The system allows to radically enhance the workflow of current terminology expert groups, where most of the editing decisions still come from introspection. The presented system supplements the lexicographic process with natural language processing techniques, which are seamlessly integrated to the thesaurus editing environment. The system’s methodology and the resulting thesaurus are closely connected to new domain corpora in the six languages involved. They are used for term usage examples as well as for the automatic extraction of new candidate terms. The terminological thesaurus is now accessible via a web-based application, which a) presents rich detailed information on each term, b) visualizes term relations, and c) displays real-life usage examples of the term in the domain-related documents and in the context-based similar terms. Furthermore, the specialized corpora are used to detect candidate translations of terms from the central language (Czech) to the other languages (English, French, German, Russian and Slovak) as well as to detect broader Czech terms, which help to place new terms in the actual thesaurus hierarchy. This project has been realized as a terminological thesaurus of land surveying, but the presented tools and methodology are reusable for other terminology domains.
Tasks
Published	2019-03-26
URL	http://arxiv.org/abs/1903.10921v2
PDF	http://arxiv.org/pdf/1903.10921v2.pdf
PWC	https://paperswithcode.com/paper/a-new-approach-for-semi-automatic-building
Repo
Framework

SDCNet: Smoothed Dense-Convolution Network for Restoring Low-Dose Cerebral CT Perfusion


Title	SDCNet: Smoothed Dense-Convolution Network for Restoring Low-Dose Cerebral CT Perfusion
Authors	Peng Liu, Ruogu Fang
Abstract	With substantial public concerns on potential cancer risks and health hazards caused by the accumulated radiation exposure in medical imaging, reducing radiation dose in X-ray based medical imaging such as Computed Tomography Perfusion (CTP) has raised significant research interests. In this paper, we embrace the deep Convolutional Neural Networks (CNN) based approaches and introduce Smoothed Dense-Convolution Neural Network (SDCNet) to recover high-dose quality CTP images from low-dose ones. SDCNet is composed of sub-network blocks cascaded by skip-connections to infer the noise (differentials) from paired low/high-dose CT scans. SDCNet can effectively remove the noise in real low-dose CT scans and enhance the quality of medical images. We evaluate the proposed architecture on thousands of CT perfusion frames for both reconstructed image denoising and perfusion map quantification including cerebral blood flow (CBF) and cerebral blood volume (CBV). SDCNet achieves high performance in both visual and quantitative results with promising computational efficiency, comparing favorably with state-of-the-art approaches. \textit{The code is available at \url{https://github.com/cswin/RC-Nets}}.
Tasks	Denoising, Image Denoising
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08364v1
PDF	https://arxiv.org/pdf/1910.08364v1.pdf
PWC	https://paperswithcode.com/paper/sdcnet-smoothed-dense-convolution-network-for
Repo
Framework

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition


Title	Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Authors	Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu
Abstract	We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves $\mathcal{\tilde{O}}(LX\sqrt{AT})$ regret with high probability, where $L$ is the horizon, $X$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. To the best of our knowledge, our algorithm is the first to ensure $\mathcal{\tilde{O}}(\sqrt{T})$ regret in this challenging setting; in fact it achieves the same regret bound as (Rosenberg & Mansour, 2019a) that considers an easier setting with full-information feedback. Our key technical contributions are two-fold: a tighter confidence set for the transition function, and an optimistic loss estimator that is inversely weighted by an $\textit{upper occupancy bound}$.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01192v3
PDF	https://arxiv.org/pdf/1912.01192v3.pdf
PWC	https://paperswithcode.com/paper/learning-adversarial-mdps-with-bandit
Repo
Framework

Gaussian Processes for Analyzing Positioned Trajectories in Sports


Title	Gaussian Processes for Analyzing Positioned Trajectories in Sports
Authors	Yuxin Zhao, Feng Yin, Fredrik Gunnarsson, Fredrik Hultkrantz
Abstract	Kernel-based machine learning approaches are gaining increasing interest for exploring and modeling large dataset in recent years. Gaussian process (GP) is one example of such kernel-based approaches, which can provide very good performance for nonlinear modeling problems. In this work, we first propose a grey-box modeling approach to analyze the forces in cross country skiing races. To be more precise, a disciplined set of kinetic motion model formulae is combined with data-driven Gaussian process regression model, which accounts for everything unknown in the system. Then, a modeling approach is proposed to analyze the kinetic flow of both individual and clusters of skiers. The proposed approaches can be generally applied to use cases where positioned trajectories and kinetic measurements are available. The proposed approaches are evaluated using data collected from the Falun Nordic World Ski Championships 2015, in particular the Men’s cross country $4\times10$ km relay. Forces during the cross country skiing races are analyzed and compared. Velocity models for skiers at different competition stages are also evaluated. Finally, the comparisons between the grey-box and black-box approach are carried out, where the grey-box approach can reduce the predictive uncertainty by $30%$ to $40%$.
Tasks	Gaussian Processes
Published	2019-07-05
URL	https://arxiv.org/abs/1907.03043v1
PDF	https://arxiv.org/pdf/1907.03043v1.pdf
PWC	https://paperswithcode.com/paper/gaussian-processes-for-analyzing-positioned
Repo
Framework


Title	PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement
Authors	Jesus Zarzar, Silvio Giancola, Bernard Ghanem
Abstract	In autonomous driving pipelines, perception modules provide a visual understanding of the surrounding road scene. Among the perception tasks, vehicle detection is of paramount importance for a safe driving as it identifies the position of other agents sharing the road. In our work, we propose PointRGCN: a graph-based 3D object detection pipeline based on graph convolutional networks (GCNs) which operates exclusively on 3D LiDAR point clouds. To perform more accurate 3D object detection, we leverage a graph representation that performs proposal feature and context aggregation. We integrate residual GCNs in a two-stage 3D object detection pipeline, where 3D object proposals are refined using a novel graph representation. In particular, R-GCN is a residual GCN that classifies and regresses 3D proposals, and C-GCN is a contextual GCN that further refines proposals by sharing contextual information between multiple proposals. We integrate our refinement modules into a novel 3D detection pipeline, PointRGCN, and achieve state-of-the-art performance on the easy difficulty for the bird eye view detection task.
Tasks	3D Object Detection, Autonomous Driving, Object Detection
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12236v1
PDF	https://arxiv.org/pdf/1911.12236v1.pdf
PWC	https://paperswithcode.com/paper/pointrgcn-graph-convolution-networks-for-3d
Repo
Framework

Supervised level-wise pretraining for recurrent neural network initialization in multi-class classification


Title	Supervised level-wise pretraining for recurrent neural network initialization in multi-class classification
Authors	Dino Ienco, Roberto Interdonato, Raffaele Gaetano
Abstract	Recurrent Neural Networks (RNNs) can be seriously impacted by the initial parameters assignment, which may result in poor generalization performances on new unseen data. With the objective to tackle this crucial issue, in the context of RNN based classification, we propose a new supervised layer-wise pretraining strategy to initialize network parameters. The proposed approach leverages a data-aware strategy that sets up a taxonomy of classification problems automatically derived by the model behavior. To the best of our knowledge, despite the great interest in RNN-based classification, this is the first data-aware strategy dealing with the initialization of such models. The proposed strategy has been tested on four benchmarks coming from two different domains, i.e., Speech Recognition and Remote Sensing. Results underline the significance of our approach and point out that data-aware strategies positively support the initialization of Recurrent Neural Network based classification models.
Tasks	Speech Recognition
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01071v1
PDF	https://arxiv.org/pdf/1911.01071v1.pdf
PWC	https://paperswithcode.com/paper/supervised-level-wise-pretraining-for
Repo
Framework