Paper Group ANR 762
Learning and T-Norms Theory. Mobile Recognition of Wikipedia Featured Sites using Deep Learning and Crowd-sourced Imagery. Efficient Object Detection Model for Real-Time UAV Applications. Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core. Scene Graph based Image Retrieval – A case study on the CLEVR Dataset. ROI Regularization fo …
Learning and T-Norms Theory
Title | Learning and T-Norms Theory |
Authors | Francesco Giannini, Giuseppe Marra, Michelangelo Diligenti, Marco Maggini, Marco Gori |
Abstract | Neuro-symbolic approaches have recently gained popularity to inject prior knowledge into a learner without requiring it to induce this knowledge from data. These approaches can potentially learn competitive solutions with a significant reduction of the amount of supervised data. A large class of neuro-symbolic approaches is based on First-Order Logic to represent prior knowledge, relaxed to a differentiable form using fuzzy logic. This paper shows that the loss function expressing these neuro-symbolic learning tasks can be unambiguously determined given the selection of a t-norm generator. When restricted to supervised learning, the presented theoretical apparatus provides a clean justification to the popular cross-entropy loss, which has been shown to provide faster convergence and to reduce the vanishing gradient problem in very deep structures. However, the proposed learning formulation extends the advantages of the cross-entropy loss to the general knowledge that can be represented by a neuro-symbolic method. Therefore, the methodology allows the development of a novel class of loss functions, which are shown in the experimental results to lead to faster convergence rates than the approaches previously proposed in the literature. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11468v2 |
https://arxiv.org/pdf/1907.11468v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-and-t-norms-theory |
Repo | |
Framework | |
Mobile Recognition of Wikipedia Featured Sites using Deep Learning and Crowd-sourced Imagery
Title | Mobile Recognition of Wikipedia Featured Sites using Deep Learning and Crowd-sourced Imagery |
Authors | Jimin Tan, Anastasios Noulas, Diego Sáez, Rossano Schifanella |
Abstract | Rendering Wikipedia content through mobile and augmented reality mediums can enable new forms of interaction in urban-focused user communities facilitating learning, communication and knowledge exchange. With this objective in mind, in this work we develop a mobile application that allows for the recognition of notable sites featured on Wikipedia. The application is powered by a deep neural network that has been trained on crowd-sourced imagery describing sites of interest, such as buildings, statues, museums or other physical entities that are present and visually accessible in an urban environment. We describe an end-to-end pipeline that describes data collection, model training and evaluation of our application considering online and real world scenarios. We identify a number of challenges in the site recognition task which arise due to visual similarities amongst the classified sites as well as due to noise introduce by the surrounding built environment. We demonstrate how using mobile contextual information, such as user location, orientation and attention patterns can significantly alleviate such challenges. Moreover, we present an unsupervised learning technique to de-noise crowd-sourced imagery which improves classification performance further. |
Tasks | Denoising, Image Denoising |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09705v2 |
https://arxiv.org/pdf/1910.09705v2.pdf | |
PWC | https://paperswithcode.com/paper/notable-site-recognition-on-mobile-devices |
Repo | |
Framework | |
Efficient Object Detection Model for Real-Time UAV Applications
Title | Efficient Object Detection Model for Real-Time UAV Applications |
Authors | Subrahmanyam Vaddi, Chandan Kumar, Ali Jannesari |
Abstract | Unmanned Aerial Vehicles (UAVs) especially drones, equipped with vision techniques have become very popular in recent years, with their extensive use in wide range of applications. Many of these applications require use of computer vision techniques, particularly object detection from the information captured by on-board camera. In this paper, we propose an end to end object detection model running on a UAV platform which is suitable for real-time applications. We propose a deep feature pyramid architecture which makes use of inherent properties of features extracted from Convolutional Networks by capturing more generic features in the images (such as edge, color etc.) along with the minute detailed features specific to the classes contained in our problem. We use VisDrone-18 dataset for our studies which contain different objects such as pedestrians, vehicles, bicycles etc. We provide software and hardware architecture of our platform used in this study. We implemented our model with both ResNet and MobileNet as convolutional bases. Our model combined with modified focal loss function, produced a desirable performance of 30.6 mAP for object detection with an inference time of 14 fps. We compared our results with RetinaNet-ResNet-50 and HAL-RetinaNet and shown that our model combined with MobileNet as backend feature extractor gave the best results in terms of accuracy, speed and memory efficiency and is best suitable for real time object detection with drones. |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1906.00786v1 |
https://arxiv.org/pdf/1906.00786v1.pdf | |
PWC | https://paperswithcode.com/paper/190600786 |
Repo | |
Framework | |
Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core
Title | Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core |
Authors | Wadim Kehl, Federico Tombari, Slobodan Ilic, Nassir Navab |
Abstract | We present a novel method to track 3D models in color and depth data. To this end, we introduce approximations that accelerate the state-of-the-art in region-based tracking by an order of magnitude while retaining similar accuracy. Furthermore, we show how the method can be made more robust in the presence of depth data and consequently formulate a new joint contour and ICP tracking energy. We present better results than the state-of-the-art while being much faster then most other methods and achieving all of the above on a single CPU core. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10249v1 |
https://arxiv.org/pdf/1911.10249v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-3d-model-tracking-in-color-and-1 |
Repo | |
Framework | |
Scene Graph based Image Retrieval – A case study on the CLEVR Dataset
Title | Scene Graph based Image Retrieval – A case study on the CLEVR Dataset |
Authors | Sahana Ramnath, Amrita Saha, Soumen Chakrabarti, Mitesh M. Khapra |
Abstract | With the prolification of multimodal interaction in various domains, recently there has been much interest in text based image retrieval in the computer vision community. However most of the state of the art techniques model this problem in a purely neural way, which makes it difficult to incorporate pragmatic strategies in searching a large scale catalog especially when the search requirements are insufficient and the model needs to resort to an interactive retrieval process through multiple iterations of question-answering. Motivated by this, we propose a neural-symbolic approach for a one-shot retrieval of images from a large scale catalog, given the caption description. To facilitate this, we represent the catalog and caption as scene-graphs and model the retrieval task as a learnable graph matching problem, trained end-to-end with a REINFORCE algorithm. Further, we briefly describe an extension of this pipeline to an iterative retrieval framework, based on interactive questioning and answering. |
Tasks | Graph Matching, Image Retrieval, Question Answering |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00850v1 |
https://arxiv.org/pdf/1911.00850v1.pdf | |
PWC | https://paperswithcode.com/paper/scene-graph-based-image-retrieval-a-case |
Repo | |
Framework | |
ROI Regularization for Semi-supervised and Supervised Learning
Title | ROI Regularization for Semi-supervised and Supervised Learning |
Authors | Hiroshi Kaizuka, Yasuhiro Nagasaki, Ryo Sako |
Abstract | We propose ROI regularization (ROIreg) as a semi-supervised learning method for image classification. ROIreg focuses on the maximum probability of a posterior probability distribution g(x) obtained when inputting an unlabeled data sample x into a convolutional neural network (CNN). ROIreg divides the pixel set of x into multiple blocks and evaluates, for each block, its contribution to the maximum probability. A masked data sample x_ROI is generated by replacing blocks with relatively small degrees of contribution with random images. Then, ROIreg trains CNN so that g(x_ROI ) does not change as much as possible from g(x). Therefore, ROIreg can be said to refine the classification ability of CNN more. On the other hand, Virtual Adverserial Training (VAT), which is an excellent semi-supervised learning method, generates data sample x_VAT by perturbing x in the direction in which g(x) changes most. Then, VAT trains CNN so that g(x_VAT ) does not change from g(x) as much as possible. Therefore, VAT can be said to be a method to improve CNN’s weakness. Thus, ROIreg and VAT have complementary training effects. In fact, the combination of VAT and ROIreg improves the results obtained when using VAT or ROIreg alone. This combination also improves the state-of-the-art on “SVHN with and without data augmentation” and “CIFAR-10 without data augmentation”. We also propose a method called ROI augmentation (ROIaug) as a method to apply ROIreg to data augmentation in supervised learning. However, the evaluation function used there is different from the standard cross-entropy. ROIaug improves the performance of supervised learning for both SVHN and CIFAR-10. Finally, we investigate the performance degradation of VAT and VAT+ROIreg when data samples not belonging to classification classes are included in unlabeled data. |
Tasks | Data Augmentation, Image Classification |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.08615v1 |
https://arxiv.org/pdf/1905.08615v1.pdf | |
PWC | https://paperswithcode.com/paper/190508615 |
Repo | |
Framework | |
Scaling Matters in Deep Structured-Prediction Models
Title | Scaling Matters in Deep Structured-Prediction Models |
Authors | Aleksandr Shevchenko, Anton Osokin |
Abstract | Deep structured-prediction energy-based models combine the expressive power of learned representations and the ability of embedding knowledge about the task at hand into the system. A common way to learn parameters of such models consists in a multistage procedure where different combinations of components are trained at different stages. The joint end-to-end training of the whole system is then done as the last fine-tuning stage. This multistage approach is time-consuming and cumbersome as it requires multiple runs until convergence and multiple rounds of hyperparameter tuning. From this point of view, it is beneficial to start the joint training procedure from the beginning. However, such approaches often unexpectedly fail and deliver results worse than the multistage ones. In this paper, we hypothesize that one reason for joint training of deep energy-based models to fail is the incorrect relative normalization of different components in the energy function. We propose online and offline scaling algorithms that fix the joint training and demonstrate their efficacy on three different tasks. |
Tasks | Structured Prediction |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11088v1 |
http://arxiv.org/pdf/1902.11088v1.pdf | |
PWC | https://paperswithcode.com/paper/scaling-matters-in-deep-structured-prediction |
Repo | |
Framework | |
SecureBoost: A Lossless Federated Learning Framework
Title | SecureBoost: A Lossless Federated Learning Framework |
Authors | Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Qiang Yang |
Abstract | The protection of user privacy is an important concern in machine learning, as evidenced by the rolling out of the General Data Protection Regulation (GDPR) in the European Union (EU) in May 2018. The GDPR is designed to give users more control over their personal data, which motivates us to explore machine learning frameworks with data sharing without violating user privacy. To meet this goal, in this paper, we propose a novel lossless privacy-preserving tree-boosting system known as SecureBoost in the setting of federated learning. This federated-learning system allows a learning process to be jointly conducted over multiple parties with partially common user samples but different feature sets, which corresponds to a vertically partitioned virtual data set. An advantage of SecureBoost is that it provides the same level of accuracy as the non-privacy-preserving approach while at the same time, reveal no information of each private data provider. We theoretically prove that the SecureBoost framework is as accurate as other non-federated gradient tree-boosting algorithms that bring the data into one place. In addition, along with a proof of security, we discuss what would be required to make the protocols completely secure. |
Tasks | |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08755v1 |
http://arxiv.org/pdf/1901.08755v1.pdf | |
PWC | https://paperswithcode.com/paper/secureboost-a-lossless-federated-learning |
Repo | |
Framework | |
Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates
Title | Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates |
Authors | Javier Iranzo-Sánchez, Joan Albert Silvestre-Cerdà, Javier Jorge, Nahuel Roselló, Adrià Giménez, Albert Sanchis, Jorge Civera, Alfons Juan |
Abstract | Current research into spoken language translation (SLT),or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition, machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable. |
Tasks | Machine Translation, Speech Recognition |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03167v3 |
https://arxiv.org/pdf/1911.03167v3.pdf | |
PWC | https://paperswithcode.com/paper/europarl-st-a-multilingual-corpus-for-speech |
Repo | |
Framework | |
A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus
Title | A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus |
Authors | Adam Rambousek, Ales Horak, Vit Suchomel, Vit Baisa |
Abstract | This paper describes a new system for semi-automatically building, extending and managing a terminological thesaurus—a multilingual terminology dictionary enriched with relationships between the terms themselves to form a thesaurus. The system allows to radically enhance the workflow of current terminology expert groups, where most of the editing decisions still come from introspection. The presented system supplements the lexicographic process with natural language processing techniques, which are seamlessly integrated to the thesaurus editing environment. The system’s methodology and the resulting thesaurus are closely connected to new domain corpora in the six languages involved. They are used for term usage examples as well as for the automatic extraction of new candidate terms. The terminological thesaurus is now accessible via a web-based application, which a) presents rich detailed information on each term, b) visualizes term relations, and c) displays real-life usage examples of the term in the domain-related documents and in the context-based similar terms. Furthermore, the specialized corpora are used to detect candidate translations of terms from the central language (Czech) to the other languages (English, French, German, Russian and Slovak) as well as to detect broader Czech terms, which help to place new terms in the actual thesaurus hierarchy. This project has been realized as a terminological thesaurus of land surveying, but the presented tools and methodology are reusable for other terminology domains. |
Tasks | |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.10921v2 |
http://arxiv.org/pdf/1903.10921v2.pdf | |
PWC | https://paperswithcode.com/paper/a-new-approach-for-semi-automatic-building |
Repo | |
Framework | |
SDCNet: Smoothed Dense-Convolution Network for Restoring Low-Dose Cerebral CT Perfusion
Title | SDCNet: Smoothed Dense-Convolution Network for Restoring Low-Dose Cerebral CT Perfusion |
Authors | Peng Liu, Ruogu Fang |
Abstract | With substantial public concerns on potential cancer risks and health hazards caused by the accumulated radiation exposure in medical imaging, reducing radiation dose in X-ray based medical imaging such as Computed Tomography Perfusion (CTP) has raised significant research interests. In this paper, we embrace the deep Convolutional Neural Networks (CNN) based approaches and introduce Smoothed Dense-Convolution Neural Network (SDCNet) to recover high-dose quality CTP images from low-dose ones. SDCNet is composed of sub-network blocks cascaded by skip-connections to infer the noise (differentials) from paired low/high-dose CT scans. SDCNet can effectively remove the noise in real low-dose CT scans and enhance the quality of medical images. We evaluate the proposed architecture on thousands of CT perfusion frames for both reconstructed image denoising and perfusion map quantification including cerebral blood flow (CBF) and cerebral blood volume (CBV). SDCNet achieves high performance in both visual and quantitative results with promising computational efficiency, comparing favorably with state-of-the-art approaches. \textit{The code is available at \url{https://github.com/cswin/RC-Nets}}. |
Tasks | Denoising, Image Denoising |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08364v1 |
https://arxiv.org/pdf/1910.08364v1.pdf | |
PWC | https://paperswithcode.com/paper/sdcnet-smoothed-dense-convolution-network-for |
Repo | |
Framework | |
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Title | Learning Adversarial MDPs with Bandit Feedback and Unknown Transition |
Authors | Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu |
Abstract | We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves $\mathcal{\tilde{O}}(LX\sqrt{AT})$ regret with high probability, where $L$ is the horizon, $X$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. To the best of our knowledge, our algorithm is the first to ensure $\mathcal{\tilde{O}}(\sqrt{T})$ regret in this challenging setting; in fact it achieves the same regret bound as (Rosenberg & Mansour, 2019a) that considers an easier setting with full-information feedback. Our key technical contributions are two-fold: a tighter confidence set for the transition function, and an optimistic loss estimator that is inversely weighted by an $\textit{upper occupancy bound}$. |
Tasks | |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01192v3 |
https://arxiv.org/pdf/1912.01192v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-adversarial-mdps-with-bandit |
Repo | |
Framework | |
Gaussian Processes for Analyzing Positioned Trajectories in Sports
Title | Gaussian Processes for Analyzing Positioned Trajectories in Sports |
Authors | Yuxin Zhao, Feng Yin, Fredrik Gunnarsson, Fredrik Hultkrantz |
Abstract | Kernel-based machine learning approaches are gaining increasing interest for exploring and modeling large dataset in recent years. Gaussian process (GP) is one example of such kernel-based approaches, which can provide very good performance for nonlinear modeling problems. In this work, we first propose a grey-box modeling approach to analyze the forces in cross country skiing races. To be more precise, a disciplined set of kinetic motion model formulae is combined with data-driven Gaussian process regression model, which accounts for everything unknown in the system. Then, a modeling approach is proposed to analyze the kinetic flow of both individual and clusters of skiers. The proposed approaches can be generally applied to use cases where positioned trajectories and kinetic measurements are available. The proposed approaches are evaluated using data collected from the Falun Nordic World Ski Championships 2015, in particular the Men’s cross country $4\times10$ km relay. Forces during the cross country skiing races are analyzed and compared. Velocity models for skiers at different competition stages are also evaluated. Finally, the comparisons between the grey-box and black-box approach are carried out, where the grey-box approach can reduce the predictive uncertainty by $30%$ to $40%$. |
Tasks | Gaussian Processes |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.03043v1 |
https://arxiv.org/pdf/1907.03043v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-processes-for-analyzing-positioned |
Repo | |
Framework | |
PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement
Title | PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement |
Authors | Jesus Zarzar, Silvio Giancola, Bernard Ghanem |
Abstract | In autonomous driving pipelines, perception modules provide a visual understanding of the surrounding road scene. Among the perception tasks, vehicle detection is of paramount importance for a safe driving as it identifies the position of other agents sharing the road. In our work, we propose PointRGCN: a graph-based 3D object detection pipeline based on graph convolutional networks (GCNs) which operates exclusively on 3D LiDAR point clouds. To perform more accurate 3D object detection, we leverage a graph representation that performs proposal feature and context aggregation. We integrate residual GCNs in a two-stage 3D object detection pipeline, where 3D object proposals are refined using a novel graph representation. In particular, R-GCN is a residual GCN that classifies and regresses 3D proposals, and C-GCN is a contextual GCN that further refines proposals by sharing contextual information between multiple proposals. We integrate our refinement modules into a novel 3D detection pipeline, PointRGCN, and achieve state-of-the-art performance on the easy difficulty for the bird eye view detection task. |
Tasks | 3D Object Detection, Autonomous Driving, Object Detection |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12236v1 |
https://arxiv.org/pdf/1911.12236v1.pdf | |
PWC | https://paperswithcode.com/paper/pointrgcn-graph-convolution-networks-for-3d |
Repo | |
Framework | |
Supervised level-wise pretraining for recurrent neural network initialization in multi-class classification
Title | Supervised level-wise pretraining for recurrent neural network initialization in multi-class classification |
Authors | Dino Ienco, Roberto Interdonato, Raffaele Gaetano |
Abstract | Recurrent Neural Networks (RNNs) can be seriously impacted by the initial parameters assignment, which may result in poor generalization performances on new unseen data. With the objective to tackle this crucial issue, in the context of RNN based classification, we propose a new supervised layer-wise pretraining strategy to initialize network parameters. The proposed approach leverages a data-aware strategy that sets up a taxonomy of classification problems automatically derived by the model behavior. To the best of our knowledge, despite the great interest in RNN-based classification, this is the first data-aware strategy dealing with the initialization of such models. The proposed strategy has been tested on four benchmarks coming from two different domains, i.e., Speech Recognition and Remote Sensing. Results underline the significance of our approach and point out that data-aware strategies positively support the initialization of Recurrent Neural Network based classification models. |
Tasks | Speech Recognition |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01071v1 |
https://arxiv.org/pdf/1911.01071v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-level-wise-pretraining-for |
Repo | |
Framework | |