Paper Group ANR 1647
Multi-scale Cross-form Pyramid Network for Stereo Matching. Just Ask:An Interactive Learning Framework for Vision and Language Navigation. A Two-Stage Stochastic Programming Model for Car-Sharing Problem using Kernel Density Estimation. Unsupervised Adversarial Graph Alignment with Graph Embedding. Demystifying the MLPerf Benchmark Suite. Amobee at …
Multi-scale Cross-form Pyramid Network for Stereo Matching
Title | Multi-scale Cross-form Pyramid Network for Stereo Matching |
Authors | Zhidong Zhu, Mingyi He, Yuchao Dai, Zhibo Rao, Bo Li |
Abstract | Stereo matching plays an indispensable part in autonomous driving, robotics and 3D scene reconstruction. We propose a novel deep learning architecture, which called CFP-Net, a Cross-Form Pyramid stereo matching network for regressing disparity from a rectified pair of stereo images. The network consists of three modules: Multi-Scale 2D local feature extraction module, Cross-form spatial pyramid module and Multi-Scale 3D Feature Matching and Fusion module. The Multi-Scale 2D local feature extraction module can extract enough multi-scale features. The Cross-form spatial pyramid module aggregates the context information in different scales and locations to form a cost volume. Moreover, it is proved to be more effective than SPP and ASPP in ill-posed regions. The Multi-Scale 3D feature matching and fusion module is proved to regularize the cost volume using two parallel 3D deconvolution structure with two different receptive fields. Our proposed method has been evaluated on the Scene Flow and KITTI datasets. It achieves state-of-the-art performance on the KITTI 2012 and 2015 benchmarks. |
Tasks | 3D Feature Matching, 3D Scene Reconstruction, Autonomous Driving, Stereo Matching, Stereo Matching Hand |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11309v3 |
https://arxiv.org/pdf/1904.11309v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-scale-cross-form-pyramid-network-for |
Repo | |
Framework | |
Just Ask:An Interactive Learning Framework for Vision and Language Navigation
Title | Just Ask:An Interactive Learning Framework for Vision and Language Navigation |
Authors | Ta-Chung Chi, Mihail Eric, Seokhwan Kim, Minmin Shen, Dilek Hakkani-tur |
Abstract | In the vision and language navigation task, the agent may encounter ambiguous situations that are hard to interpret by just relying on visual information and natural language instructions. We propose an interactive learning framework to endow the agent with the ability to ask for users’ help in such situations. As part of this framework, we investigate multiple learning approaches for the agent with different levels of complexity. The simplest model-confusion-based method lets the agent ask questions based on its confusion, relying on the predefined confidence threshold of a next action prediction model. To build on this confusion-based method, the agent is expected to demonstrate more sophisticated reasoning such that it discovers the timing and locations to interact with a human. We achieve this goal using reinforcement learning (RL) with a proposed reward shaping term, which enables the agent to ask questions only when necessary. The success rate can be boosted by at least 15% with only one question asked on average during the navigation. Furthermore, we show that the RL agent is capable of adjusting dynamically to noisy human responses. Finally, we design a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human. We demonstrate the proposed strategy is substantially more realistic and data-efficient compared to previously proposed pre-exploration techniques. |
Tasks | Continual Learning, Data Augmentation |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00915v1 |
https://arxiv.org/pdf/1912.00915v1.pdf | |
PWC | https://paperswithcode.com/paper/just-askan-interactive-learning-framework-for |
Repo | |
Framework | |
A Two-Stage Stochastic Programming Model for Car-Sharing Problem using Kernel Density Estimation
Title | A Two-Stage Stochastic Programming Model for Car-Sharing Problem using Kernel Density Estimation |
Authors | Xiaoming Li, Chun Wang, Xiao Huang |
Abstract | Car-sharing problem is a popular research field in sharing economy. In this paper, we investigate the car-sharing re-balancing problem under uncertain demands. An innovative framework that integrates a non-parametric approach - kernel density estimation (KDE) and a two-stage stochastic programming (SP) model are proposed. Specifically, the probability distributions are derived from New York taxi trip data sets by KDE, which is used as the input uncertain parameters for SP. Additionally, the car-sharing problem is formulated as a two-stage SP model which aims to maximize the overall profit. Meanwhile, a Monte Carlo method called sample average approximation (SAA) and Benders decomposition algorithm is introduced to solve the large-scale optimization model. Finally, the experimental validations show that the proposed framework outperforms the existing works in terms of outcomes. |
Tasks | Density Estimation |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09293v1 |
https://arxiv.org/pdf/1909.09293v1.pdf | |
PWC | https://paperswithcode.com/paper/a-two-stage-stochastic-programming-model-for |
Repo | |
Framework | |
Unsupervised Adversarial Graph Alignment with Graph Embedding
Title | Unsupervised Adversarial Graph Alignment with Graph Embedding |
Authors | Chaoqi Chen, Weiping Xie, Tingyang Xu, Yu Rong, Wenbing Huang, Xinghao Ding, Yue Huang, Junzhou Huang |
Abstract | Graph alignment, also known as network alignment, is a fundamental task in social network analysis. Many recent works have relied on partially labeled cross-graph node correspondences, i.e., anchor links. However, due to the privacy and security issue, the manual labeling of anchor links for diverse scenarios may be prohibitive. Aligning two graphs without any anchor links is a crucial and challenging task. In this paper, we propose an Unsupervised Adversarial Graph Alignment (UAGA) framework to learn a cross-graph alignment between two embedding spaces of different graphs in a fully unsupervised fashion (\emph{i.e.,} no existing anchor links and no users’ personal profile or attribute information is available). The proposed framework learns the embedding spaces of each graph, and then attempts to align the two spaces via adversarial training, followed by a refinement procedure. We further extend our UAGA method to incremental UAGA (iUAGA) that iteratively reveals the unobserved user links based on the pseudo anchor links. This can be used to further improve both the embedding quality and the alignment accuracy. Moreover, the proposed methods will benefit some real-world applications, \emph{e.g.,} link prediction in social networks. Comprehensive experiments on real-world data demonstrate the effectiveness of our proposed approaches UAGA and iUAGA for unsupervised graph alignment. |
Tasks | Graph Embedding, Link Prediction |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00544v1 |
https://arxiv.org/pdf/1907.00544v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-adversarial-graph-alignment-with |
Repo | |
Framework | |
Demystifying the MLPerf Benchmark Suite
Title | Demystifying the MLPerf Benchmark Suite |
Authors | Snehil Verma, Qinzhe Wu, Bagus Hanindhito, Gunjan Jha, Eugene B. John, Ramesh Radhakrishnan, Lizy K. John |
Abstract | MLPerf, an emerging machine learning benchmark suite strives to cover a broad range of applications of machine learning. We present a study on its characteristics and how the MLPerf benchmarks differ from some of the previous deep learning benchmarks like DAWNBench and DeepBench. We find that application benchmarks such as MLPerf (although rich in kernels) exhibit different features compared to kernel benchmarks such as DeepBench. MLPerf benchmark suite contains a diverse set of models which allows unveiling various bottlenecks in the system. Based on our findings, dedicated low latency interconnect between GPUs in multi-GPU systems is required for optimal distributed deep learning training. We also observe variation in scaling efficiency across the MLPerf models. The variation exhibited by the different models highlight the importance of smart scheduling strategies for multi-GPU training. Another observation is that CPU utilization increases with increase in number of GPUs used for training. Corroborating prior work we also observe and quantify improvements possible by compiler optimizations, mixed-precision training and use of Tensor Cores. |
Tasks | |
Published | 2019-08-24 |
URL | https://arxiv.org/abs/1908.09207v1 |
https://arxiv.org/pdf/1908.09207v1.pdf | |
PWC | https://paperswithcode.com/paper/demystifying-the-mlperf-benchmark-suite |
Repo | |
Framework | |
Amobee at SemEval-2019 Tasks 5 and 6: Multiple Choice CNN Over Contextual Embedding
Title | Amobee at SemEval-2019 Tasks 5 and 6: Multiple Choice CNN Over Contextual Embedding |
Authors | Alon Rozental, Dadi Biton |
Abstract | This article describes Amobee’s participation in “HatEval: Multilingual detection of hate speech against immigrants and women in Twitter” (task 5) and “OffensEval: Identifying and Categorizing Offensive Language in Social Media” (task 6). The goal of task 5 was to detect hate speech targeted to women and immigrants. The goal of task 6 was to identify and categorized offensive language in social media, and identify offense target. We present a novel type of convolutional neural network called “Multiple Choice CNN” (MC-CNN) that we used over our newly developed contextual embedding, Rozental et al. (2019). For both tasks we used this architecture and achieved 4th place out of 69 participants with an F1 score of 0.53 in task 5, in task 6 achieved 2nd place (out of 75) in Sub-task B - automatic categorization of offense types (our model reached places 18/2/7 out of 103/75/65 for sub-tasks A, B and C respectively in task 6). |
Tasks | |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08292v1 |
http://arxiv.org/pdf/1904.08292v1.pdf | |
PWC | https://paperswithcode.com/paper/amobee-at-semeval-2019-tasks-5-and-6-multiple |
Repo | |
Framework | |
RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving
Title | RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving |
Authors | Mohamed Ramzy, Hazem Rashed, Ahmad El Sallab, Senthil Yogamani |
Abstract | Moving Object Detection (MOD) is a critical task for autonomous vehicles as moving objects represent higher collision risk than static ones. The trajectory of the ego-vehicle is planned based on the future states of detected moving objects. It is quite challenging as the ego-motion has to be modelled and compensated to be able to understand the motion of the surrounding objects. In this work, we propose a real-time end-to-end CNN architecture for MOD utilizing spatio-temporal context to improve robustness. We construct a novel time-aware architecture exploiting temporal motion information embedded within sequential images in addition to explicit motion maps using optical flow images.We demonstrate the impact of our algorithm on KITTI dataset where we obtain an improvement of 8% relative to the baselines. We compare our algorithm with state-of-the-art methods and achieve competitive results on KITTI-Motion dataset in terms of accuracy at three times better run-time. The proposed algorithm runs at 23 fps on a standard desktop GPU targeting deployment on embedded platforms. |
Tasks | Autonomous Driving, Autonomous Vehicles, Object Detection, Optical Flow Estimation |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00438v1 |
https://arxiv.org/pdf/1912.00438v1.pdf | |
PWC | https://paperswithcode.com/paper/rst-modnet-real-time-spatio-temporal-moving |
Repo | |
Framework | |
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning
Title | A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning |
Authors | Filippos Gouidis, Alexandros Vassiliades, Theodore Patkos, Antonis Argyros, Nick Bassiliades, Dimitris Plexousakis |
Abstract | Object perception is a fundamental sub-field of Computer Vision, covering a multitude of individual areas and having contributed high-impact results. While Machine Learning has been traditionally applied to address related problems, recent works also seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects, their properties and their relations with their environment. In this paper, we attempt a systematic investigation of how knowledge-based methods contribute to diverse object perception tasks. We review the latest achievements and identify prominent research directions. |
Tasks | |
Published | 2019-12-26 |
URL | https://arxiv.org/abs/1912.11861v2 |
https://arxiv.org/pdf/1912.11861v2.pdf | |
PWC | https://paperswithcode.com/paper/a-review-on-intelligent-object-perception |
Repo | |
Framework | |
Quantum-theoretic Modeling in Computer Science A complex Hilbert space model for entangled concepts in corpuses of documents
Title | Quantum-theoretic Modeling in Computer Science A complex Hilbert space model for entangled concepts in corpuses of documents |
Authors | Diederik Aerts, Lester Beltran, Suzette Geriente, Sandro Sozzo |
Abstract | We work out a quantum-theoretic model in complex Hilbert space of a recently performed test on co-occurrencies of two concepts and their combination in retrieval processes on specific corpuses of documents. The test violated the Clauser-Horne-Shimony-Holt version of the Bell inequalities (‘CHSH inequality’), thus indicating the presence of entanglement between the combined concepts. We make use of a recently elaborated ‘entanglement scheme’ and represent the collected data in the tensor product of Hilbert spaces of the individual concepts, showing that the identified violation is due to the occurrence of a strong form of entanglement, involving both states and measurements and reflecting the meaning connection between the component concepts. These results provide a significant confirmation of the presence of quantum structures in corpuses of documents, like it is the case for the entanglement identified in human cognition. |
Tasks | |
Published | 2019-01-05 |
URL | http://arxiv.org/abs/1901.04299v1 |
http://arxiv.org/pdf/1901.04299v1.pdf | |
PWC | https://paperswithcode.com/paper/quantum-theoretic-modeling-in-computer |
Repo | |
Framework | |
Transfer Representation Learning with TSK Fuzzy System
Title | Transfer Representation Learning with TSK Fuzzy System |
Authors | Peng Xu, Zhaohong Deng, Jun Wang, Qun Zhang, Shitong Wang |
Abstract | Transfer learning can address the learning tasks of unlabeled data in the target domain by leveraging plenty of labeled data from a different but related source domain. A core issue in transfer learning is to learn a shared feature space in where the distributions of the data from two domains are matched. This learning process can be named as transfer representation learning (TRL). The feature transformation methods are crucial to ensure the success of TRL. The most commonly used feature transformation method in TRL is kernel-based nonlinear mapping to the high-dimensional space followed by linear dimensionality reduction. But the kernel functions are lack of interpretability and are difficult to be selected. To this end, the TSK fuzzy system (TSK-FS) is combined with transfer learning and a more intuitive and interpretable modeling method, called transfer representation learning with TSK-FS (TRL-TSK-FS) is proposed in this paper. Specifically, TRL-TSK-FS realizes TRL from two aspects. On one hand, the data in the source and target domains are transformed into the fuzzy feature space in which the distribution distance of the data between two domains is min-imized. On the other hand, discriminant information and geo-metric properties of the data are preserved by linear discriminant analysis and principal component analysis. In addition, another advantage arises with the proposed method, that is, the nonlinear transformation is realized by constructing fuzzy mapping with the antecedent part of the TSK-FS instead of kernel functions which are difficult to be selected. Extensive experiments are conducted on the text and image datasets. The results obviously show the superiority of the proposed method. |
Tasks | Dimensionality Reduction, Representation Learning, Transfer Learning |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.02703v1 |
http://arxiv.org/pdf/1901.02703v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-representation-learning-with-tsk |
Repo | |
Framework | |
Hallucinated Adversarial Learning for Robust Visual Tracking
Title | Hallucinated Adversarial Learning for Robust Visual Tracking |
Authors | Qiangqiang Wu, Zhihui Chen, Lin Cheng, Yan Yan, Bo Li, Hanzi Wang |
Abstract | Humans can easily learn new concepts from just a single exemplar, mainly due to their remarkable ability to imagine or hallucinate what the unseen exemplar may look like in different settings. Incorporating such an ability to hallucinate diverse new samples of the tracked instance can help the trackers alleviate the over-fitting problem in the low-data tracking regime. To achieve this, we propose an effective adversarial approach, denoted as adversarial “hallucinator” (AH), for robust visual tracking. The proposed AH is designed to firstly learn transferable non-linear deformations between a pair of same-identity instances, and then apply these deformations to an unseen tracked instance in order to generate diverse positive training samples. By incorporating AH into an online tracking-by-detection framework, we propose the hallucinated adversarial tracker (HAT), which jointly optimizes AH with an online classifier (e.g., MDNet) in an end-to-end manner. In addition, a novel selective deformation transfer (SDT) method is presented to better select the deformations which are more suitable for transfer. Extensive experiments on 3 popular benchmarks demonstrate that our HAT achieves the state-of-the-art performance. |
Tasks | Visual Tracking |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07008v1 |
https://arxiv.org/pdf/1906.07008v1.pdf | |
PWC | https://paperswithcode.com/paper/hallucinated-adversarial-learning-for-robust |
Repo | |
Framework | |
A Compression Objective and a Cycle Loss for Neural Image Compression
Title | A Compression Objective and a Cycle Loss for Neural Image Compression |
Authors | Caglar Aytekin, Francesco Cricri, Antti Hallapuro, Jani Lainema, Emre Aksu, Miska Hannuksela |
Abstract | In this manuscript we propose two objective terms for neural image compression: a compression objective and a cycle loss. These terms are applied on the encoder output of an autoencoder and are used in combination with reconstruction losses. The compression objective encourages sparsity and low entropy in the activations. The cycle loss term represents the distortion between encoder outputs computed from the original image and from the reconstructed image (code-domain distortion). We train different autoencoders by using the compression objective in combination with different losses: a) MSE, b) MSE and MSSSIM, c) MSE, MS-SSIM and cycle loss. We observe that images encoded by these differently-trained autoencoders fall into different points of the perception-distortion curve (while having similar bit-rates). In particular, MSE-only training favors low image-domain distortion, whereas cycle loss training favors high perceptual quality. |
Tasks | Image Compression |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10371v1 |
https://arxiv.org/pdf/1905.10371v1.pdf | |
PWC | https://paperswithcode.com/paper/a-compression-objective-and-a-cycle-loss-for |
Repo | |
Framework | |
Relaxed 2-D Principal Component Analysis by $L_p$ Norm for Face Recognition
Title | Relaxed 2-D Principal Component Analysis by $L_p$ Norm for Face Recognition |
Authors | Xiao Chen, Zhi-Gang Jia, Yunfeng Cai, Mei-Xiang Zhao |
Abstract | A relaxed two dimensional principal component analysis (R2DPCA) approach is proposed for face recognition. Different to the 2DPCA, 2DPCA-$L_1$ and G2DPCA, the R2DPCA utilizes the label information (if known) of training samples to calculate a relaxation vector and presents a weight to each subset of training data. A new relaxed scatter matrix is defined and the computed projection axes are able to increase the accuracy of face recognition. The optimal $L_p$-norms are selected in a reasonable range. Numerical experiments on practical face databased indicate that the R2DPCA has high generalization ability and can achieve a higher recognition rate than state-of-the-art methods. |
Tasks | Face Recognition |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06458v1 |
https://arxiv.org/pdf/1905.06458v1.pdf | |
PWC | https://paperswithcode.com/paper/relaxed-2-d-principal-component-analysis-by |
Repo | |
Framework | |
Dynamic Deep Multi-modal Fusion for Image Privacy Prediction
Title | Dynamic Deep Multi-modal Fusion for Image Privacy Prediction |
Authors | Ashwini Tonge, Cornelia Caragea |
Abstract | With millions of images that are shared online on social networking sites, effective methods for image privacy prediction are highly needed. In this paper, we propose an approach for fusing object, scene context, and image tags modalities derived from convolutional neural networks for accurately predicting the privacy of images shared online. Specifically, our approach identifies the set of most competent modalities on the fly, according to each new target image whose privacy has to be predicted. The approach considers three stages to predict the privacy of a target image, wherein we first identify the neighborhood images that are visually similar and/or have similar sensitive content as the target image. Then, we estimate the competence of the modalities based on the neighborhood images. Finally, we fuse the decisions of the most competent modalities and predict the privacy label for the target image. Experimental results show that our approach predicts the sensitive (or private) content more accurately than the models trained on individual modalities (object, scene, and tags) and prior privacy prediction works. Also, our approach outperforms strong baselines, that train meta-classifiers to obtain an optimal combination of modalities. |
Tasks | |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10796v2 |
http://arxiv.org/pdf/1902.10796v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-deep-multi-modal-fusion-for-image |
Repo | |
Framework | |
3D-BEVIS: Bird’s-Eye-View Instance Segmentation
Title | 3D-BEVIS: Bird’s-Eye-View Instance Segmentation |
Authors | Cathrin Elich, Francis Engelmann, Theodora Kontogianni, Bastian Leibe |
Abstract | Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is less explored. In this work, we present 3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on point clouds. Following the idea of previous proposal-free instance segmentation approaches, our model learns a feature embedding and groups the obtained feature space into semantic instances. Current point-based methods scale linearly with the number of points by processing local sub-parts of a scene individually. However, to perform instance segmentation by clustering, globally consistent features are required. Therefore, we propose to combine local point geometry with global context information from an intermediate bird’s-eye view representation. |
Tasks | 3D Semantic Instance Segmentation, Instance Segmentation, Object Classification, Semantic Segmentation |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.02199v3 |
https://arxiv.org/pdf/1904.02199v3.pdf | |
PWC | https://paperswithcode.com/paper/3d-bevis-birds-eye-view-instance-segmentation |
Repo | |
Framework | |