Paper Group ANR 1149
On Boosting Semantic Street Scene Segmentation with Weak Supervision. Intra-clip Aggregation for Video Person Re-identification. Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence. Multi-Scale Body-Part Mask Guided Attention for Person Re-identification. Person Re-identification with Metric Learning using Privileged Informa …
On Boosting Semantic Street Scene Segmentation with Weak Supervision
Title | On Boosting Semantic Street Scene Segmentation with Weak Supervision |
Authors | Panagiotis Meletis, Gijs Dubbelman |
Abstract | Training convolutional networks for semantic segmentation requires per-pixel ground truth labels, which are very time consuming and hence costly to obtain. Therefore, in this work, we research and develop a hierarchical deep network architecture and the corresponding loss for semantic segmentation that can be trained from weak supervision, such as bounding boxes or image level labels, as well as from strong per-pixel supervision. We demonstrate that the hierarchical structure and the simultaneous training on strong (per-pixel) and weak (bounding boxes) labels, even from separate datasets, constantly increases the performance against per-pixel only training. Moreover, we explore the more challenging case of adding weak image-level labels. We collect street scene images and weak labels from the immense Open Images dataset to generate the OpenScapes dataset, and we use this novel dataset to increase segmentation performance on two established per-pixel labeled datasets, Cityscapes and Vistas. We report performance gains up to +13.2% mIoU on crucial street scene classes, and inference speed of 20 fps on a Titan V GPU for Cityscapes at 512 x 1024 resolution. Our network and OpenScapes dataset are shared with the research community. |
Tasks | Scene Segmentation, Semantic Segmentation |
Published | 2019-03-08 |
URL | https://arxiv.org/abs/1903.03462v2 |
https://arxiv.org/pdf/1903.03462v2.pdf | |
PWC | https://paperswithcode.com/paper/on-boosting-semantic-street-scene |
Repo | |
Framework | |
Intra-clip Aggregation for Video Person Re-identification
Title | Intra-clip Aggregation for Video Person Re-identification |
Authors | Takashi Isobe, Jian Han, Fang Zhu, Yali Li, Shengjin Wang |
Abstract | Video-based person re-id has drawn much attention in recent years due to its prospective applications in video surveillance. Most existing methods concentrate on how to represent discriminative clip-level features. Moreover, clip-level data augmentation is also important, especially for temporal aggregation task. Inconsistent intra-clip augmentation will collapse inter-frame alignment, thus bringing in additional noise. To tackle the above-motioned problems, we design a novel framework for video-based person re-id, which consists of two main modules: Synchronized Transformation (ST) and Intra-clip Aggregation (ICA). The former module augments intra-clip frames with the same probability and the same operation, while the latter leverages two-level intra-clip encoding to generate more discriminative clip-level features. To confirm the advantage of synchronized transformation, we conduct ablation study with different synchronized transformation scheme. We also perform cross-dataset experiment to better understand the generality of our method. Extensive experiments on three benchmark datasets demonstrate that our framework outperforming the most of recent state-of-the-art methods. |
Tasks | Data Augmentation, Person Re-Identification, Video-Based Person Re-Identification |
Published | 2019-05-05 |
URL | https://arxiv.org/abs/1905.01722v1 |
https://arxiv.org/pdf/1905.01722v1.pdf | |
PWC | https://paperswithcode.com/paper/intra-clip-aggregation-for-video-person-re |
Repo | |
Framework | |
Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence
Title | Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence |
Authors | Hsueh-Ying Lai, Yi-Hsuan Tsai, Wei-Chen Chiu |
Abstract | Stereo matching and flow estimation are two essential tasks for scene understanding, spatially in 3D and temporally in motion. Existing approaches have been focused on the unsupervised setting due to the limited resource to obtain the large-scale ground truth data. To construct a self-learnable objective, co-related tasks are often linked together to form a joint framework. However, the prior work usually utilizes independent networks for each task, thus not allowing to learn shared feature representations across models. In this paper, we propose a single and principled network to jointly learn spatiotemporal correspondence for stereo matching and flow estimation, with a newly designed geometric connection as the unsupervised signal for temporally adjacent stereo pairs. We show that our method performs favorably against several state-of-the-art baselines for both unsupervised depth and flow estimation on the KITTI benchmark dataset. |
Tasks | Optical Flow Estimation, Scene Understanding, Stereo Matching, Stereo Matching Hand |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.09265v1 |
https://arxiv.org/pdf/1905.09265v1.pdf | |
PWC | https://paperswithcode.com/paper/bridging-stereo-matching-and-optical-flow-via-1 |
Repo | |
Framework | |
Multi-Scale Body-Part Mask Guided Attention for Person Re-identification
Title | Multi-Scale Body-Part Mask Guided Attention for Person Re-identification |
Authors | Honglong Cai, Zhiguan Wang, Jinxing Cheng |
Abstract | Person re-identification becomes a more and more important task due to its wide applications. In practice, person re-identification still remains challenging due to the variation of person pose, different lighting, occlusion, misalignment, background clutter, etc. In this paper, we propose a multi-scale body-part mask guided attention network (MMGA), which jointly learns whole-body and part body attention to help extract global and local features simultaneously. In MMGA, body-part masks are used to guide the training of corresponding attention. Experiments show that our proposed method can reduce the negative influence of variation of person pose, misalignment and background clutter. Our method achieves rank-1/mAP of 95.0%/87.2% on the Market1501 dataset, 89.5%/78.1% on the DukeMTMC-reID dataset, outperforming current state-of-the-art methods. |
Tasks | Person Re-Identification |
Published | 2019-04-24 |
URL | http://arxiv.org/abs/1904.11041v1 |
http://arxiv.org/pdf/1904.11041v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-scale-body-part-mask-guided-attention |
Repo | |
Framework | |
Person Re-identification with Metric Learning using Privileged Information
Title | Person Re-identification with Metric Learning using Privileged Information |
Authors | Xun Yang, Meng Wang, Dacheng Tao |
Abstract | Despite the promising progress made in recent years, person re-identification remains a challenging task due to complex variations in human appearances from different camera views. This paper presents a logistic discriminant metric learning method for this challenging problem. Different with most existing metric learning algorithms, it exploits both original data and auxiliary data during training, which is motivated by the new machine learning paradigm - Learning Using Privileged Information. Such privileged information is a kind of auxiliary knowledge which is only available during training. Our goal is to learn an optimal distance function by constructing a locally adaptive decision rule with the help of privileged information. We jointly learn two distance metrics by minimizing the empirical loss penalizing the difference between the distance in the original space and that in the privileged space. In our setting, the distance in the privileged space functions as a local decision threshold, which guides the decision making in the original space like a teacher. The metric learned from the original space is used to compute the distance between a probe image and a gallery image during testing. In addition, we extend the proposed approach to a multi-view setting which is able to explore the complementation of multiple feature representations. In the multi-view setting, multiple metrics corresponding to different original features are jointly learned, guided by the same privileged information. Besides, an effective iterative optimization scheme is introduced to simultaneously optimize the metrics and the assigned metric weights. Experiment results on several widely-used datasets demonstrate that the proposed approach is superior to global decision threshold based methods and outperforms most state-of-the-art results. |
Tasks | Decision Making, Metric Learning, Person Re-Identification |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05005v1 |
http://arxiv.org/pdf/1904.05005v1.pdf | |
PWC | https://paperswithcode.com/paper/person-re-identification-with-metric-learning |
Repo | |
Framework | |
Interpretable Encrypted Searchable Neural Networks
Title | Interpretable Encrypted Searchable Neural Networks |
Authors | Kai Chen, Zhongrui Lin, Jian Wan, Chungen Xu |
Abstract | In cloud security, traditional searchable encryption (SE) requires high computation and communication overhead for dynamic search and update. The clever combination of machine learning (ML) and SE may be a new way to solve this problem. This paper proposes interpretable encrypted searchable neural networks (IESNN) to explore probabilistic query, balanced index tree construction and automatic weight update in an encrypted cloud environment. In IESNN, probabilistic learning is used to obtain search ranking for searchable index, and probabilistic query is performed based on ciphertext index, which reduces the computational complexity of query significantly. Compared to traditional SE, it is proposed that adversarial learning and automatic weight update in response to user’s timely query of the latest data set without expensive communication overhead. The proposed IESNN performs better than the previous works, bringing the query complexity closer to $O(\log N)$ and introducing low overhead on computation and communication. |
Tasks | |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.04998v1 |
https://arxiv.org/pdf/1908.04998v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-encrypted-searchable-neural |
Repo | |
Framework | |
Convolutional Temporal Attention Model for Video-based Person Re-identification
Title | Convolutional Temporal Attention Model for Video-based Person Re-identification |
Authors | Tanzila Rahman, Mrigank Rochan, Yang Wang |
Abstract | The goal of video-based person re-identification is to match two input videos, so that the distance of the two videos is small if two videos contain the same person. A common approach for person re-identification is to first extract image features for all frames in the video, then aggregate all the features to form a video-level feature. The video-level features of two videos can then be used to calculate the distance of the two videos. In this paper, we propose a temporal attention approach for aggregating frame-level features into a video-level feature vector for re-identification. Our method is motivated by the fact that not all frames in a video are equally informative. We propose a fully convolutional temporal attention model for generating the attention scores. Fully convolutional network (FCN) has been widely used in semantic segmentation for generating 2D output maps. In this paper, we formulate video based person re-identification as a sequence labeling problem like semantic segmentation. We establish a connection between them and modify FCN to generate attention scores to represent the importance of each frame. Extensive experiments on three different benchmark datasets (i.e. iLIDS-VID, PRID-2011 and SDU-VID) show that our proposed method outperforms other state-of-the-art approaches. |
Tasks | Person Re-Identification, Semantic Segmentation, Video-Based Person Re-Identification |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04492v2 |
http://arxiv.org/pdf/1904.04492v2.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-temporal-attention-model-for |
Repo | |
Framework | |
Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently
Title | Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently |
Authors | Dan Liu, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu, Feiyue Huang, Siwei Lyu |
Abstract | Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i.e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection. In this paper, we propose a new Scale Invariant Fully Convolutional Network (SIFCN) trained in an end-to-end fashion to detect hands efficiently. Specifically, we merge the feature maps from high to low layers in an iterative way, which handles different scales of hands better with less time overhead comparing to concatenating them simply. Moreover, we develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers to achieve scale invariance. To deal with rotated hand detection, we present the rotation map to get rid of complex rotation and derotation layers. Besides, we design the multi-scale loss scheme to accelerate the training process significantly by adding supervision to the intermediate layers of the network. Compared with the state-of-the-art methods, our algorithm shows comparable accuracy and runs a 4.23 times faster speed on the VIVA dataset and achieves better average precision on Oxford hand detection dataset at a speed of 62.5 fps. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04634v1 |
https://arxiv.org/pdf/1906.04634v1.pdf | |
PWC | https://paperswithcode.com/paper/scale-invariant-fully-convolutional-network |
Repo | |
Framework | |
Handheld Mobile Photography in Very Low Light
Title | Handheld Mobile Photography in Very Low Light |
Authors | Orly Liba, Kiran Murthy, Yun-Ta Tsai, Tim Brooks, Tianfan Xue, Nikhil Karnad, Qiurui He, Jonathan T. Barron, Dillon Sharlet, Ryan Geiss, Samuel W. Hasinoff, Yael Pritch, Marc Levoy |
Abstract | Taking photographs in low light using a mobile phone is challenging and rarely produces pleasing results. Aside from the physical limits imposed by read noise and photon shot noise, these cameras are typically handheld, have small apertures and sensors, use mass-produced analog electronics that cannot easily be cooled, and are commonly used to photograph subjects that move, like children and pets. In this paper we describe a system for capturing clean, sharp, colorful photographs in light as low as 0.3~lux, where human vision becomes monochromatic and indistinct. To permit handheld photography without flash illumination, we capture, align, and combine multiple frames. Our system employs “motion metering”, which uses an estimate of motion magnitudes (whether due to handshake or moving objects) to identify the number of frames and the per-frame exposure times that together minimize both noise and motion blur in a captured burst. We combine these frames using robust alignment and merging techniques that are specialized for high-noise imagery. To ensure accurate colors in such low light, we employ a learning-based auto white balancing algorithm. To prevent the photographs from looking like they were shot in daylight, we use tone mapping techniques inspired by illusionistic painting: increasing contrast, crushing shadows to black, and surrounding the scene with darkness. All of these processes are performed using the limited computational resources of a mobile device. Our system can be used by novice photographers to produce shareable pictures in a few seconds based on a single shutter press, even in environments so dim that humans cannot see clearly. |
Tasks | |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11336v1 |
https://arxiv.org/pdf/1910.11336v1.pdf | |
PWC | https://paperswithcode.com/paper/handheld-mobile-photography-in-very-low-light |
Repo | |
Framework | |
Use of OWL and Semantic Web Technologies at Pinterest
Title | Use of OWL and Semantic Web Technologies at Pinterest |
Authors | Rafael S. Gonçalves, Matthew Horridge, Rui Li, Yu Liu, Mark A. Musen, Csongor I. Nyulas, Evelyn Obamos, Dhananjay Shrouty, David Temple |
Abstract | Pinterest is a popular Web application that has over 250 million active users. It is a visual discovery engine for finding ideas for recipes, fashion, weddings, home decoration, and much more. In the last year, the company adopted Semantic Web technologies to create a knowledge graph that aims to represent the vast amount of content and users on Pinterest, to help both content recommendation and ads targeting. In this paper, we present the engineering of an OWL ontology—the Pinterest Taxonomy—that forms the core of Pinterest’s knowledge graph, the Pinterest Taste Graph. We describe modeling choices and enhancements to WebProt'eg'e that we used for the creation of the ontology. In two months, eight Pinterest engineers, without prior experience of OWL and WebProt'eg'e, revamped an existing taxonomy of noisy terms into an OWL ontology. We share our experience and present the key aspects of our work that we believe will be useful for others working in this area. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.02106v1 |
https://arxiv.org/pdf/1907.02106v1.pdf | |
PWC | https://paperswithcode.com/paper/use-of-owl-and-semantic-web-technologies-at |
Repo | |
Framework | |
Visualization, Discriminability and Applications of Interpretable Saak Features
Title | Visualization, Discriminability and Applications of Interpretable Saak Features |
Authors | Abinaya Manimaran, Thiyagarajan Ramanathan, Suya You, C-C Jay Kuo |
Abstract | In this work, we study the power of Saak features as an effort towards interpretable deep learning. Being inspired by the operations of convolutional layers of convolutional neural networks, multi-stage Saak transform was proposed. Based on this foundation, we provide an in-depth examination on Saak features, which are coefficients of the Saak transform, by analyzing their properties through visualization and demonstrating their applications in image classification. Being similar to CNN features, Saak features at later stages have larger receptive fields, yet they are obtained in a one-pass feedforward manner without backpropagation. The whole feature extraction process is transparent and is of extremely low complexity. The discriminant power of Saak features is demonstrated, and their classification performance in three well-known datasets (namely, MNIST, CIFAR-10 and STL-10) is shown by experimental results. |
Tasks | Image Classification |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09107v3 |
http://arxiv.org/pdf/1902.09107v3.pdf | |
PWC | https://paperswithcode.com/paper/visualization-discriminability-and |
Repo | |
Framework | |
Towards unstructured mortality prediction with free-text clinical notes
Title | Towards unstructured mortality prediction with free-text clinical notes |
Authors | Mohammad Hashir, Rapinder Sawhney |
Abstract | Healthcare data continues to flourish yet a relatively small portion, mostly structured, is being utilized effectively for predicting clinical outcomes. The rich subjective information available in unstructured clinical notes can possibly facilitate higher discrimination but tends to be under-utilized in mortality prediction. This work attempts to assess the gain in performance when multiple notes that have been minimally preprocessed are used as an input for prediction. A hierarchical architecture consisting of both convolutional and recurrent layers is used to concurrently model the different notes compiled in an individual hospital stay. This approach is evaluated on predicting in-hospital mortality on the MIMIC-III dataset. On comparison to approaches utilizing structured data, it achieved higher metrics despite requiring less cleaning and preprocessing. This demonstrates the potential of unstructured data in enhancing mortality prediction and signifies the need to incorporate more raw unstructured data into current clinical prediction methods. |
Tasks | Mortality Prediction |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08437v1 |
https://arxiv.org/pdf/1911.08437v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-unstructured-mortality-prediction |
Repo | |
Framework | |
Genetic Algorithm for a class of Knapsack Problems
Title | Genetic Algorithm for a class of Knapsack Problems |
Authors | Shalin Shah |
Abstract | The 0/1 knapsack problem is weakly NP-hard in that there exist pseudo-polynomial time algorithms based on dynamic programming that can solve it exactly. There are also the core branch and bound algorithms that can solve large randomly generated instances in a very short amount of time. However, as the correlation between the variables is increased, the difficulty of the problem increases. Recently a new class of knapsack problems was introduced by D. Pisinger called the spanner knapsack instances. These instances are unsolvable by the core branch and bound instances; and as the size of the coefficients and the capacity constraint increase, the spanner instances are unsolvable even by dynamic programming based algorithms. In this paper, a genetic algorithm is presented for spanner knapsack instances. Results show that the algorithm is capable of delivering optimum solutions within a reasonable amount of computational duration. |
Tasks | |
Published | 2019-02-15 |
URL | http://arxiv.org/abs/1903.03494v1 |
http://arxiv.org/pdf/1903.03494v1.pdf | |
PWC | https://paperswithcode.com/paper/genetic-algorithm-for-a-class-of-knapsack |
Repo | |
Framework | |
Risks from Learned Optimization in Advanced Machine Learning Systems
Title | Risks from Learned Optimization in Advanced Machine Learning Systems |
Authors | Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant |
Abstract | We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be - how will it differ from the loss function it was trained under - and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01820v2 |
https://arxiv.org/pdf/1906.01820v2.pdf | |
PWC | https://paperswithcode.com/paper/risks-from-learned-optimization-in-advanced |
Repo | |
Framework | |
Divide and Generate: Neural Generation of Complex Sentences
Title | Divide and Generate: Neural Generation of Complex Sentences |
Authors | Tomoya Ogata, Mamoru Komachi, Tomoya Takatani |
Abstract | We propose a task to generate a complex sentence from a simple sentence in order to amplify various kinds of responses in the database. We first divide a complex sentence into a main clause and a subordinate clause to learn a generator model of modifiers, and then use the model to generate a modifier clause to create a complex sentence from a simple sentence. We present an automatic evaluation metric to estimate the quality of the models and show that a pipeline model outperforms an end-to-end model. |
Tasks | |
Published | 2019-01-29 |
URL | http://arxiv.org/abs/1901.10196v1 |
http://arxiv.org/pdf/1901.10196v1.pdf | |
PWC | https://paperswithcode.com/paper/divide-and-generate-neural-generation-of |
Repo | |
Framework | |