January 26, 2020

3203 words 16 mins read

Paper Group ANR 1513

Paper Group ANR 1513

Towards Train-Test Consistency for Semi-supervised Temporal Action Localization. Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning. Shared Feelings: Understanding Facebook Reactions to Scholarly Articles. Human Action Recognition with Multi-Laplacian Graph Convolutional Networks. Detecting retail products in si …

Towards Train-Test Consistency for Semi-supervised Temporal Action Localization

Title Towards Train-Test Consistency for Semi-supervised Temporal Action Localization
Authors Xudong Lin, Zheng Shou, Shih-Fu Chang
Abstract Recently, Weakly-supervised Temporal Action Localization (WTAL) has been densely studied but there is still a large gap between weakly-supervised models and fully-supervised models. It is practical and intuitive to annotate temporal boundaries of a few examples and utilize them to help WTAL models better detect actions. However, the train-test discrepancy of action localization strategy prevents WTAL models from leveraging semi-supervision for further improvement. At training time, attention or multiple instance learning is used to aggregate predictions of each snippet for video-level classification; at test time, they first obtain action score sequences over time, then truncate segments of scores higher than a fixed threshold, and post-process action segments. The inconsistent strategy makes it hard to explicitly supervise the action localization model with temporal boundary annotations at training time. In this paper, we propose a Train-Test Consistent framework, TTC-Loc. In both training and testing time, our TTC-Loc localizes actions by comparing scores of action classes and predicted threshold, which enables it to be trained with semi-supervision. By fixing the train-test discrepancy, our TTC-Loc significantly outperforms the state-of-the-art performance on THUMOS’14, ActivityNet 1.2 and 1.3 when only video-level labels are provided for training. With full annotations of only one video per class and video-level labels for the other videos, our TTC-Loc further boosts the performance and achieves 33.4% mAP (IoU threshold 0.5) on THUMOS’s 14.
Tasks Action Localization, Multiple Instance Learning, Temporal Action Localization, Video Classification, Weakly-supervised Temporal Action Localization
Published 2019-10-24
URL https://arxiv.org/abs/1910.11285v3
PDF https://arxiv.org/pdf/1910.11285v3.pdf
PWC https://paperswithcode.com/paper/lpat-learning-to-predict-adaptive-threshold
Repo
Framework

Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning

Title Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning
Authors Nicolas Garcia Trillos, Zach Kaplan, Daniel Sanz-Alonso
Abstract The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of a best Gaussian approximation in Kullback-Leibler divergence. Under this unified light, the optimization schemes for local entropy and heat regularized loss differ only over which argument of the Kullback-Leibler divergence is used to find the best Gaussian approximation. Local entropy corresponds to minimizing over the second argument, and the solution is given by moment matching. This allows to replace traditional back-propagation calculation of gradients by sampling algorithms, opening an avenue for gradient-free, parallelizable training of neural networks.
Tasks
Published 2019-01-29
URL http://arxiv.org/abs/1901.10082v1
PDF http://arxiv.org/pdf/1901.10082v1.pdf
PWC https://paperswithcode.com/paper/variational-characterizations-of-local
Repo
Framework

Shared Feelings: Understanding Facebook Reactions to Scholarly Articles

Title Shared Feelings: Understanding Facebook Reactions to Scholarly Articles
Authors Cole Freeman, Mrinal Kanti Roy, Michele Fattoruso, Hamed Alhoori
Abstract Research on social-media platforms has tended to rely on textual analysis to perform research tasks. While text-based approaches have significantly increased our understanding of online behavior and social dynamics, they overlook features on these platforms that have grown in prominence in the past few years: click-based responses to content. In this paper, we present a new dataset of Facebook Reactions to scholarly content. We give an overview of its structure, analyze some of the statistical trends in the data, and use it to train and test two supervised learning algorithms. Our preliminary tests suggest the presence of stratification in the number of users following pages, divisions that seem to fall in line with distinctions in the subject matter of those pages.
Tasks
Published 2019-05-27
URL https://arxiv.org/abs/1905.10975v1
PDF https://arxiv.org/pdf/1905.10975v1.pdf
PWC https://paperswithcode.com/paper/shared-feelings-understanding-facebook
Repo
Framework

Human Action Recognition with Multi-Laplacian Graph Convolutional Networks

Title Human Action Recognition with Multi-Laplacian Graph Convolutional Networks
Authors Ahmed Mazari, Hichem Sahbi
Abstract Convolutional neural networks are nowadays witnessing a major success in different pattern recognition problems. These learning models were basically designed to handle vectorial data such as images but their extension to non-vectorial and semi-structured data (namely graphs with variable sizes, topology, etc.) remains a major challenge, though a few interesting solutions are currently emerging. In this paper, we introduce MLGCN; a novel spectral Multi-Laplacian Graph Convolutional Network. The main contribution of this method resides in a new design principle that learns graph-laplacians as convex combinations of other elementary laplacians each one dedicated to a particular topology of the input graphs. We also introduce a novel pooling operator, on graphs, that proceeds in two steps: context-dependent node expansion is achieved, followed by a global average pooling; the strength of this two-step process resides in its ability to preserve the discrimination power of nodes while achieving permutation invariance. Experiments conducted on SBU and UCF-101 datasets, show the validity of our method for the challenging task of action recognition.
Tasks Temporal Action Localization
Published 2019-10-15
URL https://arxiv.org/abs/1910.06934v1
PDF https://arxiv.org/pdf/1910.06934v1.pdf
PWC https://paperswithcode.com/paper/human-action-recognition-with-multi-laplacian
Repo
Framework

Detecting retail products in situ using CNN without human effort labeling

Title Detecting retail products in situ using CNN without human effort labeling
Authors Wei Yi, Yaoran Sun, Tao Ding, Sailing He
Abstract CNN is a powerful tool for many computer vision tasks, achieving much better result than traditional methods. Since CNN has a very large capacity, training such a neural network often requires many data, but it is often expensive to obtain labeled images in real practice, especially for object detection, where collecting bounding box of every object in training set requires many human efforts. This is the case in detection of retail products where there can be many different categories. In this paper, we focus on applying CNN to detect 324-categories products in situ, while requiring no extra effort of labeling bounding box for any image. Our approach is based on an algorithm that extracts bounding box from in-vitro dataset and an algorithm to simulate occlusion. We have successfully shown the effectiveness and usefulness of our methods to build up a Faster RCNN detection model. Similar idea is also applicable in other scenarios.
Tasks Object Detection
Published 2019-04-22
URL http://arxiv.org/abs/1904.09781v1
PDF http://arxiv.org/pdf/1904.09781v1.pdf
PWC https://paperswithcode.com/paper/detecting-retail-products-in-situ-using-cnn
Repo
Framework

Wasserstein Collaborative Filtering for Item Cold-start Recommendation

Title Wasserstein Collaborative Filtering for Item Cold-start Recommendation
Authors Yitong Meng, Guangyong Chen, Benben Liao, Jun Guo, Weiwen Liu
Abstract The item cold-start problem seriously limits the recommendation performance of Collaborative Filtering (CF) methods when new items have either none or very little interactions. To solve this issue, many modern Internet applications propose to predict a new item’s interaction from the possessing contents. However, it is difficult to design and learn a map between the item’s interaction history and the corresponding contents. In this paper, we apply the Wasserstein distance to address the item cold-start problem. Given item content information, we can calculate the similarity between the interacted items and cold-start ones, so that a user’s preference on cold-start items can be inferred by minimizing the Wasserstein distance between the distributions over these two types of items. We further adopt the idea of CF and propose Wasserstein CF (WCF) to improve the recommendation performance on cold-start items. Experimental results demonstrate the superiority of WCF over state-of-the-art approaches.
Tasks
Published 2019-09-10
URL https://arxiv.org/abs/1909.04266v1
PDF https://arxiv.org/pdf/1909.04266v1.pdf
PWC https://paperswithcode.com/paper/wasserstein-collaborative-filtering-for-item
Repo
Framework

Dual IV: A Single Stage Instrumental Variable Regression

Title Dual IV: A Single Stage Instrumental Variable Regression
Authors Krikamol Muandet, Arash Mehrjou, Si Kai Lee, Anant Raj
Abstract We present a novel single-stage procedure for instrumental variable (IV) regression called DualIV which simplifies traditional two-stage regression via a dual formulation. We show that the common two-stage procedure can alternatively be solved via generalized least squares. Our formulation circumvents the first-stage regression which can be a bottleneck in modern two-stage procedures for IV regression. We also show that our framework is closely related to the generalized method of moments (GMM) with specific assumptions. This highlights the fundamental connection between GMM and two-stage procedures in IV literature. Using the proposed framework, we develop a simple kernel-based algorithm with consistency guarantees. Lastly, we give empirical results illustrating the advantages of our method over the existing two-stage algorithms.
Tasks
Published 2019-10-27
URL https://arxiv.org/abs/1910.12358v1
PDF https://arxiv.org/pdf/1910.12358v1.pdf
PWC https://paperswithcode.com/paper/dual-iv-a-single-stage-instrumental-variable
Repo
Framework

Collaborative Attention Network for Person Re-identification

Title Collaborative Attention Network for Person Re-identification
Authors Wenpeng Li, Yongli Sun, Jinjun Wang, Han Xu, Xiangru Yang, Long Cui
Abstract Jointly utilizing global and local features to improve model accuracy is becoming a popular approach for the person re-identification (ReID) problem, because previous works using global features alone have very limited capacity at extracting discriminative local patterns in the obtained feature representation. Existing works that attempt to collect local patterns either explicitly slice the global feature into several local pieces in a handcrafted way, or apply the attention mechanism to implicitly infer the importance of different local regions. In this paper, we show that by explicitly learning the importance of small local parts and part combinations, we can further improve the final feature representation for Re-ID. Specifically, we first separate the global feature into multiple local slices at different scale with a proposed multi-branch structure. Then we introduce the Collaborative Attention Network (CAN) to automatically learn the combination of features from adjacent slices. In this way, the combination keeps the intrinsic relation between adjacent features across local regions and scales, without losing information by partitioning the global features. Experiment results on several widely-used public datasets including Market-1501, DukeMTMC-ReID and CUHK03 prove that the proposed method outperforms many existing state-of-the-art methods.
Tasks Person Re-Identification
Published 2019-11-29
URL https://arxiv.org/abs/1911.13008v1
PDF https://arxiv.org/pdf/1911.13008v1.pdf
PWC https://paperswithcode.com/paper/collaborative-attention-network-for-person-re
Repo
Framework

Ordered or Orderless: A Revisit for Video based Person Re-Identification

Title Ordered or Orderless: A Revisit for Video based Person Re-Identification
Authors Le Zhang, Zenglin Shi, Joey Tianyi Zhou, Ming-Ming Cheng, Yun Liu, Jia-Wang Bian, Zeng Zeng, Chunhua Shen
Abstract Is recurrent network really necessary for learning a good visual representation for video based person re-identification (VPRe-id)? In this paper, we first show that the common practice of employing recurrent neural networks (RNNs) to aggregate temporal spatial features may not be optimal. Specifically, with a diagnostic analysis, we show that the recurrent structure may not be effective to learn temporal dependencies than what we expected and implicitly yields an orderless representation. Based on this observation, we then present a simple yet surprisingly powerful approach for VPRe-id, where we treat VPRe-id as an efficient orderless ensemble of image based person re-identification problem. More specifically, we divide videos into individual images and re-identify person with ensemble of image based rankers. Under the i.i.d. assumption, we provide an error bound that sheds light upon how could we improve VPRe-id. Our work also presents a promising way to bridge the gap between video and image based person re-identification. Comprehensive experimental evaluations demonstrate that the proposed solution achieves state-of-the-art performances on multiple widely used datasets (iLIDS-VID, PRID 2011, and MARS).
Tasks Person Re-Identification, Video-Based Person Re-Identification
Published 2019-12-24
URL https://arxiv.org/abs/1912.11236v1
PDF https://arxiv.org/pdf/1912.11236v1.pdf
PWC https://paperswithcode.com/paper/ordered-or-orderless-a-revisit-for-video
Repo
Framework

Segmentation of blood vessels in retinal fundus images

Title Segmentation of blood vessels in retinal fundus images
Authors Michiel Straat, Jorrit Oosterhof
Abstract In recent years, several automatic segmentation methods have been proposed for blood vessels in retinal fundus images, ranging from using cheap and fast trainable filters to complicated neural networks and even deep learning. One example of a filted-based segmentation method is B-COSFIRE. In this approach the image filter is trained with example prototype patterns, to which the filter becomes selective by finding points in a Difference of Gaussian response on circles around the center with large intensity variation. In this paper we discuss and evaluate several of these vessel segmentation methods. We take a closer look at B-COSFIRE and study the performance of B-COSFIRE on the recently published IOSTAR dataset by experiments and we examine how the parameter values affect the performance. In the experiment we manage to reach a segmentation accuracy of 0.9419. Based on our findings we discuss when B-COSFIRE is the preferred method to use and in which circumstances it could be beneficial to use a more (computationally) complex segmentation method. We also shortly discuss areas beyond blood vessel segmentation where these methods can be used to segment elongated structures, such as rivers in satellite images or nerves of a leaf.
Tasks
Published 2019-05-29
URL https://arxiv.org/abs/1905.12596v1
PDF https://arxiv.org/pdf/1905.12596v1.pdf
PWC https://paperswithcode.com/paper/segmentation-of-blood-vessels-in-retinal
Repo
Framework

Fast Large-Scale Discrete Optimization Based on Principal Coordinate Descent

Title Fast Large-Scale Discrete Optimization Based on Principal Coordinate Descent
Authors Huan Xiong, Mengyang Yu, Li Liu, Fan Zhu, Fumin Shen, Ling Shao
Abstract Binary optimization, a representative subclass of discrete optimization, plays an important role in mathematical optimization and has various applications in computer vision and machine learning. Usually, binary optimization problems are NP-hard and difficult to solve due to the binary constraints, especially when the number of variables is very large. Existing methods often suffer from high computational costs or large accumulated quantization errors, or are only designed for specific tasks. In this paper, we propose a fast algorithm to find effective approximate solutions for general binary optimization problems. The proposed algorithm iteratively solves minimization problems related to the linear surrogates of loss functions, which leads to the updating of some binary variables most impacting the value of loss functions in each step. Our method supports a wide class of empirical objective functions with/without restrictions on the numbers of $1$s and $-1$s in the binary variables. Furthermore, the theoretical convergence of our algorithm is proven, and the explicit convergence rates are derived, for objective functions with Lipschitz continuous gradients, which are commonly adopted in practice. Extensive experiments on several binary optimization tasks and large-scale datasets demonstrate the superiority of the proposed algorithm over several state-of-the-art methods in terms of both effectiveness and efficiency.
Tasks Quantization
Published 2019-09-16
URL https://arxiv.org/abs/1909.07079v1
PDF https://arxiv.org/pdf/1909.07079v1.pdf
PWC https://paperswithcode.com/paper/fast-large-scale-discrete-optimization-based
Repo
Framework

Space-time error estimates for deep neural network approximations for differential equations

Title Space-time error estimates for deep neural network approximations for differential equations
Authors Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philipp Zimmermann
Abstract Over the last few years deep artificial neural networks (DNNs) have very successfully been used in numerical simulations for a wide variety of computational problems including computer vision, image classification, speech recognition, natural language processing, as well as computational advertisement. In addition, it has recently been proposed to approximate solutions of partial differential equations (PDEs) by means of stochastic learning problems involving DNNs. There are now also a few rigorous mathematical results in the scientific literature which provide error estimates for such deep learning based approximation methods for PDEs. All of these articles provide spatial error estimates for neural network approximations for PDEs but do not provide error estimates for the entire space-time error for the considered neural network approximations. It is the subject of the main result of this article to provide space-time error estimates for DNN approximations of Euler approximations of certain perturbed differential equations. Our proof of this result is based (i) on a certain artificial neural network (ANN) calculus and (ii) on ANN approximation results for products of the form $[0,T]\times \mathbb{R}^d\ni (t,x)\mapsto tx\in \mathbb{R}^d$ where $T\in (0,\infty)$, $d\in \mathbb{N}$, which we both develop within this article.
Tasks Image Classification, Speech Recognition
Published 2019-08-11
URL https://arxiv.org/abs/1908.03833v1
PDF https://arxiv.org/pdf/1908.03833v1.pdf
PWC https://paperswithcode.com/paper/space-time-error-estimates-for-deep-neural
Repo
Framework

Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect

Title Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect
Authors Xinyang Jiang, Yifei Gong, Xiaowei Guo, Qize Yang, Feiyue Huang, Weishi Zheng, Feng Zheng, Xing Sun
Abstract Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video. However, existing video-based ReID methods do not consider the semantic difference brought by the outputs of different network stages, which potentially compromises the information richness of the person features. Furthermore, traditional methods ignore important relationship among frames, which causes information redundancy in fusion along the time axis. To address these issues, we propose a novel general temporal fusion framework to aggregate frame features on both semantic aspect and time aspect. As for the semantic aspect, a multi-stage fusion network is explored to fuse richer frame features at multiple semantic levels, which can effectively reduce the information loss caused by the traditional single-stage fusion. While, for the time axis, the existing intra-frame attention method is improved by adding a novel inter-frame attention module, which effectively reduces the information redundancy in temporal fusion by taking the relationship among frames into consideration. The experimental results show that our approach can effectively improve the video-based re-identification accuracy, achieving the state-of-the-art performance.
Tasks Person Re-Identification, Video-Based Person Re-Identification
Published 2019-11-28
URL https://arxiv.org/abs/1911.12512v1
PDF https://arxiv.org/pdf/1911.12512v1.pdf
PWC https://paperswithcode.com/paper/rethinking-temporal-fusion-for-video-based
Repo
Framework

Automatic segmentation of texts into units of meaning for reading assistance

Title Automatic segmentation of texts into units of meaning for reading assistance
Authors Jean-Claude Houbart, Solen Quiniou, Marion Berthaut, Béatrice Daille, Claire Salomé
Abstract The emergence of the digital book is a major step forward in providing access to reading, and therefore often to the common culture and the labour market. By allowing the enrichment of texts with cognitive crutches, EPub 3 compatible accessibility formats such as FROG have proven their effectiveness in alleviating but also reducing dyslexic disorders. In this paper, we show how Artificial Intelligence and particularly Transfer Learning with Google BERT can automate the division into units of meaning, and thus facilitate the creation of enriched digital books at a moderate cost.
Tasks Transfer Learning
Published 2019-10-11
URL https://arxiv.org/abs/1910.05014v1
PDF https://arxiv.org/pdf/1910.05014v1.pdf
PWC https://paperswithcode.com/paper/automatic-segmentation-of-texts-into-units-of
Repo
Framework

Adaptive Graph Representation Learning for Video Person Re-identification

Title Adaptive Graph Representation Learning for Video Person Re-identification
Authors Yiming Wu, Omar El Farouk Bourahla, Xi Li, Fei Wu, Qi Tian
Abstract Recent years have witnessed a great development of deep learning based video person re-identification (Re-ID). A key factor for video person Re-ID is how to effectively construct discriminative video feature representations for the robustness to many complicated situations like occlusions. Recent part-based approaches employ spatial and temporal attention to extract the representative local features. While the correlations between the parts are ignored in the previous methods, to leverage the relations of different parts, we propose an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between the relevant regional features. Specifically, we exploit pose alignment connection and feature affinity connection to construct an adaptive structure-aware adjacency graph, which models the intrinsic relations between graph nodes. We perform feature propagation on the adjacency graph to refine the original regional features iteratively, the neighbor nodes information is taken into account for part feature representation. To learn the compact and discriminative representations, we further propose a novel temporal resolution-aware regularization, which enforces the consistency among different temporal resolutions for the same identities. We conduct extensive evaluations on four benchmarks, i.e. iLIDS-VID, PRID2011, MARS, and DukeMTMC-VideoReID, the experimental results achieve the competitive performance which demonstrates the effectiveness of our proposed method.
Tasks Graph Representation Learning, Person Re-Identification, Representation Learning, Video-Based Person Re-Identification
Published 2019-09-05
URL https://arxiv.org/abs/1909.02240v1
PDF https://arxiv.org/pdf/1909.02240v1.pdf
PWC https://paperswithcode.com/paper/adaptive-graph-representation-learning-for
Repo
Framework
comments powered by Disqus