Paper Group ANR 311
Superpixel Segmentation via Convolutional Neural Networks with Regularized Information Maximization. Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective. Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding. Path Planning Using Probability Tens …
Superpixel Segmentation via Convolutional Neural Networks with Regularized Information Maximization
Title | Superpixel Segmentation via Convolutional Neural Networks with Regularized Information Maximization |
Authors | Teppei Suzuki |
Abstract | We propose an unsupervised superpixel segmentation method by optimizing a randomly-initialized convolutional neural network (CNN) in inference time. Our method generates superpixels via CNN from a single image without any labels by minimizing a proposed objective function for superpixel segmentation in inference time. There are three advantages to our method compared with many of existing methods: (i) leverages an image prior of CNN for superpixel segmentation, (ii) adaptively changes the number of superpixels according to the given images, and (iii) controls the property of superpixels by adding an auxiliary cost to the objective function. We verify the advantages of our method quantitatively and qualitatively on BSDS500 and SBD datasets. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06765v1 |
https://arxiv.org/pdf/2002.06765v1.pdf | |
PWC | https://paperswithcode.com/paper/superpixel-segmentation-via-convolutional |
Repo | |
Framework | |
Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective
Title | Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective |
Authors | Shun-ichi Amari |
Abstract | It is known that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large. There are sophisticated theories and discussions concerning this striking fact, but rigorous theories are very complicated. We give an elementary geometrical proof by using a simple model for the purpose of elucidating its structure. We show that high-dimensional geometry plays a magical role: When we project a high-dimensional sphere of radius 1 to a low-dimensional subspace, the uniform distribution over the sphere reduces to a Gaussian distribution of negligibly small covariances. |
Tasks | |
Published | 2020-01-20 |
URL | https://arxiv.org/abs/2001.06931v2 |
https://arxiv.org/pdf/2001.06931v2.pdf | |
PWC | https://paperswithcode.com/paper/any-target-function-exists-in-a-neighborhood |
Repo | |
Framework | |
Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding
Title | Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding |
Authors | Zhecheng Wang, Haoyuan Li, Ram Rajagopal |
Abstract | Understanding intrinsic patterns and predicting spatiotemporal characteristics of cities require a comprehensive representation of urban neighborhoods. Existing works relied on either inter- or intra-region connectivities to generate neighborhood representations but failed to fully utilize the informative yet heterogeneous data within neighborhoods. In this work, we propose Urban2Vec, an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest (POI) data to learn neighborhood embeddings. Specifically, we use a convolutional neural network to extract visual features from street view images while preserving geospatial similarity. Furthermore, we model each POI as a bag-of-words containing its category, rating, and review information. Analog to document embedding in natural language processing, we establish the semantic similarity between neighborhood (“document”) and the words from its surrounding POIs in the vector space. By jointly encoding visual, textual, and geospatial information into the neighborhood representation, Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks. Extensive experiments on three U.S. metropolitan areas also demonstrate the model interpretability, generalization capability, and its value in neighborhood similarity analysis. |
Tasks | Document Embedding, Semantic Similarity, Semantic Textual Similarity |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.11101v1 |
https://arxiv.org/pdf/2001.11101v1.pdf | |
PWC | https://paperswithcode.com/paper/urban2vec-incorporating-street-view-imagery |
Repo | |
Framework | |
Path Planning Using Probability Tensor Flows
Title | Path Planning Using Probability Tensor Flows |
Authors | Francesco A. N. Palmieri, Krishna R. Pattipati, Giovanni Fioretti, Giovanni Di Gennaro, Amedeo Buonanno |
Abstract | Probability models have been proposed in the literature to account for “intelligent” behavior in many contexts. In this paper, probability propagation is applied to model agent’s motion in potentially complex scenarios that include goals and obstacles. The backward flow provides precious background information to the agent’s behavior, viz., inferences coming from the future determine the agent’s actions. Probability tensors are layered in time in both directions in a manner similar to convolutional neural networks. The discussion is carried out with reference to a set of simulated grids where, despite the apparent task complexity, a solution, if feasible, is always found. The original model proposed by Attias has been extended to include non-absorbing obstacles, multiple goals and multiple agents. The emerging behaviors are very realistic and demonstrate great potentials of the application of this framework to real environments. |
Tasks | |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02774v1 |
https://arxiv.org/pdf/2003.02774v1.pdf | |
PWC | https://paperswithcode.com/paper/path-planning-using-probability-tensor-flows |
Repo | |
Framework | |
Dual Temporal Memory Network for Efficient Video Object Segmentation
Title | Dual Temporal Memory Network for Efficient Video Object Segmentation |
Authors | Kaihua Zhang, Long Wang, Dong Liu, Bo Liu, Qingshan Liu, Zhu Li |
Abstract | Video Object Segmentation (VOS) is typically formulated in a semi-supervised setting. Given the ground-truth segmentation mask on the first frame, the task of VOS is to track and segment the single or multiple objects of interests in the rest frames of the video at the pixel level. One of the fundamental challenges in VOS is how to make the most use of the temporal information to boost the performance. We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories to address the temporal modeling in VOS. Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network. The short-term memory sub-network models the fine-grained spatial-temporal interactions between local regions across neighboring frames in video via a graph-based learning framework, which can well preserve the visual consistency of local regions over time. The long-term memory sub-network models the long-range evolution of object via a Simplified-Gated Recurrent Unit (S-GRU), making the segmentation be robust against occlusions and drift errors. In our experiments, we show that our proposed method achieves a favorable and competitive performance on three frequently-used VOS datasets, including DAVIS 2016, DAVIS 2017 and Youtube-VOS in terms of both speed and accuracy. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06125v1 |
https://arxiv.org/pdf/2003.06125v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-temporal-memory-network-for-efficient |
Repo | |
Framework | |
Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning
Title | Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning |
Authors | Han-Jia Ye, Hong-You Chen, De-Chuan Zhan, Wei-Lun Chao |
Abstract | We investigate learning a ConvNet classifier with class-imbalanced data. We found that a ConvNet over-fits significantly to the minor classes that do not have sufficient training instances, even if it is trained using vanilla empirical risk minimization (ERM). We conduct a series of analysis and argue that feature deviation between the training and test instances serves as the main cause. We propose to incorporate class-dependent temperatures (CDT) in learning a ConvNet: CDT forces the minor-class instances to have larger decision values in training, so as to compensate for the effect of feature deviation in testing. We validate our approach on several benchmark datasets and achieve promising results. Our studies further suggest that class-imbalanced data affects traditional machine learning and recent deep learning in very different ways. We hope that our insights can inspire new ways of thinking in resolving class-imbalanced deep learning. |
Tasks | |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01385v2 |
https://arxiv.org/pdf/2001.01385v2.pdf | |
PWC | https://paperswithcode.com/paper/identifying-and-compensating-for-feature |
Repo | |
Framework | |
Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Inputs
Title | Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Inputs |
Authors | Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand |
Abstract | Significant progress has been made recently in developing few-shot object segmentation methods. Learning is shown to be successful in few segmentation settings, including pixel-level, scribbles and bounding boxes. This paper takes another approach, i.e., only requiring image-level classification data for few-shot object segmentation. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embedding. Our model using image-level labels achieves 4.8% improvement over previously proposed image-level few-shot object segmentation. It also outperforms state-of-the-art methods that use weak bounding box supervision on PASCAL-$5^i$. Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels. We further propose a novel setup, Temporal Object Segmentation for Few-shot Learning (TOSFL) for videos. TOSFL requires only image-level labels for the first frame in order to segment objects in the following frames. TOSFL provides a novel benchmark for video segmentation, which can be used on a variety of public video data such as Youtube-VOS, as demonstrated in our experiment. |
Tasks | Few-Shot Learning, Semantic Segmentation, Video Semantic Segmentation, Word Embeddings |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09540v2 |
https://arxiv.org/pdf/2001.09540v2.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-few-shot-object |
Repo | |
Framework | |
Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation
Title | Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation |
Authors | Sangdon Park, Osbert Bastani, James Weimer, Insup Lee |
Abstract | Reliable uncertainty estimates are an important tool for helping autonomous agents or human decision makers understand and leverage predictive models. However, existing approaches to estimating uncertainty largely ignore the possibility of covariate shift–i.e., where the real-world data distribution may differ from the training distribution. As a consequence, existing algorithms can overestimate certainty, possibly yielding a false sense of confidence in the predictive model. We propose an algorithm for calibrating predictions that accounts for the possibility of covariate shift, given labeled examples from the training distribution and unlabeled examples from the real-world distribution. Our algorithm uses importance weighting to correct for the shift from the training to the real-world distribution. However, importance weighting relies on the training and real-world distributions to be sufficiently close. Building on ideas from domain adaptation, we additionally learn a feature map that tries to equalize these two distributions. In an empirical evaluation, we show that our proposed approach outperforms existing approaches to calibrated prediction when there is covariate shift. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00343v1 |
https://arxiv.org/pdf/2003.00343v1.pdf | |
PWC | https://paperswithcode.com/paper/calibrated-prediction-with-covariate-shift |
Repo | |
Framework | |
Domain-Liftability of Relational Marginal Polytopes
Title | Domain-Liftability of Relational Marginal Polytopes |
Authors | Ondrej Kuzelka, Yuyi Wang |
Abstract | We study computational aspects of relational marginal polytopes which are statistical relational learning counterparts of marginal polytopes, well-known from probabilistic graphical models. Here, given some first-order logic formula, we can define its relational marginal statistic to be the fraction of groundings that make this formula true in a given possible world. For a list of first-order logic formulas, the relational marginal polytope is the set of all points that correspond to the expected values of the relational marginal statistics that are realizable. In this paper, we study the following two problems: (i) Do domain-liftability results for the partition functions of Markov logic networks (MLNs) carry over to the problem of relational marginal polytope construction? (ii) Is the relational marginal polytope containment problem hard under some plausible complexity-theoretic assumptions? Our positive results have consequences for lifted weight learning of MLNs. In particular, we show that weight learning of MLNs is domain-liftable whenever the computation of the partition function of the respective MLNs is domain-liftable (this result has not been rigorously proven before). |
Tasks | Relational Reasoning |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05198v1 |
https://arxiv.org/pdf/2001.05198v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-liftability-of-relational-marginal |
Repo | |
Framework | |
User Profiling Using Hinge-loss Markov Random Fields
Title | User Profiling Using Hinge-loss Markov Random Fields |
Authors | Golnoosh Farnadi, Lise Getoor, Marie-Francine Moens, Martine De Cock |
Abstract | A variety of approaches have been proposed to automatically infer the profiles of users from their digital footprint in social media. Most of the proposed approaches focus on mining a single type of information, while ignoring other sources of available user-generated content (UGC). In this paper, we propose a mechanism to infer a variety of user characteristics, such as, age, gender and personality traits, which can then be compiled into a user profile. To this end, we model social media users by incorporating and reasoning over multiple sources of UGC as well as social relations. Our model is based on a statistical relational learning framework using Hinge-loss Markov Random Fields (HL-MRFs), a class of probabilistic graphical models that can be defined using a set of first-order logical rules. We validate our approach on data from Facebook with more than 5k users and almost 725k relations. We show how HL-MRFs can be used to develop a generic and extensible user profiling framework by leveraging textual, visual, and relational content in the form of status updates, profile pictures and Facebook page likes. Our experimental results demonstrate that our proposed model successfully incorporates multiple sources of information and outperforms competing methods that use only one source of information or an ensemble method across the different sources for modeling of users in social media. |
Tasks | Relational Reasoning |
Published | 2020-01-05 |
URL | https://arxiv.org/abs/2001.01177v1 |
https://arxiv.org/pdf/2001.01177v1.pdf | |
PWC | https://paperswithcode.com/paper/user-profiling-using-hinge-loss-markov-random |
Repo | |
Framework | |
Mel-spectrogram augmentation for sequence to sequence voice conversion
Title | Mel-spectrogram augmentation for sequence to sequence voice conversion |
Authors | Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Insoo Oh, Seong-Whan Lee |
Abstract | When training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech tuples which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on the sequence-to-sequence voice conversion model. For Mel-spectrogram augmentation, we adopted the policies proposed in SpecAugment. In addition, we propose new policies for more data variations. To find the optimal hyperparameters of augmentation policies for voice conversion, we experimented based on the new metric, namely deformation per deteriorating ratio. We observed the effect of these through experiments based on various sizes of training set and combinations of augmentation policy. In the experimental results, the time axis warping based policies showed better performance than other policies. |
Tasks | Voice Conversion |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01401v1 |
https://arxiv.org/pdf/2001.01401v1.pdf | |
PWC | https://paperswithcode.com/paper/mel-spectrogram-augmentation-for-sequence-to |
Repo | |
Framework | |
Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space
Title | Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space |
Authors | Prateek Verma, Kenneth Salisbury |
Abstract | Audio perception is a key to solving a variety of problems ranging from acoustic scene analysis, music meta-data extraction, recommendation, synthesis and analysis. It can potentially also augment computers in doing tasks that humans do effortlessly in day-to-day activities. This paper builds upon key ideas to build perception of touch sounds without access to any ground-truth data. We show how we can leverage ideas from classical signal processing to get large amounts of data of any sound of interest with a high precision. These sounds are then used, along with the images to map the sounds to a clustered space of the latent representation of these images. This approach, not only allows us to learn semantic representation of the possible sounds of interest, but also allows association of different modalities to the learned distinctions. The model trained to map sounds to this clustered representation, gives reasonable performance as opposed to expensive methods collecting a lot of human annotated data. Such approaches can be used to build a state of art perceptual model for any sound of interest described using a few signal processing features. Daisy chaining high precision sound event detectors using signal processing combined with neural architectures and high dimensional clustering of unlabelled data is a vastly powerful idea, and can be explored in a variety of ways in future. |
Tasks | |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.04076v1 |
https://arxiv.org/pdf/2002.04076v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-audio-perception-for |
Repo | |
Framework | |
Numerical Solution of Inverse Problems by Weak Adversarial Networks
Title | Numerical Solution of Inverse Problems by Weak Adversarial Networks |
Authors | Gang Bao, Xiaojing Ye, Yaohua Zang, Haomin Zhou |
Abstract | We consider a weak adversarial network approach to numerically solve a class of inverse problems, including electrical impedance tomography and dynamic electrical impedance tomography problems. We leverage the weak formulation of PDE in the given inverse problem, and parameterize the solution and the test function as deep neural networks. The weak formulation and the boundary conditions induce a minimax problem of a saddle function of the network parameters. As the parameters are alternatively updated, the network gradually approximates the solution of the inverse problem. We provide theoretical justifications on the convergence of the proposed algorithm. Our method is completely mesh-free without any spatial discretization, and is particularly suitable for problems with high dimensionality and low regularity on solutions. Numerical experiments on a variety of test inverse problems demonstrate the promising accuracy and efficiency of our approach. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11340v1 |
https://arxiv.org/pdf/2002.11340v1.pdf | |
PWC | https://paperswithcode.com/paper/numerical-solution-of-inverse-problems-by |
Repo | |
Framework | |
Attribute-guided Feature Learning Network for Vehicle Re-identification
Title | Attribute-guided Feature Learning Network for Vehicle Re-identification |
Authors | Huibing Wang, Jinjia Peng, Dongyan Chen, Guangqi Jiang, Tongtong Zhao, Xianping Fu |
Abstract | Vehicle re-identification (reID) plays an important role in the automatic analysis of the increasing urban surveillance videos, which has become a hot topic in recent years. However, it poses the critical but challenging problem that is caused by various viewpoints of vehicles, diversified illuminations and complicated environments. Till now, most existing vehicle reID approaches focus on learning metrics or ensemble to derive better representation, which are only take identity labels of vehicle into consideration. However, the attributes of vehicle that contain detailed descriptions are beneficial for training reID model. Hence, this paper proposes a novel Attribute-Guided Network (AGNet), which could learn global representation with the abundant attribute features in an end-to-end manner. Specially, an attribute-guided module is proposed in AGNet to generate the attribute mask which could inversely guide to select discriminative features for category classification. Besides that, in our proposed AGNet, an attribute-based label smoothing (ALS) loss is presented to better train the reID model, which can strength the distinct ability of vehicle reID model to regularize AGNet model according to the attributes. Comprehensive experimental results clearly demonstrate that our method achieves excellent performance on both VehicleID dataset and VeRi-776 dataset. |
Tasks | Vehicle Re-Identification |
Published | 2020-01-12 |
URL | https://arxiv.org/abs/2001.03872v1 |
https://arxiv.org/pdf/2001.03872v1.pdf | |
PWC | https://paperswithcode.com/paper/attribute-guided-feature-learning-network-for |
Repo | |
Framework | |
Automatic Frame Selection using CNN in Ultrasound Elastography
Title | Automatic Frame Selection using CNN in Ultrasound Elastography |
Authors | Abdelrahman Zayed, Guy Cloutier, Hassan Rivaz |
Abstract | Ultrasound elastography is used to estimate the mechanical properties of the tissue by monitoring its response to an internal or external force. Different levels of deformation are obtained from different tissue types depending on their mechanical properties, where stiffer tissues deform less. Given two radio frequency (RF) frames collected before and after some deformation, we estimate displacement and strain images by comparing the RF frames. The quality of the strain image is dependent on the type of motion that occurs during deformation. In-plane axial motion results in high-quality strain images, whereas out-of-plane motion results in low-quality strain images. In this paper, we introduce a new method using a convolutional neural network (CNN) to determine the suitability of a pair of RF frames for elastography in only 5.4 ms. Our method could also be used to automatically choose the best pair of RF frames, yielding a high-quality strain image. The CNN was trained on 3,818 pairs of RF frames, while testing was done on 986 new unseen pairs, achieving an accuracy of more than 91%. The RF frames were collected from both phantom and in vivo data. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06734v1 |
https://arxiv.org/pdf/2002.06734v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-frame-selection-using-cnn-in |
Repo | |
Framework | |