Paper Group ANR 130
A Benchmark for LiDAR-based Panoptic Segmentation based on KITTI. Deep Learning for MIR Tutorial. Cost-aware Bayesian Optimization. A Bounded Measure for Estimating the Benefit of Visualization. Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships. Learning Shape Representations for Clothing Variat …
A Benchmark for LiDAR-based Panoptic Segmentation based on KITTI
Title | A Benchmark for LiDAR-based Panoptic Segmentation based on KITTI |
Authors | Jens Behley, Andres Milioto, Cyrill Stachniss |
Abstract | Panoptic segmentation is the recently introduced task that tackles semantic segmentation and instance segmentation jointly. In this paper, we present an extension of SemanticKITTI, which is a large-scale dataset providing dense point-wise semantic labels for all sequences of the KITTI Odometry Benchmark, for training and evaluation of laser-based panoptic segmentation. We provide the data and discuss the processing steps needed to enrich a given semantic annotation with temporally consistent instance information, i.e., instance information that supplements the semantic labels and identifies the same instance over sequences of LiDAR point clouds. Additionally, we present two strong baselines that combine state-of-the-art LiDAR-based semantic segmentation approaches with a state-of-the-art detector enriching the segmentation with instance information and that allow other researchers to compare their approaches against. We hope that our extension of SemanticKITTI with strong baselines enables the creation of novel algorithms for LiDAR-based panoptic segmentation as much as it has for the original semantic segmentation and semantic scene completion tasks. Data, code, and an online evaluation using a hidden test set will be published on http://semantic-kitti.org. |
Tasks | Instance Segmentation, Panoptic Segmentation, Semantic Segmentation |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.02371v1 |
https://arxiv.org/pdf/2003.02371v1.pdf | |
PWC | https://paperswithcode.com/paper/a-benchmark-for-lidar-based-panoptic |
Repo | |
Framework | |
Deep Learning for MIR Tutorial
Title | Deep Learning for MIR Tutorial |
Authors | Alexander Schindler, Thomas Lidy, Sebastian Böck |
Abstract | Deep Learning has become state of the art in visual computing and continuously emerges into the Music Information Retrieval (MIR) and audio retrieval domain. In order to bring attention to this topic we propose an introductory tutorial on deep learning for MIR. Besides a general introduction to neural networks, the proposed tutorial covers a wide range of MIR relevant deep learning approaches. \textbf{Convolutional Neural Networks} are currently a de-facto standard for deep learning based audio retrieval. \textbf{Recurrent Neural Networks} have proven to be effective in onset detection tasks such as beat or audio-event detection. \textbf{Siamese Networks} have been shown effective in learning audio representations and distance functions specific for music similarity retrieval. We will incorporate both academic and industrial points of view into the tutorial. Accompanying the tutorial, we will create a Github repository for the content presented at the tutorial as well as references to state of the art work and literature for further reading. This repository will remain public after the conference. |
Tasks | Information Retrieval, Music Information Retrieval |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05266v1 |
https://arxiv.org/pdf/2001.05266v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-mir-tutorial |
Repo | |
Framework | |
Cost-aware Bayesian Optimization
Title | Cost-aware Bayesian Optimization |
Authors | Eric Hans Lee, Valerio Perrone, Cedric Archambeau, Matthias Seeger |
Abstract | Bayesian optimization (BO) is a class of global optimization algorithms, suitable for minimizing an expensive objective function in as few function evaluations as possible. While BO budgets are typically given in iterations, this implicitly measures convergence in terms of iteration count and assumes each evaluation has identical cost. In practice, evaluation costs may vary in different regions of the search space. For example, the cost of neural network training increases quadratically with layer size, which is a typical hyperparameter. Cost-aware BO measures convergence with alternative cost metrics such as time, energy, or money, for which vanilla BO methods are unsuited. We introduce Cost Apportioned BO (CArBO), which attempts to minimize an objective function in as little cost as possible. CArBO combines a cost-effective initial design with a cost-cooled optimization phase which depreciates a learned cost model as iterations proceed. On a set of 20 black-box function optimization problems we show that, given the same cost budget, CArBO finds significantly better hyperparameter configurations than competing methods. |
Tasks | |
Published | 2020-03-22 |
URL | https://arxiv.org/abs/2003.10870v1 |
https://arxiv.org/pdf/2003.10870v1.pdf | |
PWC | https://paperswithcode.com/paper/cost-aware-bayesian-optimization |
Repo | |
Framework | |
A Bounded Measure for Estimating the Benefit of Visualization
Title | A Bounded Measure for Estimating the Benefit of Visualization |
Authors | Min Chen, Mateu Sbert, Alfie Abdul-Rahman, Deborah Silver |
Abstract | Information theory can be used to analyze the cost-benefit of visualization processes. However, the current measure of benefit contains an unbounded term that is neither easy to estimate nor intuitive to interpret. In this work, we propose to revise the existing cost-benefit measure by replacing the unbounded term with a bounded one. We examine a number of bounded measures that include the Jenson-Shannon divergence and a new divergence measure formulated as part of this work. We use visual analysis to support the multi-criteria comparison, enabling the selection of the most logical and intuitive option. We applied the revised cost-benefit measure to two case studies, demonstrating its uses in practical scenarios, while the collected real world data further informs the selection of a bounded measure. |
Tasks | |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05282v1 |
https://arxiv.org/pdf/2002.05282v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bounded-measure-for-estimating-the-benefit |
Repo | |
Framework | |
Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships
Title | Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships |
Authors | Chundra Aroor Cathcart |
Abstract | This paper addresses a series of complex and unresolved issues in the historical phonology of West Iranian languages. The West Iranian languages (Persian, Kurdish, Balochi, and other languages) display a high degree of non-Lautgesetzlich behavior. Most of this irregularity is undoubtedly due to language contact; we argue, however, that an oversimplified view of the processes at work has prevailed in the literature on West Iranian dialectology, with specialists assuming that deviations from an expected outcome in a given non-Persian language are due to lexical borrowing from some chronological stage of Persian. It is demonstrated that this qualitative approach yields at times problematic conclusions stemming from the lack of explicit probabilistic inferences regarding the distribution of the data: Persian may not be the sole donor language; additionally, borrowing at the lexical level is not always the mechanism that introduces irregularity. In many cases, the possibility that West Iranian languages show different reflexes in different conditioning environments remains under-explored. We employ a novel Bayesian approach designed to overcome these problems and tease apart the different determinants of irregularity in patterns of West Iranian sound change. Our methodology allows us to provisionally resolve a number of outstanding questions in the literature on West Iranian dialectology concerning the dialectal affiliation of certain sound changes. We outline future directions for work of this sort. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.05297v1 |
https://arxiv.org/pdf/2001.05297v1.pdf | |
PWC | https://paperswithcode.com/paper/dialectal-layers-in-west-iranian-a |
Repo | |
Framework | |
Learning Shape Representations for Clothing Variations in Person Re-Identification
Title | Learning Shape Representations for Clothing Variations in Person Re-Identification |
Authors | Yu-Jhe Li, Zhengyi Luo, Xinshuo Weng, Kris M. Kitani |
Abstract | Person re-identification (re-ID) aims to recognize instances of the same person contained in multiple images taken across different cameras. Existing methods for re-ID tend to rely heavily on the assumption that both query and gallery images of the same person have the same clothing. Unfortunately, this assumption may not hold for datasets captured over long periods of time (e.g., weeks, months or years). To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns. We call our model the Color Agnostic Shape Extraction Network (CASE-Net). CASE-Net learns a representation of identity that depends only on body shape via adversarial learning and feature disentanglement. Due to the lack of large-scale re-ID datasets which contain clothing changes for the same person, we propose two synthetic datasets for evaluation. We create a rendered dataset SMPL-reID with different clothes patterns and a synthesized dataset Div-Market with different clothing color to simulate two types of clothing changes. The quantitative and qualitative results across 5 datasets (SMPL-reID, Div-Market, two benchmark re-ID datasets, a cross-modality re-ID dataset) confirm the robustness and superiority of our approach against several state-of-the-art approaches |
Tasks | Person Re-Identification, Representation Learning |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07340v1 |
https://arxiv.org/pdf/2003.07340v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-shape-representations-for-clothing |
Repo | |
Framework | |
Certification of Semantic Perturbations via Randomized Smoothing
Title | Certification of Semantic Perturbations via Randomized Smoothing |
Authors | Marc Fischer, Maximilian Baader, Martin Vechev |
Abstract | We introduce a novel certification method for parametrized perturbations by generalizing randomized smoothing. Using this method, we construct a provable classifier that can establish state-of-the-art robustness against semantic perturbations including geometric transformations (e.g., rotation, translation), for different types of interpolation, and, for the first time, volume changes on audio data. Our experimental results indicate that the method is practically effective: for ResNet-50 on ImageNet, it achieves rotational robustness provable up to $\pm 30^\circ$ for 28% of images. |
Tasks | |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12463v1 |
https://arxiv.org/pdf/2002.12463v1.pdf | |
PWC | https://paperswithcode.com/paper/certification-of-semantic-perturbations-via |
Repo | |
Framework | |
Symmetry & critical points for a model shallow neural network
Title | Symmetry & critical points for a model shallow neural network |
Authors | Yossi Arjevani, Michael Field |
Abstract | A detailed analysis is given of a family of critical points determining spurious minima for a model student-teacher 2-layer neural network, with ReLU activation function, and a natural $\Gamma = S_k \times S_k$-symmetry. For a $k$-neuron shallow network of this type, analytic equations are given which, for example, determine the critical points of the spurious minima described by Safran and Shamir (2018) for $6 \le k \le 20$. These critical points have isotropy (conjugate to) the diagonal subgroup $\Delta S_{k-1}\subset \Delta S_k$ of $\Gamma$. It is shown that critical points of this family can be expressed as an infinite series in $1/\sqrt{k}$ (for large enough $k$) and, as an application, the critical values decay like $a k^{-1}$, where $a \approx 0.3$. Other non-trivial families of critical points are also described with isotropy conjugate to $\Delta S_{k-1}, \Delta S_k$ and $\Delta (S_2\times S_{k-2})$ (the latter giving spurious minima for $k\ge 9$). The methods used depend on symmetry breaking, bifurcation, and algebraic geometry, notably Artin’s implicit function theorem, and are applicable to other families of critical points that occur in this network. |
Tasks | |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10576v1 |
https://arxiv.org/pdf/2003.10576v1.pdf | |
PWC | https://paperswithcode.com/paper/symmetry-critical-points-for-a-model-shallow |
Repo | |
Framework | |
Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning
Title | Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning |
Authors | Chongwen Huang, Member, IEEE, Ronghong Mo, Chau Yuen, Senior Member |
Abstract | Recently, the reconfigurable intelligent surface (RIS), benefited from the breakthrough on the fabrication of programmable meta-material, has been speculated as one of the key enabling technologies for the future six generation (6G) wireless communication systems scaled up beyond massive multiple input multiple output (Massive-MIMO) technology to achieve smart radio environments. Employed as reflecting arrays, RIS is able to assist MIMO transmissions without the need of radio frequency chains resulting in considerable reduction in power consumption. In this paper, we investigate the joint design of transmit beamforming matrix at the base station and the phase shift matrix at the RIS, by leveraging recent advances in deep reinforcement learning (DRL). We first develop a DRL based algorithm, in which the joint design is obtained through trial-and-error interactions with the environment by observing predefined rewards, in the context of continuous state and action. Unlike the most reported works utilizing the alternating optimization techniques to alternatively obtain the transmit beamforming and phase shifts, the proposed DRL based algorithm obtains the joint design simultaneously as the output of the DRL neural network. Simulation results show that the proposed algorithm is not only able to learn from the environment and gradually improve its behavior, but also obtains the comparable performance compared with two state-of-the-art benchmarks. It is also observed that, appropriate neural network parameter settings will improve significantly the performance and convergence rate of the proposed algorithm. |
Tasks | |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10072v1 |
https://arxiv.org/pdf/2002.10072v1.pdf | |
PWC | https://paperswithcode.com/paper/reconfigurable-intelligent-surface-assisted |
Repo | |
Framework | |
Annotation of Emotion Carriers in Personal Narratives
Title | Annotation of Emotion Carriers in Personal Narratives |
Authors | Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi |
Abstract | We are interested in the problem of understanding personal narratives (PN) - spoken or written - recollections of facts, events, and thoughts. In PN, emotion carriers are the speech or text segments that best explain the emotional state of the user. Such segments may include entities, verb or noun phrases. Advanced automatic understanding of PNs requires not only the prediction of the user emotional state but also to identify which events (e.g. “the loss of relative” or “the visit of grandpa”) or people ( e.g. “the old group of high school mates”) carry the emotion manifested during the personal recollection. This work proposes and evaluates an annotation model for identifying emotion carriers in spoken personal narratives. Compared to other text genres such as news and microblogs, spoken PNs are particularly challenging because a narrative is usually unstructured, involving multiple sub-events and characters as well as thoughts and associated emotions perceived by the narrator. In this work, we experiment with annotating emotion carriers from speech transcriptions in the Ulm State-of-Mind in Speech (USoMS) corpus, a dataset of German PNs. We believe this resource could be used for experiments in the automatic extraction of emotion carriers from PN, a task that could provide further advancements in narrative understanding. |
Tasks | |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12196v2 |
https://arxiv.org/pdf/2002.12196v2.pdf | |
PWC | https://paperswithcode.com/paper/annotation-of-emotion-carriers-in-personal |
Repo | |
Framework | |
A Set-Theoretic Study of the Relationships of Image Models and Priors for Restoration Problems
Title | A Set-Theoretic Study of the Relationships of Image Models and Priors for Restoration Problems |
Authors | Bihan Wen, Yanjun Li, Yuqi Li, Yoram Bresler |
Abstract | Image prior modeling is the key issue in image recovery, computational imaging, compresses sensing, and other inverse problems. Recent algorithms combining multiple effective priors such as the sparse or low-rank models, have demonstrated superior performance in various applications. However, the relationships among the popular image models are unclear, and no theory in general is available to demonstrate their connections. In this paper, we present a theoretical analysis on the image models, to bridge the gap between applications and image prior understanding, including sparsity, group-wise sparsity, joint sparsity, and low-rankness, etc. We systematically study how effective each image model is for image restoration. Furthermore, we relate the denoising performance improvement by combining multiple models, to the image model relationships. Extensive experiments are conducted to compare the denoising results which are consistent with our analysis. On top of the model-based methods, we quantitatively demonstrate the image properties that are inexplicitly exploited by deep learning method, of which can further boost the denoising performance by combining with its complementary image models. |
Tasks | Denoising, Image Restoration |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.12985v1 |
https://arxiv.org/pdf/2003.12985v1.pdf | |
PWC | https://paperswithcode.com/paper/a-set-theoretic-study-of-the-relationships-of |
Repo | |
Framework | |
Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions
Title | Top Comment or Flop Comment? Predicting and Explaining User Engagement in Online News Discussions |
Authors | Julian Risch, Ralf Krestel |
Abstract | Comment sections below online news articles enjoy growing popularity among readers. However, the overwhelming number of comments makes it infeasible for the average news consumer to read all of them and hinders engaging discussions. Most platforms display comments in chronological order, which neglects that some of them are more relevant to users and are better conversation starters. In this paper, we systematically analyze user engagement in the form of the upvotes and replies that a comment receives. Based on comment texts, we train a model to distinguish comments that have either a high or low chance of receiving many upvotes and replies. Our evaluation on user comments from TheGuardian.com compares recurrent and convolutional neural network models, and a traditional feature-based classifier. Further, we investigate what makes some comments more engaging than others. To this end, we identify engagement triggers and arrange them in a taxonomy. Explanation methods for neural networks reveal which input words have the strongest influence on our model’s predictions. In addition, we evaluate on a dataset of product reviews, which exhibit similar properties as user comments, such as featuring upvotes for helpfulness. |
Tasks | |
Published | 2020-03-26 |
URL | https://arxiv.org/abs/2003.11949v1 |
https://arxiv.org/pdf/2003.11949v1.pdf | |
PWC | https://paperswithcode.com/paper/top-comment-or-flop-comment-predicting-and |
Repo | |
Framework | |
Modeling Product Search Relevance in e-Commerce
Title | Modeling Product Search Relevance in e-Commerce |
Authors | Rahul Radhakrishnan Iyer, Rohan Kohli, Shrimai Prabhumoye |
Abstract | With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping. However, there is still a big gap between the products that customers really desire to purchase and relevance of products that are suggested in response to a query from the customer. In this paper, we propose a robust way of predicting relevance scores given a search query and a product, using techniques involving machine learning, natural language processing and information retrieval. We compare conventional information retrieval models such as BM25 and Indri with deep learning models such as word2vec, sentence2vec and paragraph2vec. We share some of our insights and findings from our experiments. |
Tasks | Information Retrieval |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04980v1 |
https://arxiv.org/pdf/2001.04980v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-product-search-relevance-in-e |
Repo | |
Framework | |
Foundations of Explainable Knowledge-Enabled Systems
Title | Foundations of Explainable Knowledge-Enabled Systems |
Authors | Shruthi Chari, Daniel M. Gruen, Oshani Seneviratne, Deborah L. McGuinness |
Abstract | Explainability has been an important goal since the early days of Artificial Intelligence. Several approaches for producing explanations have been developed. However, many of these approaches were tightly coupled with the capabilities of the artificial intelligence systems at the time. With the proliferation of AI-enabled systems in sometimes critical settings, there is a need for them to be explainable to end-users and decision-makers. We present a historical overview of explainable artificial intelligence systems, with a focus on knowledge-enabled systems, spanning the expert systems, cognitive assistants, semantic applications, and machine learning domains. Additionally, borrowing from the strengths of past approaches and identifying gaps needed to make explanations user- and context-focused, we propose new definitions for explanations and explainable knowledge-enabled systems. |
Tasks | |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07520v1 |
https://arxiv.org/pdf/2003.07520v1.pdf | |
PWC | https://paperswithcode.com/paper/foundations-of-explainable-knowledge-enabled |
Repo | |
Framework | |
Lattice-based Improvements for Voice Triggering Using Graph Neural Networks
Title | Lattice-based Improvements for Voice Triggering Using Graph Neural Networks |
Authors | Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik, Adithya Sagar, Ying Ma, Stephen Pulman, Jason Williams |
Abstract | Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using graph neural networks (GNN). The proposed approach uses the fact that decoding lattice of a falsely triggered audio exhibits uncertainties in terms of many alternative paths and unexpected words on the lattice arcs as compared to the lattice of a correctly triggered audio. A pure trigger-phrase detector model doesn’t fully utilize the intent of the user speech whereas by using the complete decoding lattice of user audio, we can effectively mitigate speech not intended for the smart assistant. We deploy two variants of GNNs in this paper based on 1) graph convolution layers and 2) self-attention mechanism respectively. Our experiments demonstrate that GNNs are highly accurate in FTM task by mitigating ~87% of false triggers at 99% true positive rate (TPR). Furthermore, the proposed models are fast to train and efficient in parameter requirements. |
Tasks | Speech Recognition |
Published | 2020-01-25 |
URL | https://arxiv.org/abs/2001.10822v1 |
https://arxiv.org/pdf/2001.10822v1.pdf | |
PWC | https://paperswithcode.com/paper/lattice-based-improvements-for-voice |
Repo | |
Framework | |