Paper Group ANR 302
Restricted Linearized Augmented Lagrangian Method for Euler’s Elastica Model. Hybrid Physical-Deep Learning Model for Astronomical Inverse Problems. Extracting Interpretable Concept-Based Decision Trees from CNNs. To See in the Dark: N2DGAN for Background Modeling in Nighttime Scene. Towards Explainable Music Emotion Recognition: The Route via Mid- …
Restricted Linearized Augmented Lagrangian Method for Euler’s Elastica Model
Title | Restricted Linearized Augmented Lagrangian Method for Euler’s Elastica Model |
Authors | Yinghui Zhang, Xiaojuan Deng, Jun Zhang, Hongwei Li |
Abstract | Euler’s elastica model has been extensively studied and applied to image processing tasks. However, due to the high nonlinearity and nonconvexity of the involved curvature term, conventional algorithms suffer from slow convergence and high computational cost. Various fast algorithms have been proposed, among which, the augmented Lagrangian based ones are very popular in the community. However, parameter tuning might be very challenging for these methods. In this paper, a simple cutting-off strategy is introduced into the augmented Lagrangian based algorithms for minimizing the Euler’s elastica energy, which leads to easy parameter tuning and fast convergence. The cutting-off strategy is based on an observation of inconsistency inside the augmented Lagrangian based algorithms. When the weighting parameter of the curvature term goes to zero, the energy functional boils down to the ROF model. So, a natural requirement is that its augmented Lagrangian based algorithms should also approach the augmented Lagrangian based algorithms formulated directly for solving the ROF model from the very beginning. Unfortunately, this is not the case for certain existing augmented Lagrangian based algorithms. The proposed cutting-off strategy helps to decouple the tricky dependence between the auxiliary splitting variables, so as to remove the observed inconsistency. Numerical experiments suggest that the proposed algorithm enjoys easier parameter-tuning, faster convergence and even higher quality of image restorations. |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01429v1 |
https://arxiv.org/pdf/1908.01429v1.pdf | |
PWC | https://paperswithcode.com/paper/restricted-linearized-augmented-lagrangian |
Repo | |
Framework | |
Hybrid Physical-Deep Learning Model for Astronomical Inverse Problems
Title | Hybrid Physical-Deep Learning Model for Astronomical Inverse Problems |
Authors | Francois Lanusse, Peter Melchior, Fred Moolekamp |
Abstract | We present a Bayesian machine learning architecture that combines a physically motivated parametrization and an analytic error model for the likelihood with a deep generative model providing a powerful data-driven prior for complex signals. This combination yields an interpretable and differentiable generative model, allows the incorporation of prior knowledge, and can be utilized for observations with different data quality without having to retrain the deep network. We demonstrate our approach with an example of astronomical source separation in current imaging data, yielding a physical and interpretable model of astronomical scenes. |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.03980v1 |
https://arxiv.org/pdf/1912.03980v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-physical-deep-learning-model-for |
Repo | |
Framework | |
Extracting Interpretable Concept-Based Decision Trees from CNNs
Title | Extracting Interpretable Concept-Based Decision Trees from CNNs |
Authors | Conner Chyung, Michael Tsang, Yan Liu |
Abstract | In an attempt to gather a deeper understanding of how convolutional neural networks (CNNs) reason about human-understandable concepts, we present a method to infer labeled concept data from hidden layer activations and interpret the concepts through a shallow decision tree. The decision tree can provide information about which concepts a model deems important, as well as provide an understanding of how the concepts interact with each other. Experiments demonstrate that the extracted decision tree is capable of accurately representing the original CNN’s classifications at low tree depths, thus encouraging human-in-the-loop understanding of discriminative concepts. |
Tasks | |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04664v2 |
https://arxiv.org/pdf/1906.04664v2.pdf | |
PWC | https://paperswithcode.com/paper/extracting-interpretable-concept-based |
Repo | |
Framework | |
To See in the Dark: N2DGAN for Background Modeling in Nighttime Scene
Title | To See in the Dark: N2DGAN for Background Modeling in Nighttime Scene |
Authors | Zhenfeng Zhu, Yingying Meng, Deqiang Kong, Xingxing Zhang, Yandong Guo, Yao Zhao |
Abstract | Due to the deteriorated conditions of \mbox{illumination} lack and uneven lighting, nighttime images have lower contrast and higher noise than their daytime counterparts of the same scene, which limits seriously the performances of conventional background modeling methods. For such a challenging problem of background modeling under nighttime scene, an innovative and reasonable solution is proposed in this paper, which paves a new way completely different from the existing ones. To make background modeling under nighttime scene performs as well as in daytime condition, we put forward a promising generation-based background modeling framework for foreground surveillance. With a pre-specified daytime reference image as background frame, the {\bfseries GAN} based generation model, called {\bfseries N2DGAN}, is trained to transfer each frame of {\bfseries n}ighttime video {\bfseries to} a virtual {\bfseries d}aytime image with the same scene to the reference image except for the foreground region. Specifically, to balance the preservation of background scene and the foreground object(s) in generating the virtual daytime image, we present a two-pathway generation model, in which the global and local sub-networks are well combined with spatial and temporal consistency constraints. For the sequence of generated virtual daytime images, a multi-scale Bayes model is further proposed to characterize pertinently the temporal variation of background. We evaluate on collected datasets with manually labeled ground truth, which provides a valuable resource for related research community. The impressive results illustrated in both the main paper and supplementary show efficacy of our proposed approach. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06556v1 |
https://arxiv.org/pdf/1912.06556v1.pdf | |
PWC | https://paperswithcode.com/paper/to-see-in-the-dark-n2dgan-for-background |
Repo | |
Framework | |
Towards Explainable Music Emotion Recognition: The Route via Mid-level Features
Title | Towards Explainable Music Emotion Recognition: The Route via Mid-level Features |
Authors | Shreyan Chowdhury, Andreu Vall, Verena Haunschmid, Gerhard Widmer |
Abstract | Emotional aspects play an important part in our interaction with music. However, modelling these aspects in MIR systems have been notoriously challenging since emotion is an inherently abstract and subjective experience, thus making it difficult to quantify or predict in the first place, and to make sense of the predictions in the next. In an attempt to create a model that can give a musically meaningful and intuitive explanation for its predictions, we propose a VGG-style deep neural network that learns to predict emotional characteristics of a musical piece together with (and based on) human-interpretable, mid-level perceptual features. We compare this to predicting emotion directly with an identical network that does not take into account the mid-level features and observe that the loss in predictive performance of going through the mid-level features is surprisingly low, on average. The design of our network allows us to visualize the effects of perceptual features on individual emotion predictions, and we argue that the small loss in performance in going through the mid-level features is justified by the gain in explainability of the predictions. |
Tasks | Emotion Recognition, Music Emotion Recognition |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03572v1 |
https://arxiv.org/pdf/1907.03572v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-explainable-music-emotion-recognition |
Repo | |
Framework | |
An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation
Title | An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation |
Authors | Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin |
Abstract | We focus on Unsupervised Domain Adaptation (UDA) for the task of semantic segmentation. Recently, adversarial alignment has been widely adopted to match the marginal distribution of feature representations across two domains globally. However, this strategy fails in adapting the representations of the tail classes or small objects for semantic segmentation since the alignment objective is dominated by head categories or large objects. In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations. Specifically, we firstly perturb the intermediate feature maps with several attack objectives (i.e., discriminator and classifier) on each individual position for both domains, and then the classifier is trained to be invariant to the perturbations. By perturbing each position individually, our model treats each location evenly regardless of the category or object size and thus circumvents the aforementioned issue. Moreover, the domain gap in feature space is reduced by extrapolating source and target perturbed features towards each other with attack on the domain discriminator. Our approach achieves the state-of-the-art performance on two challenging domain adaptation tasks for semantic segmentation: GTA5 -> Cityscapes and SYNTHIA -> Cityscapes. |
Tasks | Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08954v1 |
https://arxiv.org/pdf/1912.08954v1.pdf | |
PWC | https://paperswithcode.com/paper/an-adversarial-perturbation-oriented-domain |
Repo | |
Framework | |
Causal Embeddings for Recommendation: An Extended Abstract
Title | Causal Embeddings for Recommendation: An Extended Abstract |
Authors | Stephen Bonner, Flavian Vasile |
Abstract | Recommendations are commonly used to modify user’s natural behavior, for example, increasing product sales or the time spent on a website. This results in a gap between the ultimate business objective and the classical setup where recommendations are optimized to be coherent with past user behavior. To bridge this gap, we propose a new learning setup for recommendation that optimizes for the Incremental Treatment Effect (ITE) of the policy. We show this is equivalent to learning to predict recommendation outcomes under a fully random recommendation policy and propose a new domain adaptation algorithm that learns from logged data containing outcomes from a biased recommendation policy and predicts recommendation outcomes according to random exposure. We compare our method against state-of-the-art factorization methods, in addition to new approaches of causal recommendation and show significant improvements. |
Tasks | Domain Adaptation |
Published | 2019-04-10 |
URL | https://arxiv.org/abs/1904.05165v2 |
https://arxiv.org/pdf/1904.05165v2.pdf | |
PWC | https://paperswithcode.com/paper/causal-embeddings-for-recommendation-an |
Repo | |
Framework | |
Distance metric learning based on structural neighborhoods for dimensionality reduction and classification performance improvement
Title | Distance metric learning based on structural neighborhoods for dimensionality reduction and classification performance improvement |
Authors | Mostafa Razavi Ghods, Mohammad Hossein Moattar, Yahya Forghani |
Abstract | Distance metric learning can be viewed as one of the fundamental interests in pattern recognition and machine learning, which plays a pivotal role in the performance of many learning methods. One of the effective methods in learning such a metric is to learn it from a set of labeled training samples. The issue of data imbalance is the most important challenge of recent methods. This research tries not only to preserve the local structures but also covers the issue of imbalanced datasets. To do this, the proposed method first tries to extract a low dimensional manifold from the input data. Then, it learns the local neighborhood structures and the relationship of the data points in the ambient space based on the adjacencies of the same data points on the embedded low dimensional manifold. Using the local neighborhood relationships extracted from the manifold space, the proposed method learns the distance metric in a way which minimizes the distance between similar data and maximizes their distance from the dissimilar data points. The evaluations of the proposed method on numerous datasets from the UCI repository of machine learning, and also the KDDCup98 dataset as the most imbalance dataset, justify the supremacy of the proposed approach in comparison with other approaches especially when the imbalance factor is high. |
Tasks | Dimensionality Reduction, Metric Learning |
Published | 2019-02-09 |
URL | https://arxiv.org/abs/1902.03453v2 |
https://arxiv.org/pdf/1902.03453v2.pdf | |
PWC | https://paperswithcode.com/paper/distance-metric-learning-based-on-structural |
Repo | |
Framework | |
Two-level Explanations in Music Emotion Recognition
Title | Two-level Explanations in Music Emotion Recognition |
Authors | Verena Haunschmid, Shreyan Chowdhury, Gerhard Widmer |
Abstract | Current ML models for music emotion recognition, while generally working quite well, do not give meaningful or intuitive explanations for their predictions. In this work, we propose a 2-step procedure to arrive at spectrogram-level explanations that connect certain aspects of the audio to interpretable mid-level perceptual features, and these to the actual emotion prediction. That makes it possible to focus on specific musical reasons for a prediction (in terms of perceptual features), and to trace these back to patterns in the audio that can be interpreted visually and acoustically. |
Tasks | Emotion Recognition, Music Emotion Recognition |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11760v1 |
https://arxiv.org/pdf/1905.11760v1.pdf | |
PWC | https://paperswithcode.com/paper/two-level-explanations-in-music-emotion |
Repo | |
Framework | |
Distributed Cooperative Online Estimation With Random Observation Matrices, Communication Graphs and Time-Delays
Title | Distributed Cooperative Online Estimation With Random Observation Matrices, Communication Graphs and Time-Delays |
Authors | Jiexiang Wang, Tao Li, Xiwei Zhang |
Abstract | We analyze convergence of distributed cooperative online estimation algorithms by a network of multiple nodes via information exchanging in an uncertain environment. Each node has a linear observation of an unknown parameter with randomly time-varying observation matrices. The underlying communication network is modeled by a sequence of random digraphs and is subjected to nonuniform random time-varying delays in channels. Each node runs an online estimation algorithm consisting of a consensus term taking a weighted sum of its own estimate and delayed estimates of neighbors, and an innovation term processing its own new measurement at each time step. By stochastic time-varying system, martingale convergence theories and the binomial expansion of random matrix products, we transform the convergence analysis of the algorithm into that of the mathematical expectation of random matrix products. Firstly, for the delay-free case, we show that the algorithm gains can be designed properly such that all nodes’ estimates converge to the real parameter in mean square and almost surely if the observation matrices and communication graphs satisfy the stochastic spatial-temporal persistence of excitation condition. Especially, this condition holds for Markovian switching communication graphs and observation matrices, if the stationary graph is balanced with a spanning tree and the measurement model is spatially-temporally jointly observable. Secondly, for the case with time-delays, we introduce delay matrices to model the random time-varying communication delays between nodes, and propose a mean square convergence condition, which quantitatively shows the intensity of spatial-temporal persistence of excitation to overcome time-delays. |
Tasks | |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08245v5 |
https://arxiv.org/pdf/1908.08245v5.pdf | |
PWC | https://paperswithcode.com/paper/distributed-cooperative-online-estimation |
Repo | |
Framework | |
Uncertainty-Aware Anticipation of Activities
Title | Uncertainty-Aware Anticipation of Activities |
Authors | Yazan Abu Farha, Juergen Gall |
Abstract | Anticipating future activities in video is a task with many practical applications. While earlier approaches are limited to just a few seconds in the future, the prediction time horizon has just recently been extended to several minutes in the future. However, as increasing the predicted time horizon, the future becomes more uncertain and models that generate a single prediction fail at capturing the different possible future activities. In this paper, we address the uncertainty modelling for predicting long-term future activities. Both an action model and a length model are trained to model the probability distribution of the future activities. At test time, we sample from the predicted distributions multiple samples that correspond to the different possible sequences of future activities. Our model is evaluated on two challenging datasets and shows a good performance in capturing the multi-modal future activities without compromising the accuracy when predicting a single sequence of future activities. |
Tasks | |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09540v2 |
https://arxiv.org/pdf/1908.09540v2.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-aware-anticipation-of-activities |
Repo | |
Framework | |
Nonverbal Robot Feedback for Human Teachers
Title | Nonverbal Robot Feedback for Human Teachers |
Authors | Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan |
Abstract | Robots can learn preferences from human demonstrations, but their success depends on how informative these demonstrations are. Being informative is unfortunately very challenging, because during teaching, people typically get no transparency into what the robot already knows or has learned so far. In contrast, human students naturally provide a wealth of nonverbal feedback that reveals their level of understanding and engagement. In this work, we study how a robot can similarly provide feedback that is minimally disruptive, yet gives human teachers a better mental model of the robot learner, and thus enables them to teach more effectively. Our idea is that at any point, the robot can indicate what it thinks the correct next action is, shedding light on its current estimate of the human’s preferences. We analyze how useful this feedback is, both in theory and with two user studies—one with a virtual character that tests the feedback itself, and one with a PR2 robot that uses gaze as the feedback mechanism. We find that feedback can be useful for improving both the quality of teaching and teachers’ understanding of the robot’s capability. |
Tasks | |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02320v1 |
https://arxiv.org/pdf/1911.02320v1.pdf | |
PWC | https://paperswithcode.com/paper/nonverbal-robot-feedback-for-human-teachers |
Repo | |
Framework | |
Neural networks are a priori biased towards Boolean functions with low entropy
Title | Neural networks are a priori biased towards Boolean functions with low entropy |
Authors | Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio, Vladimir Mikulik, Ard A. Louis |
Abstract | Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks – a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term – we prove that upon random initialisation of weights, the a priori probability P(t) that it represents a Boolean function that classifies t points in {0,1}^n as 1 has a remarkably simple form: P(t) = 2^{-n} for 0\leq t < 2^n. Since a perceptron can express far fewer Boolean functions with small or large values of t (low entropy) than with intermediate values of t (high entropy) there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed t, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect. |
Tasks | |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.11522v3 |
https://arxiv.org/pdf/1909.11522v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-are-textita-priori-biased |
Repo | |
Framework | |
Report on UG^2+ Challenge Track 1: Assessing Algorithms to Improve Video Object Detection and Classification from Unconstrained Mobility Platforms
Title | Report on UG^2+ Challenge Track 1: Assessing Algorithms to Improve Video Object Detection and Classification from Unconstrained Mobility Platforms |
Authors | Sreya Banerjee, Rosaura G. VidalMata, Zhangyang Wang, Walter J. Scheirer |
Abstract | How can we effectively engineer a computer vision system that is able to interpret videos from unconstrained mobility platforms like UAVs? One promising option is to make use of image restoration and enhancement algorithms from the area of computational photography to improve the quality of the underlying frames in a way that also improves automatic visual recognition. Along these lines, exploratory work is needed to find out which image pre-processing algorithms, in combination with the strongest features and supervised machine learning approaches, are good candidates for difficult scenarios like motion blur, weather, and mis-focus — all common artifacts in UAV acquired images. This paper summarizes the protocols and results of Track 1 of the UG^2+ Challenge held in conjunction with IEEE/CVF CVPR 2019. The challenge looked at two separate problems: (1) object detection improvement in video, and (2) object classification improvement in video. The challenge made use of the UG^2 (UAV, Glider, Ground) dataset, which is an established benchmark for assessing the interplay between image restoration and enhancement and visual recognition. 16 algorithms were submitted by academic and corporate teams, and a detailed analysis of how they performed on each challenge problem is reported here. |
Tasks | Image Restoration, Object Classification, Object Detection, Video Object Detection |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11529v3 |
https://arxiv.org/pdf/1907.11529v3.pdf | |
PWC | https://paperswithcode.com/paper/report-on-ug2-challenge-track-1-assessing |
Repo | |
Framework | |
Mode Collapse and Regularity of Optimal Transportation Maps
Title | Mode Collapse and Regularity of Optimal Transportation Maps |
Authors | Na lei, Yang Guo, Dongsheng An, Xin Qi, Zhongxuan Luo, Shing-Tung Yau, Xianfeng Gu |
Abstract | This work builds the connection between the regularity theory of optimal transportation map, Monge-Amp`{e}re equation and GANs, which gives a theoretic understanding of the major drawbacks of GANs: convergence difficulty and mode collapse. According to the regularity theory of Monge-Amp`{e}re equation, if the support of the target measure is disconnected or just non-convex, the optimal transportation mapping is discontinuous. General DNNs can only approximate continuous mappings. This intrinsic conflict leads to the convergence difficulty and mode collapse in GANs. We test our hypothesis that the supports of real data distribution are in general non-convex, therefore the discontinuity is unavoidable using an Autoencoder combined with discrete optimal transportation map (AE-OT framework) on the CelebA data set. The testing result is positive. Furthermore, we propose to approximate the continuous Brenier potential directly based on discrete Brenier theory to tackle mode collapse. Comparing with existing method, this method is more accurate and effective. |
Tasks | |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.02934v1 |
http://arxiv.org/pdf/1902.02934v1.pdf | |
PWC | https://paperswithcode.com/paper/mode-collapse-and-regularity-of-optimal |
Repo | |
Framework | |