January 25, 2020

3168 words 15 mins read

Paper Group ANR 1643

Implicit Priors for Knowledge Sharing in Bayesian Neural Networks. Spatial Analysis Made Easy with Linear Regression and Kernels. Gauge Equivariant Convolutional Networks and the Icosahedral CNN. Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features. Scalable Facial Image Compression with …


Title	Implicit Priors for Knowledge Sharing in Bayesian Neural Networks
Authors	Jack K Fitzsimons, Sebastian M Schmon, Stephen J Roberts
Abstract	Bayesian interpretations of neural network have a long history, dating back to early work in the 1990’s and have recently regained attention because of their desirable properties like uncertainty estimation, model robustness and regularisation. We want to discuss here the application of Bayesian models to knowledge sharing between neural networks. Knowledge sharing comes in different facets, such as transfer learning, model distillation and shared embeddings. All of these tasks have in common that learned “features” ought to be shared across different networks. Theoretically rooted in the concepts of Bayesian neural networks this work has widespread application to general deep learning.
Tasks	Transfer Learning
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00874v1
PDF	https://arxiv.org/pdf/1912.00874v1.pdf
PWC	https://paperswithcode.com/paper/implicit-priors-for-knowledge-sharing-in
Repo
Framework

Spatial Analysis Made Easy with Linear Regression and Kernels


Title	Spatial Analysis Made Easy with Linear Regression and Kernels
Authors	Philip Milton, Emanuele Giorgi, Samir Bhatt
Abstract	Kernel methods are an incredibly popular technique for extending linear models to non-linear problems via a mapping to an implicit, high-dimensional feature space. While kernel methods are computationally cheaper than an explicit feature mapping, they are still subject to cubic cost on the number of points. Given only a few thousand locations, this computational cost rapidly outstrips the currently available computational power. This paper aims to provide an overview of kernel methods from first-principals (with a focus on ridge regression), before progressing to a review of random Fourier features (RFF), a set of methods that enable the scaling of kernel methods to big datasets. At each stage, the associated R code is provided. We begin by illustrating how the dual representation of ridge regression relies solely on inner products and permits the use of kernels to map the data into high-dimensional spaces. We progress to RFFs, showing how only a few lines of code provides a significant computational speed-up for a negligible cost to accuracy. We provide an example of the implementation of RFFs on a simulated spatial data set to illustrate these properties. Lastly, we summarise the main issues with RFFs and highlight some of the advanced techniques aimed at alleviating them.
Tasks
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08679v1
PDF	http://arxiv.org/pdf/1902.08679v1.pdf
PWC	https://paperswithcode.com/paper/spatial-analysis-made-easy-with-linear
Repo
Framework

Gauge Equivariant Convolutional Networks and the Icosahedral CNN


Title	Gauge Equivariant Convolutional Networks and the Icosahedral CNN
Authors	Taco S. Cohen, Maurice Weiler, Berkay Kicanaoglu, Max Welling
Abstract	The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the development of a very general class of convolutional neural networks on manifolds that depend only on the intrinsic geometry, and which includes many popular methods from equivariant and geometric deep learning. We implement gauge equivariant CNNs for signals defined on the surface of the icosahedron, which provides a reasonable approximation of the sphere. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. Using this method, we demonstrate substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns.
Tasks	Semantic Segmentation
Published	2019-02-11
URL	https://arxiv.org/abs/1902.04615v3
PDF	https://arxiv.org/pdf/1902.04615v3.pdf
PWC	https://paperswithcode.com/paper/gauge-equivariant-convolutional-networks-and
Repo
Framework

Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features


Title	Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features
Authors	Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li
Abstract	This paper describes a conditional neural network architecture for Mandarin Chinese polyphone disambiguation. The system is composed of a bidirectional recurrent neural network component acting as a sentence encoder to accumulate the context correlations, followed by a prediction network that maps the polyphonic character embeddings along with the conditions to corresponding pronunciations. We obtain the word-level condition from a pre-trained word-to-vector lookup table. One goal of polyphone disambiguation is to address the homograph problem existing in the front-end processing of Mandarin Chinese text-to-speech system. Our system achieves an accuracy of 94.69% on a publicly available polyphonic character dataset. To further validate our choices on the conditional feature, we investigate polyphone disambiguation systems with multi-level conditions respectively. The experimental results show that both the sentence-level and the word-level conditional embedding features are able to attain good performance for Mandarin Chinese polyphone disambiguation.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01749v1
PDF	https://arxiv.org/pdf/1907.01749v1.pdf
PWC	https://paperswithcode.com/paper/polyphone-disambiguation-for-mandarin-chinese
Repo
Framework

Scalable Facial Image Compression with Deep Feature Reconstruction


Title	Scalable Facial Image Compression with Deep Feature Reconstruction
Authors	Shurun Wang, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao
Abstract	In this paper, we propose a scalable image compression scheme, including the base layer for feature representation and enhancement layer for texture representation. More specifically, the base layer is designed as the deep learning feature for analysis purpose, and it can also be converted to the fine structure with deep feature reconstruction. The enhancement layer, which serves to compress the residuals between the input image and the signals generated from the base layer, aims to faithfully reconstruct the input texture. The proposed scheme can feasibly inherit the advantages of both compress-then-analyze and analyze-then-compress schemes in surveillance applications. The performance of this framework is validated with facial images, and the conducted experiments provide useful evidences to show that the proposed framework can achieve better rate-accuracy and rate-distortion performance over conventional image compression schemes.
Tasks	Image Compression
Published	2019-03-14
URL	https://arxiv.org/abs/1903.05921v2
PDF	https://arxiv.org/pdf/1903.05921v2.pdf
PWC	https://paperswithcode.com/paper/scalable-facial-image-compression-with-deep
Repo
Framework

Effect of shapes of activation functions on predictability in the echo state network


Title	Effect of shapes of activation functions on predictability in the echo state network
Authors	Hanten Chang, Shinji Nakaoka, Hiroyasu Ando
Abstract	We investigate prediction accuracy for time series of Echo state networks with respect to several kinds of activation functions. As a result, we found that some kinds of activation functions with an appropriate nonlinearity show high performance compared to the conventional sigmoid function.
Tasks	Time Series
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09419v1
PDF	https://arxiv.org/pdf/1905.09419v1.pdf
PWC	https://paperswithcode.com/paper/effect-of-shapes-of-activation-functions-on
Repo
Framework

Mesh Variational Autoencoders with Edge Contraction Pooling


Title	Mesh Variational Autoencoders with Edge Contraction Pooling
Authors	Yu-Jie Yuan, Yu-Kun Lai, Jie Yang, Hongbo Fu, Lin Gao
Abstract	3D shape analysis is an important research topic in computer vision and graphics. While existing methods have generalized image-based deep learning to meshes using graph-based convolutions, the lack of an effective pooling operation restricts the learning capability of their networks. In this paper, we propose a novel pooling operation for mesh datasets with the same connectivity but different geometry, by building a mesh hierarchy using mesh simplification. For this purpose, we develop a modified mesh simplification method to avoid generating highly irregularly sized triangles. Our pooling operation effectively encodes the correspondence between coarser and finer meshes in the hierarchy. We then present a variational auto-encoder structure with the edge contraction pooling and graph-based convolutions, to explore probability latent spaces of 3D surfaces. Our network requires far fewer parameters than the original mesh VAE and thus can handle denser models thanks to our new pooling operation and convolutional kernels. Our evaluation also shows that our method has better generalization ability and is more reliable in various applications, including shape generation, shape interpolation and shape embedding.
Tasks	3D Shape Analysis
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02507v1
PDF	https://arxiv.org/pdf/1908.02507v1.pdf
PWC	https://paperswithcode.com/paper/mesh-variational-autoencoders-with-edge
Repo
Framework

Fairness in Algorithmic Decision Making: An Excursion Through the Lens of Causality


Title	Fairness in Algorithmic Decision Making: An Excursion Through the Lens of Causality
Authors	Aria Khademi, Sanghack Lee, David Foley, Vasant Honavar
Abstract	As virtually all aspects of our lives are increasingly impacted by algorithmic decision making systems, it is incumbent upon us as a society to ensure such systems do not become instruments of unfair discrimination on the basis of gender, race, ethnicity, religion, etc. We consider the problem of determining whether the decisions made by such systems are discriminatory, through the lens of causal models. We introduce two definitions of group fairness grounded in causality: fair on average causal effect (FACE), and fair on average causal effect on the treated (FACT). We use the Rubin-Neyman potential outcomes framework for the analysis of cause-effect relationships to robustly estimate FACE and FACT. We demonstrate the effectiveness of our proposed approach on synthetic data. Our analyses of two real-world data sets, the Adult income data set from the UCI repository (with gender as the protected attribute), and the NYC Stop and Frisk data set (with race as the protected attribute), show that the evidence of discrimination obtained by FACE and FACT, or lack thereof, is often in agreement with the findings from other studies. We further show that FACT, being somewhat more nuanced compared to FACE, can yield findings of discrimination that differ from those obtained using FACE.
Tasks	Decision Making
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11719v1
PDF	http://arxiv.org/pdf/1903.11719v1.pdf
PWC	https://paperswithcode.com/paper/fairness-in-algorithmic-decision-making-an
Repo
Framework

Dynamic Traffic Scene Classification with Space-Time Coherence


Title	Dynamic Traffic Scene Classification with Space-Time Coherence
Authors	Athma Narayanan, Isht Dwivedi, Behzad Dariush
Abstract	This paper examines the problem of dynamic traffic scene classification under space-time variations in viewpoint that arise from video captured on-board a moving vehicle. Solutions to this problem are important for realization of effective driving assistance technologies required to interpret or predict road user behavior. Currently, dynamic traffic scene classification has not been adequately addressed due to a lack of benchmark datasets that consider spatiotemporal evolution of traffic scenes resulting from a vehicle’s ego-motion. This paper has three main contributions. First, an annotated dataset is released to enable dynamic scene classification that includes 80 hours of diverse high quality driving video data clips collected in the San Francisco Bay area. The dataset includes temporal annotations for road places, road types, weather, and road surface conditions. Second, we introduce novel and baseline algorithms that utilize semantic context and temporal nature of the dataset for dynamic classification of road scenes. Finally, we showcase algorithms and experimental results that highlight how extracted features from scene classification serve as strong priors and help with tactical driver behavior understanding. The results show significant improvement from previously reported driving behavior detection baselines in the literature.
Tasks	Scene Classification
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12708v1
PDF	https://arxiv.org/pdf/1905.12708v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-traffic-scene-classification-with
Repo
Framework

Domain Randomization for Active Pose Estimation


Title	Domain Randomization for Active Pose Estimation
Authors	Xinyi Ren, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Abhishek Gupta, Aviv Tamar, Pieter Abbeel
Abstract	Accurate state estimation is a fundamental component of robotic control. In robotic manipulation tasks, as is our focus in this work, state estimation is essential for identifying the positions of objects in the scene, forming the basis of the manipulation plan. However, pose estimation typically requires expensive 3D cameras or additional instrumentation such as fiducial markers to perform accurately. Recently, Tobin et al.~introduced an approach to pose estimation based on domain randomization, where a neural network is trained to predict pose directly from a 2D image of the scene. The network is trained on computer-generated images with a high variation in textures and lighting, thereby generalizing to real-world images. In this work, we investigate how to improve the accuracy of domain randomization based pose estimation. Our main idea is that active perception – moving the robot to get a better estimate of pose – can be trained in simulation and transferred to real using domain randomization. In our approach, the robot trains in a domain-randomized simulation how to estimate pose from a \emph{sequence} of images. We show that our approach can significantly improve the accuracy of standard pose estimation in several scenarios: when the robot holding an object moves, when reference objects are moved in the scene, or when the camera is moved around the object.
Tasks	Pose Estimation
Published	2019-03-10
URL	http://arxiv.org/abs/1903.03953v1
PDF	http://arxiv.org/pdf/1903.03953v1.pdf
PWC	https://paperswithcode.com/paper/domain-randomization-for-active-pose
Repo
Framework

Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes


Title	Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes
Authors	Mandar Dixit, Yunsheng Li, Nuno Vasconcelos
Abstract	The transfer of a neural network (CNN) trained to recognize objects to the task of scene classification is considered. A Bag-of-Semantics (BoS) representation is first induced, by feeding scene image patches to the object CNN, and representing the scene image by the ensuing bag of posterior class probability vectors (semantic posteriors). The encoding of the BoS with a Fisher vector(FV) is then studied. A link is established between the FV of any probabilistic model and the Q-function of the expectation-maximization(EM) algorithm used to estimate its parameters by maximum likelihood. A network implementation of the MFA Fisher Score (MFA-FS), denoted as the MFAFSNet, is finally proposed to enable end-to-end training. Experiments with various object CNNs and datasets show that the approach has state-of-the-art transfer performance. Somewhat surprisingly, the scene classification results are superior to those of a CNN explicitly trained for scene classification, using a large scene dataset (Places). This suggests that holistic analysis is insufficient for scene classification. The modeling of local object semantics appears to be at least equally important. The two approaches are also shown to be strongly complementary, leading to very large scene classification gains when combined, and outperforming all previous scene classification approaches by a sizeable margin
Tasks	Scene Classification
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11539v1
PDF	https://arxiv.org/pdf/1905.11539v1.pdf
PWC	https://paperswithcode.com/paper/semantic-fisher-scores-for-task-transfer
Repo
Framework

Evolution Strategies Converges to Finite Differences


Title	Evolution Strategies Converges to Finite Differences
Authors	John C. Raisbeck, Matthew Allen, Ralph Weissleder, Hyungsoon Im, Hakho Lee
Abstract	Since the debut of Evolution Strategies (ES) as a tool for Reinforcement Learning by Salimans et al. 2017, there has been interest in determining the exact relationship between the Evolution Strategies gradient and the gradient of a similar class of algorithms, Finite Differences (FD).(Zhang et al. 2017, Lehman et al. 2018) Several investigations into the subject have been performed, investigating the formal motivational differences(Lehman et al. 2018) between ES and FD, as well as the differences in a standard benchmark problem in Machine Learning, the MNIST classification problem(Zhang et al. 2017). This paper proves that while the gradients are different, they converge as the dimension of the vector under optimization increases.
Tasks
Published	2019-12-27
URL	https://arxiv.org/abs/2001.01684v1
PDF	https://arxiv.org/pdf/2001.01684v1.pdf
PWC	https://paperswithcode.com/paper/evolution-strategies-converges-to-finite
Repo
Framework

Can We Automate Diagrammatic Reasoning?


Title	Can We Automate Diagrammatic Reasoning?
Authors	Sk. Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy, Dilip K. Prasad
Abstract	Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on visual reasoning that can be applied to real-world objects. In this paper, we present a diagrammatic reasoning dataset that provides a large variety of DR problems. In addition, we also propose a Knowledge-based Long Short Term Memory (KLSTM) to solve diagrammatic reasoning problems. Our proposed analysis is arguably the first work in this research area. Several state-of-the-art learning frameworks have been used to compare with the proposed KLSTM framework in the present context. Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues.
Tasks	Visual Reasoning
Published	2019-02-13
URL	http://arxiv.org/abs/1902.04955v1
PDF	http://arxiv.org/pdf/1902.04955v1.pdf
PWC	https://paperswithcode.com/paper/can-we-automate-diagrammatic-reasoning
Repo
Framework

Heterogeneous Multi-task Metric Learning across Multiple Domains


Title	Heterogeneous Multi-task Metric Learning across Multiple Domains
Authors	Yong Luo, Yonggang Wen, Dacheng Tao
Abstract	Distance metric learning (DML) plays a crucial role in diverse machine learning algorithms and applications. When the labeled information in target domain is limited, transfer metric learning (TML) helps to learn the metric by leveraging the sufficient information from other related domains. Multi-task metric learning (MTML), which can be regarded as a special case of TML, performs transfer across all related domains. Current TML tools usually assume that the same feature representation is exploited for different domains. However, in real-world applications, data may be drawn from heterogeneous domains. Heterogeneous transfer learning approaches can be adopted to remedy this drawback by deriving a metric from the learned transformation across different domains. But they are often limited in that only two domains can be handled. To appropriately handle multiple domains, we develop a novel heterogeneous multi-task metric learning (HMTML) framework. In HMTML, the metrics of all different domains are learned together. The transformations derived from the metrics are utilized to induce a common subspace, and the high-order covariance among the predictive structures of these domains is maximized in this subspace. There do exist a few heterogeneous transfer learning approaches that deal with multiple domains, but the high-order statistics (correlation information), which can only be exploited by simultaneously examining all domains, is ignored in these approaches. Compared with them, the proposed HMTML can effectively explore such high-order information, thus obtaining more reliable feature transformations and metrics. Effectiveness of our method is validated by the extensive and intensive experiments on text categorization, scene classification, and social image annotation.
Tasks	Metric Learning, Scene Classification, Text Categorization, Transfer Learning
Published	2019-04-08
URL	http://arxiv.org/abs/1904.04081v1
PDF	http://arxiv.org/pdf/1904.04081v1.pdf
PWC	https://paperswithcode.com/paper/heterogeneous-multi-task-metric-learning
Repo
Framework


Title	Examining Untempered Social Media: Analyzing Cascades of Polarized Conversations
Authors	Arunkumar Bagavathi, Pedram Bashiri, Shannon Reid, Matthew Phillips, Siddharth Krishnan
Abstract	Online social media, periodically serves as a platform for cascading polarizing topics of conversation. The inherent community structure present in online social networks (homophily) and the advent of fringe outlets like Gab have created online “echo chambers” that amplify the effects of polarization, which fuels detrimental behavior. Recently, in October 2018, Gab made headlines when it was revealed that Robert Bowers, the individual behind the Pittsburgh Synagogue massacre, was an active member of this social media site and used it to express his anti-Semitic views and discuss conspiracy theories. Thus to address the need of automated data-driven analyses of such fringe outlets, this research proposes novel methods to discover topics that are prevalent in Gab and how they cascade within the network. Specifically, using approximately 34 million posts, and 3.7 million cascading conversation threads with close to 300k users; we demonstrate that there are essentially five cascading patterns that manifest in Gab and the most “viral” ones begin with an echo-chamber pattern and grow out to the entire network. Also, we empirically show, through two models viz. Susceptible-Infected and Bass, how the cascades structurally evolve from one of the five patterns to the other based on the topic of the conversation with upto 84% accuracy.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04261v1
PDF	https://arxiv.org/pdf/1906.04261v1.pdf
PWC	https://paperswithcode.com/paper/examining-untempered-social-media-analyzing
Repo
Framework