Paper Group ANR 786
Replacing the do-calculus with Bayes rule. Joint group and residual sparse coding for image compressive sensing. Design Space Exploration of Hardware Spiking Neurons for Embedded Artificial Intelligence. Multi-Resolution Weak Supervision for Sequential Data. Galaxy classification: A machine learning analysis of GAMA catalogue data. STEERAGE: Synthe …
Replacing the do-calculus with Bayes rule
Title | Replacing the do-calculus with Bayes rule |
Authors | Finnian Lattimore, David Rohde |
Abstract | The concept of causality has a controversial history. The question of whether it is possible to represent and address causal problems with probability theory, or if fundamentally new mathematics such as the do calculus is required has been hotly debated, e.g. Pearl (2001) states “the building blocks of our scientific and everyday knowledge are elementary facts such as “mud does not cause rain” and “symptoms do not cause disease” and those facts, strangely enough, cannot be expressed in the vocabulary of probability calculus”. This has lead to a dichotomy between advocates of causal graphical modeling and the do calculus, and researchers applying Bayesian methods. In this paper we demonstrate that, while it is critical to explicitly model our assumptions on the impact of intervening in a system, provided we do so, estimating causal effects can be done entirely within the standard Bayesian paradigm. The invariance assumptions underlying causal graphical models can be encoded in ordinary Probabilistic graphical models, allowing causal estimation with Bayesian statistics, equivalent to the do calculus. Elucidating the connections between these approaches is a key step toward enabling the insights provided by each to be combined to solve real problems. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07125v2 |
https://arxiv.org/pdf/1906.07125v2.pdf | |
PWC | https://paperswithcode.com/paper/replacing-the-do-calculus-with-bayes-rule |
Repo | |
Framework | |
Joint group and residual sparse coding for image compressive sensing
Title | Joint group and residual sparse coding for image compressive sensing |
Authors | Lizhao Li, Song Xiao |
Abstract | Nonlocal self-similarity and group sparsity have been widely utilized in image compressive sensing (CS). However, when the sampling rate is low, the internal prior information of degraded images may be not enough for accurate restoration, resulting in loss of image edges and details. In this paper, we propose a joint group and residual sparse coding method for CS image recovery (JGRSC-CS). In the proposed JGRSC-CS, patch group is treated as the basic unit of sparse coding and two dictionaries (namely internal and external dictionaries) are applied to exploit the sparse representation of each group simultaneously. The internal self-adaptive dictionary is used to remove artifacts, and an external Gaussian Mixture Model (GMM) dictionary, learned from clean training images, is used to enhance details and texture. To make the proposed method effective and robust, the split Bregman method is adopted to reconstruct the whole image. Experimental results manifest the proposed JGRSC-CS algorithm outperforms existing state-of-the-art methods in both peak signal to noise ratio (PSNR) and visual quality. |
Tasks | Compressive Sensing |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.07720v1 |
http://arxiv.org/pdf/1901.07720v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-group-and-residual-sparse-coding-for |
Repo | |
Framework | |
Design Space Exploration of Hardware Spiking Neurons for Embedded Artificial Intelligence
Title | Design Space Exploration of Hardware Spiking Neurons for Embedded Artificial Intelligence |
Authors | Nassim Abderrahmane, Edgar Lemaire, Benoît Miramond |
Abstract | Machine learning is yielding unprecedented interest in research and industry, due to recent success in many applied contexts such as image classification and object recognition. However, the deployment of these systems requires huge computing capabilities, thus making them unsuitable for embedded systems. To deal with this limitation, many researchers are investigating brain-inspired computing, which would be a perfect alternative to the conventional Von Neumann architecture based computers (CPU/GPU) that meet the requirements for computing performance, but not for energy-efficiency. Therefore, neuromorphic hardware circuits that are adaptable for both parallel and distributed computations need to be designed. In this paper, we focus on Spiking Neural Networks (SNNs) with a comprehensive study of information coding methods and hardware exploration. In this context, we propose a framework for neuromorphic hardware design space exploration, which allows to define a suitable architecture based on application-specific constraints and starting from a wide variety of possible architectural choices. For this framework, we have developed a behavioral level simulator for neuromorphic hardware architectural exploration named NAXT. Moreover, we propose modified versions of the standard Rate Coding technique to make trade-offs with the Time Coding paradigm, which is characterized by the low number of spikes propagating in the network. Thus, we are able to reduce the number of spikes while keeping the same neuron’s model, which results in an SNN with fewer events to process. By doing so, we seek to reduce the amount of power consumed by the hardware. Furthermore, we present three neuromorphic hardware architectures in order to quantitatively study the implementation of SNNs. These architectures are derived from a novel funnel-like Design Space Exploration framework for neuromorphic hardware. |
Tasks | Image Classification, Object Recognition |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.01010v1 |
https://arxiv.org/pdf/1910.01010v1.pdf | |
PWC | https://paperswithcode.com/paper/design-space-exploration-of-hardware-spiking |
Repo | |
Framework | |
Multi-Resolution Weak Supervision for Sequential Data
Title | Multi-Resolution Weak Supervision for Sequential Data |
Authors | Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher Ré |
Abstract | Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision is estimating the unknown accuracies and correlations of these sources without using labeled data. Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence. We propose Dugong, the first framework to model multi-resolution weak supervision sources with complex correlations to assign probabilistic labels to training data. Theoretically, we prove that Dugong, under mild conditions, can uniquely recover the unobserved accuracy and correlation parameters and use parameter sharing to improve sample complexity. Our method assigns clinician-validated labels to population-scale biomedical video repositories, helping outperform traditional supervision by 36.8 F1 points and addressing a key use case where machine learning has been severely limited by the lack of expert labeled data. On average, Dugong improves over traditional supervision by 16.0 F1 points and existing weak supervision approaches by 24.2 F1 points across several video and sensor classification tasks. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09505v1 |
https://arxiv.org/pdf/1910.09505v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-resolution-weak-supervision-for |
Repo | |
Framework | |
Galaxy classification: A machine learning analysis of GAMA catalogue data
Title | Galaxy classification: A machine learning analysis of GAMA catalogue data |
Authors | Aleke Nolte, Lingyu Wang, Maciej Bilicki, Benne Holwerda, Michael Biehl |
Abstract | We present a machine learning analysis of five labelled galaxy catalogues from the Galaxy And Mass Assembly (GAMA): The SersicCatVIKING and SersicCatUKIDSS catalogues containing morphological features, the GaussFitSimple catalogue containing spectroscopic features, the MagPhys catalogue including physical parameters for galaxies, and the Lambdar catalogue, which contains photometric measurements. Extending work previously presented at the ESANN 2018 conference - in an analysis based on Generalized Relevance Matrix Learning Vector Quantization and Random Forests - we find that neither the data from the individual catalogues nor a combined dataset based on all 5 catalogues fully supports the visual-inspection-based galaxy classification scheme employed to categorise the galaxies. In particular, only one class, the Little Blue Spheroids, is consistently separable from the other classes. To aid further insight into the nature of the employed visual-based classification scheme with respect to physical and morphological features, we present the galaxy parameters that are discriminative for the achieved class distinctions. |
Tasks | Quantization |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07749v1 |
http://arxiv.org/pdf/1903.07749v1.pdf | |
PWC | https://paperswithcode.com/paper/galaxy-classification-a-machine-learning |
Repo | |
Framework | |
STEERAGE: Synthesis of Neural Networks Using Architecture Search and Grow-and-Prune Methods
Title | STEERAGE: Synthesis of Neural Networks Using Architecture Search and Grow-and-Prune Methods |
Authors | Shayan Hassantabar, Xiaoliang Dai, Niraj K. Jha |
Abstract | Neural networks (NNs) have been successfully deployed in many applications. However, architectural design of these models is still a challenging problem. Moreover, neural networks are known to have a lot of redundancy. This increases the computational cost of inference and poses an obstacle to deployment on Internet-of-Thing sensors and edge devices. To address these challenges, we propose the STEERAGE synthesis methodology. It consists of two complementary approaches: efficient architecture search, and grow-and-prune NN synthesis. The first step, covered in a global search module, uses an accuracy predictor to efficiently navigate the architectural search space. The predictor is built using boosted decision tree regression, iterative sampling, and efficient evolutionary search. The second step involves local search. By using various grow-and-prune methodologies for synthesizing convolutional and feed-forward NNs, it reduces the network redundancy, while boosting its performance. We have evaluated STEERAGE performance on various datasets, including MNIST and CIFAR-10. On MNIST dataset, our CNN architecture achieves an error rate of 0.66%, with 8.6x fewer parameters compared to the LeNet-5 baseline. For the CIFAR-10 dataset, we used the ResNet architectures as the baseline. Our STEERAGE-synthesized ResNet-18 has a 2.52% accuracy improvement over the original ResNet-18, 1.74% over ResNet-101, and 0.16% over ResNet-1001, while having comparable number of parameters and FLOPs to the original ResNet-18. This shows that instead of just increasing the number of layers to increase accuracy, an alternative is to use a better NN architecture with fewer layers. In addition, STEERAGE achieves an error rate of just 3.86% with a variant of ResNet architecture with 40 layers. To the best of our knowledge, this is the highest accuracy obtained by ResNet-based architectures on the CIFAR-10 dataset. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.05831v1 |
https://arxiv.org/pdf/1912.05831v1.pdf | |
PWC | https://paperswithcode.com/paper/steerage-synthesis-of-neural-networks-using |
Repo | |
Framework | |
Gated recurrent units viewed through the lens of continuous time dynamical systems
Title | Gated recurrent units viewed through the lens of continuous time dynamical systems |
Authors | Ian D. Jordan, Piotr Aleksander Sokol, Il Memming Park |
Abstract | Gated recurrent units (GRUs) are specialized memory elements for building recurrent neural networks. Despite their incredible success in natural language, speech, and video processing, little is understood about the specific dynamics representable in a GRU network, along with the constraints these dynamics impose when generalizing a specific task. As a result, it is difficult to know a priori how successful a GRU network will perform on a given task. Using a continuous time analysis, we gain intuition on the inner workings of GRU networks. We restrict our presentation to low dimensions to allow for a comprehensive visualization. We found a surprisingly rich repertoire of dynamical features that includes stable limit cycles (nonlinear oscillations), multi-stable dynamics with various topologies, and homoclinic orbits. We contextualize the usefulness of the different kinds of dynamics and experimentally test their existence. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.01005v1 |
https://arxiv.org/pdf/1906.01005v1.pdf | |
PWC | https://paperswithcode.com/paper/gated-recurrent-units-viewed-through-the-lens |
Repo | |
Framework | |
Hello, It’s GPT-2 – How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems
Title | Hello, It’s GPT-2 – How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems |
Authors | Paweł Budzianowski, Ivan Vulić |
Abstract | Data scarcity is a long-standing and crucial challenge that hinders quick development of task-oriented dialogue systems across multiple domains: task-oriented dialogue models are expected to learn grammar, syntax, dialogue reasoning, decision making, and language generation from absurdly small amounts of task-specific data. In this paper, we demonstrate that recent progress in language modeling pre-training and transfer learning shows promise to overcome this problem. We propose a task-oriented dialogue model that operates solely on text input: it effectively bypasses explicit policy and language generation modules. Building on top of the TransferTransfo framework (Wolf et al., 2019) and generative model pre-training (Radford et al., 2019), we validate the approach on complex multi-domain task-oriented dialogues from the MultiWOZ dataset. Our automatic and human evaluations show that the proposed model is on par with a strong task-specific neural baseline. In the long run, our approach holds promise to mitigate the data scarcity problem, and to support the construction of more engaging and more eloquent task-oriented conversational agents. |
Tasks | Decision Making, Language Modelling, Task-Oriented Dialogue Systems, Text Generation, Transfer Learning |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05774v2 |
https://arxiv.org/pdf/1907.05774v2.pdf | |
PWC | https://paperswithcode.com/paper/hello-its-gpt-2-how-can-i-help-you-towards |
Repo | |
Framework | |
Kernelized Complete Conditional Stein Discrepancy
Title | Kernelized Complete Conditional Stein Discrepancy |
Authors | Raghav Singhal, Xintian Han, Saad Lahlou, Rajesh Ranganath |
Abstract | Much of machine learning relies on comparing distributions with discrepancy measures. Stein’s method creates discrepancy measures between two distributions that require only the unnormalized density of one and samples from the other. Stein discrepancies can be combined with kernels to define kernelized Stein discrepancies (KSDs). While kernels make Stein discrepancies tractable, they pose several challenges in high dimensions. We introduce kernelized complete conditional Stein discrepancies (KCC-SDs). Complete conditionals turn a multivariate distribution into multiple univariate distributions. We show that KCC-SDs distinguish distributions. We empirically show that KCC-SDs detect non-convergence where KSDs fail. Our experiments illustrate the efficacy of KCC-SDs compared to KSDs for comparing high-dimensional distributions. |
Tasks | |
Published | 2019-04-09 |
URL | https://arxiv.org/abs/1904.04478v3 |
https://arxiv.org/pdf/1904.04478v3.pdf | |
PWC | https://paperswithcode.com/paper/kernelized-complete-conditional-stein |
Repo | |
Framework | |
Bilinear Models for Machine Learning
Title | Bilinear Models for Machine Learning |
Authors | Tayssir Doghri, Leszek Szczecinski, Jacob Benesty, Amar Mitiche |
Abstract | In this work we define and analyze the bilinear models which replace the conventional linear operation used in many building blocks of machine learning (ML). The main idea is to devise the ML algorithms which are adapted to the objects they treat. In the case of monochromatic images, we show that the bilinear operation exploits better the structure of the image than the conventional linear operation which ignores the spatial relationship between the pixels. This translates into significantly smaller number of parameters required to yield the same performance. We show numerical examples of classification in the MNIST data set. |
Tasks | |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03354v1 |
https://arxiv.org/pdf/1912.03354v1.pdf | |
PWC | https://paperswithcode.com/paper/bilinear-models-for-machine-learning |
Repo | |
Framework | |
Mixed-Variable Bayesian Optimization
Title | Mixed-Variable Bayesian Optimization |
Authors | Erik Daxberger, Anastasia Makarova, Matteo Turchetta, Andreas Krause |
Abstract | The optimization of expensive to evaluate, black-box, mixed-variable functions, i.e. functions that have continuous and discrete inputs, is a difficult and yet pervasive problem in science and engineering. In Bayesian optimization (BO), special cases of this problem that consider fully continuous or fully discrete domains have been widely studied. However, few methods exist for mixed-variable domains. In this paper, we introduce MiVaBo, a novel BO algorithm for the efficient optimization of mixed-variable functions that combines a linear surrogate model based on expressive feature representations with Thompson sampling. We propose two methods to optimize its acquisition function, a challenging problem for mixed-variable domains, and we show that MiVaBo can handle complex constraints over the discrete part of the domain that other methods cannot take into account. Moreover, we provide the first convergence analysis of a mixed-variable BO algorithm. Finally, we show that MiVaBo is significantly more sample efficient than state-of-the-art mixed-variable BO algorithms on hyperparameter tuning tasks. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01329v3 |
https://arxiv.org/pdf/1907.01329v3.pdf | |
PWC | https://paperswithcode.com/paper/mixed-variable-bayesian-optimization |
Repo | |
Framework | |
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
Title | NEZHA: Neural Contextualized Representation for Chinese Language Understanding |
Authors | Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu |
Abstract | The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora. In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks. The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models. The experimental results show that NEZHA achieves the state-of-the-art performances when finetuned on several representative Chinese tasks, including named entity recognition (People’s Daily NER), sentence matching (LCQMC), Chinese sentiment classification (ChnSenti) and natural language inference (XNLI). |
Tasks | Named Entity Recognition, Natural Language Inference, Sentiment Analysis |
Published | 2019-08-31 |
URL | https://arxiv.org/abs/1909.00204v2 |
https://arxiv.org/pdf/1909.00204v2.pdf | |
PWC | https://paperswithcode.com/paper/nezha-neural-contextualized-representation |
Repo | |
Framework | |
Low-bit Quantization of Neural Networks for Efficient Inference
Title | Low-bit Quantization of Neural Networks for Efficient Inference |
Authors | Yoni Choukroun, Eli Kravchik, Fan Yang, Pavel Kisilev |
Abstract | Recent machine learning methods use increasingly large deep neural networks to achieve state of the art results in various tasks. The gains in performance come at the cost of a substantial increase in computation and storage requirements. This makes real-time implementations on limited resources hardware a challenging task. One popular approach to address this challenge is to perform low-bit precision computations via neural network quantization. However, aggressive quantization generally entails a severe penalty in terms of accuracy, and often requires retraining of the network, or resorting to higher bit precision quantization. In this paper, we formalize the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations, allowing low-bit precision inference without the need for full network retraining. The main contributions of our approach are the optimizations of the constrained MSE problem at each layer of the network, the hardware aware partitioning of the network parameters, and the use of multiple low precision quantized tensors for poorly approximated layers. The proposed approach allows 4 bits integer (INT4) quantization for deployment of pretrained models on limited hardware resources. Multiple experiments on various network architectures show that the suggested method yields state of the art results with minimal loss of tasks accuracy. |
Tasks | Quantization |
Published | 2019-02-18 |
URL | http://arxiv.org/abs/1902.06822v2 |
http://arxiv.org/pdf/1902.06822v2.pdf | |
PWC | https://paperswithcode.com/paper/low-bit-quantization-of-neural-networks-for |
Repo | |
Framework | |
On Measuring and Mitigating Biased Inferences of Word Embeddings
Title | On Measuring and Mitigating Biased Inferences of Word Embeddings |
Authors | Sunipa Dev, Tao Li, Jeff Phillips, Vivek Srikumar |
Abstract | Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences in downstream models that rely on them. We use this observation to design a mechanism for measuring stereotypes using the task of natural language inference. We demonstrate a reduction in invalid inferences via bias mitigation strategies on static word embeddings (GloVe). Further, we show that for gender bias, these techniques extend to contextualized embeddings when applied selectively only to the static components of contextualized embeddings (ELMo, BERT). |
Tasks | Natural Language Inference, Word Embeddings |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09369v3 |
https://arxiv.org/pdf/1908.09369v3.pdf | |
PWC | https://paperswithcode.com/paper/on-measuring-and-mitigating-biased-inferences |
Repo | |
Framework | |
4-D Scene Alignment in Surveillance Video
Title | 4-D Scene Alignment in Surveillance Video |
Authors | Robert Wagner, Daniel Crispell, Patrick Feeney, Joe Mundy |
Abstract | Designing robust activity detectors for fixed camera surveillance video requires knowledge of the 3-D scene. This paper presents an automatic camera calibration process that provides a mechanism to reason about the spatial proximity between objects at different times. It combines a CNN-based camera pose estimator with a vertical scale provided by pedestrian observations to establish the 4-D scene geometry. Unlike some previous methods, the people do not need to be tracked nor do the head and feet need to be explicitly detected. It is robust to individual height variations and camera parameter estimation errors. |
Tasks | Calibration |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01675v2 |
https://arxiv.org/pdf/1906.01675v2.pdf | |
PWC | https://paperswithcode.com/paper/4-d-scene-alignment-in-surveillance-video |
Repo | |
Framework | |