January 25, 2020

2991 words 15 mins read

Paper Group NAWR 32

Paper Group NAWR 32

Towards Hardware-Aware Tractable Learning of Probabilistic Models. Language Models are Unsupervised Multitask Learners. Online-Within-Online Meta-Learning. GesturePod: Enabling On-device Gesture-based Interaction for White Cane Users. Clustering-Based Article Identification in Historical Newspapers. Semi-supervised deep embedded clustering. Qsparse …

Towards Hardware-Aware Tractable Learning of Probabilistic Models

Title Towards Hardware-Aware Tractable Learning of Probabilistic Models
Authors Laura I. Galindez Olascoaga, Wannes Meert, Nimish Shah, Marian Verhelst, Guy Van Den Broeck
Abstract Smart portable applications increasingly rely on edge computing due to privacy and latency concerns. But guaranteeing always-on functionality comes with two major challenges: heavily resource-constrained hardware; and dynamic application conditions. Probabilistic models present an ideal solution to these challenges: they are robust to missing data, allow for joint predictions and have small data needs. In addition, ongoing efforts in field of tractable learning have resulted in probabilistic models with strict inference efficiency guarantees. However, the current notions of tractability are often limited to model complexity, disregarding the hardware’s specifications and constraints. We propose a novel resource-aware cost metric that takes into consideration the hardware’s properties in determining whether the inference task can be efficiently deployed. We use this metric to evaluate the performance versus resource trade-off relevant to the application of interest, and we propose a strategy that selects the device-settings that can optimally meet users’ requirements. We showcase our framework on a mobile activity recognition scenario, and on a variety of benchmark datasets representative of the field of tractable learning and of the applications of interest.
Tasks Activity Recognition
Published 2019-12-01
URL http://papers.nips.cc/paper/9525-towards-hardware-aware-tractable-learning-of-probabilistic-models
PDF http://papers.nips.cc/paper/9525-towards-hardware-aware-tractable-learning-of-probabilistic-models.pdf
PWC https://paperswithcode.com/paper/towards-hardware-aware-tractable-learning-of
Repo https://github.com/laurago894/HwAwareProb
Framework none

Language Models are Unsupervised Multitask Learners

Title Language Models are Unsupervised Multitask Learners
Authors Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
Abstract Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset - matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples. The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Tasks Common Sense Reasoning, Document Summarization, Language Modelling, Machine Translation, Question Answering, Reading Comprehension, Text Generation
Published 2019-02-14
URL https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
PDF https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
PWC https://paperswithcode.com/paper/language-models-are-unsupervised-multitask
Repo https://github.com/openai/gpt-2
Framework tf

Online-Within-Online Meta-Learning

Title Online-Within-Online Meta-Learning
Authors Giulia Denevi, Dimitris Stamos, Carlo Ciliberto, Massimiliano Pontil
Abstract We study the problem of learning a series of tasks in a fully online Meta-Learning setting. The goal is to exploit similarities among the tasks to incrementally adapt an inner online algorithm in order to incur a low averaged cumulative error over the tasks. We focus on a family of inner algorithms based on a parametrized variant of online Mirror Descent. The inner algorithm is incrementally adapted by an online Mirror Descent meta-algorithm using the corresponding within-task minimum regularized empirical risk as the meta-loss. In order to keep the process fully online, we approximate the meta-subgradients by the online inner algorithm. An upper bound on the approximation error allows us to derive a cumulative error bound for the proposed method. Our analysis can also be converted to the statistical setting by online-to-batch arguments. We instantiate two examples of the framework in which the meta-parameter is either a common bias vector or feature map. Finally, preliminary numerical experiments confirm our theoretical findings.
Tasks Meta-Learning
Published 2019-12-01
URL http://papers.nips.cc/paper/9468-online-within-online-meta-learning
PDF http://papers.nips.cc/paper/9468-online-within-online-meta-learning.pdf
PWC https://paperswithcode.com/paper/online-within-online-meta-learning
Repo https://github.com/dstamos/Adversarial-LTL
Framework none

GesturePod: Enabling On-device Gesture-based Interaction for White Cane Users

Title GesturePod: Enabling On-device Gesture-based Interaction for White Cane Users
Authors Shishir G. Patil, Don Dennis, Chirag Pabbaraju, Nadeem Shaheer, Harsha Vardhan Simhadri, Vivek Seshadri, Manik Varma, Prateek Jain
Abstract People using white canes for navigation find it challenging to concurrently access devices such as smartphones. Building on prior research on abandonment of specialized devices, we explore a new touch free mode of interaction wherein a person with visual impairment can perform gestures on their existing white cane to trigger tasks on their smartphone. We present GesturePod, an easy-to-integrate device that clips on to any white cane, and detects gestures performed with the cane. With GesturePod, a user can perform common tasks on their smartphone without touch or even removing the phone from their pocket or bag. We discuss the challenges in building the device and our design choices. We propose a novel, efficient machine learning pipeline to train and deploy the gesture recognition model. Our in-lab study shows that GesturePod achieves 92% gesture recognition accuracy and can help perform common smartphone tasks faster. Our in-wild study suggests that GesturePod is a promising tool to improve smartphone access for people with VI, especially in constrained outdoor scenarios.
Tasks Gesture Recognition, Time Series, Time Series Classification
Published 2019-10-20
URL https://dl.acm.org/citation.cfm?id=3347881
PDF https://github.com/microsoft/EdgeML/blob/master/docs/publications/GesturePod-UIST19.pdf
PWC https://paperswithcode.com/paper/gesturepod-enabling-on-device-gesture-based
Repo https://github.com/Microsoft/EdgeML
Framework tf

Clustering-Based Article Identification in Historical Newspapers

Title Clustering-Based Article Identification in Historical Newspapers
Authors Martin Riedl, Daniela Betz, Sebastian Pad{'o}
Abstract This article focuses on the problem of identifying articles and recovering their text from within and across newspaper pages when OCR just delivers one text file per page. We frame the task as a segmentation plus clustering step. Our results on a sample of 1912 New York Tribune magazine shows that performing the clustering based on similarities computed with word embeddings outperforms a similarity measure based on character n-grams and words. Furthermore, the automatic segmentation based on the text results in low scores, due to the low quality of some OCRed documents.
Tasks Optical Character Recognition, Word Embeddings
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2502/
PDF https://www.aclweb.org/anthology/W19-2502
PWC https://paperswithcode.com/paper/clustering-based-article-identification-in
Repo https://github.com/riedlma/cluster_identification
Framework none

Semi-supervised deep embedded clustering

Title Semi-supervised deep embedded clustering
Authors Yazhou Ren, Kangrong Hu, Xinyi Dai, Lili Pan, Steven C.H. Hoi, Zenglin Xu
Abstract Clustering is an important topic in machine learning and data mining. Recently, deep clustering, which learns feature representations for clustering tasks using deep neural networks, has attracted increasing attention for various clustering applications. Deep embedded clustering (DEC) is one of the state-of-the-art deep clustering methods. However, DEC does not make use of prior knowledge to guide the learning process. In this paper, we propose a new scheme of semi-supervised deep embedded clustering (SDEC) to overcome this limitation. Concretely, SDEC learns feature representations that favor the clustering tasks and performs clustering assignments simultaneously. In contrast to DEC, SDEC incorporates pairwise constraints in the feature learning process such that data samples belonging to the same cluster are close to each other and data samples belonging to different clusters are far away from each other in the learned feature space. Extensive experiments on real benchmark data sets validate the effectiveness and robustness of the proposed method.
Tasks
Published 2019-01-24
URL https://www.sciencedirect.com/science/article/pii/S0925231218312049
PDF https://www.sciencedirect.com/science/article/pii/S0925231218312049/pdfft?md5=8a996048c4341a32dc25417827d6bf2e&pid=1-s2.0-S0925231218312049-main.pdf
PWC https://paperswithcode.com/paper/semi-supervised-deep-embedded-clustering
Repo https://github.com/yongzx/SDEC-Keras
Framework none

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations

Title Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations
Authors Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi
Abstract Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients. We propose both synchronous and asynchronous implementations of Qsparse-local-SGD. We analyze convergence for Qsparse-local-SGD in the distributed case, for smooth non-convex and convex objective functions. We demonstrate that Qsparse-local-SGD converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. We use Qsparse-local-SGD to train ResNet-50 on ImageNet, and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.
Tasks Distributed Optimization, Quantization
Published 2019-12-01
URL http://papers.nips.cc/paper/9610-qsparse-local-sgd-distributed-sgd-with-quantization-sparsification-and-local-computations
PDF http://papers.nips.cc/paper/9610-qsparse-local-sgd-distributed-sgd-with-quantization-sparsification-and-local-computations.pdf
PWC https://paperswithcode.com/paper/qsparse-local-sgd-distributed-sgd-with-1
Repo https://github.com/karakusc/horovod
Framework tf

Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization

Title Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization
Authors Farzin Haddadpour, Mohammad Mahdi Kamani, Mehrdad Mahdavi, Viveck Cadambe
Abstract Communication overhead is one of the key challenges that hinder the scalability of distributed optimization algorithms to train large neural networks. In recent years, there has been a great deal of research to alleviate communication cost by compressing the gradient vector or using local updates and periodic model averaging. In this paper, we advocate the use of redundancy towards communication-efficient distributed stochastic algorithms for non-convex optimization. In particular, we, both theoretically and practically, show that by properly infusing redundancy to the training data with model averaging, it is possible to significantly reduce the number of communication rounds. To be more precise, we show that redundancy reduces residual error in local averaging, thereby reaching the same level of accuracy with fewer rounds of communication as compared with previous algorithms. Empirical studies on CIFAR10, CIFAR100 and ImageNet datasets in a distributed environment complement our theoretical results; they show that our algorithms have additional beneficial aspects including tolerance to failures, as well as greater gradient diversity.
Tasks Distributed Optimization
Published 2019-07-03
URL http://proceedings.mlr.press/v97/haddadpour19a.html
PDF http://proceedings.mlr.press/v97/haddadpour19a/haddadpour19a.pdf
PWC https://paperswithcode.com/paper/trading-redundancy-for-communication-speeding
Repo https://github.com/mmkamani7/RI-SGD
Framework tf

Surround Modulation: A Bio-inspired Connectivity Structure for Convolutional Neural Networks

Title Surround Modulation: A Bio-inspired Connectivity Structure for Convolutional Neural Networks
Authors Hosein Hasani, Mahdieh Soleymani, Hamid Aghajan
Abstract Numerous neurophysiological studies have revealed that a large number of the primary visual cortex neurons operate in a regime called surround modulation. Surround modulation has a substantial effect on various perceptual tasks, and it also plays a crucial role in the efficient neural coding of the visual cortex. Inspired by the notion of surround modulation, we designed new excitatory-inhibitory connections between a unit and its surrounding units in the convolutional neural network (CNN) to achieve a more biologically plausible network. Our experiments show that this simple mechanism can considerably improve both the performance and training speed of traditional CNNs in visual tasks. We further explore additional outcomes of the proposed structure. We first evaluate the model under several visual challenges, such as the presence of clutter or change in lighting conditions and show its superior generalization capability in handling these challenging situations. We then study possible changes in the statistics of neural activities such as sparsity and decorrelation and provide further insight into the underlying efficiencies of surround modulation. Experimental results show that importing surround modulation into the convolutional layers ensues various effects analogous to those derived by surround modulation in the visual cortex.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks
PDF http://papers.nips.cc/paper/9719-surround-modulation-a-bio-inspired-connectivity-structure-for-convolutional-neural-networks.pdf
PWC https://paperswithcode.com/paper/surround-modulation-a-bio-inspired
Repo https://github.com/HoseinHasani/SM-CNN
Framework none

Online Continual Learning with Maximal Interfered Retrieval

Title Online Continual Learning with Maximal Interfered Retrieval
Authors Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, Lucas Page-Caccia
Abstract Continual learning, the setting where a learning agent is faced with a never-ending stream of data, continues to be a great challenge for modern machine learning systems. In particular the online or “single-pass through the data” setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work, we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at https://github.com/optimass/Maximally_Interfered_Retrieval
Tasks Continual Learning
Published 2019-12-01
URL http://papers.nips.cc/paper/9357-online-continual-learning-with-maximal-interfered-retrieval
PDF http://papers.nips.cc/paper/9357-online-continual-learning-with-maximal-interfered-retrieval.pdf
PWC https://paperswithcode.com/paper/online-continual-learning-with-maximal
Repo https://github.com/optimass/Maximally_Interfered_Retrieval
Framework pytorch

Self-Routing Capsule Networks

Title Self-Routing Capsule Networks
Authors Taeyoung Hahn, Myeongjang Pyeon, Gunhee Kim
Abstract Capsule networks have recently gained a great deal of interest as a new architecture of neural networks that can be more robust to input perturbations than similar-sized CNNs. Capsule networks have two major distinctions from the conventional CNNs: (i) each layer consists of a set of capsules that specialize in disjoint regions of the feature space and (ii) the routing-by-agreement coordinates connections between adjacent capsule layers. Although the routing-by-agreement is capable of filtering out noisy predictions of capsules by dynamically adjusting their influences, its unsupervised clustering nature causes two weaknesses: (i) high computational complexity and (ii) cluster assumption that may not hold in presence of heavy input noise. In this work, we propose a novel and surprisingly simple routing strategy called self-routing where each capsule is routed independently by its subordinate routing network. Therefore, the agreement between capsules is not required anymore but both poses and activations of upper-level capsules are obtained in a way similar to Mixture-of-Experts. Our experiments on CIFAR-10, SVHN and SmallNORB show that the self-routing performs more robustly against white-box adversarial attacks and affine transformations, requiring less computation.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8982-self-routing-capsule-networks
PDF http://papers.nips.cc/paper/8982-self-routing-capsule-networks.pdf
PWC https://paperswithcode.com/paper/self-routing-capsule-networks
Repo https://github.com/coder3000/SR-CapsNet
Framework pytorch

Implicit Generation and Modeling with Energy Based Models

Title Implicit Generation and Modeling with Energy Based Models
Authors Yilun Du, Igor Mordatch
Abstract Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train. We present techniques to scale MCMC based EBM training on continuous neural networks, and we show its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving better samples than other likelihood models and nearing the performance of contemporary GAN approaches, while covering all modes of the data. We highlight some unique capabilities of implicit generation such as compositionality and corrupt image reconstruction and inpainting. Finally, we show that EBMs are useful models across a wide variety of tasks, achieving state-of-the-art out-of-distribution classification, adversarially robust classification, state-of-the-art continual online class learning, and coherent long term predicted trajectory rollouts.
Tasks Image Reconstruction
Published 2019-12-01
URL http://papers.nips.cc/paper/8619-implicit-generation-and-modeling-with-energy-based-models
PDF http://papers.nips.cc/paper/8619-implicit-generation-and-modeling-with-energy-based-models.pdf
PWC https://paperswithcode.com/paper/implicit-generation-and-modeling-with-energy
Repo https://github.com/openai/ebm_code_release
Framework tf

Exploring Algorithmic Fairness in Robust Graph Covering Problems

Title Exploring Algorithmic Fairness in Robust Graph Covering Problems
Authors Aida Rahmattalabi, Phebe Vayanos, Anthony Fulginiti, Eric Rice, Bryan Wilder, Amulya Yadav, Milind Tambe
Abstract Fueled by algorithmic advances, AI algorithms are increasingly being deployed in settings subject to unanticipated challenges with complex social effects. Motivated by real-world deployment of AI driven, social-network based suicide prevention and landslide risk management interventions, this paper focuses on a robust graph covering problem subject to group fairness constraints. We show that, in the absence of fairness constraints, state-of-the-art algorithms for the robust graph covering problem result in biased node coverage: they tend to discriminate individuals (nodes) based on membership in traditionally marginalized groups. To remediate this issue, we propose a novel formulation of the robust covering problem with fairness constraints and a tractable approximation scheme applicable to real world instances. We provide a formal analysis of the price of group fairness (PoF) for this problem, where we show that uncertainty can lead to greater PoF. We demonstrate the effectiveness of our approach on several real-world social networks. Our method yields competitive node coverage while significantly improving group fairness relative to state-of-the-art methods.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9707-exploring-algorithmic-fairness-in-robust-graph-covering-problems
PDF http://papers.nips.cc/paper/9707-exploring-algorithmic-fairness-in-robust-graph-covering-problems.pdf
PWC https://paperswithcode.com/paper/exploring-algorithmic-fairness-in-robust
Repo https://github.com/Aida-Rahmattalabi/Fair-and-Robust-Graph-Covering-Problem
Framework none

An Open Online Dictionary for Endangered Uralic Languages

Title An Open Online Dictionary for Endangered Uralic Languages
Authors Mika Hämäläinen, Jack Rueter
Abstract We describe a MediaWiki-based online dictionary for endangered Uralic languages. The system makes it possible to synchronize edits done in XML-based dictionaries and edits done in the MediaWiki system. This makes it possible to integrate the system with the existing open-source Giellatekno infrastructure that provides and utilizes XML formatted dictionaries for use in a variety of NLP tasks. As our system provides an online dictionary, the XML-based dictionaries become available for a wider audience and the dictionary editing process can be crowdsourced for community engagement with a full integration to the existing XML dictionaries. We present how new automatically produced data is encoded and incorporated into our system in addition to our preliminary experiences with crowdsourcing.
Tasks
Published 2019-10-01
URL https://researchportal.helsinki.fi/en/publications/an-open-online-dictionary-for-endangered-uralic-languages
PDF https://helda.helsinki.fi//bitstream/handle/10138/305873/eLex_2019_46.pdf?sequence=1
PWC https://paperswithcode.com/paper/an-open-online-dictionary-for-endangered
Repo https://github.com/mikahama/akusanat
Framework none

Object detection deep learning networks for Optical Character Recognition

Title Object detection deep learning networks for Optical Character Recognition
Authors Christopher Bourez, Aurelien Coquard
Abstract In this article, we show how we applied a simple approach coming from deep learning networks for object detection to the task of optical character recognition in order to build image features taylored for documents. In contrast to scene text reading in natural images using networks pretrained on ImageNet, our document reading is performed with small networks inspired by MNIST digit recognition challenge, at a small computational budget and a small stride. The object detection modern frameworks allow a direct end-to-end training, with no other algorithm than the deep learning and the non-max-suppression algorithm to filter the duplicate predictions. The trained weights can be used for higher level models, such as, for example, document classification, or document segmentation.
Tasks Document Classification, Object Detection, Optical Character Recognition
Published 2019-05-01
URL https://openreview.net/forum?id=S1ej8o05tm
PDF https://openreview.net/pdf?id=S1ej8o05tm
PWC https://paperswithcode.com/paper/object-detection-deep-learning-networks-for
Repo https://github.com/Ivalua/object_detection_ocr
Framework tf
comments powered by Disqus