January 27, 2020

3166 words 15 mins read

Paper Group ANR 1098

Better Approximate Inference for Partial Likelihood Models with a Latent Structure. Communication Complexity of Estimating Correlations. On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms. Derived Codebooks for High-Accuracy Nearest Neighbor Search. A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed …

Better Approximate Inference for Partial Likelihood Models with a Latent Structure


Title	Better Approximate Inference for Partial Likelihood Models with a Latent Structure
Authors	Amrith Setlur, Barnabás Póczós
Abstract	Temporal Point Processes (TPP) with partial likelihoods involving a latent structure often entail an intractable marginalization, thus making inference hard. We propose a novel approach to Maximum Likelihood Estimation (MLE) involving approximate inference over the latent variables by minimizing a tight upper bound on the approximation gap. Given a discrete latent variable $Z$, the proposed approximation reduces inference complexity from $O(Z^c)$ to $O(Z)$. We use convex conjugates to determine this upper bound in a closed form and show that its addition to the optimization objective results in improved results for models assuming proportional hazards as in Survival Analysis.
Tasks	Point Processes, Survival Analysis
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10211v2
PDF	https://arxiv.org/pdf/1910.10211v2.pdf
PWC	https://paperswithcode.com/paper/better-approximate-inference-for-partial
Repo
Framework

Communication Complexity of Estimating Correlations


Title	Communication Complexity of Estimating Correlations
Authors	Uri Hadar, Jingbo Liu, Yury Polyanskiy, Ofer Shayevitz
Abstract	We characterize the communication complexity of the following distributed estimation problem. Alice and Bob observe infinitely many iid copies of $\rho$-correlated unit-variance (Gaussian or $\pm1$ binary) random variables, with unknown $\rho\in[-1,1]$. By interactively exchanging $k$ bits, Bob wants to produce an estimate $\hat\rho$ of $\rho$. We show that the best possible performance (optimized over interaction protocol $\Pi$ and estimator $\hat \rho$) satisfies $\inf_{\Pi \hat\rho}\sup_\rho \mathbb{E} [\rho-\hat\rho^2] = \tfrac{1}{k} (\frac{1}{2 \ln 2} + o(1))$. Curiously, the number of samples in our achievability scheme is exponential in $k$; by contrast, a naive scheme exchanging $k$ samples achieves the same $\Omega(1/k)$ rate but with a suboptimal prefactor. Our protocol achieving optimal performance is one-way (non-interactive). We also prove the $\Omega(1/k)$ bound even when $\rho$ is restricted to any small open sub-interval of $[-1,1]$ (i.e. a local minimax lower bound). Our proof techniques rely on symmetric strong data-processing inequalities and various tensorization techniques from information-theoretic interactive common-randomness extraction. Our results also imply an $\Omega(n)$ lower bound on the information complexity of the Gap-Hamming problem, for which we show a direct information-theoretic proof.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.09100v2
PDF	http://arxiv.org/pdf/1901.09100v2.pdf
PWC	https://paperswithcode.com/paper/communication-complexity-of-estimating
Repo
Framework

On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms


Title	On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms
Authors	Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar
Abstract	We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions. We start with the MAML method and its first-order approximation (FO-MAML) and highlight the challenges that emerge in their analysis. By overcoming these challenges not only we provide the first theoretical guarantees for MAML and FO-MAML in nonconvex settings, but also we answer some of the unanswered questions for the implementation of these algorithms including how to choose their learning rate and the batch size for both tasks and datasets corresponding to tasks. In particular, we show that MAML can find an $\epsilon$-first-order stationary point ($\epsilon$-FOSP) for any positive $\epsilon$ after at most $\mathcal{O}(1/\epsilon^2)$ iterations at the expense of requiring second-order information. We also show that FO-MAML which ignores the second-order information required in the update of MAML cannot achieve any small desired level of accuracy, i.e., FO-MAML cannot find an $\epsilon$-FOSP for any $\epsilon>0$. We further propose a new variant of the MAML algorithm called Hessian-free MAML which preserves all theoretical guarantees of MAML, without requiring access to second-order information.
Tasks	Meta-Learning
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10400v3
PDF	https://arxiv.org/pdf/1908.10400v3.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-theory-of-gradient-based
Repo
Framework

Derived Codebooks for High-Accuracy Nearest Neighbor Search


Title	Derived Codebooks for High-Accuracy Nearest Neighbor Search
Authors	Fabien André, Anne-Marie Kermarrec, Nicolas Le Scouarnec
Abstract	High-dimensional Nearest Neighbor (NN) search is central in multimedia search systems. Product Quantization (PQ) is a widespread NN search technique which has a high performance and good scalability. PQ compresses high-dimensional vectors into compact codes thanks to a combination of quantizers. Large databases can, therefore, be stored entirely in RAM, enabling fast responses to NN queries. In almost all cases, PQ uses 8-bit quantizers as they offer low response times. In this paper, we advocate the use of 16-bit quantizers. Compared to 8-bit quantizers, 16-bit quantizers boost accuracy but they increase response time by a factor of 3 to 10. We propose a novel approach that allows 16-bit quantizers to offer the same response time as 8-bit quantizers, while still providing a boost of accuracy. Our approach builds on two key ideas: (i) the construction of derived codebooks that allow a fast and approximate distance evaluation, and (ii) a two-pass NN search procedure which builds a candidate set using the derived codebooks, and then refines it using 16-bit quantizers. On 1 billion SIFT vectors, with an inverted index, our approach offers a Recall@100 of 0.85 in 5.2 ms. By contrast, 16-bit quantizers alone offer a Recall@100 of 0.85 in 39 ms, and 8-bit quantizers a Recall@100 of 0.82 in 3.8 ms.
Tasks	Quantization
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06900v1
PDF	https://arxiv.org/pdf/1905.06900v1.pdf
PWC	https://paperswithcode.com/paper/derived-codebooks-for-high-accuracy-nearest
Repo
Framework

A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation


Title	A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation
Authors	Geng Yuan, Xiaolong Ma, Sheng Lin, Zhengang Li, Caiwen Ding
Abstract	The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to embedded systems and IoT devices, where low power consumption are required. To address these challenges, spin orbit torque magnetic random-access memory (SOT-MRAM) and SOT-MRAM based Processing-In-Memory (PIM) engines have been used to reduce the power consumption of DNNs since SOT-MRAM has the characteristic of near-zero standby power, high density, none-volatile. However, the drawbacks of SOT-MRAM based PIM engines such as high writing latency and requiring low bit-width data decrease its popularity as a favorable energy efficient DNN accelerator. To mitigate these drawbacks, we propose an ultra energy efficient framework by using model compression techniques including weight pruning and quantization from the software level considering the architecture of SOT-MRAM PIM. And we incorporate the alternating direction method of multipliers (ADMM) into the training phase to further guarantee the solution feasibility and satisfy SOT-MRAM hardware constraints. Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system throughput at the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficiency and suitable for embedded systems or IoT devices. Our experimental results show the accuracy and compression rate of our proposed framework is consistently outperforming the reference works, while the efficiency (area & power) and throughput of SOT-MRAM PIM engine is significantly improved.
Tasks	Model Compression, Quantization
Published	2019-11-24
URL	https://arxiv.org/abs/1912.05416v1
PDF	https://arxiv.org/pdf/1912.05416v1.pdf
PWC	https://paperswithcode.com/paper/a-sot-mram-based-processing-in-memory-engine
Repo
Framework

Learning to Screen


Title	Learning to Screen
Authors	Alon Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Shay Moran
Abstract	Imagine a large firm with multiple departments that plans a large recruitment. Candidates arrive one-by-one, and for each candidate the firm decides, based on her data (CV, skills, experience, etc), whether to summon her for an interview. The firm wants to recruit the best candidates while minimizing the number of interviews. We model such scenarios as an assignment problem between items (candidates) and categories (departments): the items arrive one-by-one in an online manner, and upon processing each item the algorithm decides, based on its value and the categories it can be matched with, whether to retain or discard it (this decision is irrevocable). The goal is to retain as few items as possible while guaranteeing that the set of retained items contains an optimal matching. We consider two variants of this problem: (i) in the first variant it is assumed that the $n$ items are drawn independently from an unknown distribution $D$. (ii) In the second variant it is assumed that before the process starts, the algorithm has an access to a training set of $n$ items drawn independently from the same unknown distribution (e.g.\ data of candidates from previous recruitment seasons). We give tight bounds on the minimum possible number of retained items in each of these variants. These results demonstrate that one can retain exponentially less items in the second variant (with the training set).
Tasks
Published	2019-02-13
URL	https://arxiv.org/abs/1902.04741v3
PDF	https://arxiv.org/pdf/1902.04741v3.pdf
PWC	https://paperswithcode.com/paper/learning-and-generalization-for-matching
Repo
Framework

Distributed Low Precision Training Without Mixed Precision


Title	Distributed Low Precision Training Without Mixed Precision
Authors	Zehua Cheng, Weiyang Wang, Yan Pan, Thomas Lukasiewicz
Abstract	Low precision training is one of the most popular strategies for deploying the deep model on limited hardware resources. Fixed point implementation of DCNs has the potential to alleviate complexities and facilitate potential deployment on embedded hardware. However, most low precision training solution is based on a mixed precision strategy. In this paper, we have presented an ablation study on different low precision training strategy and propose a solution for IEEE FP-16 format throughout the training process. We tested the ResNet50 on 128 GPU cluster on ImageNet-full dataset. We have viewed that it is not essential to use FP32 format to train the deep models. We have viewed that communication cost reduction, model compression, and large-scale distributed training are three coupled problems.
Tasks	Model Compression
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07384v2
PDF	https://arxiv.org/pdf/1911.07384v2.pdf
PWC	https://paperswithcode.com/paper/distributed-low-precision-training-without
Repo
Framework

Closing the Accuracy Gap in an Event-Based Visual Recognition Task


Title	Closing the Accuracy Gap in an Event-Based Visual Recognition Task
Authors	Bodo Rückauer, Nicolas Känzig, Shih-Chii Liu, Tobi Delbruck, Yulia Sandamirskaya
Abstract	Mobile and embedded applications require neural networks-based pattern recognition systems to perform well under a tight computational budget. In contrast to commonly used synchronous, frame-based vision systems and CNNs, asynchronous, spiking neural networks driven by event-based visual input respond with low latency to sparse, salient features in the input, leading to high efficiency at run-time. The discrete nature of the event-based data streams makes direct training of asynchronous neural networks challenging. This paper studies asynchronous spiking neural networks, obtained by conversion from a conventional CNN trained on frame-based data. As an example, we consider a CNN trained to steer a robot to follow a moving target. We identify possible pitfalls of the conversion and demonstrate how the proposed solutions bring the classification accuracy of the asynchronous network to only 3% below the performance of the original synchronous CNN, while requiring 12x fewer computations. While being applied to a simple task, this work is an important step towards low-power, fast, and embedded neural networks-based vision solutions for robotic applications.
Tasks
Published	2019-05-06
URL	https://arxiv.org/abs/1906.08859v1
PDF	https://arxiv.org/pdf/1906.08859v1.pdf
PWC	https://paperswithcode.com/paper/closing-the-accuracy-gap-in-an-event-based
Repo
Framework

ASCAI: Adaptive Sampling for acquiring Compact AI


Title	ASCAI: Adaptive Sampling for acquiring Compact AI
Authors	Mojan Javaheripi, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar
Abstract	This paper introduces ASCAI, a novel adaptive sampling methodology that can learn how to effectively compress Deep Neural Networks (DNNs) for accelerated inference on resource-constrained platforms. Modern DNN compression techniques comprise various hyperparameters that require per-layer customization to ensure high accuracy. Choosing such hyperparameters is cumbersome as the pertinent search space grows exponentially with the number of model layers. To effectively traverse this large space, we devise an intelligent sampling mechanism that adapts the sampling strategy using customized operations inspired by genetic algorithms. As a special case, we consider the space of model compression as a vector space. The adaptively selected samples enable ASCAI to automatically learn how to tune per-layer compression hyperparameters to optimize the accuracy/model-size trade-off. Our extensive evaluations show that ASCAI outperforms rule-based and reinforcement learning methods in terms of compression rate and/or accuracy
Tasks	Model Compression
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06471v1
PDF	https://arxiv.org/pdf/1911.06471v1.pdf
PWC	https://paperswithcode.com/paper/ascai-adaptive-sampling-for-acquiring-compact
Repo
Framework

A Computing Kernel for Network Binarization on PyTorch


Title	A Computing Kernel for Network Binarization on PyTorch
Authors	Xianda Xu, Marco Pedersoli
Abstract	Deep Neural Networks have now achieved state-of-the-art results in a wide range of tasks including image classification, object detection and so on. However, they are both computation consuming and memory intensive, making them difficult to deploy on low-power devices. Network binarization is one of the existing effective techniques for model compression and acceleration, but there is no computing kernel yet to support it on PyTorch. In this paper we developed a computing kernel supporting 1-bit xnor and bitcount computation on PyTorch. Experimental results show that our kernel could accelerate the inference of the binarized neural network by 3 times in GPU and by 4.5 times in CPU compared with the control group.
Tasks	Image Classification, Model Compression, Object Detection
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04477v1
PDF	https://arxiv.org/pdf/1911.04477v1.pdf
PWC	https://paperswithcode.com/paper/a-computing-kernel-for-network-binarization
Repo
Framework

On the Equivalence of Forward Mode Automatic Differentiation and Symbolic Differentiation


Title	On the Equivalence of Forward Mode Automatic Differentiation and Symbolic Differentiation
Authors	Soeren Laue
Abstract	We show that forward mode automatic differentiation and symbolic differentiation are equivalent in the sense that they both perform the same operations when computing derivatives. This is in stark contrast to the common claim that they are substantially different. The difference is often illustrated by claiming that symbolic differentiation suffers from “expression swell” whereas automatic differentiation does not. Here, we show that this statement is not true. “Expression swell” refers to the phenomenon of a much larger representation of the derivative as opposed to the representation of the original function.
Tasks
Published	2019-04-05
URL	https://arxiv.org/abs/1904.02990v2
PDF	https://arxiv.org/pdf/1904.02990v2.pdf
PWC	https://paperswithcode.com/paper/on-the-equivalence-of-forward-mode-automatic
Repo
Framework

SubCharacter Chinese-English Neural Machine Translation with Wubi encoding


Title	SubCharacter Chinese-English Neural Machine Translation with Wubi encoding
Authors	Wei Zhang, Feifei Lin, Xiaodong Wang, Zhenshuang Liang, Zhen Huang
Abstract	Neural machine translation (NMT) is one of the best methods for understanding the differences in semantic rules between two languages. Especially for Indo-European languages, subword-level models have achieved impressive results. However, when the translation task involves Chinese, semantic granularity remains at the word and character level, so there is still need more fine-grained translation model of Chinese. In this paper, we introduce a simple and effective method for Chinese translation at the sub-character level. Our approach uses the Wubi method to translate Chinese into English; byte-pair encoding (BPE) is then applied. Our method for Chinese-English translation eliminates the need for a complicated word segmentation algorithm during preprocessing. Furthermore, our method allows for sub-character-level neural translation based on recurrent neural network (RNN) architecture, without preprocessing. The empirical results show that for Chinese-English translation tasks, our sub-character-level model has a comparable BLEU score to the subword model, despite having a much smaller vocabulary. Additionally, the small vocabulary is highly advantageous for NMT model compression.
Tasks	Machine Translation, Model Compression
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02737v1
PDF	https://arxiv.org/pdf/1911.02737v1.pdf
PWC	https://paperswithcode.com/paper/subcharacter-chinese-english-neural-machine
Repo
Framework


Title	A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media
Authors	Marzieh Mozafari, Reza Farahbakhsh, Noel Crespi
Abstract	Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of annotated data to train a model. The lack of a sufficient amount of labelled hate speech data, along with the existing biases, has been the main issue in this domain of research. To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers). More specifically, we investigate the ability of BERT at capturing hateful context within social media content by using new fine-tuning methods based on transfer learning. To evaluate our proposed approach, we use two publicly available datasets that have been annotated for racism, sexism, hate, or offensive content on Twitter. The results show that our solution obtains considerable performance on these datasets in terms of precision and recall in comparison to existing approaches. Consequently, our model can capture some biases in data annotation and collection process and can potentially lead us to a more accurate model.
Tasks	Hate Speech Detection, Language Modelling, Transfer Learning
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12574v1
PDF	https://arxiv.org/pdf/1910.12574v1.pdf
PWC	https://paperswithcode.com/paper/a-bert-based-transfer-learning-approach-for
Repo
Framework

ErrorNet: Learning error representations from limited data to improve vascular segmentation


Title	ErrorNet: Learning error representations from limited data to improve vascular segmentation
Authors	Nima Tajbakhsh, Brian Lai, Shilpa Ananth, Xiaowei Ding
Abstract	Deep convolutional neural networks have proved effective in segmenting lesions and anatomies in various medical imaging modalities. However, in the presence of small sample size and domain shift problems, these models often produce masks with non-intuitive segmentation mistakes. In this paper, we propose a segmentation framework called ErrorNet, which learns to correct these segmentation mistakes through the repeated process of injecting systematic segmentation errors to the segmentation result based on a learned shape prior, followed by attempting to predict the injected error. During inference, ErrorNet corrects the segmentation mistakes by adding the predicted error map to the initial segmentation result. ErrorNet has advantages over alternatives based on domain adaptation or CRF-based post processing, because it requires neither domain-specific parameter tuning nor any data from the target domains. We have evaluated ErrorNet using five public datasets for the task of retinal vessel segmentation. The selected datasets differ in size and patient population, allowing us to evaluate the effectiveness of ErrorNet in handling small sample size and domain shift problems. Our experiments demonstrate that ErrorNet outperforms a base segmentation model, a CRF-based post processing scheme, and a domain adaptation method, with a greater performance gain in the presence of the aforementioned dataset limitations.
Tasks	Domain Adaptation, Retinal Vessel Segmentation
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04814v4
PDF	https://arxiv.org/pdf/1910.04814v4.pdf
PWC	https://paperswithcode.com/paper/errornet-learning-error-representations-from
Repo
Framework

VAIS Hate Speech Detection System: A Deep Learning based Approach for System Combination


Title	VAIS Hate Speech Detection System: A Deep Learning based Approach for System Combination
Authors	Thai Binh Nguyen, Quang Minh Nguyen, Thu Hien Nguyen, Ngoc Phuong Pham, The Loc Nguyen, Quoc Truong Do
Abstract	Nowadays, Social network sites (SNSs) such as Facebook, Twitter are common places where people show their opinions, sentiments and share information with others. However, some people use SNSs to post abuse and harassment threats in order to prevent other SNSs users from expressing themselves as well as seeking different opinions. To deal with this problem, SNSs have to use a lot of resources including people to clean the aforementioned content. In this paper, we propose a supervised learning model based on the ensemble method to solve the problem of detecting hate content on SNSs in order to make conversations on SNSs more effective. Our proposed model got the first place for public dashboard with 0.730 F1 macro-score and the third place with 0.584 F1 macro-score for private dashboard at the sixth international workshop on Vietnamese Language and Speech Processing 2019.
Tasks	Hate Speech Detection
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05608v1
PDF	https://arxiv.org/pdf/1910.05608v1.pdf
PWC	https://paperswithcode.com/paper/vais-hate-speech-detection-system-a-deep
Repo
Framework