January 27, 2020

3157 words 15 mins read

Paper Group ANR 1296

Graph Pruning for Model Compression. CARS: Continuous Evolution for Efficient Neural Architecture Search. Visual Wake Words Dataset. Bypassing Backdoor Detection Algorithms in Deep Learning. A Random Gossip BMUF Process for Neural Language Modeling. AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning. Unpaired Image Enhancement Featur …

Graph Pruning for Model Compression


Title	Graph Pruning for Model Compression
Authors	Mingyang Zhang, Xinyi Yu, Jingtao Rong, Linlin Ou, Weidong Zhang
Abstract	Previous AutoML pruning works utilized individual layer features to automatically prune filters. We analyze the correlation for two layers from different blocks which have a short-cut structure. It is found that, in one block, the deeper layer has many redundant filters which can be represented by filters in the former layer so that it is necessary to take information from other layers into consideration in pruning. In this paper, a graph pruning approach is proposed, which views any deep model as a topology graph. Graph PruningNet based on the graph convolution network is designed to automatically extract neighboring information for each node. To extract features from various topologies, Graph PruningNet is connected with Pruned Network by an individual fully connection layer for each node and jointly trained on a training dataset from scratch. Thus, we can obtain reasonable weights for any size of sub-network. We then search the best configuration of the Pruned Network by reinforcement learning. Different from previous work, we take the node features from well-trained Graph PruningNet, instead of the hand-craft features, as the states in reinforcement learning. Compared with other AutoML pruning works, our method has achieved the state-of-the-art under same conditions on ImageNet-2012. The code will be released on GitHub.
Tasks	AutoML, Model Compression
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09817v1
PDF	https://arxiv.org/pdf/1911.09817v1.pdf
PWC	https://paperswithcode.com/paper/graph-pruning-for-model-compression
Repo
Framework

CARS: Continuous Evolution for Efficient Neural Architecture Search


Title	CARS: Continuous Evolution for Efficient Neural Architecture Search
Authors	Zhaohui Yang, Yunhe Wang, Xinghao Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, Chang Xu
Abstract	Searching techniques in most of existing neural architecture search (NAS) algorithms are mainly dominated by differentiable methods for the efficiency reason. In contrast, we develop an efficient continuous evolutionary approach for searching neural networks. Architectures in the population that share parameters within one SuperNet in the latest generation will be tuned over the training dataset with a few epochs. The searching in the next evolution generation will directly inherit both the SuperNet and the population, which accelerates the optimal network generation. The non-dominated sorting strategy is further applied to preserve only results on the Pareto front for accurately updating the SuperNet. Several neural networks with different model sizes and performances will be produced after the continuous search with only 0.4 GPU days. As a result, our framework provides a series of networks with the number of parameters ranging from 3.7M to 5.1M under mobile settings. These networks surpass those produced by the state-of-the-art methods on the benchmark ImageNet dataset.
Tasks	Neural Architecture Search
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04977v6
PDF	https://arxiv.org/pdf/1909.04977v6.pdf
PWC	https://paperswithcode.com/paper/cars-continuous-evolution-for-efficient
Repo
Framework

Visual Wake Words Dataset


Title	Visual Wake Words Dataset
Authors	Aakanksha Chowdhery, Pete Warden, Jonathon Shlens, Andrew Howard, Rocky Rhodes
Abstract	The emergence of Internet of Things (IoT) applications requires intelligence on the edge. Microcontrollers provide a low-cost compute platform to deploy intelligent IoT applications using machine learning at scale, but have extremely limited on-chip memory and compute capability. To deploy computer vision on such devices, we need tiny vision models that fit within a few hundred kilobytes of memory footprint in terms of peak usage and model size on device storage. To facilitate the development of microcontroller friendly models, we present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, and provides a realistic benchmark for tiny vision models. Within a limited memory footprint of 250 KB, several state-of-the-art mobile models achieve accuracy of 85-90% on the Visual Wake Words dataset. We anticipate the proposed dataset will advance the research on tiny vision models that can push the pareto-optimal boundary in terms of accuracy versus memory usage for microcontroller applications.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05721v1
PDF	https://arxiv.org/pdf/1906.05721v1.pdf
PWC	https://paperswithcode.com/paper/visual-wake-words-dataset
Repo
Framework

Bypassing Backdoor Detection Algorithms in Deep Learning


Title	Bypassing Backdoor Detection Algorithms in Deep Learning
Authors	Te Juin Lester Tan, Reza Shokri
Abstract	Deep learning models are known to be vulnerable to various adversarial manipulations of the training data, model parameters, and input data. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary’s objective if the input contains the backdoor features (e.g., a stamp on an image). The poisoned model’s behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model activation functions, in order to remove the backdoor. These algorithms rely on the statistical difference between the latent representations of backdoor-enabled and clean input data in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques (published in IEEE S&P 2019 and NeurIPS 2018). We design a strategic adversarial training that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. We show the effectiveness of our attack on multiple datasets and model architectures. This work calls for designing adversary-aware defense mechanisms for backdoor detection algorithms.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13409v1
PDF	https://arxiv.org/pdf/1905.13409v1.pdf
PWC	https://paperswithcode.com/paper/bypassing-backdoor-detection-algorithms-in
Repo
Framework

A Random Gossip BMUF Process for Neural Language Modeling


Title	A Random Gossip BMUF Process for Neural Language Modeling
Authors	Yiheng Huang, Jinchuan Tian, Lei Han, Guangsen Wang, Xingcheng Song, Dan Su, Dong Yu
Abstract	Neural network language model (NNLM) is an essential component of industrial ASR systems. One important challenge of training an NNLM is to leverage between scaling the learning process and handling big data. Conventional approaches such as block momentum provides a blockwise model update filtering (BMUF) process and achieves almost linear speedups with no performance degradation for speech recognition. However, it needs to calculate the model average from all computing nodes (e.g., GPUs) and when the number of computing nodes is large, the learning suffers from the severe communication latency. As a consequence, BMUF is not suitable under restricted network conditions. In this paper, we present a decentralized BMUF process, in which the model is split into different components, each of which is updated by communicating to some randomly chosen neighbor nodes with the same component, followed by a BMUF-like process. We apply this method to several LSTM language modeling tasks. Experimental results show that our approach achieves consistently better performance than conventional BMUF. In particular, we obtain a lower perplexity than the single-GPU baseline on the wiki-text-103 benchmark using 4 GPUs. In addition, no performance degradation is observed when scaling to 8 and 16 GPUs.
Tasks	Language Modelling, Speech Recognition
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09010v3
PDF	https://arxiv.org/pdf/1909.09010v3.pdf
PWC	https://paperswithcode.com/paper/a-random-gossip-bmuf-process-for-neural
Repo
Framework

AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning


Title	AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning
Authors	Yunhui Guo, Yandong Li, Liqiang Wang, Tajana Rosing
Abstract	There is an increasing number of pre-trained deep neural network models. However, it is still unclear how to effectively use these models for a new task. Transfer learning, which aims to transfer knowledge from source tasks to a target task, is an effective solution to this problem. Fine-tuning is a popular transfer learning technique for deep neural networks where a few rounds of training are applied to the parameters of a pre-trained model to adapt them to a new task. Despite its popularity, in this paper, we show that fine-tuning suffers from several drawbacks. We propose an adaptive fine-tuning approach, called AdaFilter, which selects only a part of the convolutional filters in the pre-trained model to optimize on a per-example basis. We use a recurrent gated network to selectively fine-tune convolutional filters based on the activations of the previous layer. We experiment with 7 public image classification datasets and the results show that AdaFilter can reduce the average classification error of the standard fine-tuning by 2.54%.
Tasks	Image Classification, Transfer Learning
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09659v2
PDF	https://arxiv.org/pdf/1911.09659v2.pdf
PWC	https://paperswithcode.com/paper/adafilter-adaptive-filter-fine-tuning-for
Repo
Framework

Unpaired Image Enhancement Featuring Reinforcement-Learning-Controlled Image Editing Software


Title	Unpaired Image Enhancement Featuring Reinforcement-Learning-Controlled Image Editing Software
Authors	Satoshi Kosugi, Toshihiko Yamasaki
Abstract	This paper tackles unpaired image enhancement, a task of learning a mapping function which transforms input images into enhanced images in the absence of input-output image pairs. Our method is based on generative adversarial networks (GANs), but instead of simply generating images with a neural network, we enhance images utilizing image editing software such as Adobe Photoshop for the following three benefits: enhanced images have no artifacts, the same enhancement can be applied to larger images, and the enhancement is interpretable. To incorporate image editing software into a GAN, we propose a reinforcement learning framework where the generator works as the agent that selects the software’s parameters and is rewarded when it fools the discriminator. Our framework can use high-quality non-differentiable filters present in image editing software, which enables image enhancement with high performance. We apply the proposed method to two unpaired image enhancement tasks: photo enhancement and face beautification. Our experimental results demonstrate that the proposed method achieves better performance, compared to the performances of the state-of-the-art methods based on unpaired learning.
Tasks	Image Enhancement
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07833v1
PDF	https://arxiv.org/pdf/1912.07833v1.pdf
PWC	https://paperswithcode.com/paper/unpaired-image-enhancement-featuring
Repo
Framework

FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference


Title	FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference
Authors	Ruizhou Ding, Zeye Liu, Ting-Wu Chin, Diana Marculescu, R. D., Blanton
Abstract	To improve the throughput and energy efficiency of Deep Neural Networks (DNNs) on customized hardware, lightweight neural networks constrain the weights of DNNs to be a limited combination (denoted as $k\in{1,2}$) of powers of 2. In such networks, the multiply-accumulate operation can be replaced with a single shift operation, or two shifts and an add operation. To provide even more design flexibility, the $k$ for each convolutional filter can be optimally chosen instead of being fixed for every filter. In this paper, we formulate the selection of $k$ to be differentiable, and describe model training for determining $k$-based weights on a per-filter basis. Over 46 FPGA-design experiments involving eight configurations and four data sets reveal that lightweight neural networks with a flexible $k$ value (dubbed FLightNNs) fully utilize the hardware resources on Field Programmable Gate Arrays (FPGAs), our experimental results show that FLightNNs can achieve 2$\times$ speedup when compared to lightweight NNs with $k=2$, with only 0.1% accuracy degradation. Compared to a 4-bit fixed-point quantization, FLightNNs achieve higher accuracy and up to 2$\times$ inference speedup, due to their lightweight shift operations. In addition, our experiments also demonstrate that FLightNNs can achieve higher computational energy efficiency for ASIC implementation.
Tasks	Quantization
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02835v1
PDF	http://arxiv.org/pdf/1904.02835v1.pdf
PWC	https://paperswithcode.com/paper/flightnns-lightweight-quantized-deep-neural
Repo
Framework

Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds


Title	Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds
Authors	Bei Wang, Jianping An, Jiayan Cao
Abstract	Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-FPN, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts multi-scale voxel information in a bottom-up manner while decoder fuses multiple feature maps from various scales in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.
Tasks	3D Object Detection, Autonomous Driving, Object Detection
Published	2019-06-28
URL	https://arxiv.org/abs/1907.05286v2
PDF	https://arxiv.org/pdf/1907.05286v2.pdf
PWC	https://paperswithcode.com/paper/voxel-fpn-multi-scale-voxel-feature
Repo
Framework

Conditions for Unnecessary Logical Constraints in Kernel Machines


Title	Conditions for Unnecessary Logical Constraints in Kernel Machines
Authors	Francesco Giannini, Marco Maggini
Abstract	A main property of support vector machines consists in the fact that only a small portion of the training data is significant to determine the maximum margin separating hyperplane in the feature space, the so called support vectors. In a similar way, in the general scheme of learning from constraints, where possibly several constraints are considered, some of them may turn out to be unnecessary with respect to the learning optimization, even if they are active for a given optimal solution. In this paper we extend the definition of support vector to support constraint and we provide some criteria to determine which constraints can be removed from the learning problem still yielding the same optimal solutions. In particular, we discuss the case of logical constraints expressed by Lukasiewicz logic, where both inferential and algebraic arguments can be considered. Some theoretical results that characterize the concept of unnecessary constraint are proved and explained by means of examples.
Tasks
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00216v2
PDF	https://arxiv.org/pdf/1909.00216v2.pdf
PWC	https://paperswithcode.com/paper/conditions-for-unnecessary-logical
Repo
Framework

SOMOSPIE: A modular SOil MOisture SPatial Inference Engine based on data driven decisions


Title	SOMOSPIE: A modular SOil MOisture SPatial Inference Engine based on data driven decisions
Authors	Danny Rorabaugh, Mario Guevara, Ricardo Llamas, Joy Kitson, Rodrigo Vargas, Michela Taufer
Abstract	The current availability of soil moisture data over large areas comes from satellite remote sensing technologies (i.e., radar-based systems), but these data have coarse resolution and often exhibit large spatial information gaps. Where data are too coarse or sparse for a given need (e.g., precision agriculture), one can leverage machine-learning techniques coupled with other sources of environmental information (e.g., topography) to generate gap-free information and at a finer spatial resolution (i.e., increased granularity). To this end, we develop a spatial inference engine consisting of modular stages for processing spatial environmental data, generating predictions with machine-learning techniques, and analyzing these predictions. We demonstrate the functionality of this approach and the effects of data processing choices via multiple prediction maps over a United States ecological region with a highly diverse soil moisture profile (i.e., the Middle Atlantic Coastal Plains). The relevance of our work derives from a pressing need to improve the spatial representation of soil moisture for applications in environmental sciences (e.g., ecological niche modeling, carbon monitoring systems, and other Earth system models) and precision agriculture (e.g., optimizing irrigation practices and other land management decisions).
Tasks
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07754v2
PDF	https://arxiv.org/pdf/1904.07754v2.pdf
PWC	https://paperswithcode.com/paper/somospie-a-modular-soil-moisture-spatial
Repo
Framework

Almost Boltzmann Exploration


Title	Almost Boltzmann Exploration
Authors	Harsh Gupta, Seo Taek Kong, R. Srikant, Weina Wang
Abstract	Boltzmann exploration is widely used in reinforcement learning to provide a trade-off between exploration and exploitation. Recently, in (Cesa-Bianchi et al., 2017) it has been shown that pure Boltzmann exploration does not perform well from a regret perspective, even in the simplest setting of stochastic multi-armed bandit (MAB) problems. In this paper, we show that a simple modification to Boltzmann exploration, motivated by a variation of the standard doubling trick, achieves $O(K\log^{1+\alpha} T)$ regret for a stochastic MAB problem with $K$ arms, where $\alpha>0$ is a parameter of the algorithm. This improves on the result in (Cesa-Bianchi et al., 2017), where an algorithm inspired by the Gumbel-softmax trick achieves $O(K\log^2 T)$ regret. We also show that our algorithm achieves $O(\beta(G) \log^{1+\alpha} T)$ regret in stochastic MAB problems with graph-structured feedback, without knowledge of the graph structure, where $\beta(G)$ is the independence number of the feedback graph. Additionally, we present extensive experimental results on real datasets and applications for multi-armed bandits with both traditional bandit feedback and graph-structured feedback. In all cases, our algorithm performs as well or better than the state-of-the-art.
Tasks	Multi-Armed Bandits
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08708v2
PDF	http://arxiv.org/pdf/1901.08708v2.pdf
PWC	https://paperswithcode.com/paper/almost-boltzmann-exploration
Repo
Framework

Credibility-based Fake News Detection


Title	Credibility-based Fake News Detection
Authors	Niraj Sitaula, Chilukuri K. Mohan, Jennifer Grygiel, Xinyi Zhou, Reza Zafarani
Abstract	Fake news can significantly misinform people who often rely on online sources and social media for their information. Current research on fake news detection has mostly focused on analyzing fake news content and how it propagates on a network of users. In this paper, we emphasize the detection of fake news by assessing its credibility. By analyzing public fake news data, we show that information on news sources (and authors) can be a strong indicator of credibility. Our findings suggest that an author’s history of association with fake news, and the number of authors of a news article, can play a significant role in detecting fake news. Our approach can help improve traditional fake news detection methods, wherein content features are often used to detect fake news.
Tasks	Fake News Detection
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00643v1
PDF	https://arxiv.org/pdf/1911.00643v1.pdf
PWC	https://paperswithcode.com/paper/credibility-based-fake-news-detection
Repo
Framework

SenseFitting: Sense Level Semantic Specialization of Word Embeddings for Word Sense Disambiguation


Title	SenseFitting: Sense Level Semantic Specialization of Word Embeddings for Word Sense Disambiguation
Authors	Manuel Stoeckel, Sajawel Ahmed, Alexander Mehler
Abstract	We introduce a neural network-based system of Word Sense Disambiguation (WSD) for German that is based on SenseFitting, a novel method for optimizing WSD. We outperform knowledge-based WSD methods by up to 25% F1-score and produce a new state-of-the-art on the German sense-annotated dataset WebCAGe. Our method uses three feature vectors consisting of a) sense, b) gloss, and c) relational vectors to represent target senses and to compare them with the vector centroids of sample contexts. Utilizing widely available word embeddings and lexical resources, we are able to compensate for the lower resource availability of German. SenseFitting builds upon the recently introduced semantic specialization procedure Attract-Repel, and leverages sense level semantic constraints from lexical-semantic networks (e.g. GermaNet) or online social dictionaries (e.g. Wiktionary) to produce high-quality sense embeddings from pre-trained word embeddings. We evaluate our sense embeddings with a new SimLex-999 based similarity dataset, called SimSense, that we developed for this work. We achieve results that outperform current lemma-based specialization methods for German, making them comparable to results achieved for English.
Tasks	Word Embeddings, Word Sense Disambiguation
Published	2019-07-30
URL	https://arxiv.org/abs/1907.13237v1
PDF	https://arxiv.org/pdf/1907.13237v1.pdf
PWC	https://paperswithcode.com/paper/sensefitting-sense-level-semantic
Repo
Framework

Enhancement of Energy-Based Swing-Up Controller via Entropy Search


Title	Enhancement of Energy-Based Swing-Up Controller via Entropy Search
Authors	Chang Sik Lee, Dong Eui Chang
Abstract	An energy based approach for stabilizing a mechanical system has offered a simple yet powerful control scheme. However, since it does not impose such strong constraints on parameter space of the controller, finding appropriate parameter values for an optimal controller is known to be hard. This paper intends to generate an optimal energy-based controller for swinging up a rotary inverted pendulum, also known as the Furuta pendulum, by applying the Bayesian optimization called Entropy Search. Simulations and experiments show that the optimal controller has an improved performance compared to a nominal controller for various initial conditions.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01214v2
PDF	http://arxiv.org/pdf/1904.01214v2.pdf
PWC	https://paperswithcode.com/paper/enhancement-of-energy-based-swing-up
Repo
Framework