July 30, 2019

3156 words 15 mins read

Paper Group AWR 64

Paper Group AWR 64

Hello Edge: Keyword Spotting on Microcontrollers. DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples. Count-ception: Counting by Fully Convolutional Redundant Counting. Human Pose Regression by Combining Indirect Part Detection and Contextual Information. Cross-Language Learning for Program Classification using …

Hello Edge: Keyword Spotting on Microcontrollers

Title Hello Edge: Keyword Spotting on Microcontrollers
Authors Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra
Abstract Keyword spotting (KWS) is a critical component for enabling speech based user interactions on smart devices. It requires real-time response and high accuracy for good user experience. Recently, neural networks have become an attractive choice for KWS architecture because of their superior accuracy compared to traditional speech processing algorithms. Due to its always-on nature, KWS application has highly constrained power budget and typically runs on tiny microcontrollers with limited memory and compute capability. The design of neural network architecture for KWS must consider these constraints. In this work, we perform neural network architecture evaluation and exploration for running KWS on resource-constrained microcontrollers. We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements. We show that it is possible to optimize these neural network architectures to fit within the memory and compute constraints of microcontrollers without sacrificing accuracy. We further explore the depthwise separable convolutional neural network (DS-CNN) and compare it against other neural network architectures. DS-CNN achieves an accuracy of 95.4%, which is ~10% higher than the DNN model with similar number of parameters.
Tasks Keyword Spotting
Published 2017-11-20
URL http://arxiv.org/abs/1711.07128v3
PDF http://arxiv.org/pdf/1711.07128v3.pdf
PWC https://paperswithcode.com/paper/hello-edge-keyword-spotting-on
Repo https://github.com/Danila-github/ML-KWS-for-MCU
Framework tf

DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples

Title DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples
Authors Ji Gao, Beilun Wang, Zeming Lin, Weilin Xu, Yanjun Qi
Abstract Recent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in security-sensitive settings. It was observed that an adversary could easily generate adversarial samples by making a small perturbation on irrelevant feature dimensions that are unnecessary for the current classification task. To overcome this problem, we introduce a defensive mechanism called DeepCloak. By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs. Comparing with other defensive approaches, DeepCloak is easy to implement and computationally efficient. Experimental results show that DeepCloak can increase the performance of state-of-the-art DNN models against adversarial samples.
Tasks
Published 2017-02-22
URL http://arxiv.org/abs/1702.06763v8
PDF http://arxiv.org/pdf/1702.06763v8.pdf
PWC https://paperswithcode.com/paper/deepcloak-masking-deep-neural-network-models
Repo https://github.com/QData/DeepCloak
Framework none

Count-ception: Counting by Fully Convolutional Redundant Counting

Title Count-ception: Counting by Fully Convolutional Redundant Counting
Authors Joseph Paul Cohen, Genevieve Boucher, Craig A. Glastonbury, Henry Z. Lo, Yoshua Bengio
Abstract Counting objects in digital images is a process that should be replaced by machines. This tedious task is time consuming and prone to errors due to fatigue of human annotators. The goal is to have a system that takes as input an image and returns a count of the objects inside and justification for the prediction in the form of object localization. We repose a problem, originally posed by Lempitsky and Zisserman, to instead predict a count map which contains redundant counts based on the receptive field of a smaller regression network. The regression network predicts a count of the objects that exist inside this frame. By processing the image in a fully convolutional way each pixel is going to be accounted for some number of times, the number of windows which include it, which is the size of each window, (i.e., 32x32 = 1024). To recover the true count we take the average over the redundant predictions. Our contribution is redundant counting instead of predicting a density map in order to average over errors. We also propose a novel deep neural network architecture adapted from the Inception family of networks called the Count-ception network. Together our approach results in a 20% relative improvement (2.9 to 2.3 MAE) over the state of the art method by Xie, Noble, and Zisserman in 2016.
Tasks Object Localization
Published 2017-03-25
URL http://arxiv.org/abs/1703.08710v2
PDF http://arxiv.org/pdf/1703.08710v2.pdf
PWC https://paperswithcode.com/paper/count-ception-counting-by-fully-convolutional
Repo https://github.com/ieee8023/countception
Framework none

Human Pose Regression by Combining Indirect Part Detection and Contextual Information

Title Human Pose Regression by Combining Indirect Part Detection and Contextual Information
Authors Diogo C. Luvizon, Hedi Tabia, David Picard
Abstract In this paper, we propose an end-to-end trainable regression approach for human pose estimation from still images. We use the proposed Soft-argmax function to convert feature maps directly to joint coordinates, resulting in a fully differentiable framework. Our method is able to learn heat maps representations indirectly, without additional steps of artificial ground truth generation. Consequently, contextual information can be included to the pose predictions in a seamless way. We evaluated our method on two very challenging datasets, the Leeds Sports Poses (LSP) and the MPII Human Pose datasets, reaching the best performance among all the existing regression methods and comparable results to the state-of-the-art detection based approaches.
Tasks Pose Estimation
Published 2017-10-06
URL http://arxiv.org/abs/1710.02322v1
PDF http://arxiv.org/pdf/1710.02322v1.pdf
PWC https://paperswithcode.com/paper/human-pose-regression-by-combining-indirect
Repo https://github.com/dluvizon/pose-regression
Framework tf

Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks

Title Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks
Authors Nghi D. Q. Bui, Lingxiao Jiang, Yijun Yu
Abstract Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement the same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision.
Tasks
Published 2017-10-17
URL http://arxiv.org/abs/1710.06159v2
PDF http://arxiv.org/pdf/1710.06159v2.pdf
PWC https://paperswithcode.com/paper/cross-language-learning-for-program
Repo https://github.com/bdqnghi/bi-tbcnn
Framework tf

Adversarial Examples for Semantic Segmentation and Object Detection

Title Adversarial Examples for Semantic Segmentation and Object Detection
Authors Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille
Abstract It has been well demonstrated that adversarial examples, i.e., natural images with visually imperceptible perturbations added, generally exist for deep networks to fail on image classification. In this paper, we extend adversarial examples to semantic segmentation and object detection which are much more difficult. Our observation is that both segmentation and detection are based on classifying multiple targets on an image (e.g., the basic target is a pixel or a receptive field in segmentation, and an object proposal in detection), which inspires us to optimize a loss function over a set of pixels/proposals for generating adversarial perturbations. Based on this idea, we propose a novel algorithm named Dense Adversary Generation (DAG), which generates a large family of adversarial examples, and applies to a wide range of state-of-the-art deep networks for segmentation and detection. We also find that the adversarial perturbations can be transferred across networks with different training data, based on different architectures, and even for different recognition tasks. In particular, the transferability across networks with the same architecture is more significant than in other cases. Besides, summing up heterogeneous perturbations often leads to better transfer performance, which provides an effective method of black-box adversarial attack.
Tasks Adversarial Attack, Object Detection, Semantic Segmentation
Published 2017-03-24
URL http://arxiv.org/abs/1703.08603v3
PDF http://arxiv.org/pdf/1703.08603v3.pdf
PWC https://paperswithcode.com/paper/adversarial-examples-for-semantic
Repo https://github.com/cihangxie/DAG
Framework none

AttGAN: Facial Attribute Editing by Only Changing What You Want

Title AttGAN: Facial Attribute Editing by Only Changing What You Want
Authors Zhenliang He, Wangmeng Zuo, Meina Kan, Shiguang Shan, Xilin Chen
Abstract Facial attribute editing aims to manipulate single or multiple attributes of a face image, i.e., to generate a new face with desired attributes while preserving other details. Recently, generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of the given face conditioned on the desired attributes. Some existing methods attempt to establish an attribute-independent latent representation for further attribute editing. However, such attribute-independent constraint on the latent representation is excessive because it restricts the capacity of the latent representation and may result in information loss, leading to over-smooth and distorted generation. Instead of imposing constraints on the latent representation, in this work we apply an attribute classification constraint to the generated image to just guarantee the correct change of desired attributes, i.e., to “change what you want”. Meanwhile, the reconstruction learning is introduced to preserve attribute-excluding details, in other words, to “only change what you want”. Besides, the adversarial learning is employed for visually realistic editing. These three components cooperate with each other forming an effective framework for high quality facial attribute editing, referred as AttGAN. Furthermore, our method is also directly applicable for attribute intensity control and can be naturally extended for attribute style manipulation. Experiments on CelebA dataset show that our method outperforms the state-of-the-arts on realistic attribute editing with facial details well preserved.
Tasks
Published 2017-11-29
URL http://arxiv.org/abs/1711.10678v3
PDF http://arxiv.org/pdf/1711.10678v3.pdf
PWC https://paperswithcode.com/paper/attgan-facial-attribute-editing-by-only
Repo https://github.com/kszu/UniGAN
Framework tf

Neural Reranking for Named Entity Recognition

Title Neural Reranking for Named Entity Recognition
Authors Jie Yang, Yue Zhang, Fei Dong
Abstract We propose a neural reranking system for named entity recognition (NER). The basic idea is to leverage recurrent neural network models to learn sentence-level patterns that involve named entity mentions. In particular, given an output sentence produced by a baseline NER model, we replace all entity mentions, such as \textit{Barack Obama}, into their entity types, such as \textit{PER}. The resulting sentence patterns contain direct output information, yet is less sparse without specific named entities. For example, “PER was born in LOC” can be such a pattern. LSTM and CNN structures are utilised for learning deep representations of such sentences for reranking. Results show that our system can significantly improve the NER accuracies over two different baselines, giving the best reported results on a standard benchmark.
Tasks Named Entity Recognition
Published 2017-07-17
URL http://arxiv.org/abs/1707.05127v1
PDF http://arxiv.org/pdf/1707.05127v1.pdf
PWC https://paperswithcode.com/paper/neural-reranking-for-named-entity-recognition
Repo https://github.com/jiesutd/RerankNER
Framework none

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models

Title Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Authors Sergey Ioffe
Abstract Batch Normalization is quite effective at accelerating and improving the training of deep models. However, its effectiveness diminishes when the training minibatches are small, or do not consist of independent samples. We hypothesize that this is due to the dependence of model layer inputs on all the examples in the minibatch, and different activations being produced between training and inference. We propose Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch. Models trained with Batch Renormalization perform substantially better than batchnorm when training with small or non-i.i.d. minibatches. At the same time, Batch Renormalization retains the benefits of batchnorm such as insensitivity to initialization and training efficiency.
Tasks
Published 2017-02-10
URL http://arxiv.org/abs/1702.03275v2
PDF http://arxiv.org/pdf/1702.03275v2.pdf
PWC https://paperswithcode.com/paper/batch-renormalization-towards-reducing
Repo https://github.com/mf1024/Batch-Renormalization-PyTorch
Framework pytorch

Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

Title Distributed Deep Neural Networks over the Cloud, the Edge and End Devices
Authors Surat Teerapittayanon, Bradley McDanel, H. T. Kung
Abstract We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed computing hierarchy, a DDNN can scale up in neural network size and scale out in geographical span. Due to its distributed nature, DDNNs enhance sensor fusion, system fault tolerance and data privacy for DNN applications. In implementing a DDNN, we map sections of a DNN onto a distributed computing hierarchy. By jointly training these sections, we minimize communication and resource usage for devices and maximize usefulness of extracted features which are utilized in the cloud. The resulting system has built-in support for automatic sensor fusion and fault tolerance. As a proof of concept, we show a DDNN can exploit geographical diversity of sensors to improve object recognition accuracy and reduce communication cost. In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
Tasks Object Recognition, Sensor Fusion
Published 2017-09-06
URL http://arxiv.org/abs/1709.01921v1
PDF http://arxiv.org/pdf/1709.01921v1.pdf
PWC https://paperswithcode.com/paper/distributed-deep-neural-networks-over-the
Repo https://github.com/kunglab/ddnn
Framework pytorch

Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction

Title Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction
Authors Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, Tie-Yan Liu
Abstract Stock trend prediction plays a critical role in seeking maximized profit from stock investment. However, precise trend prediction is very difficult since the highly volatile and non-stationary nature of stock market. Exploding information on Internet together with advancing development of natural language processing and text mining techniques have enable investors to unveil market trends and volatility from online content. Unfortunately, the quality, trustworthiness and comprehensiveness of online content related to stock market varies drastically, and a large portion consists of the low-quality news, comments, or even rumors. To address this challenge, we imitate the learning process of human beings facing such chaotic online news, driven by three principles: sequential content dependency, diverse influence, and effective and efficient learning. In this paper, to capture the first two principles, we designed a Hybrid Attention Networks to predict the stock trend based on the sequence of recent related news. Moreover, we apply the self-paced learning mechanism to imitate the third principle. Extensive experiments on real-world stock market data demonstrate the effectiveness of our approach.
Tasks Stock Trend Prediction
Published 2017-12-06
URL http://arxiv.org/abs/1712.02136v3
PDF http://arxiv.org/pdf/1712.02136v3.pdf
PWC https://paperswithcode.com/paper/listening-to-chaotic-whispers-a-deep-learning
Repo https://github.com/Pie33000/stock-prediction
Framework none

Separating Self-Expression and Visual Content in Hashtag Supervision

Title Separating Self-Expression and Visual Content in Hashtag Supervision
Authors Andreas Veit, Maximilian Nickel, Serge Belongie, Laurens van der Maaten
Abstract The variety, abundance, and structured nature of hashtags make them an interesting data source for training vision models. For instance, hashtags have the potential to significantly reduce the problem of manual supervision and annotation when learning vision models for a large number of concepts. However, a key challenge when learning from hashtags is that they are inherently subjective because they are provided by users as a form of self-expression. As a consequence, hashtags may have synonyms (different hashtags referring to the same visual content) and may be ambiguous (the same hashtag referring to different visual content). These challenges limit the effectiveness of approaches that simply treat hashtags as image-label pairs. This paper presents an approach that extends upon modeling simple image-label pairs by modeling the joint distribution of images, hashtags, and users. We demonstrate the efficacy of such approaches in image tagging and retrieval experiments, and show how the joint model can be used to perform user-conditional retrieval and tagging.
Tasks
Published 2017-11-27
URL http://arxiv.org/abs/1711.09825v1
PDF http://arxiv.org/pdf/1711.09825v1.pdf
PWC https://paperswithcode.com/paper/separating-self-expression-and-visual-content
Repo https://github.com/weiyinwei/GCN_PHR
Framework pytorch

A Knowledge-Grounded Neural Conversation Model

Title A Knowledge-Grounded Neural Conversation Model
Authors Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, Michel Galley
Abstract Neural network models are capable of generating extremely natural sounding conversational interactions. Nevertheless, these models have yet to demonstrate that they can incorporate content in the form of factual information or entity-grounded opinion that would enable them to serve in more task-oriented conversational applications. This paper presents a novel, fully data-driven, and knowledge-grounded neural conversation model aimed at producing more contentful responses without slot filling. We generalize the widely-used Seq2Seq approach by conditioning responses on both conversation history and external “facts”, allowing the model to be versatile and applicable in an open-domain setting. Our approach yields significant improvements over a competitive Seq2Seq baseline. Human judges found that our outputs are significantly more informative.
Tasks Slot Filling
Published 2017-02-07
URL http://arxiv.org/abs/1702.01932v2
PDF http://arxiv.org/pdf/1702.01932v2.pdf
PWC https://paperswithcode.com/paper/a-knowledge-grounded-neural-conversation
Repo https://github.com/DSTC-MSR-NLP/DSTC7-End-to-End-Conversation-Modeling
Framework none

Textual Entailment with Structured Attentions and Composition

Title Textual Entailment with Structured Attentions and Composition
Authors Kai Zhao, Liang Huang, Mingbo Ma
Abstract Deep learning techniques are increasingly popular in the textual entailment task, overcoming the fragility of traditional discrete models with hard alignments and logics. In particular, the recently proposed attention models (Rockt"aschel et al., 2015; Wang and Jiang, 2015) achieves state-of-the-art accuracy by computing soft word alignments between the premise and hypothesis sentences. However, there remains a major limitation: this line of work completely ignores syntax and recursion, which is helpful in many traditional efforts. We show that it is beneficial to extend the attention model to tree nodes between premise and hypothesis. More importantly, this subtree-level attention reveals information about entailment relation. We study the recursive composition of this subtree-level entailment relation, which can be viewed as a soft version of the Natural Logic framework (MacCartney and Manning, 2009). Experiments show that our structured attention and entailment composition model can correctly identify and infer entailment relations from the bottom up, and bring significant improvements in accuracy.
Tasks Natural Language Inference
Published 2017-01-04
URL http://arxiv.org/abs/1701.01126v1
PDF http://arxiv.org/pdf/1701.01126v1.pdf
PWC https://paperswithcode.com/paper/textual-entailment-with-structured-attentions
Repo https://github.com/kaayy/structured-attention
Framework torch

A Pursuit of Temporal Accuracy in General Activity Detection

Title A Pursuit of Temporal Accuracy in General Activity Detection
Authors Yuanjun Xiong, Yue Zhao, Limin Wang, Dahua Lin, Xiaoou Tang
Abstract Detecting activities in untrimmed videos is an important but challenging task. The performance of existing methods remains unsatisfactory, e.g., they often meet difficulties in locating the beginning and end of a long complex action. In this paper, we propose a generic framework that can accurately detect a wide variety of activities from untrimmed videos. Our first contribution is a novel proposal scheme that can efficiently generate candidates with accurate temporal boundaries. The other contribution is a cascaded classification pipeline that explicitly distinguishes between relevance and completeness of a candidate instance. On two challenging temporal activity detection datasets, THUMOS14 and ActivityNet, the proposed framework significantly outperforms the existing state-of-the-art methods, demonstrating superior accuracy and strong adaptivity in handling activities with various temporal structures.
Tasks Action Detection, Activity Detection, Temporal Action Localization
Published 2017-03-08
URL http://arxiv.org/abs/1703.02716v1
PDF http://arxiv.org/pdf/1703.02716v1.pdf
PWC https://paperswithcode.com/paper/a-pursuit-of-temporal-accuracy-in-general
Repo https://github.com/yjxiong/action-detection
Framework pytorch
comments powered by Disqus