October 21, 2019

3277 words 16 mins read

Paper Group AWR 116

Paper Group AWR 116

A Closer Look at Structured Pruning for Neural Network Compression. Relaxed Quantization for Discretized Neural Networks. Latent Space Autoregression for Novelty Detection. Subword Encoding in Lattice LSTM for Chinese Word Segmentation. A Large-scale Attribute Dataset for Zero-shot Learning. Aesthetic Discrimination of Graph Layouts. CSRNet: Dilate …

A Closer Look at Structured Pruning for Neural Network Compression

Title A Closer Look at Structured Pruning for Neural Network Compression
Authors Elliot J. Crowley, Jack Turner, Amos Storkey, Michael O’Boyle
Abstract Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of structured pruning has largely evaded scrutiny. In this paper, we examine ResNets and DenseNets obtained through structured pruning-and-tuning and make two interesting observations: (i) reduced networks—smaller versions of the original network trained from scratch—consistently outperform pruned networks; (ii) if one takes the architecture of a pruned network and then trains it from scratch it is significantly more competitive. Furthermore, these architectures are easy to approximate: we can prune once and obtain a family of new, scalable network architectures that can simply be trained from scratch. Finally, we compare the inference speed of reduced and pruned networks on hardware, and show that reduced networks are significantly faster. Code is available at https://github.com/BayesWatch/pytorch-prunes.
Tasks Network Pruning, Neural Network Compression
Published 2018-10-10
URL https://arxiv.org/abs/1810.04622v3
PDF https://arxiv.org/pdf/1810.04622v3.pdf
PWC https://paperswithcode.com/paper/pruning-neural-networks-is-it-time-to-nip-it
Repo https://github.com/NatGr/Master_Thesis
Framework tf

Relaxed Quantization for Discretized Neural Networks

Title Relaxed Quantization for Discretized Neural Networks
Authors Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, Max Welling
Abstract Neural network quantization has become an important research area due to its great impact on deployment of large models on resource constrained devices. In order to train networks that can be effectively discretized without loss of performance, we introduce a differentiable quantization procedure. Differentiability can be achieved by transforming continuous distributions over the weights and activations of the network to categorical distributions over the quantization grid. These are subsequently relaxed to continuous surrogates that can allow for efficient gradient-based optimization. We further show that stochastic rounding can be seen as a special case of the proposed approach and that under this formulation the quantization grid itself can also be optimized with gradient descent. We experimentally validate the performance of our method on MNIST, CIFAR 10 and Imagenet classification.
Tasks Quantization
Published 2018-10-03
URL http://arxiv.org/abs/1810.01875v1
PDF http://arxiv.org/pdf/1810.01875v1.pdf
PWC https://paperswithcode.com/paper/relaxed-quantization-for-discretized-neural
Repo https://github.com/newwhitecheng/compress-all-nn
Framework tf

Latent Space Autoregression for Novelty Detection

Title Latent Space Autoregression for Novelty Detection
Authors Davide Abati, Angelo Porrello, Simone Calderara, Rita Cucchiara
Abstract Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure. We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts.
Tasks Anomaly Detection
Published 2018-07-04
URL http://arxiv.org/abs/1807.01653v2
PDF http://arxiv.org/pdf/1807.01653v2.pdf
PWC https://paperswithcode.com/paper/latent-space-autoregression-for-novelty
Repo https://github.com/aimagelab/novelty-detection
Framework pytorch

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

Title Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Authors Jie Yang, Yue Zhang, Shuailong Liang
Abstract We investigate a lattice LSTM network for Chinese word segmentation (CWS) to utilize words or subwords. It integrates the character sequence features with all subsequences information matched from a lexicon. The matched subsequences serve as information shortcut tunnels which link their start and end characters directly. Gated units are used to control the contribution of multiple input links. Through formula derivation and comparison, we show that the lattice LSTM is an extension of the standard LSTM with the ability to take multiple inputs. Previous lattice LSTM model takes word embeddings as the lexicon input, we prove that subword encoding can give the comparable performance and has the benefit of not relying on any external segmentor. The contribution of lattice LSTM comes from both lexicon and pretrained embeddings information, we find that the lexicon information contributes more than the pretrained embeddings information through controlled experiments. Our experiments show that the lattice structure with subword encoding gives competitive or better results with previous state-of-the-art methods on four segmentation benchmarks. Detailed analyses are conducted to compare the performance of word encoding and subword encoding in lattice LSTM. We also investigate the performance of lattice LSTM structure under different circumstances and when this model works or fails.
Tasks Chinese Word Segmentation, Word Embeddings
Published 2018-10-30
URL http://arxiv.org/abs/1810.12594v1
PDF http://arxiv.org/pdf/1810.12594v1.pdf
PWC https://paperswithcode.com/paper/subword-encoding-in-lattice-lstm-for-chinese
Repo https://github.com/jiesutd/SubwordEncoding-CWS
Framework pytorch

A Large-scale Attribute Dataset for Zero-shot Learning

Title A Large-scale Attribute Dataset for Zero-shot Learning
Authors Bo Zhao, Yanwei Fu, Rui Liang, Jiahong Wu, Yonggang Wang, Yizhou Wang
Abstract Zero-Shot Learning (ZSL) has attracted huge research attention over the past few years; it aims to learn the new concepts that have never been seen before. In classical ZSL algorithms, attributes are introduced as the intermediate semantic representation to realize the knowledge transfer from seen classes to unseen classes. Previous ZSL algorithms are tested on several benchmark datasets annotated with attributes. However, these datasets are defective in terms of the image distribution and attribute diversity. In addition, we argue that the “co-occurrence bias problem” of existing datasets, which is caused by the biased co-occurrence of objects, significantly hinders models from correctly learning the concept. To overcome these problems, we propose a Large-scale Attribute Dataset (LAD). Our dataset has 78,017 images of 5 super-classes, 230 classes. The image number of LAD is larger than the sum of the four most popular attribute datasets. 359 attributes of visual, semantic and subjective properties are defined and annotated in instance-level. We analyze our dataset by conducting both supervised learning and zero-shot learning tasks. Seven state-of-the-art ZSL algorithms are tested on this new dataset. The experimental results reveal the challenge of implementing zero-shot learning on our dataset.
Tasks Transfer Learning, Zero-Shot Learning
Published 2018-04-12
URL http://arxiv.org/abs/1804.04314v2
PDF http://arxiv.org/pdf/1804.04314v2.pdf
PWC https://paperswithcode.com/paper/a-large-scale-attribute-dataset-for-zero-shot
Repo https://github.com/PatrickZH/Zero-shot-Learning
Framework none

Aesthetic Discrimination of Graph Layouts

Title Aesthetic Discrimination of Graph Layouts
Authors Moritz Klammler, Tamara Mchedlidze, Alexey Pak
Abstract This paper addresses the following basic question: given two layouts of the same graph, which one is more aesthetically pleasing? We propose a neural network-based discriminator model trained on a labeled dataset that decides which of two layouts has a higher aesthetic quality. The feature vectors used as inputs to the model are based on known graph drawing quality metrics, classical statistics, information-theoretical quantities, and two-point statistics inspired by methods of condensed matter physics. The large corpus of layout pairs used for training and testing is constructed using force-directed drawing algorithms and the layouts that naturally stem from the process of graph generation. It is further extended using data augmentation techniques. The mean prediction accuracy of our model is 95.70%, outperforming discriminators based on stress and on the linear combination of popular quality metrics by a statistically significant margin.
Tasks Data Augmentation, Graph Generation
Published 2018-09-04
URL http://arxiv.org/abs/1809.01017v1
PDF http://arxiv.org/pdf/1809.01017v1.pdf
PWC https://paperswithcode.com/paper/aesthetic-discrimination-of-graph-layouts
Repo https://github.com/5gon12eder/msc-graphstudy
Framework tf

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Title CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
Authors Yuhong Li, Xiaofan Zhang, Deming Chen
Abstract We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO’10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the ShanghaiTech Part_B dataset, CSRNet achieves 47.3% lower Mean Absolute Error (MAE) than the previous state-of-the-art method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the-art approach.
Tasks Scene Recognition
Published 2018-02-27
URL http://arxiv.org/abs/1802.10062v4
PDF http://arxiv.org/pdf/1802.10062v4.pdf
PWC https://paperswithcode.com/paper/csrnet-dilated-convolutional-neural-networks
Repo https://github.com/DiaoXY/CSRnet
Framework tf

Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

Title Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria
Authors Keze Wang, Liang Lin, Xiaopeng Yan, Ziliang Chen, Dongyu Zhang, Lei Zhang
Abstract Though quite challenging, leveraging large-scale unlabeled or partially labeled data in learning systems (e.g., model/classifier training) has attracted increasing attentions due to its fundamental importance. To address this problem, many active learning (AL) methods have been proposed that employ up-to-date detectors to retrieve representative minority samples according to predefined confidence or uncertainty thresholds. However, these AL methods cause the detectors to ignore the remaining majority samples (i.e., those with low uncertainty or high prediction confidence). In this work, by developing a principled active sample mining (ASM) framework, we demonstrate that cost-effectively mining samples from these unlabeled majority data is key to training more powerful object detectors while minimizing user effort. Specifically, our ASM framework involves a switchable sample selection mechanism for determining whether an unlabeled sample should be manually annotated via AL or automatically pseudo-labeled via a novel self-learning process. The proposed process can be compatible with mini-batch based training (i.e., using a batch of unlabeled or partially labeled data as a one-time input) for object detection. In addition, a few samples with low-confidence predictions are selected and annotated via AL. Notably, our method is suitable for object categories that are not seen in the unlabeled data during the learning process. Extensive experiments clearly demonstrate that our ASM framework can achieve performance comparable to that of alternative methods but with significantly fewer annotations.
Tasks Active Learning, Object Detection
Published 2018-06-30
URL http://arxiv.org/abs/1807.00147v3
PDF http://arxiv.org/pdf/1807.00147v3.pdf
PWC https://paperswithcode.com/paper/cost-effective-object-detection-active-sample
Repo https://github.com/yanxp/ASM
Framework none

Robust Adversarial Learning via Sparsifying Front Ends

Title Robust Adversarial Learning via Sparsifying Front Ends
Authors Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, Ramtin Pedarsani
Abstract It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks. In this paper, we take a bottom-up signal processing perspective to this problem and show that a systematic exploitation of sparsity in natural data is a promising tool for defense. For linear classifiers, we show that a sparsifying front end is provably effective against $\ell_{\infty}$-bounded attacks, reducing output distortion due to the attack by a factor of roughly $K/N$ where $N$ is the data dimension and $K$ is the sparsity level. We then extend this concept to deep networks, showing that a “locally linear” model can be used to develop a theoretical foundation for crafting attacks and defenses. We also devise attacks based on the locally linear model that outperform the well-known FGSM attack. We supplement our theoretical results with experiments on the MNIST handwritten digit database, showing the efficacy of the proposed sparsity-based defense schemes.
Tasks
Published 2018-10-24
URL http://arxiv.org/abs/1810.10625v2
PDF http://arxiv.org/pdf/1810.10625v2.pdf
PWC https://paperswithcode.com/paper/toward-robust-neural-networks-via
Repo https://github.com/soorya19/sparsity-based-defenses
Framework tf
Title M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search
Authors Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao
Abstract Learning to walk over a graph towards a target node for a given query and a source node is an important problem in applications such as knowledge base completion (KBC). It can be formulated as a reinforcement learning (RL) problem with a known state transition model. To overcome the challenge of sparse rewards, we develop a graph-walking agent called M-Walk, which consists of a deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN encodes the state (i.e., history of the walked path) and maps it separately to a policy and Q-values. In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards. From these trajectories, the network is improved in an off-policy manner using Q-learning, which modifies the RNN policy via parameter sharing. Our proposed RL algorithm repeatedly applies this policy-improvement step to learn the model. At test time, MCTS is combined with the neural policy to predict the target node. Experimental results on several graph-walking benchmarks show that M-Walk is able to learn better policies than other RL-based methods, which are mainly based on policy gradients. M-Walk also outperforms traditional KBC baselines.
Tasks Knowledge Base Completion, Link Prediction, Q-Learning
Published 2018-02-12
URL http://arxiv.org/abs/1802.04394v5
PDF http://arxiv.org/pdf/1802.04394v5.pdf
PWC https://paperswithcode.com/paper/m-walk-learning-to-walk-over-graphs-using
Repo https://github.com/ciferlv/Papers
Framework none

Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference

Title Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference
Authors Mike Wu, Milan Mosse, Noah Goodman, Chris Piech
Abstract In modern computer science education, massive open online courses (MOOCs) log thousands of hours of data about how students solve coding challenges. Being so rich in data, these platforms have garnered the interest of the machine learning community, with many new algorithms attempting to autonomously provide feedback to help future students learn. But what about those first hundred thousand students? In most educational contexts (i.e. classrooms), assignments do not have enough historical data for supervised learning. In this paper, we introduce a human-in-the-loop “rubric sampling” approach to tackle the “zero shot” feedback challenge. We are able to provide autonomous feedback for the first students working on an introductory programming assignment with accuracy that substantially outperforms data-hungry algorithms and approaches human level fidelity. Rubric sampling requires minimal teacher effort, can associate feedback with specific parts of a student’s solution and can articulate a student’s misconceptions in the language of the instructor. Deep learning inference enables rubric sampling to further improve as more assignment specific student data is acquired. We demonstrate our results on a novel dataset from Code.org, the world’s largest programming education platform.
Tasks Zero-Shot Learning
Published 2018-09-05
URL http://arxiv.org/abs/1809.01357v2
PDF http://arxiv.org/pdf/1809.01357v2.pdf
PWC https://paperswithcode.com/paper/zero-shot-learning-for-code-education-rubric
Repo https://github.com/mhw32/rubric-sampling-public
Framework pytorch

TSM: Temporal Shift Module for Efficient Video Understanding

Title TSM: Temporal Shift Module for Efficient Video Understanding
Authors Ji Lin, Chuang Gan, Song Han
Abstract The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN’s complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: https://github.com/mit-han-lab/temporal-shift-module.
Tasks Action Recognition In Videos, Object Detection, Video Object Detection, Video Recognition, Video Understanding
Published 2018-11-20
URL https://arxiv.org/abs/1811.08383v3
PDF https://arxiv.org/pdf/1811.08383v3.pdf
PWC https://paperswithcode.com/paper/temporal-shift-module-for-efficient-video
Repo https://github.com/niveditarahurkar/CS231N-ActionRecognition
Framework pytorch

Hypernetwork Knowledge Graph Embeddings

Title Hypernetwork Knowledge Graph Embeddings
Authors Ivana Balažević, Carl Allen, Timothy M. Hospedales
Abstract Knowledge graphs are graphical representations of large databases of facts, which typically suffer from incompleteness. Inferring missing relations (links) between entities (nodes) is the task of link prediction. A recent state-of-the-art approach to link prediction, ConvE, implements a convolutional neural network to extract features from concatenated subject and relation vectors. Whilst results are impressive, the method is unintuitive and poorly understood. We propose a hypernetwork architecture that generates simplified relation-specific convolutional filters that (i) outperforms ConvE and all previous approaches across standard datasets; and (ii) can be framed as tensor factorization and thus set within a well established family of factorization models for link prediction. We thus demonstrate that convolution simply offers a convenient computational means of introducing sparsity and parameter tying to find an effective trade-off between non-linear expressiveness and the number of parameters to learn.
Tasks Knowledge Graph Embeddings, Knowledge Graphs, Link Prediction
Published 2018-08-21
URL https://arxiv.org/abs/1808.07018v5
PDF https://arxiv.org/pdf/1808.07018v5.pdf
PWC https://paperswithcode.com/paper/hypernetwork-knowledge-graph-embeddings
Repo https://github.com/ibalazevic/HypER
Framework pytorch

Backpropagating through Structured Argmax using a SPIGOT

Title Backpropagating through Structured Argmax using a SPIGOT
Authors Hao Peng, Sam Thomson, Noah A. Smith
Abstract We introduce the structured projection of intermediate gradients optimization technique (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e.g., parsing) in intermediate layers. SPIGOT requires no marginal inference, unlike structured attention networks (Kim et al., 2017) and some reinforcement learning-inspired solutions (Yogatama et al., 2017). Like so-called straight-through estimators (Hinton, 2012), SPIGOT defines gradient-like quantities associated with intermediate nondifferentiable operations, allowing backpropagation before and after them; SPIGOT’s proxy aims to ensure that, after a parameter update, the intermediate structure will remain well-formed. We experiment on two structured NLP pipelines: syntactic-then-semantic dependency parsing, and semantic parsing followed by sentiment classification. We show that training with SPIGOT leads to a larger improvement on the downstream task than a modularly-trained pipeline, the straight-through estimator, and structured attention, reaching a new state of the art on semantic dependency parsing.
Tasks Dependency Parsing, Semantic Dependency Parsing, Semantic Parsing, Sentiment Analysis
Published 2018-05-12
URL http://arxiv.org/abs/1805.04658v1
PDF http://arxiv.org/pdf/1805.04658v1.pdf
PWC https://paperswithcode.com/paper/backpropagating-through-structured-argmax
Repo https://github.com/Noahs-ARK/SPIGOT
Framework none

GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices

Title GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices
Authors Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri
Abstract This paper addresses the problem of recovering projective camera matrices from collections of fundamental matrices in multiview settings. We make two main contributions. First, given ${n \choose 2}$ fundamental matrices computed for $n$ images, we provide a complete algebraic characterization in the form of conditions that are both necessary and sufficient to enabling the recovery of camera matrices. These conditions are based on arranging the fundamental matrices as blocks in a single matrix, called the $n$-view fundamental matrix, and characterizing this matrix in terms of the signs of its eigenvalues and rank structures. Secondly, we propose a concrete algorithm for projective structure-from-motion that utilizes this characterization. Given a complete or partial collection of measured fundamental matrices, our method seeks camera matrices that minimize a global algebraic error for the measured fundamental matrices. In contrast to existing methods, our optimization, without any initialization, produces a consistent set of fundamental matrices that corresponds to a unique set of cameras (up to a choice of projective frame). Our experiments indicate that our method achieves state of the art performance in both accuracy and running time.
Tasks
Published 2018-12-02
URL http://arxiv.org/abs/1812.00426v3
PDF http://arxiv.org/pdf/1812.00426v3.pdf
PWC https://paperswithcode.com/paper/gpsfm-global-projective-sfm-using-algebraic
Repo https://github.com/amnonge/GPSFM-Code
Framework none
comments powered by Disqus