October 21, 2019

3277 words 16 mins read

Paper Group AWR 116

A Closer Look at Structured Pruning for Neural Network Compression. Relaxed Quantization for Discretized Neural Networks. Latent Space Autoregression for Novelty Detection. Subword Encoding in Lattice LSTM for Chinese Word Segmentation. A Large-scale Attribute Dataset for Zero-shot Learning. Aesthetic Discrimination of Graph Layouts. CSRNet: Dilate …

A Closer Look at Structured Pruning for Neural Network Compression


Title	A Closer Look at Structured Pruning for Neural Network Compression
Authors	Elliot J. Crowley, Jack Turner, Amos Storkey, Michael O’Boyle
Abstract	Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of structured pruning has largely evaded scrutiny. In this paper, we examine ResNets and DenseNets obtained through structured pruning-and-tuning and make two interesting observations: (i) reduced networks—smaller versions of the original network trained from scratch—consistently outperform pruned networks; (ii) if one takes the architecture of a pruned network and then trains it from scratch it is significantly more competitive. Furthermore, these architectures are easy to approximate: we can prune once and obtain a family of new, scalable network architectures that can simply be trained from scratch. Finally, we compare the inference speed of reduced and pruned networks on hardware, and show that reduced networks are significantly faster. Code is available at https://github.com/BayesWatch/pytorch-prunes.
Tasks	Network Pruning, Neural Network Compression
Published	2018-10-10
URL	https://arxiv.org/abs/1810.04622v3
PDF	https://arxiv.org/pdf/1810.04622v3.pdf
PWC	https://paperswithcode.com/paper/pruning-neural-networks-is-it-time-to-nip-it
Repo	https://github.com/NatGr/Master_Thesis
Framework	tf

Relaxed Quantization for Discretized Neural Networks


Title	Relaxed Quantization for Discretized Neural Networks
Authors	Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, Max Welling
Abstract	Neural network quantization has become an important research area due to its great impact on deployment of large models on resource constrained devices. In order to train networks that can be effectively discretized without loss of performance, we introduce a differentiable quantization procedure. Differentiability can be achieved by transforming continuous distributions over the weights and activations of the network to categorical distributions over the quantization grid. These are subsequently relaxed to continuous surrogates that can allow for efficient gradient-based optimization. We further show that stochastic rounding can be seen as a special case of the proposed approach and that under this formulation the quantization grid itself can also be optimized with gradient descent. We experimentally validate the performance of our method on MNIST, CIFAR 10 and Imagenet classification.
Tasks	Quantization
Published	2018-10-03
URL	http://arxiv.org/abs/1810.01875v1
PDF	http://arxiv.org/pdf/1810.01875v1.pdf
PWC	https://paperswithcode.com/paper/relaxed-quantization-for-discretized-neural
Repo	https://github.com/newwhitecheng/compress-all-nn
Framework	tf

Latent Space Autoregression for Novelty Detection


Title	Latent Space Autoregression for Novelty Detection
Authors	Davide Abati, Angelo Porrello, Simone Calderara, Rita Cucchiara
Abstract	Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure. We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts.
Tasks	Anomaly Detection
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01653v2
PDF	http://arxiv.org/pdf/1807.01653v2.pdf
PWC	https://paperswithcode.com/paper/latent-space-autoregression-for-novelty
Repo	https://github.com/aimagelab/novelty-detection
Framework	pytorch

Subword Encoding in Lattice LSTM for Chinese Word Segmentation


Title	Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Authors	Jie Yang, Yue Zhang, Shuailong Liang
Abstract	We investigate a lattice LSTM network for Chinese word segmentation (CWS) to utilize words or subwords. It integrates the character sequence features with all subsequences information matched from a lexicon. The matched subsequences serve as information shortcut tunnels which link their start and end characters directly. Gated units are used to control the contribution of multiple input links. Through formula derivation and comparison, we show that the lattice LSTM is an extension of the standard LSTM with the ability to take multiple inputs. Previous lattice LSTM model takes word embeddings as the lexicon input, we prove that subword encoding can give the comparable performance and has the benefit of not relying on any external segmentor. The contribution of lattice LSTM comes from both lexicon and pretrained embeddings information, we find that the lexicon information contributes more than the pretrained embeddings information through controlled experiments. Our experiments show that the lattice structure with subword encoding gives competitive or better results with previous state-of-the-art methods on four segmentation benchmarks. Detailed analyses are conducted to compare the performance of word encoding and subword encoding in lattice LSTM. We also investigate the performance of lattice LSTM structure under different circumstances and when this model works or fails.
Tasks	Chinese Word Segmentation, Word Embeddings
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12594v1
PDF	http://arxiv.org/pdf/1810.12594v1.pdf
PWC	https://paperswithcode.com/paper/subword-encoding-in-lattice-lstm-for-chinese
Repo	https://github.com/jiesutd/SubwordEncoding-CWS
Framework	pytorch

A Large-scale Attribute Dataset for Zero-shot Learning


Title	A Large-scale Attribute Dataset for Zero-shot Learning
Authors	Bo Zhao, Yanwei Fu, Rui Liang, Jiahong Wu, Yonggang Wang, Yizhou Wang
Abstract	Zero-Shot Learning (ZSL) has attracted huge research attention over the past few years; it aims to learn the new concepts that have never been seen before. In classical ZSL algorithms, attributes are introduced as the intermediate semantic representation to realize the knowledge transfer from seen classes to unseen classes. Previous ZSL algorithms are tested on several benchmark datasets annotated with attributes. However, these datasets are defective in terms of the image distribution and attribute diversity. In addition, we argue that the “co-occurrence bias problem” of existing datasets, which is caused by the biased co-occurrence of objects, significantly hinders models from correctly learning the concept. To overcome these problems, we propose a Large-scale Attribute Dataset (LAD). Our dataset has 78,017 images of 5 super-classes, 230 classes. The image number of LAD is larger than the sum of the four most popular attribute datasets. 359 attributes of visual, semantic and subjective properties are defined and annotated in instance-level. We analyze our dataset by conducting both supervised learning and zero-shot learning tasks. Seven state-of-the-art ZSL algorithms are tested on this new dataset. The experimental results reveal the challenge of implementing zero-shot learning on our dataset.
Tasks	Transfer Learning, Zero-Shot Learning
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04314v2
PDF	http://arxiv.org/pdf/1804.04314v2.pdf
PWC	https://paperswithcode.com/paper/a-large-scale-attribute-dataset-for-zero-shot
Repo	https://github.com/PatrickZH/Zero-shot-Learning
Framework	none

Aesthetic Discrimination of Graph Layouts


Title	Aesthetic Discrimination of Graph Layouts
Authors	Moritz Klammler, Tamara Mchedlidze, Alexey Pak
Abstract	This paper addresses the following basic question: given two layouts of the same graph, which one is more aesthetically pleasing? We propose a neural network-based discriminator model trained on a labeled dataset that decides which of two layouts has a higher aesthetic quality. The feature vectors used as inputs to the model are based on known graph drawing quality metrics, classical statistics, information-theoretical quantities, and two-point statistics inspired by methods of condensed matter physics. The large corpus of layout pairs used for training and testing is constructed using force-directed drawing algorithms and the layouts that naturally stem from the process of graph generation. It is further extended using data augmentation techniques. The mean prediction accuracy of our model is 95.70%, outperforming discriminators based on stress and on the linear combination of popular quality metrics by a statistically significant margin.
Tasks	Data Augmentation, Graph Generation
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01017v1
PDF	http://arxiv.org/pdf/1809.01017v1.pdf
PWC	https://paperswithcode.com/paper/aesthetic-discrimination-of-graph-layouts
Repo	https://github.com/5gon12eder/msc-graphstudy
Framework	tf

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes


Title	CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
Authors	Yuhong Li, Xiaofan Zhang, Deming Chen
Abstract	We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO’10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the ShanghaiTech Part_B dataset, CSRNet achieves 47.3% lower Mean Absolute Error (MAE) than the previous state-of-the-art method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the-art approach.
Tasks	Scene Recognition
Published	2018-02-27
URL	http://arxiv.org/abs/1802.10062v4
PDF	http://arxiv.org/pdf/1802.10062v4.pdf
PWC	https://paperswithcode.com/paper/csrnet-dilated-convolutional-neural-networks
Repo	https://github.com/DiaoXY/CSRnet
Framework	tf

Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria


Title	Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria
Authors	Keze Wang, Liang Lin, Xiaopeng Yan, Ziliang Chen, Dongyu Zhang, Lei Zhang
Abstract	Though quite challenging, leveraging large-scale unlabeled or partially labeled data in learning systems (e.g., model/classifier training) has attracted increasing attentions due to its fundamental importance. To address this problem, many active learning (AL) methods have been proposed that employ up-to-date detectors to retrieve representative minority samples according to predefined confidence or uncertainty thresholds. However, these AL methods cause the detectors to ignore the remaining majority samples (i.e., those with low uncertainty or high prediction confidence). In this work, by developing a principled active sample mining (ASM) framework, we demonstrate that cost-effectively mining samples from these unlabeled majority data is key to training more powerful object detectors while minimizing user effort. Specifically, our ASM framework involves a switchable sample selection mechanism for determining whether an unlabeled sample should be manually annotated via AL or automatically pseudo-labeled via a novel self-learning process. The proposed process can be compatible with mini-batch based training (i.e., using a batch of unlabeled or partially labeled data as a one-time input) for object detection. In addition, a few samples with low-confidence predictions are selected and annotated via AL. Notably, our method is suitable for object categories that are not seen in the unlabeled data during the learning process. Extensive experiments clearly demonstrate that our ASM framework can achieve performance comparable to that of alternative methods but with significantly fewer annotations.
Tasks	Active Learning, Object Detection
Published	2018-06-30
URL	http://arxiv.org/abs/1807.00147v3
PDF	http://arxiv.org/pdf/1807.00147v3.pdf
PWC	https://paperswithcode.com/paper/cost-effective-object-detection-active-sample
Repo	https://github.com/yanxp/ASM
Framework	none

Robust Adversarial Learning via Sparsifying Front Ends


Title	Robust Adversarial Learning via Sparsifying Front Ends
Authors	Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, Ramtin Pedarsani
Abstract	It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks. In this paper, we take a bottom-up signal processing perspective to this problem and show that a systematic exploitation of sparsity in natural data is a promising tool for defense. For linear classifiers, we show that a sparsifying front end is provably effective against $\ell_{\infty}$-bounded attacks, reducing output distortion due to the attack by a factor of roughly $K/N$ where $N$ is the data dimension and $K$ is the sparsity level. We then extend this concept to deep networks, showing that a “locally linear” model can be used to develop a theoretical foundation for crafting attacks and defenses. We also devise attacks based on the locally linear model that outperform the well-known FGSM attack. We supplement our theoretical results with experiments on the MNIST handwritten digit database, showing the efficacy of the proposed sparsity-based defense schemes.
Tasks
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10625v2
PDF	http://arxiv.org/pdf/1810.10625v2.pdf
PWC	https://paperswithcode.com/paper/toward-robust-neural-networks-via
Repo	https://github.com/soorya19/sparsity-based-defenses
Framework	tf

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search


Title	M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search
Authors	Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao
Abstract	Learning to walk over a graph towards a target node for a given query and a source node is an important problem in applications such as knowledge base completion (KBC). It can be formulated as a reinforcement learning (RL) problem with a known state transition model. To overcome the challenge of sparse rewards, we develop a graph-walking agent called M-Walk, which consists of a deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN encodes the state (i.e., history of the walked path) and maps it separately to a policy and Q-values. In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards. From these trajectories, the network is improved in an off-policy manner using Q-learning, which modifies the RNN policy via parameter sharing. Our proposed RL algorithm repeatedly applies this policy-improvement step to learn the model. At test time, MCTS is combined with the neural policy to predict the target node. Experimental results on several graph-walking benchmarks show that M-Walk is able to learn better policies than other RL-based methods, which are mainly based on policy gradients. M-Walk also outperforms traditional KBC baselines.
Tasks	Knowledge Base Completion, Link Prediction, Q-Learning
Published	2018-02-12
URL	http://arxiv.org/abs/1802.04394v5
PDF	http://arxiv.org/pdf/1802.04394v5.pdf
PWC	https://paperswithcode.com/paper/m-walk-learning-to-walk-over-graphs-using
Repo	https://github.com/ciferlv/Papers
Framework	none

Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference


Title	Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference
Authors	Mike Wu, Milan Mosse, Noah Goodman, Chris Piech
Abstract	In modern computer science education, massive open online courses (MOOCs) log thousands of hours of data about how students solve coding challenges. Being so rich in data, these platforms have garnered the interest of the machine learning community, with many new algorithms attempting to autonomously provide feedback to help future students learn. But what about those first hundred thousand students? In most educational contexts (i.e. classrooms), assignments do not have enough historical data for supervised learning. In this paper, we introduce a human-in-the-loop “rubric sampling” approach to tackle the “zero shot” feedback challenge. We are able to provide autonomous feedback for the first students working on an introductory programming assignment with accuracy that substantially outperforms data-hungry algorithms and approaches human level fidelity. Rubric sampling requires minimal teacher effort, can associate feedback with specific parts of a student’s solution and can articulate a student’s misconceptions in the language of the instructor. Deep learning inference enables rubric sampling to further improve as more assignment specific student data is acquired. We demonstrate our results on a novel dataset from Code.org, the world’s largest programming education platform.
Tasks	Zero-Shot Learning
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01357v2
PDF	http://arxiv.org/pdf/1809.01357v2.pdf
PWC	https://paperswithcode.com/paper/zero-shot-learning-for-code-education-rubric
Repo	https://github.com/mhw32/rubric-sampling-public
Framework	pytorch

TSM: Temporal Shift Module for Efficient Video Understanding


Title	TSM: Temporal Shift Module for Efficient Video Understanding
Authors	Ji Lin, Chuang Gan, Song Han
Abstract	The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN’s complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition and video object detection. TSM is accurate and efficient: it ranks the first place on the Something-Something leaderboard upon publication; on Jetson Nano and Galaxy Note8, it achieves a low latency of 13ms and 35ms for online video recognition. The code is available at: https://github.com/mit-han-lab/temporal-shift-module.
Tasks	Action Recognition In Videos, Object Detection, Video Object Detection, Video Recognition, Video Understanding
Published	2018-11-20
URL	https://arxiv.org/abs/1811.08383v3
PDF	https://arxiv.org/pdf/1811.08383v3.pdf
PWC	https://paperswithcode.com/paper/temporal-shift-module-for-efficient-video
Repo	https://github.com/niveditarahurkar/CS231N-ActionRecognition
Framework	pytorch

Hypernetwork Knowledge Graph Embeddings


Title	Hypernetwork Knowledge Graph Embeddings
Authors	Ivana Balažević, Carl Allen, Timothy M. Hospedales
Abstract	Knowledge graphs are graphical representations of large databases of facts, which typically suffer from incompleteness. Inferring missing relations (links) between entities (nodes) is the task of link prediction. A recent state-of-the-art approach to link prediction, ConvE, implements a convolutional neural network to extract features from concatenated subject and relation vectors. Whilst results are impressive, the method is unintuitive and poorly understood. We propose a hypernetwork architecture that generates simplified relation-specific convolutional filters that (i) outperforms ConvE and all previous approaches across standard datasets; and (ii) can be framed as tensor factorization and thus set within a well established family of factorization models for link prediction. We thus demonstrate that convolution simply offers a convenient computational means of introducing sparsity and parameter tying to find an effective trade-off between non-linear expressiveness and the number of parameters to learn.
Tasks	Knowledge Graph Embeddings, Knowledge Graphs, Link Prediction
Published	2018-08-21
URL	https://arxiv.org/abs/1808.07018v5
PDF	https://arxiv.org/pdf/1808.07018v5.pdf
PWC	https://paperswithcode.com/paper/hypernetwork-knowledge-graph-embeddings
Repo	https://github.com/ibalazevic/HypER
Framework	pytorch

Backpropagating through Structured Argmax using a SPIGOT


Title	Backpropagating through Structured Argmax using a SPIGOT
Authors	Hao Peng, Sam Thomson, Noah A. Smith
Abstract	We introduce the structured projection of intermediate gradients optimization technique (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e.g., parsing) in intermediate layers. SPIGOT requires no marginal inference, unlike structured attention networks (Kim et al., 2017) and some reinforcement learning-inspired solutions (Yogatama et al., 2017). Like so-called straight-through estimators (Hinton, 2012), SPIGOT defines gradient-like quantities associated with intermediate nondifferentiable operations, allowing backpropagation before and after them; SPIGOT’s proxy aims to ensure that, after a parameter update, the intermediate structure will remain well-formed. We experiment on two structured NLP pipelines: syntactic-then-semantic dependency parsing, and semantic parsing followed by sentiment classification. We show that training with SPIGOT leads to a larger improvement on the downstream task than a modularly-trained pipeline, the straight-through estimator, and structured attention, reaching a new state of the art on semantic dependency parsing.
Tasks	Dependency Parsing, Semantic Dependency Parsing, Semantic Parsing, Sentiment Analysis
Published	2018-05-12
URL	http://arxiv.org/abs/1805.04658v1
PDF	http://arxiv.org/pdf/1805.04658v1.pdf
PWC	https://paperswithcode.com/paper/backpropagating-through-structured-argmax
Repo	https://github.com/Noahs-ARK/SPIGOT
Framework	none

GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices


Title	GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices
Authors	Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri
Abstract	This paper addresses the problem of recovering projective camera matrices from collections of fundamental matrices in multiview settings. We make two main contributions. First, given ${n \choose 2}$ fundamental matrices computed for $n$ images, we provide a complete algebraic characterization in the form of conditions that are both necessary and sufficient to enabling the recovery of camera matrices. These conditions are based on arranging the fundamental matrices as blocks in a single matrix, called the $n$-view fundamental matrix, and characterizing this matrix in terms of the signs of its eigenvalues and rank structures. Secondly, we propose a concrete algorithm for projective structure-from-motion that utilizes this characterization. Given a complete or partial collection of measured fundamental matrices, our method seeks camera matrices that minimize a global algebraic error for the measured fundamental matrices. In contrast to existing methods, our optimization, without any initialization, produces a consistent set of fundamental matrices that corresponds to a unique set of cameras (up to a choice of projective frame). Our experiments indicate that our method achieves state of the art performance in both accuracy and running time.
Tasks
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00426v3
PDF	http://arxiv.org/pdf/1812.00426v3.pdf
PWC	https://paperswithcode.com/paper/gpsfm-global-projective-sfm-using-algebraic
Repo	https://github.com/amnonge/GPSFM-Code
Framework	none