January 30, 2020

3606 words 17 mins read

Paper Group ANR 263

Extreme Language Model Compression with Optimal Subwords and Shared Projections. Simultaneous Subspace Clustering and Cluster Number Estimating based on Triplet Relationship. SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads. Prediction with Unpredictable Feature Evolution. Permutation-invariant Feature Restructurin …

Extreme Language Model Compression with Optimal Subwords and Shared Projections


Title	Extreme Language Model Compression with Optimal Subwords and Shared Projections
Authors	Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou
Abstract	Pre-trained deep neural network language models such as ELMo, GPT, BERT and XLNet have recently achieved state-of-the-art performance on a variety of language understanding tasks. However, their size makes them impractical for a number of scenarios, especially on mobile and edge devices. In particular, the input word embedding matrix accounts for a significant proportion of the model’s memory footprint, due to the large input vocabulary and embedding dimensions. Knowledge distillation techniques have had success at compressing large neural network models, but they are ineffective at yielding student models with vocabularies different from the original teacher models. We introduce a novel knowledge distillation technique for training a student model with a significantly smaller vocabulary as well as lower embedding and hidden state dimensions. Specifically, we employ a dual-training mechanism that trains the teacher and student models simultaneously to obtain optimal word embeddings for the student vocabulary. We combine this approach with learning shared projection matrices that transfer layer-wise knowledge from the teacher model to the student model. Our method is able to compress the BERT_BASE model by more than 60x, with only a minor drop in downstream task metrics, resulting in a language model with a footprint of under 7MB. Experimental results also demonstrate higher compression efficiency and accuracy when compared with other state-of-the-art compression techniques.
Tasks	Language Modelling, Model Compression, Word Embeddings
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11687v1
PDF	https://arxiv.org/pdf/1909.11687v1.pdf
PWC	https://paperswithcode.com/paper/extreme-language-model-compression-with-1
Repo
Framework

Simultaneous Subspace Clustering and Cluster Number Estimating based on Triplet Relationship


Title	Simultaneous Subspace Clustering and Cluster Number Estimating based on Triplet Relationship
Authors	Jie Liang, Jufeng Yang, Ming-Ming Cheng, Paul L. Rosin, Liang Wang
Abstract	In this paper we propose a unified framework to simultaneously discover the number of clusters and group the data points into them using subspace clustering. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, state-of-the-art subspace clustering approaches often optimize a self-representation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation based data structure termed as the \textit{triplet relationship}, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Three samples in each triplet are encouraged to be highly correlated and are considered as a meta-element during clustering, which show more robustness than pairwise relationships when segmenting two densely distributed subspaces. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method.
Tasks	Model Selection
Published	2019-01-23
URL	http://arxiv.org/abs/1901.07689v1
PDF	http://arxiv.org/pdf/1901.07689v1.pdf
PWC	https://paperswithcode.com/paper/simultaneous-subspace-clustering-and-cluster
Repo
Framework

SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads


Title	SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads
Authors	Sam Likun Xi, Yuan Yao, Kshitij Bhardwaj, Paul Whatmough, Gu-Yeon Wei, David Brooks
Abstract	In recent years, there has been tremendous advances in hardware acceleration of deep neural networks. However, most of the research has focused on optimizing accelerator microarchitecture for higher performance and energy efficiency on a per-layer basis. We find that for overall single-batch inference latency, the accelerator may only make up 25-40%, with the rest spent on data movement and in the deep learning software framework. Thus far, it has been very difficult to study end-to-end DNN performance during early stage design (before RTL is available) because there are no existing DNN frameworks that support end-to-end simulation with easy custom hardware accelerator integration. To address this gap in research infrastructure, we present SMAUG, the first DNN framework that is purpose-built for simulation of end-to-end deep learning applications. SMAUG offers researchers a wide range of capabilities for evaluating DNN workloads, from diverse network topologies to easy accelerator modeling and SoC integration. To demonstrate the power and value of SMAUG, we present case studies that show how we can optimize overall performance and energy efficiency for up to 1.8-5x speedup over a baseline system, without changing any part of the accelerator microarchitecture, as well as show how SMAUG can tune an SoC for a camera-powered deep learning pipeline.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04481v2
PDF	https://arxiv.org/pdf/1912.04481v2.pdf
PWC	https://paperswithcode.com/paper/smaug-end-to-end-full-stack-simulation
Repo
Framework

Prediction with Unpredictable Feature Evolution


Title	Prediction with Unpredictable Feature Evolution
Authors	Bo-Jian Hou, Lijun Zhang, Zhi-Hua Zhou
Abstract	Feature space can change or evolve when learning with streaming data. Several recent works have studied feature evolvable learning. They usually assume that features would not vanish or appear in an arbitrary way. For example, when knowing the battery lifespan, old features and new features represented by data gathered by sensors will disappear and emerge at the same time along with the sensors exchanging simultaneously. However, different sensors would have different lifespans, and thus the feature evolution can be unpredictable. In this paper, we propose a novel paradigm: Prediction with Unpredictable Feature Evolution (PUFE). We first complete the unpredictable overlapping period into an organized matrix and give a theoretical bound on the least number of observed entries. Then we learn the mapping from the completed matrix to recover the data from old feature space when observing the data from new feature space. With predictions on the recovered data, our model can make use of the advantage of old feature space and is always comparable with any combinations of the predictions on the current instance. Experiments on the synthetic and real datasets validate the effectiveness of our method.
Tasks
Published	2019-04-27
URL	http://arxiv.org/abs/1904.12171v1
PDF	http://arxiv.org/pdf/1904.12171v1.pdf
PWC	https://paperswithcode.com/paper/prediction-with-unpredictable-feature
Repo
Framework

Permutation-invariant Feature Restructuring for Correlation-aware Image Set-based Recognition


Title	Permutation-invariant Feature Restructuring for Correlation-aware Image Set-based Recognition
Authors	Xiaofeng Liu, Zhenhua Guo, Site Li, Lingsheng Kong, Ping Jia, Jane You, B. V. K. Kumar
Abstract	We consider the problem of comparing the similarity of image sets with variable-quantity, quality and un-ordered heterogeneous images. We use feature restructuring to exploit the correlations of both inner$&$inter-set images. Specifically, the residual self-attention can effectively restructure the features using the other features within a set to emphasize the discriminative images and eliminate the redundancy. Then, a sparse/collaborative learning-based dependency-guided representation scheme reconstructs the probe features conditional to the gallery features in order to adaptively align the two sets. This enables our framework to be compatible with both verification and open-set identification. We show that the parametric self-attention network and non-parametric dictionary learning can be trained end-to-end by a unified alternative optimization scheme, and that the full framework is permutation-invariant. In the numerical experiments we conducted, our method achieves top performance on competitive image set/video-based face recognition and person re-identification benchmarks.
Tasks	Dictionary Learning, Face Recognition, Person Re-Identification
Published	2019-08-03
URL	https://arxiv.org/abs/1908.01174v1
PDF	https://arxiv.org/pdf/1908.01174v1.pdf
PWC	https://paperswithcode.com/paper/permutation-invariant-feature-restructuring
Repo
Framework

A Scale Invariant Flatness Measure for Deep Network Minima


Title	A Scale Invariant Flatness Measure for Deep Network Minima
Authors	Akshay Rangamani, Nam H. Nguyen, Abhishek Kumar, Dzung Phan, Sang H. Chin, Trac D. Tran
Abstract	It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using this manifold structure and an appropriate metric, we propose a Hessian-based measure for flatness that is invariant to rescaling. We use this new measure to confirm the proposition that Large-Batch SGD minima are indeed sharper than Small-Batch SGD minima.
Tasks
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02434v1
PDF	http://arxiv.org/pdf/1902.02434v1.pdf
PWC	https://paperswithcode.com/paper/a-scale-invariant-flatness-measure-for-deep
Repo
Framework

Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition


Title	Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition
Authors	Bing Xu, Andrew Tulloch, Yunpeng Chen, Xiaomeng Yang, Lin Qiao
Abstract	We propose a new building block, IdleBlock, which naturally prunes connections within the block. To fully utilize the IdleBlock we break the tradition of monotonic design in state-of-the-art networks, and introducing hybrid composition with IdleBlock. We study hybrid composition on MobileNet v3 and EfficientNet-B0, two of the most efficient networks. Without any neural architecture search, the deeper “MobileNet v3” with hybrid composition design surpasses possibly all state-of-the-art image recognition network designed by human experts or neural architecture search algorithms. Similarly, the hybridized EfficientNet-B0 networks are more efficient than previous state-of-the-art networks with similar computation budgets. These results suggest a new simpler and more efficient direction for network design and neural architecture search.
Tasks	Neural Architecture Search
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08609v1
PDF	https://arxiv.org/pdf/1911.08609v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-composition-with-idleblock-more
Repo
Framework


Title	Unsupervised Concatenation Hashing via Combining Subspace Learning and Graph Embedding for Cross-Modal Image Retrieval
Authors	Jun Yu, Xiao-Jun Wu
Abstract	Different from the content-based image retrieval methods, cross-modal image retrieval methods uncover the rich semantic-level information of social images to further understand image contents. As multiple modal data depict a common object from multiple perspectives, many works focus on learning the unified subspace representation. Recently, hash representation has received much attention in the retrieval field. In common Hamming space, how to directly preserve the local manifold structure among objects become an interesting problem. Most of the unsupervised hashing methods attempt to solve it by constructing a neighborhood graph on every modality respectively. However, it is hard to decide the weight factor of each graph to get the optimal graph. To overcome this problem, we adopt the concatenated features to represent the common object since the information implied by different modalities is complementary. In our framework, Locally Linear Embedding and Locality Preserving Projection are introduced to reconstruct the manifold structure of the original space. Besides, The $\ell_{2,1}$-norm constraint is imposed on the projection matrices to explore the discriminative hashing functions. Extensive experiments are performed on three public datasets and the experimental results show that our method outperforms several classic unsupervised hashing models.
Tasks	Content-Based Image Retrieval, Cross-Modal Retrieval, Graph Embedding, Image Retrieval
Published	2019-03-26
URL	https://arxiv.org/abs/1904.00726v2
PDF	https://arxiv.org/pdf/1904.00726v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-concatenation-hashing-with
Repo
Framework

Transferable Feature Representation for Visible-to-Infrared Cross-Dataset Human Action Recognition


Title	Transferable Feature Representation for Visible-to-Infrared Cross-Dataset Human Action Recognition
Authors	Yang Liu, Zhaoyang Lu, Jing Li, Chao Yao, Yanzi Deng
Abstract	Recently, infrared human action recognition has attracted increasing attention for it has many advantages over visible light, that is, being robust to illumination change and shadows. However, the infrared action data is limited until now, which degrades the performance of infrared action recognition. Motivated by the idea of transfer learning, an infrared human action recognition framework using auxiliary data from visible light is proposed to solve the problem of limited infrared action data. In the proposed framework, we first construct a novel Cross-Dataset Feature Alignment and Generalization (CDFAG) framework to map the infrared data and visible light data into a common feature space, where Kernel Manifold Alignment (KEMA) and a dual alignedto-generalized encoders (AGE) model are employed to represent the feature. Then, a support vector machine (SVM) is trained, using both the infrared data and visible light data, and can classify the features derived from infrared data. The proposed method is evaluated on InfAR, which is a publicly available infrared human action dataset. To build up auxiliary data, we set up a novel visible light action dataset XD145. Experimental results show that the proposed method can achieve state-of-the-art performance compared with several transfer learning and domain adaptation methods.
Tasks	Domain Adaptation, Temporal Action Localization, Transfer Learning
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08297v1
PDF	https://arxiv.org/pdf/1909.08297v1.pdf
PWC	https://paperswithcode.com/paper/transferable-feature-representation-for
Repo
Framework

Neural Style-Preserving Visual Dubbing


Title	Neural Style-Preserving Visual Dubbing
Authors	Hyeongwoo Kim, Mohamed Elgharib, Michael Zollhöfer, Hans-Peter Seidel, Thabo Beeler, Christian Richardt, Christian Theobalt
Abstract	Dubbing is a technique for translating video content from one language to another. However, state-of-the-art visual dubbing techniques directly copy facial expressions from source to target actors without considering identity-specific idiosyncrasies such as a unique type of smile. We present a style-preserving visual dubbing approach from single video inputs, which maintains the signature style of target actors when modifying facial expressions, including mouth motions, to match foreign languages. At the heart of our approach is the concept of motion style, in particular for facial expressions, i.e., the person-specific expression change that is yet another essential factor beyond visual accuracy in face editing applications. Our method is based on a recurrent generative adversarial network that captures the spatiotemporal co-activation of facial expressions, and enables generating and modifying the facial expressions of the target actor while preserving their style. We train our model with unsynchronized source and target videos in an unsupervised manner using cycle-consistency and mouth expression losses, and synthesize photorealistic video frames using a layered neural face renderer. Our approach generates temporally coherent results, and handles dynamic backgrounds. Our results show that our dubbing approach maintains the idiosyncratic style of the target actor better than previous approaches, even for widely differing source and target actors.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02518v2
PDF	https://arxiv.org/pdf/1909.02518v2.pdf
PWC	https://paperswithcode.com/paper/neural-style-preserving-visual-dubbing
Repo
Framework

Deep Neural Networks for Choice Analysis: Architectural Design with Alternative-Specific Utility Functions


Title	Deep Neural Networks for Choice Analysis: Architectural Design with Alternative-Specific Utility Functions
Authors	Shenhao Wang, Jinhua Zhao
Abstract	Whereas deep neural network (DNN) is increasingly applied to choice analysis, it is challenging to reconcile domain-specific behavioral knowledge with generic-purpose DNN, to improve DNN’s interpretability and predictive power, and to identify effective regularization methods for specific tasks. This study designs a particular DNN architecture with alternative-specific utility functions (ASU-DNN) by using prior behavioral knowledge. Unlike a fully connected DNN (F-DNN), which computes the utility value of an alternative k by using the attributes of all the alternatives, ASU-DNN computes it by using only k’s own attributes. Theoretically, ASU-DNN can dramatically reduce the estimation error of F-DNN because of its lighter architecture and sparser connectivity. Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN over the whole hyperparameter space in a private dataset that we collected in Singapore and a public dataset in R mlogit package. The alternative-specific connectivity constraint, as a domain-knowledge-based regularization method, is more effective than the most popular generic-purpose explicit and implicit regularization methods and architectural hyperparameters. ASU-DNN is also more interpretable because it provides a more regular substitution pattern of travel mode choices than F-DNN does. The comparison between ASU-DNN and F-DNN can also aid in testing the behavioral knowledge. Our results reveal that individuals are more likely to compute utility by using an alternative’s own attributes, supporting the long-standing practice in choice modeling. Overall, this study demonstrates that prior behavioral knowledge could be used to guide the architecture design of DNN, to function as an effective domain-knowledge-based regularization method, and to improve both the interpretability and predictive power of DNN in choice analysis.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07481v1
PDF	https://arxiv.org/pdf/1909.07481v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-networks-for-choice-analysis
Repo
Framework

On the Merge of k-NN Graph


Title	On the Merge of k-NN Graph
Authors	Peng-Cheng Lin, Wan-Lei Zhao
Abstract	k-nearest neighbor graph is a fundamental data structure in many disciplines such as information retrieval, data-mining, pattern recognition, and machine learning, etc. In the literature, considerable research has been focusing on how to efficiently build an approximate k-nearest neighbor graph (k-NN graph) for a fixed dataset. Unfortunately, a closely related issue to the graph construction has been long overlooked. Few works in the literature cover how to merge two existing k-NN graphs. In this paper, we address the k-NN graph merge issue of two different scenarios. On the first hand, the symmetric merge is proposed to address the problem of merging two approximate k-NN graphs into one. This makes parallel approximate k-NN graph computation in large-scale become possible. Moreover, the problem of merging a raw set into a built k-NN graph is addressed by the joint merge. It allows the approximate k-NN graph to be built incrementally. It therefore supports approximate k-NN graph construction for an open set. Moreover, deriving from the joint merge, a hierarchical approximate k-NN graph construction approach is presented. With the support of produced graph hierarchy, superior performance is observed on the large-scale NN search task across various data types and data dimensions, and under different distance measures.
Tasks	graph construction, Information Retrieval
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00814v4
PDF	https://arxiv.org/pdf/1908.00814v4.pdf
PWC	https://paperswithcode.com/paper/on-the-merge-of-k-nn-graph
Repo
Framework

Global exponential stability of primal-dual gradient flow dynamics based on the proximal augmented Lagrangian: A Lyapunov-based approach


Title	Global exponential stability of primal-dual gradient flow dynamics based on the proximal augmented Lagrangian: A Lyapunov-based approach
Authors	Dongsheng Ding, Mihailo R. Jovanović
Abstract	For a class of nonsmooth composite optimization problems with linear equality constraints, we utilize a Lyapunov-based approach to establish the global exponential stability of the primal-dual gradient flow dynamics based on the proximal augmented Lagrangian. The result holds when the differentiable part of the objective function is strongly convex with a Lipschitz continuous gradient; the non-differentiable part is proper, lower semi-continuous, and convex; and the matrix in the linear constraint is full row rank. Our quadratic Lyapunov function generalizes recent result from strongly convex problems with either affine equality or inequality constraints to a broader class of composite optimization problems with nonsmooth regularizers and it provides a worst-case lower bound of the exponential decay rate. Finally, we use computational experiments to demonstrate that our convergence rate estimate is less conservative than the existing alternatives.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00783v1
PDF	https://arxiv.org/pdf/1910.00783v1.pdf
PWC	https://paperswithcode.com/paper/global-exponential-stability-of-primal-dual
Repo
Framework

Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models


Title	Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models
Authors	Reza Mohammadi, Matthew Pratola, Maurits Kaptein
Abstract	Decision trees are flexible models that are well suited for many statistical regression problems. In a Bayesian framework for regression trees, Markov Chain Monte Carlo (MCMC) search algorithms are required to generate samples of tree models according to their posterior probabilities. The critical component of such an MCMC algorithm is to construct good Metropolis-Hastings steps for updating the tree topology. However, such algorithms frequently suffering from local mode stickiness and poor mixing. As a result, the algorithms are slow to converge. Hitherto, authors have primarily used discrete-time birth/death mechanisms for Bayesian (sums of) regression tree models to explore the model space. These algorithms are efficient only if the acceptance rate is high which is not always the case. Here we overcome this issue by developing a new search algorithm which is based on a continuous-time birth-death Markov process. This search algorithm explores the model space by jumping between parameter spaces corresponding to different tree structures. In the proposed algorithm, the moves between models are always accepted which can dramatically improve the convergence and mixing properties of the MCMC algorithm. We provide theoretical support of the algorithm for Bayesian regression tree models and demonstrate its performance.
Tasks
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09339v1
PDF	http://arxiv.org/pdf/1904.09339v1.pdf
PWC	https://paperswithcode.com/paper/190409339
Repo
Framework

Deep Semantic Multimodal Hashing Network for Scalable Multimedia Retrieval


Title	Deep Semantic Multimodal Hashing Network for Scalable Multimedia Retrieval
Authors	Zechao Li, Lu Jin, Jinhui Tang
Abstract	Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. Particularly, deep hashing has received unprecedented research attention in recent years, owing to its perfect retrieval performance. However, most of existing deep hashing methods learn binary hash codes by preserving the similarity relationship while without exploiting the semantic labels of data points, which result in suboptimal binary codes. In this work, we propose a novel Deep Semantic Multimodal Hashing Network for scalable multimodal retrieval. In DSMHN, two sets of modality-specific hash functions are jointly learned by explicitly preserving both the inter-modality similarities and the intra-modality semantic labels. Specifically, with the assumption that the learned hash codes should be optimal for task-specific classification, two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes. Different from previous deep hashing methods, which are tied to some particular forms of loss functions, the proposed deep hashing framework can be flexibly integrated with different types of loss functions. In addition, the bit balance property is investigated to generate binary codes with each bit having 50% probability to be 1 or -1. Moreover, a unified deep multimodal hashing framework is proposed to learn compact and high-quality hash codes by exploiting the feature representation learning, inter-modality similarity preserving learning, semantic label preserving learning and hash functions learning with bit balanced constraint simultaneously. We conduct extensive experiments for both unimodal and cross-modal retrieval tasks on three widely-used multimodal retrieval datasets. The experimental result demonstrates that DSMHN significantly outperforms state-of-the-art methods.
Tasks	Cross-Modal Retrieval, Representation Learning
Published	2019-01-09
URL	https://arxiv.org/abs/1901.02662v2
PDF	https://arxiv.org/pdf/1901.02662v2.pdf
PWC	https://paperswithcode.com/paper/deep-semantic-multimodal-hashing-network-for
Repo
Framework