February 1, 2020

3233 words 16 mins read

Paper Group AWR 118

Time-varying Autoregression with Low Rank Tensors. EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency. Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks. Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization. MedMentions: A Large Biomedical Corpus Annot …

Time-varying Autoregression with Low Rank Tensors


Title	Time-varying Autoregression with Low Rank Tensors
Authors	Kameron Decker Harris, Aleksandr Aravkin, Rajesh Rao, Bingni Wen Brunton
Abstract	We present a windowed technique to learn parsimonious time-varying autoregressive models from multivariate timeseries. This unsupervised method uncovers spatiotemporal structure in data via non-smooth and non-convex optimization. In each time window, we assume the data follow a linear model parameterized by a potentially different system matrix, and we model this stack of system matrices as a low rank tensor. Because of its structure, the model is scalable to high-dimensional data and can easily incorporate priors such as smoothness over time. We find the components of the tensor using alternating minimization and prove that any stationary point of this algorithm is a local minimum. In a test case, our method identifies the true rank of a switching linear system in the presence of noise. We illustrate our model’s utility and superior scalability over extant methods when applied to several synthetic and real examples, including a nonlinear dynamical system, worm behavior, sea surface temperature, and monkey brain recordings.
Tasks
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08389v1
PDF	https://arxiv.org/pdf/1905.08389v1.pdf
PWC	https://paperswithcode.com/paper/time-varying-autoregression-with-low-rank
Repo	https://github.com/kharris/tvart
Framework	none

EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency


Title	EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency
Authors	Muhammad Faisal, Ijaz Akhter, Mohsen Ali, Richard Hartley
Abstract	The existing approaches for salient motion segmentation are unable to explicitly learn geometric cues and often give false detections on prominent static objects. We exploit multiview geometric constraints to avoid such shortcomings. To handle the nonrigid background like a sea, we also propose a robust fusion mechanism between motion and appearance-based features. We find dense trajectories, covering every pixel in the video, and propose trajectory-based epipolar distances to distinguish between background and foreground regions. Trajectory epipolar distances are data-independent and can be readily computed given a few features’ correspondences between the images. We show that by combining epipolar distances with optical flow, a powerful motion network can be learned. Enabling the network to leverage both of these features, we propose a simple mechanism, we call input-dropout. Comparing the motion-only networks, we outperform the previous state of the art on DAVIS-2016 dataset by 5.2% in the mean IoU score. By robustly fusing our motion network with an appearance network using the input-dropout mechanism, we also outperform the previous methods on DAVIS-2016, 2017 and Segtrackv2 dataset.
Tasks	Motion Segmentation, Optical Flow Estimation
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13258v2
PDF	https://arxiv.org/pdf/1909.13258v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-geometric-constraints-on-dense
Repo	https://github.com/mfaisal59/EpONet
Framework	pytorch

Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks


Title	Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks
Authors	Xiang Li, Xiaolin Hu, Jian Yang
Abstract	The Convolutional Neural Networks (CNNs) generate the feature representation of complex objects by collecting hierarchical and different parts of semantic sub-features. These sub-features can usually be distributed in grouped form in the feature vector of each layer, representing various semantic entities. However, the activation of these sub-features is often spatially affected by similar patterns and noisy backgrounds, resulting in erroneous localization and identification. We propose a Spatial Group-wise Enhance (SGE) module that can adjust the importance of each sub-feature by generating an attention factor for each spatial location in each semantic group, so that every individual group can autonomously enhance its learnt expression and suppress possible noise. The attention factors are only guided by the similarities between the global and local feature descriptors inside each group, thus the design of SGE module is extremely lightweight with \emph{almost no extra parameters and calculations}. Despite being trained with only category supervisions, the SGE component is extremely effective in highlighting multiple active areas with various high-order semantics (such as the dog’s eyes, nose, etc.). When integrated with popular CNN backbones, SGE can significantly boost the performance of image recognition tasks. Specifically, based on ResNet50 backbones, SGE achieves 1.2% Top-1 accuracy improvement on the ImageNet benchmark and 1.0$\sim$2.0% AP gain on the COCO benchmark across a wide range of detectors (Faster/Mask/Cascade RCNN and RetinaNet). Codes and pretrained models are available at https://github.com/implus/PytorchInsight.
Tasks	Image Classification, Object Detection
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09646v2
PDF	https://arxiv.org/pdf/1905.09646v2.pdf
PWC	https://paperswithcode.com/paper/spatial-group-wise-enhance-improving-semantic
Repo	https://github.com/implus/PytorchInsight
Framework	pytorch

Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization


Title	Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
Authors	Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder
Abstract	Optimization of Binarized Neural Networks (BNNs) currently relies on real-valued latent weights to accumulate small update steps. In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training. We interpret current methods in terms of inertia and provide novel insights into the optimization of BNNs. We subsequently introduce the first optimizer specifically designed for BNNs, Binary Optimizer (Bop), and demonstrate its performance on CIFAR-10 and ImageNet. Together, the redefinition of latent weights as inertia and the introduction of Bop enable a better understanding of BNN optimization and open up the way for further improvements in training methodologies for BNNs. Code is available at: https://github.com/plumerai/rethinking-bnn-optimization
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02107v2
PDF	https://arxiv.org/pdf/1906.02107v2.pdf
PWC	https://paperswithcode.com/paper/latent-weights-do-not-exist-rethinking
Repo	https://github.com/larq/larq
Framework	tf

MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts


Title	MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts
Authors	Sunil Mohan, Donghui Li
Abstract	This paper presents the formal release of MedMentions, a new manually annotated resource for the recognition of biomedical concepts. What distinguishes MedMentions from other annotated biomedical corpora is its size (over 4,000 abstracts and over 350,000 linked mentions), as well as the size of the concept ontology (over 3 million concepts from UMLS 2017) and its broad coverage of biomedical disciplines. In addition to the full corpus, a sub-corpus of MedMentions is also presented, comprising annotations for a subset of UMLS 2017 targeted towards document retrieval. To encourage research in Biomedical Named Entity Recognition and Linking, data splits for training and testing are included in the release, and a baseline model and its metrics for entity linking are also described.
Tasks	Entity Linking, Named Entity Recognition
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09476v1
PDF	http://arxiv.org/pdf/1902.09476v1.pdf
PWC	https://paperswithcode.com/paper/medmentions-a-large-biomedical-corpus
Repo	https://github.com/chanzuckerberg/MedMentions
Framework	none

City-GAN: Learning architectural styles using a custom Conditional GAN architecture


Title	City-GAN: Learning architectural styles using a custom Conditional GAN architecture
Authors	Maximilian Bachl, Daniel C. Ferreira
Abstract	Generative Adversarial Networks (GANs) are a well-known technique that is trained on samples (e.g. pictures of fruits) and which after training is able to generate realistic new samples. Conditional GANs (CGANs) additionally provide label information for subclasses (e.g. apple, orange, pear) which enables the GAN to learn more easily and increase the quality of its output samples. We use GANs to learn architectural features of major cities and to generate images of buildings which do not exist. We show that currently available GAN and CGAN architectures are unsuited for this task and propose a custom architecture and demonstrate that our architecture has superior performance for this task and verify its capabilities with extensive experiments.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.05280v1
PDF	https://arxiv.org/pdf/1907.05280v1.pdf
PWC	https://paperswithcode.com/paper/city-gan-learning-architectural-styles-using
Repo	https://github.com/muxamilian/city-gan
Framework	none

Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck


Title	Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
Authors	Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann
Abstract	The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent’s policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL. In particular, we focus on regularization techniques relying on the injection of noise into the learned function, a family that includes some of the most widely used approaches such as Dropout and Batch Normalization. To adapt them to RL, we propose Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the gradient quality. Furthermore, we demonstrate that the Information Bottleneck (IB) is a particularly well suited regularization technique for RL as it is effective in the low-data regime encountered early on in training RL agents. Combining the IB with SNI, we significantly outperform current state of the art results, including on the recently proposed generalization benchmark Coinrun.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12911v1
PDF	https://arxiv.org/pdf/1910.12911v1.pdf
PWC	https://paperswithcode.com/paper/generalization-in-reinforcement-learning-with
Repo	https://github.com/maximecb/gym-minigrid
Framework	pytorch

DiCENet: Dimension-wise Convolutions for Efficient Networks


Title	DiCENet: Dimension-wise Convolutions for Efficient Networks
Authors	Sachin Mehta, Hannaneh Hajishirzi, Mohammad Rastegari
Abstract	We introduce a novel and generic convolutional unit, DiCE unit, that is built using dimension-wise convolutions and dimension-wise fusion. The dimension-wise convolutions apply light-weight convolutional filtering across each dimension of the input tensor while dimension-wise fusion efficiently combines these dimension-wise representations; allowing the DiCE unit to efficiently encode spatial and channel-wise information contained in the input tensor. The DiCE unit is simple and can be easily plugged into any architecture to improve its efficiency and performance. Compared to depth-wise separable convolutions, the DiCE unit shows significant improvements across different architectures. When DiCE units are stacked to build the DiCENet model, we observe significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation. On the ImageNet dataset, the DiCENet delivers either the same or better performance than existing models with fewer floating-point operations (FLOPs). Notably, for a network size of about 70 MFLOPs, DiCENet outperforms the state-of-the-art neural search architecture, MNASNet, by 4% on the ImageNet dataset. Our code is open source and available at \url{https://github.com/sacmehta/EdgeNets}
Tasks	Image Classification, Neural Architecture Search, Object Detection, Real-Time Object Detection, Real-Time Semantic Segmentation, Semantic Segmentation
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03516v2
PDF	https://arxiv.org/pdf/1906.03516v2.pdf
PWC	https://paperswithcode.com/paper/dicenet-dimension-wise-convolutions-for
Repo	https://github.com/adichaloo/EdgeNet
Framework	pytorch

Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos


Title	Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos
Authors	Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, Isao Echizen
Abstract	Detecting manipulated images and videos is an important topic in digital media forensics. Most detection methods use binary classification to determine the probability of a query being manipulated. Another important topic is locating manipulated regions (i.e., performing segmentation), which are mostly created by three commonly used attacks: removal, copy-move, and splicing. We have designed a convolutional neural network that uses the multi-task learning approach to simultaneously detect manipulated images and videos and locate the manipulated regions for each query. Information gained by performing one task is shared with the other task and thereby enhance the performance of both tasks. A semi-supervised learning approach is used to improve the network’s generability. The network includes an encoder and a Y-shaped decoder. Activation of the encoded features is used for the binary classification. The output of one branch of the decoder is used for segmenting the manipulated regions while that of the other branch is used for reconstructing the input, which helps improve overall performance. Experiments using the FaceForensics and FaceForensics++ databases demonstrated the network’s effectiveness against facial reenactment attacks and face swapping attacks as well as its ability to deal with the mismatch condition for previously seen attacks. Moreover, fine-tuning using just a small amount of data enables the network to deal with unseen attacks.
Tasks	Face Swapping, Multi-Task Learning
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06876v1
PDF	https://arxiv.org/pdf/1906.06876v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-for-detecting-and
Repo	https://github.com/nii-yamagishilab/ClassNSeg
Framework	pytorch

DEMO-Net: Degree-specific Graph Neural Networks for Node and Graph Classification


Title	DEMO-Net: Degree-specific Graph Neural Networks for Node and Graph Classification
Authors	Jun Wu, Jingrui He, Jiejun Xu
Abstract	Graph data widely exist in many high-impact applications. Inspired by the success of deep learning in grid-structured data, graph neural network models have been proposed to learn powerful node-level or graph-level representation. However, most of the existing graph neural networks suffer from the following limitations: (1) there is limited analysis regarding the graph convolution properties, such as seed-oriented, degree-aware and order-free; (2) the node’s degree-specific graph structure is not explicitly expressed in graph convolution for distinguishing structure-aware node neighborhoods; (3) the theoretical explanation regarding the graph-level pooling schemes is unclear. To address these problems, we propose a generic degree-specific graph neural network named DEMO-Net motivated by Weisfeiler-Lehman graph isomorphism test that recursively identifies 1-hop neighborhood structures. In order to explicitly capture the graph topology integrated with node attributes, we argue that graph convolution should have three properties: seed-oriented, degree-aware, order-free. To this end, we propose multi-task graph convolution where each task represents node representation learning for nodes with a specific degree value, thus leading to preserving the degree-specific graph structure. In particular, we design two multi-task learning methods: degree-specific weight and hashing functions for graph convolution. In addition, we propose a novel graph-level pooling/readout scheme for learning graph representation provably lying in a degree-specific Hilbert kernel space. The experimental results on several node and graph classification benchmark data sets demonstrate the effectiveness and efficiency of our proposed DEMO-Net over state-of-the-art graph neural network models.
Tasks	Graph Classification, Multi-Task Learning, Representation Learning
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02319v1
PDF	https://arxiv.org/pdf/1906.02319v1.pdf
PWC	https://paperswithcode.com/paper/demo-net-degree-specific-graph-neural
Repo	https://github.com/jwu4sml/DEMO-Net
Framework	tf

FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning


Title	FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
Authors	Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, Matthew Mattina
Abstract	The computational demands of computer vision tasks based on state-of-the-art Convolutional Neural Network (CNN) image classification far exceed the energy budgets of mobile devices. This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN. Image classification models for FixyNN are trained end-to-end via transfer learning, with the common feature extractor representing the transfered part, and the programmable part being learnt on the target dataset. Experimental results demonstrate FixyNN hardware can achieve very high energy efficiencies up to 26.6 TOPS/W ($4.81 \times$ better than iso-area programmable accelerator). Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1%$ resulting in up to 11.2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area.
Tasks	Image Classification, Transfer Learning
Published	2019-02-27
URL	http://arxiv.org/abs/1902.11128v1
PDF	http://arxiv.org/pdf/1902.11128v1.pdf
PWC	https://paperswithcode.com/paper/fixynn-efficient-hardware-for-mobile-computer
Repo	https://github.com/ARM-software/DeepFreeze
Framework	tf

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks


Title	ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
Authors	Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, Qinghua Hu
Abstract	Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via $1D$ convolution. Furthermore, we develop a method to adaptively select kernel size of $1D$ convolution, determining coverage of local cross-channel interaction. The proposed ECA module is efficient yet effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFLOPs vs. 3.86 GFLOPs, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.
Tasks	Dimensionality Reduction, Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03151v3
PDF	https://arxiv.org/pdf/1910.03151v3.pdf
PWC	https://paperswithcode.com/paper/eca-net-efficient-channel-attention-for-deep
Repo	https://github.com/JinLi711/Convolution_Variants
Framework	tf

Information-Theoretic Understanding of Population Risk Improvement with Model Compression


Title	Information-Theoretic Understanding of Population Risk Improvement with Model Compression
Authors	Yuheng Bu, Weihao Gao, Shaofeng Zou, Venugopal V. Veeravalli
Abstract	We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces an information-theoretic bound on the generalization error; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. These results imply that the population risk could be improved by model compression if the decrease in generalization error exceeds the increase in empirical risk. We show through a linear regression example that such a decrease in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted $K$-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions.
Tasks	Model Compression
Published	2019-01-27
URL	http://arxiv.org/abs/1901.09421v1
PDF	http://arxiv.org/pdf/1901.09421v1.pdf
PWC	https://paperswithcode.com/paper/information-theoretic-understanding-of
Repo	https://github.com/wgao9/weight_quant
Framework	pytorch

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization


Title	Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
Authors	Wataru Hirota, Yoshihiko Suhara, Behzad Golshan, Wang-Chiew Tan
Abstract	We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data.
Tasks	Intent Classification, Semantic Similarity, Semantic Textual Similarity, Sentence Embedding, Sentence Embeddings
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06731v2
PDF	https://arxiv.org/pdf/1909.06731v2.pdf
PWC	https://paperswithcode.com/paper/emu-enhancing-multilingual-sentence
Repo	https://github.com/megagonlabs/emu
Framework	none

Query Learning Algorithm for Residual Symbolic Finite Automata


Title	Query Learning Algorithm for Residual Symbolic Finite Automata
Authors	Kaizaburo Chubachi, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara
Abstract	We propose a query learning algorithm for residual symbolic finite automata (RSFAs). Symbolic finite automata (SFAs) are finite automata whose transitions are labeled by predicates over a Boolean algebra, in which a big collection of characters leading the same transition may be represented by a single predicate. Residual finite automata (RFAs) are a special type of non-deterministic finite automata which can be exponentially smaller than the minimum deterministic finite automata and have a favorable property for learning algorithms. RSFAs have both properties of SFAs and RFAs and can have more succinct representation of transitions and fewer states than RFAs and deterministic SFAs accepting the same language. The implementation of our algorithm efficiently learns RSFAs over a huge alphabet and outperforms an existing learning algorithm for deterministic SFAs. The result also shows that the benefit of non-determinism in efficiency is even larger in learning SFAs than non-symbolic automata.
Tasks
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07417v3
PDF	https://arxiv.org/pdf/1902.07417v3.pdf
PWC	https://paperswithcode.com/paper/query-learning-algorithm-for-residual
Repo	https://github.com/ushitora/RSFA-QueryLearning
Framework	none