Paper Group AWR 118
Time-varying Autoregression with Low Rank Tensors. EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency. Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks. Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization. MedMentions: A Large Biomedical Corpus Annot …
Time-varying Autoregression with Low Rank Tensors
Title | Time-varying Autoregression with Low Rank Tensors |
Authors | Kameron Decker Harris, Aleksandr Aravkin, Rajesh Rao, Bingni Wen Brunton |
Abstract | We present a windowed technique to learn parsimonious time-varying autoregressive models from multivariate timeseries. This unsupervised method uncovers spatiotemporal structure in data via non-smooth and non-convex optimization. In each time window, we assume the data follow a linear model parameterized by a potentially different system matrix, and we model this stack of system matrices as a low rank tensor. Because of its structure, the model is scalable to high-dimensional data and can easily incorporate priors such as smoothness over time. We find the components of the tensor using alternating minimization and prove that any stationary point of this algorithm is a local minimum. In a test case, our method identifies the true rank of a switching linear system in the presence of noise. We illustrate our model’s utility and superior scalability over extant methods when applied to several synthetic and real examples, including a nonlinear dynamical system, worm behavior, sea surface temperature, and monkey brain recordings. |
Tasks | |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08389v1 |
https://arxiv.org/pdf/1905.08389v1.pdf | |
PWC | https://paperswithcode.com/paper/time-varying-autoregression-with-low-rank |
Repo | https://github.com/kharris/tvart |
Framework | none |
EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency
Title | EpO-Net: Exploiting Geometric Constraints on Dense Trajectories for Motion Saliency |
Authors | Muhammad Faisal, Ijaz Akhter, Mohsen Ali, Richard Hartley |
Abstract | The existing approaches for salient motion segmentation are unable to explicitly learn geometric cues and often give false detections on prominent static objects. We exploit multiview geometric constraints to avoid such shortcomings. To handle the nonrigid background like a sea, we also propose a robust fusion mechanism between motion and appearance-based features. We find dense trajectories, covering every pixel in the video, and propose trajectory-based epipolar distances to distinguish between background and foreground regions. Trajectory epipolar distances are data-independent and can be readily computed given a few features’ correspondences between the images. We show that by combining epipolar distances with optical flow, a powerful motion network can be learned. Enabling the network to leverage both of these features, we propose a simple mechanism, we call input-dropout. Comparing the motion-only networks, we outperform the previous state of the art on DAVIS-2016 dataset by 5.2% in the mean IoU score. By robustly fusing our motion network with an appearance network using the input-dropout mechanism, we also outperform the previous methods on DAVIS-2016, 2017 and Segtrackv2 dataset. |
Tasks | Motion Segmentation, Optical Flow Estimation |
Published | 2019-09-29 |
URL | https://arxiv.org/abs/1909.13258v2 |
https://arxiv.org/pdf/1909.13258v2.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-geometric-constraints-on-dense |
Repo | https://github.com/mfaisal59/EpONet |
Framework | pytorch |
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks
Title | Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks |
Authors | Xiang Li, Xiaolin Hu, Jian Yang |
Abstract | The Convolutional Neural Networks (CNNs) generate the feature representation of complex objects by collecting hierarchical and different parts of semantic sub-features. These sub-features can usually be distributed in grouped form in the feature vector of each layer, representing various semantic entities. However, the activation of these sub-features is often spatially affected by similar patterns and noisy backgrounds, resulting in erroneous localization and identification. We propose a Spatial Group-wise Enhance (SGE) module that can adjust the importance of each sub-feature by generating an attention factor for each spatial location in each semantic group, so that every individual group can autonomously enhance its learnt expression and suppress possible noise. The attention factors are only guided by the similarities between the global and local feature descriptors inside each group, thus the design of SGE module is extremely lightweight with \emph{almost no extra parameters and calculations}. Despite being trained with only category supervisions, the SGE component is extremely effective in highlighting multiple active areas with various high-order semantics (such as the dog’s eyes, nose, etc.). When integrated with popular CNN backbones, SGE can significantly boost the performance of image recognition tasks. Specifically, based on ResNet50 backbones, SGE achieves 1.2% Top-1 accuracy improvement on the ImageNet benchmark and 1.0$\sim$2.0% AP gain on the COCO benchmark across a wide range of detectors (Faster/Mask/Cascade RCNN and RetinaNet). Codes and pretrained models are available at https://github.com/implus/PytorchInsight. |
Tasks | Image Classification, Object Detection |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09646v2 |
https://arxiv.org/pdf/1905.09646v2.pdf | |
PWC | https://paperswithcode.com/paper/spatial-group-wise-enhance-improving-semantic |
Repo | https://github.com/implus/PytorchInsight |
Framework | pytorch |
Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
Title | Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization |
Authors | Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder |
Abstract | Optimization of Binarized Neural Networks (BNNs) currently relies on real-valued latent weights to accumulate small update steps. In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training. We interpret current methods in terms of inertia and provide novel insights into the optimization of BNNs. We subsequently introduce the first optimizer specifically designed for BNNs, Binary Optimizer (Bop), and demonstrate its performance on CIFAR-10 and ImageNet. Together, the redefinition of latent weights as inertia and the introduction of Bop enable a better understanding of BNN optimization and open up the way for further improvements in training methodologies for BNNs. Code is available at: https://github.com/plumerai/rethinking-bnn-optimization |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02107v2 |
https://arxiv.org/pdf/1906.02107v2.pdf | |
PWC | https://paperswithcode.com/paper/latent-weights-do-not-exist-rethinking |
Repo | https://github.com/larq/larq |
Framework | tf |
MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts
Title | MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts |
Authors | Sunil Mohan, Donghui Li |
Abstract | This paper presents the formal release of MedMentions, a new manually annotated resource for the recognition of biomedical concepts. What distinguishes MedMentions from other annotated biomedical corpora is its size (over 4,000 abstracts and over 350,000 linked mentions), as well as the size of the concept ontology (over 3 million concepts from UMLS 2017) and its broad coverage of biomedical disciplines. In addition to the full corpus, a sub-corpus of MedMentions is also presented, comprising annotations for a subset of UMLS 2017 targeted towards document retrieval. To encourage research in Biomedical Named Entity Recognition and Linking, data splits for training and testing are included in the release, and a baseline model and its metrics for entity linking are also described. |
Tasks | Entity Linking, Named Entity Recognition |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09476v1 |
http://arxiv.org/pdf/1902.09476v1.pdf | |
PWC | https://paperswithcode.com/paper/medmentions-a-large-biomedical-corpus |
Repo | https://github.com/chanzuckerberg/MedMentions |
Framework | none |
City-GAN: Learning architectural styles using a custom Conditional GAN architecture
Title | City-GAN: Learning architectural styles using a custom Conditional GAN architecture |
Authors | Maximilian Bachl, Daniel C. Ferreira |
Abstract | Generative Adversarial Networks (GANs) are a well-known technique that is trained on samples (e.g. pictures of fruits) and which after training is able to generate realistic new samples. Conditional GANs (CGANs) additionally provide label information for subclasses (e.g. apple, orange, pear) which enables the GAN to learn more easily and increase the quality of its output samples. We use GANs to learn architectural features of major cities and to generate images of buildings which do not exist. We show that currently available GAN and CGAN architectures are unsuited for this task and propose a custom architecture and demonstrate that our architecture has superior performance for this task and verify its capabilities with extensive experiments. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.05280v1 |
https://arxiv.org/pdf/1907.05280v1.pdf | |
PWC | https://paperswithcode.com/paper/city-gan-learning-architectural-styles-using |
Repo | https://github.com/muxamilian/city-gan |
Framework | none |
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
Title | Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck |
Authors | Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann |
Abstract | The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent’s policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL. In particular, we focus on regularization techniques relying on the injection of noise into the learned function, a family that includes some of the most widely used approaches such as Dropout and Batch Normalization. To adapt them to RL, we propose Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the gradient quality. Furthermore, we demonstrate that the Information Bottleneck (IB) is a particularly well suited regularization technique for RL as it is effective in the low-data regime encountered early on in training RL agents. Combining the IB with SNI, we significantly outperform current state of the art results, including on the recently proposed generalization benchmark Coinrun. |
Tasks | |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12911v1 |
https://arxiv.org/pdf/1910.12911v1.pdf | |
PWC | https://paperswithcode.com/paper/generalization-in-reinforcement-learning-with |
Repo | https://github.com/maximecb/gym-minigrid |
Framework | pytorch |
DiCENet: Dimension-wise Convolutions for Efficient Networks
Title | DiCENet: Dimension-wise Convolutions for Efficient Networks |
Authors | Sachin Mehta, Hannaneh Hajishirzi, Mohammad Rastegari |
Abstract | We introduce a novel and generic convolutional unit, DiCE unit, that is built using dimension-wise convolutions and dimension-wise fusion. The dimension-wise convolutions apply light-weight convolutional filtering across each dimension of the input tensor while dimension-wise fusion efficiently combines these dimension-wise representations; allowing the DiCE unit to efficiently encode spatial and channel-wise information contained in the input tensor. The DiCE unit is simple and can be easily plugged into any architecture to improve its efficiency and performance. Compared to depth-wise separable convolutions, the DiCE unit shows significant improvements across different architectures. When DiCE units are stacked to build the DiCENet model, we observe significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation. On the ImageNet dataset, the DiCENet delivers either the same or better performance than existing models with fewer floating-point operations (FLOPs). Notably, for a network size of about 70 MFLOPs, DiCENet outperforms the state-of-the-art neural search architecture, MNASNet, by 4% on the ImageNet dataset. Our code is open source and available at \url{https://github.com/sacmehta/EdgeNets} |
Tasks | Image Classification, Neural Architecture Search, Object Detection, Real-Time Object Detection, Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2019-06-08 |
URL | https://arxiv.org/abs/1906.03516v2 |
https://arxiv.org/pdf/1906.03516v2.pdf | |
PWC | https://paperswithcode.com/paper/dicenet-dimension-wise-convolutions-for |
Repo | https://github.com/adichaloo/EdgeNet |
Framework | pytorch |
Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos
Title | Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos |
Authors | Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, Isao Echizen |
Abstract | Detecting manipulated images and videos is an important topic in digital media forensics. Most detection methods use binary classification to determine the probability of a query being manipulated. Another important topic is locating manipulated regions (i.e., performing segmentation), which are mostly created by three commonly used attacks: removal, copy-move, and splicing. We have designed a convolutional neural network that uses the multi-task learning approach to simultaneously detect manipulated images and videos and locate the manipulated regions for each query. Information gained by performing one task is shared with the other task and thereby enhance the performance of both tasks. A semi-supervised learning approach is used to improve the network’s generability. The network includes an encoder and a Y-shaped decoder. Activation of the encoded features is used for the binary classification. The output of one branch of the decoder is used for segmenting the manipulated regions while that of the other branch is used for reconstructing the input, which helps improve overall performance. Experiments using the FaceForensics and FaceForensics++ databases demonstrated the network’s effectiveness against facial reenactment attacks and face swapping attacks as well as its ability to deal with the mismatch condition for previously seen attacks. Moreover, fine-tuning using just a small amount of data enables the network to deal with unseen attacks. |
Tasks | Face Swapping, Multi-Task Learning |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06876v1 |
https://arxiv.org/pdf/1906.06876v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-for-detecting-and |
Repo | https://github.com/nii-yamagishilab/ClassNSeg |
Framework | pytorch |
DEMO-Net: Degree-specific Graph Neural Networks for Node and Graph Classification
Title | DEMO-Net: Degree-specific Graph Neural Networks for Node and Graph Classification |
Authors | Jun Wu, Jingrui He, Jiejun Xu |
Abstract | Graph data widely exist in many high-impact applications. Inspired by the success of deep learning in grid-structured data, graph neural network models have been proposed to learn powerful node-level or graph-level representation. However, most of the existing graph neural networks suffer from the following limitations: (1) there is limited analysis regarding the graph convolution properties, such as seed-oriented, degree-aware and order-free; (2) the node’s degree-specific graph structure is not explicitly expressed in graph convolution for distinguishing structure-aware node neighborhoods; (3) the theoretical explanation regarding the graph-level pooling schemes is unclear. To address these problems, we propose a generic degree-specific graph neural network named DEMO-Net motivated by Weisfeiler-Lehman graph isomorphism test that recursively identifies 1-hop neighborhood structures. In order to explicitly capture the graph topology integrated with node attributes, we argue that graph convolution should have three properties: seed-oriented, degree-aware, order-free. To this end, we propose multi-task graph convolution where each task represents node representation learning for nodes with a specific degree value, thus leading to preserving the degree-specific graph structure. In particular, we design two multi-task learning methods: degree-specific weight and hashing functions for graph convolution. In addition, we propose a novel graph-level pooling/readout scheme for learning graph representation provably lying in a degree-specific Hilbert kernel space. The experimental results on several node and graph classification benchmark data sets demonstrate the effectiveness and efficiency of our proposed DEMO-Net over state-of-the-art graph neural network models. |
Tasks | Graph Classification, Multi-Task Learning, Representation Learning |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02319v1 |
https://arxiv.org/pdf/1906.02319v1.pdf | |
PWC | https://paperswithcode.com/paper/demo-net-degree-specific-graph-neural |
Repo | https://github.com/jwu4sml/DEMO-Net |
Framework | tf |
FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning
Title | FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning |
Authors | Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, Matthew Mattina |
Abstract | The computational demands of computer vision tasks based on state-of-the-art Convolutional Neural Network (CNN) image classification far exceed the energy budgets of mobile devices. This paper proposes FixyNN, which consists of a fixed-weight feature extractor that generates ubiquitous CNN features, and a conventional programmable CNN accelerator which processes a dataset-specific CNN. Image classification models for FixyNN are trained end-to-end via transfer learning, with the common feature extractor representing the transfered part, and the programmable part being learnt on the target dataset. Experimental results demonstrate FixyNN hardware can achieve very high energy efficiencies up to 26.6 TOPS/W ($4.81 \times$ better than iso-area programmable accelerator). Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1%$ resulting in up to 11.2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area. |
Tasks | Image Classification, Transfer Learning |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.11128v1 |
http://arxiv.org/pdf/1902.11128v1.pdf | |
PWC | https://paperswithcode.com/paper/fixynn-efficient-hardware-for-mobile-computer |
Repo | https://github.com/ARM-software/DeepFreeze |
Framework | tf |
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
Title | ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks |
Authors | Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, Qinghua Hu |
Abstract | Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via $1D$ convolution. Furthermore, we develop a method to adaptively select kernel size of $1D$ convolution, determining coverage of local cross-channel interaction. The proposed ECA module is efficient yet effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFLOPs vs. 3.86 GFLOPs, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts. |
Tasks | Dimensionality Reduction, Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03151v3 |
https://arxiv.org/pdf/1910.03151v3.pdf | |
PWC | https://paperswithcode.com/paper/eca-net-efficient-channel-attention-for-deep |
Repo | https://github.com/JinLi711/Convolution_Variants |
Framework | tf |
Information-Theoretic Understanding of Population Risk Improvement with Model Compression
Title | Information-Theoretic Understanding of Population Risk Improvement with Model Compression |
Authors | Yuheng Bu, Weihao Gao, Shaofeng Zou, Venugopal V. Veeravalli |
Abstract | We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces an information-theoretic bound on the generalization error; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. These results imply that the population risk could be improved by model compression if the decrease in generalization error exceeds the increase in empirical risk. We show through a linear regression example that such a decrease in population risk due to model compression is indeed possible. Our theoretical results further suggest that the Hessian-weighted $K$-means clustering compression approach can be improved by regularizing the distance between the clustering centers. We provide experiments with neural networks to support our theoretical assertions. |
Tasks | Model Compression |
Published | 2019-01-27 |
URL | http://arxiv.org/abs/1901.09421v1 |
http://arxiv.org/pdf/1901.09421v1.pdf | |
PWC | https://paperswithcode.com/paper/information-theoretic-understanding-of |
Repo | https://github.com/wgao9/weight_quant |
Framework | pytorch |
Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
Title | Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization |
Authors | Wataru Hirota, Yoshihiko Suhara, Behzad Golshan, Wang-Chiew Tan |
Abstract | We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The semantic classifier improves the semantic similarity of related sentences, whereas the language discriminator enhances the multilinguality of the embeddings via multilingual adversarial training. Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data. |
Tasks | Intent Classification, Semantic Similarity, Semantic Textual Similarity, Sentence Embedding, Sentence Embeddings |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06731v2 |
https://arxiv.org/pdf/1909.06731v2.pdf | |
PWC | https://paperswithcode.com/paper/emu-enhancing-multilingual-sentence |
Repo | https://github.com/megagonlabs/emu |
Framework | none |
Query Learning Algorithm for Residual Symbolic Finite Automata
Title | Query Learning Algorithm for Residual Symbolic Finite Automata |
Authors | Kaizaburo Chubachi, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara |
Abstract | We propose a query learning algorithm for residual symbolic finite automata (RSFAs). Symbolic finite automata (SFAs) are finite automata whose transitions are labeled by predicates over a Boolean algebra, in which a big collection of characters leading the same transition may be represented by a single predicate. Residual finite automata (RFAs) are a special type of non-deterministic finite automata which can be exponentially smaller than the minimum deterministic finite automata and have a favorable property for learning algorithms. RSFAs have both properties of SFAs and RFAs and can have more succinct representation of transitions and fewer states than RFAs and deterministic SFAs accepting the same language. The implementation of our algorithm efficiently learns RSFAs over a huge alphabet and outperforms an existing learning algorithm for deterministic SFAs. The result also shows that the benefit of non-determinism in efficiency is even larger in learning SFAs than non-symbolic automata. |
Tasks | |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07417v3 |
https://arxiv.org/pdf/1902.07417v3.pdf | |
PWC | https://paperswithcode.com/paper/query-learning-algorithm-for-residual |
Repo | https://github.com/ushitora/RSFA-QueryLearning |
Framework | none |