October 21, 2019

3162 words 15 mins read

Paper Group AWR 16

Paper Group AWR 16

Novel Prediction Techniques Based on Clusterwise Linear Regression. Multitask Learning for Fundamental Frequency Estimation in Music. QuaterNet: A Quaternion-based Recurrent Model for Human Motion. Quantizing deep convolutional networks for efficient inference: A whitepaper. Deep Reinforcement Learning for Event-Triggered Control. RTSeg: Real-time …

Novel Prediction Techniques Based on Clusterwise Linear Regression

Title Novel Prediction Techniques Based on Clusterwise Linear Regression
Authors Igor Gitman, Jieshi Chen, Eric Lei, Artur Dubrawski
Abstract In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of user-specified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLR-based regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression.
Tasks
Published 2018-04-28
URL http://arxiv.org/abs/1804.10742v1
PDF http://arxiv.org/pdf/1804.10742v1.pdf
PWC https://paperswithcode.com/paper/novel-prediction-techniques-based-on
Repo https://github.com/Kipok/clr_prediction
Framework none

Multitask Learning for Fundamental Frequency Estimation in Music

Title Multitask Learning for Fundamental Frequency Estimation in Music
Authors Rachel M. Bittner, Brian McFee, Juan P. Bello
Abstract Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately, and only recently, using learning-based approaches. We present a multitask deep learning architecture that jointly estimates outputs for various tasks including multiple-f0, melody, vocal and bass line estimation, and is trained using a large, semi-automatically annotated dataset. We show that the multitask model outperforms its single-task counterparts, and explore the effect of various design decisions in our approach, and show that it performs better or at least competitively when compared against strong baseline methods.
Tasks
Published 2018-09-02
URL http://arxiv.org/abs/1809.00381v1
PDF http://arxiv.org/pdf/1809.00381v1.pdf
PWC https://paperswithcode.com/paper/multitask-learning-for-fundamental-frequency
Repo https://github.com/rabitt/multitask-f0
Framework none

QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Title QuaterNet: A Quaternion-based Recurrent Model for Human Motion
Authors Dario Pavllo, David Grangier, Michael Auli
Abstract Deep learning for predicting or generating 3D human pose sequences is an active research area. Previous work regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angle or exponential map parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configurations. This work addresses both limitations. Our recurrent network, QuaterNet, represents rotations with quaternions and our loss function performs forward kinematics on a skeleton to penalize absolute position errors instead of angle errors. On short-term predictions, QuaterNet improves the state-of-the-art quantitatively. For long-term generation, our approach is qualitatively judged as realistic as recent neural strategies from the graphics literature.
Tasks 3D Human Pose Estimation, Motion Estimation
Published 2018-05-16
URL http://arxiv.org/abs/1805.06485v2
PDF http://arxiv.org/pdf/1805.06485v2.pdf
PWC https://paperswithcode.com/paper/quaternet-a-quaternion-based-recurrent-model
Repo https://github.com/facebookresearch/QuaterNet
Framework pytorch

Quantizing deep convolutional networks for efficient inference: A whitepaper

Title Quantizing deep convolutional networks for efficient inference: A whitepaper
Authors Raghuraman Krishnamoorthi
Abstract We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.
Tasks Quantization
Published 2018-06-21
URL http://arxiv.org/abs/1806.08342v1
PDF http://arxiv.org/pdf/1806.08342v1.pdf
PWC https://paperswithcode.com/paper/quantizing-deep-convolutional-networks-for
Repo https://github.com/li-weihua/notes
Framework none

Deep Reinforcement Learning for Event-Triggered Control

Title Deep Reinforcement Learning for Event-Triggered Control
Authors Dominik Baumann, Jia-Jie Zhu, Georg Martius, Sebastian Trimpe
Abstract Event-triggered control (ETC) methods can achieve high-performance control with a significantly lower number of samples compared to usual, time-triggered methods. These frameworks are often based on a mathematical model of the system and specific designs of controller and event trigger. In this paper, we show how deep reinforcement learning (DRL) algorithms can be leveraged to simultaneously learn control and communication behavior from scratch, and present a DRL approach that is particularly suitable for ETC. To our knowledge, this is the first work to apply DRL to ETC. We validate the approach on multiple control tasks and compare it to model-based event-triggering frameworks. In particular, we demonstrate that it can, other than many model-based ETC designs, be straightforwardly applied to nonlinear systems.
Tasks
Published 2018-09-13
URL http://arxiv.org/abs/1809.05152v1
PDF http://arxiv.org/pdf/1809.05152v1.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-for-event
Repo https://github.com/jj-zhu/resource_aware_control_rl
Framework none

RTSeg: Real-time Semantic Segmentation Comparative Study

Title RTSeg: Real-time Semantic Segmentation Comparative Study
Authors Mennatullah Siam, Mostafa Gamal, Moemen Abdel-Razek, Senthil Yogamani, Martin Jagersand
Abstract Semantic segmentation benefits robotics related applications especially autonomous driving. Most of the research on semantic segmentation is only on increasing the accuracy of segmentation models with little attention to computationally efficient solutions. The few work conducted in this direction does not provide principled methods to evaluate the different design choices for segmentation. In this paper, we address this gap by presenting a real-time semantic segmentation benchmarking framework with a decoupled design for feature extraction and decoding methods. The framework is comprised of different network architectures for feature extraction such as VGG16, Resnet18, MobileNet, and ShuffleNet. It is also comprised of multiple meta-architectures for segmentation that define the decoding methodology. These include SkipNet, UNet, and Dilation Frontend. Experimental results are presented on the Cityscapes dataset for urban scenes. The modular design allows novel architectures to emerge, that lead to 143x GFLOPs reduction in comparison to SegNet. This benchmarking framework is publicly available at “https://github.com/MSiam/TFSegmentation".
Tasks Autonomous Driving, Real-Time Semantic Segmentation, Semantic Segmentation
Published 2018-03-07
URL https://arxiv.org/abs/1803.02758v4
PDF https://arxiv.org/pdf/1803.02758v4.pdf
PWC https://paperswithcode.com/paper/rtseg-real-time-semantic-segmentation
Repo https://github.com/Davidnet/TFSegmentation
Framework tf

Image Inpainting via Generative Multi-column Convolutional Neural Networks

Title Image Inpainting via Generative Multi-column Convolutional Neural Networks
Authors Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, Jiaya Jia
Abstract In this paper, we propose a generative multi-column network for image inpainting. This network synthesizes different image components in a parallel manner within one stage. To better characterize global structures, we design a confidence-driven reconstruction loss while an implicit diversified MRF regularization is adopted to enhance local details. The multi-column network combined with the reconstruction and MRF loss propagates local and global information derived from context to the target inpainting regions. Extensive experiments on challenging street view, face, natural objects and scenes manifest that our method produces visual compelling results even without previously common post-processing.
Tasks Image Inpainting
Published 2018-10-20
URL http://arxiv.org/abs/1810.08771v1
PDF http://arxiv.org/pdf/1810.08771v1.pdf
PWC https://paperswithcode.com/paper/image-inpainting-via-generative-multi-column
Repo https://github.com/tlatkowski/inpainting-gmcnn-keras
Framework tf

Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction

Title Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction
Authors Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Costas Bekas, Alpha A Lee
Abstract Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.
Tasks Chemical Reaction Prediction, Machine Translation
Published 2018-11-06
URL https://arxiv.org/abs/1811.02633v2
PDF https://arxiv.org/pdf/1811.02633v2.pdf
PWC https://paperswithcode.com/paper/molecular-transformer-for-chemical-reaction
Repo https://github.com/pschwllr/MolecularTransformer
Framework pytorch

Position-aware Self-attention with Relative Positional Encodings for Slot Filling

Title Position-aware Self-attention with Relative Positional Encodings for Slot Filling
Authors Ivan Bilan, Benjamin Roth
Abstract This paper describes how to apply self-attention with relative positional encodings to the task of relation extraction. We propose to use the self-attention encoder layer together with an additional position-aware attention layer that takes into account positions of the query and the object in the sentence. The self-attention encoder also uses a custom implementation of relative positional encodings which allow each word in the sentence to take into account its left and right context. The evaluation of the model is done on the TACRED dataset. The proposed model relies only on attention (no recurrent or convolutional layers are used), while improving performance w.r.t. the previous state of the art.
Tasks Relation Extraction, Slot Filling
Published 2018-07-09
URL http://arxiv.org/abs/1807.03052v1
PDF http://arxiv.org/pdf/1807.03052v1.pdf
PWC https://paperswithcode.com/paper/position-aware-self-attention-with-relative
Repo https://github.com/ivan-bilan/tac-self-attention
Framework pytorch

MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip

Title MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip
Authors Somdip Dey, Grigorios Kalliatakis, Sangeet Saha, Amit Kumar Singh, Shoaib Ehsan, Klaus McDonald-Maier
Abstract Intelligent Transportation Systems (ITS) have become an important pillar in modern “smart city” framework which demands intelligent involvement of machines. Traffic load recognition can be categorized as an important and challenging issue for such systems. Recently, Convolutional Neural Network (CNN) models have drawn considerable amount of interest in many areas such as weather classification, human rights violation detection through images, due to its accurate prediction capabilities. This work tackles real-life traffic load recognition problem on System-On-a-Programmable-Chip (SOPC) platform and coin it as MAT-CNN- SOPC, which uses an intelligent re-training mechanism of the CNN with known environments. The proposed methodology is capable of enhancing the efficacy of the approach by 2.44x in comparison to the state-of-art and proven through experimental analysis. We have also introduced a mathematical equation, which is capable of quantifying the suitability of using different CNN models over the other for a particular application based implementation.
Tasks
Published 2018-07-05
URL http://arxiv.org/abs/1807.02098v2
PDF http://arxiv.org/pdf/1807.02098v2.pdf
PWC https://paperswithcode.com/paper/mat-cnn-sopc-motionless-analysis-of-traffic
Repo https://github.com/somdipdey/MAT-CNN-SOPC
Framework none

Universal Sentence Encoder

Title Universal Sentence Encoder
Authors Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
Abstract We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.
Tasks Semantic Textual Similarity, Sentence Embeddings, Sentiment Analysis, Subjectivity Analysis, Text Classification, Transfer Learning, Word Embeddings
Published 2018-03-29
URL http://arxiv.org/abs/1803.11175v2
PDF http://arxiv.org/pdf/1803.11175v2.pdf
PWC https://paperswithcode.com/paper/universal-sentence-encoder
Repo https://github.com/facebookresearch/InferSent
Framework pytorch

Selective Refinement Network for High Performance Face Detection

Title Selective Refinement Network for High Performance Face Detection
Authors Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z. Li, Xudong Zou
Abstract High performance face detection remains a very challenging problem, especially when there exists many tiny faces. This paper presents a novel single-shot face detector, named Selective Refinement Network (SRN), which introduces novel two-step classification and regression operations selectively into an anchor-based face detector to reduce false positives and improve location accuracy simultaneously. In particular, the SRN consists of two modules: the Selective Two-step Classification (STC) module and the Selective Two-step Regression (STR) module. The STC aims to filter out most simple negative anchors from low level detection layers to reduce the search space for the subsequent classifier, while the STR is designed to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the subsequent regressor. Moreover, we design a Receptive Field Enhancement (RFE) block to provide more diverse receptive field, which helps to better capture faces in some extreme poses. As a consequence, the proposed SRN detector achieves state-of-the-art performance on all the widely used face detection benchmarks, including AFW, PASCAL face, FDDB, and WIDER FACE datasets. Codes will be released to facilitate further studies on the face detection problem.
Tasks Face Detection
Published 2018-09-07
URL http://arxiv.org/abs/1809.02693v1
PDF http://arxiv.org/pdf/1809.02693v1.pdf
PWC https://paperswithcode.com/paper/selective-refinement-network-for-high
Repo https://github.com/faridSam/srn
Framework pytorch

Shape Robust Text Detection with Progressive Scale Expansion Network

Title Shape Robust Text Detection with Progressive Scale Expansion Network
Authors Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, Jian Yang
Abstract The challenges of shape robust text detection lie in two aspects: 1) most existing quadrangular bounding box based detectors are difficult to locate texts with arbitrary shapes, which are hard to be enclosed perfectly in a rectangle; 2) most pixel-wise segmentation-based detectors may not separate the text instances that are very close to each other. To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance. These predictions correspond to different `kernels’ produced by shrinking the original text instance into various scales. Consequently, the final detection can be conducted through our progressive scale expansion algorithm which gradually expands the kernels with minimal scales to the text instances with maximal and complete shapes. Due to the fact that there are large geometrical margins among these minimal kernels, our method is effective to distinguish the adjacent text instances and is robust to arbitrary shapes. The state-of-the-art results on ICDAR 2015 and ICDAR 2017 MLT benchmarks further confirm the great effectiveness of PSENet. Notably, PSENet outperforms the previous best record by absolute 6.37% on the curve text dataset SCUT-CTW1500. Code will be available in https://github.com/whai362/PSENet. |
Tasks Curved Text Detection, Scene Text Detection
Published 2018-06-07
URL http://arxiv.org/abs/1806.02559v1
PDF http://arxiv.org/pdf/1806.02559v1.pdf
PWC https://paperswithcode.com/paper/shape-robust-text-detection-with-progressive
Repo https://github.com/whai362/PSENet
Framework tf

Robust Face Detection via Learning Small Faces on Hard Images

Title Robust Face Detection via Learning Small Faces on Hard Images
Authors Zhishuai Zhang, Wei Shen, Siyuan Qiao, Yan Wang, Bo Wang, Alan Yuille
Abstract Recent anchor-based deep face detectors have achieved promising performance, but they are still struggling to detect hard faces, such as small, blurred and partially occluded faces. A reason is that they treat all images and faces equally, without putting more effort on hard ones; however, many training images only contain easy faces, which are less helpful to achieve better performance on hard images. In this paper, we propose that the robustness of a face detector against hard faces can be improved by learning small faces on hard images. Our intuitions are (1) hard images are the images which contain at least one hard face, thus they facilitate training robust face detectors; (2) most hard faces are small faces and other types of hard faces can be easily converted to small faces by shrinking. We build an anchor-based deep face detector, which only output a single feature map with small anchors, to specifically learn small faces and train it by a novel hard image mining strategy. Extensive experiments have been conducted on WIDER FACE, FDDB, Pascal Faces, and AFW datasets to show the effectiveness of our method. Our method achieves APs of 95.7, 94.9 and 89.7 on easy, medium and hard WIDER FACE val dataset respectively, which surpass the previous state-of-the-arts, especially on the hard subset. Code and model are available at https://github.com/bairdzhang/smallhardface.
Tasks Face Detection
Published 2018-11-28
URL http://arxiv.org/abs/1811.11662v1
PDF http://arxiv.org/pdf/1811.11662v1.pdf
PWC https://paperswithcode.com/paper/robust-face-detection-via-learning-small
Repo https://github.com/bairdzhang/smallhardface
Framework none

Few-shot Object Detection via Feature Reweighting

Title Few-shot Object Detection via Feature Reweighting
Authors Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell
Abstract Conventional training of a deep CNN based object detector demands a large number of bounding box annotations, which may be unavailable for rare categories. In this work we develop a few-shot object detector that can learn to detect novel objects from only a few annotated examples. Our proposed model leverages fully labeled base classes and quickly adapts to novel classes, using a meta feature learner and a reweighting module within a one-stage detection architecture. The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples. The reweighting module transforms a few support examples from the novel classes to a global vector that indicates the importance or relevance of meta features for detecting the corresponding objects. These two modules, together with a detection prediction module, are trained end-to-end based on an episodic few-shot learning scheme and a carefully designed loss function. Through extensive experiments we demonstrate that our model outperforms well-established baselines by a large margin for few-shot object detection, on multiple datasets and settings. We also present analysis on various aspects of our proposed model, aiming to provide some inspiration for future few-shot detection works.
Tasks Few-Shot Learning, Few-Shot Object Detection, Image Classification, Meta-Learning, Object Detection
Published 2018-12-05
URL https://arxiv.org/abs/1812.01866v2
PDF https://arxiv.org/pdf/1812.01866v2.pdf
PWC https://paperswithcode.com/paper/few-shot-object-detection-via-feature
Repo https://github.com/Ze-Yang/Context-Transformer
Framework pytorch
comments powered by Disqus