Paper Group AWR 16
Novel Prediction Techniques Based on Clusterwise Linear Regression. Multitask Learning for Fundamental Frequency Estimation in Music. QuaterNet: A Quaternion-based Recurrent Model for Human Motion. Quantizing deep convolutional networks for efficient inference: A whitepaper. Deep Reinforcement Learning for Event-Triggered Control. RTSeg: Real-time …
Novel Prediction Techniques Based on Clusterwise Linear Regression
Title | Novel Prediction Techniques Based on Clusterwise Linear Regression |
Authors | Igor Gitman, Jieshi Chen, Eric Lei, Artur Dubrawski |
Abstract | In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of user-specified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLR-based regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression. |
Tasks | |
Published | 2018-04-28 |
URL | http://arxiv.org/abs/1804.10742v1 |
http://arxiv.org/pdf/1804.10742v1.pdf | |
PWC | https://paperswithcode.com/paper/novel-prediction-techniques-based-on |
Repo | https://github.com/Kipok/clr_prediction |
Framework | none |
Multitask Learning for Fundamental Frequency Estimation in Music
Title | Multitask Learning for Fundamental Frequency Estimation in Music |
Authors | Rachel M. Bittner, Brian McFee, Juan P. Bello |
Abstract | Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately, and only recently, using learning-based approaches. We present a multitask deep learning architecture that jointly estimates outputs for various tasks including multiple-f0, melody, vocal and bass line estimation, and is trained using a large, semi-automatically annotated dataset. We show that the multitask model outperforms its single-task counterparts, and explore the effect of various design decisions in our approach, and show that it performs better or at least competitively when compared against strong baseline methods. |
Tasks | |
Published | 2018-09-02 |
URL | http://arxiv.org/abs/1809.00381v1 |
http://arxiv.org/pdf/1809.00381v1.pdf | |
PWC | https://paperswithcode.com/paper/multitask-learning-for-fundamental-frequency |
Repo | https://github.com/rabitt/multitask-f0 |
Framework | none |
QuaterNet: A Quaternion-based Recurrent Model for Human Motion
Title | QuaterNet: A Quaternion-based Recurrent Model for Human Motion |
Authors | Dario Pavllo, David Grangier, Michael Auli |
Abstract | Deep learning for predicting or generating 3D human pose sequences is an active research area. Previous work regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angle or exponential map parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configurations. This work addresses both limitations. Our recurrent network, QuaterNet, represents rotations with quaternions and our loss function performs forward kinematics on a skeleton to penalize absolute position errors instead of angle errors. On short-term predictions, QuaterNet improves the state-of-the-art quantitatively. For long-term generation, our approach is qualitatively judged as realistic as recent neural strategies from the graphics literature. |
Tasks | 3D Human Pose Estimation, Motion Estimation |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06485v2 |
http://arxiv.org/pdf/1805.06485v2.pdf | |
PWC | https://paperswithcode.com/paper/quaternet-a-quaternion-based-recurrent-model |
Repo | https://github.com/facebookresearch/QuaterNet |
Framework | pytorch |
Quantizing deep convolutional networks for efficient inference: A whitepaper
Title | Quantizing deep convolutional networks for efficient inference: A whitepaper |
Authors | Raghuraman Krishnamoorthi |
Abstract | We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits. |
Tasks | Quantization |
Published | 2018-06-21 |
URL | http://arxiv.org/abs/1806.08342v1 |
http://arxiv.org/pdf/1806.08342v1.pdf | |
PWC | https://paperswithcode.com/paper/quantizing-deep-convolutional-networks-for |
Repo | https://github.com/li-weihua/notes |
Framework | none |
Deep Reinforcement Learning for Event-Triggered Control
Title | Deep Reinforcement Learning for Event-Triggered Control |
Authors | Dominik Baumann, Jia-Jie Zhu, Georg Martius, Sebastian Trimpe |
Abstract | Event-triggered control (ETC) methods can achieve high-performance control with a significantly lower number of samples compared to usual, time-triggered methods. These frameworks are often based on a mathematical model of the system and specific designs of controller and event trigger. In this paper, we show how deep reinforcement learning (DRL) algorithms can be leveraged to simultaneously learn control and communication behavior from scratch, and present a DRL approach that is particularly suitable for ETC. To our knowledge, this is the first work to apply DRL to ETC. We validate the approach on multiple control tasks and compare it to model-based event-triggering frameworks. In particular, we demonstrate that it can, other than many model-based ETC designs, be straightforwardly applied to nonlinear systems. |
Tasks | |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.05152v1 |
http://arxiv.org/pdf/1809.05152v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-event |
Repo | https://github.com/jj-zhu/resource_aware_control_rl |
Framework | none |
RTSeg: Real-time Semantic Segmentation Comparative Study
Title | RTSeg: Real-time Semantic Segmentation Comparative Study |
Authors | Mennatullah Siam, Mostafa Gamal, Moemen Abdel-Razek, Senthil Yogamani, Martin Jagersand |
Abstract | Semantic segmentation benefits robotics related applications especially autonomous driving. Most of the research on semantic segmentation is only on increasing the accuracy of segmentation models with little attention to computationally efficient solutions. The few work conducted in this direction does not provide principled methods to evaluate the different design choices for segmentation. In this paper, we address this gap by presenting a real-time semantic segmentation benchmarking framework with a decoupled design for feature extraction and decoding methods. The framework is comprised of different network architectures for feature extraction such as VGG16, Resnet18, MobileNet, and ShuffleNet. It is also comprised of multiple meta-architectures for segmentation that define the decoding methodology. These include SkipNet, UNet, and Dilation Frontend. Experimental results are presented on the Cityscapes dataset for urban scenes. The modular design allows novel architectures to emerge, that lead to 143x GFLOPs reduction in comparison to SegNet. This benchmarking framework is publicly available at “https://github.com/MSiam/TFSegmentation". |
Tasks | Autonomous Driving, Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2018-03-07 |
URL | https://arxiv.org/abs/1803.02758v4 |
https://arxiv.org/pdf/1803.02758v4.pdf | |
PWC | https://paperswithcode.com/paper/rtseg-real-time-semantic-segmentation |
Repo | https://github.com/Davidnet/TFSegmentation |
Framework | tf |
Image Inpainting via Generative Multi-column Convolutional Neural Networks
Title | Image Inpainting via Generative Multi-column Convolutional Neural Networks |
Authors | Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, Jiaya Jia |
Abstract | In this paper, we propose a generative multi-column network for image inpainting. This network synthesizes different image components in a parallel manner within one stage. To better characterize global structures, we design a confidence-driven reconstruction loss while an implicit diversified MRF regularization is adopted to enhance local details. The multi-column network combined with the reconstruction and MRF loss propagates local and global information derived from context to the target inpainting regions. Extensive experiments on challenging street view, face, natural objects and scenes manifest that our method produces visual compelling results even without previously common post-processing. |
Tasks | Image Inpainting |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.08771v1 |
http://arxiv.org/pdf/1810.08771v1.pdf | |
PWC | https://paperswithcode.com/paper/image-inpainting-via-generative-multi-column |
Repo | https://github.com/tlatkowski/inpainting-gmcnn-keras |
Framework | tf |
Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction
Title | Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction |
Authors | Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Costas Bekas, Alpha A Lee |
Abstract | Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable. |
Tasks | Chemical Reaction Prediction, Machine Translation |
Published | 2018-11-06 |
URL | https://arxiv.org/abs/1811.02633v2 |
https://arxiv.org/pdf/1811.02633v2.pdf | |
PWC | https://paperswithcode.com/paper/molecular-transformer-for-chemical-reaction |
Repo | https://github.com/pschwllr/MolecularTransformer |
Framework | pytorch |
Position-aware Self-attention with Relative Positional Encodings for Slot Filling
Title | Position-aware Self-attention with Relative Positional Encodings for Slot Filling |
Authors | Ivan Bilan, Benjamin Roth |
Abstract | This paper describes how to apply self-attention with relative positional encodings to the task of relation extraction. We propose to use the self-attention encoder layer together with an additional position-aware attention layer that takes into account positions of the query and the object in the sentence. The self-attention encoder also uses a custom implementation of relative positional encodings which allow each word in the sentence to take into account its left and right context. The evaluation of the model is done on the TACRED dataset. The proposed model relies only on attention (no recurrent or convolutional layers are used), while improving performance w.r.t. the previous state of the art. |
Tasks | Relation Extraction, Slot Filling |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03052v1 |
http://arxiv.org/pdf/1807.03052v1.pdf | |
PWC | https://paperswithcode.com/paper/position-aware-self-attention-with-relative |
Repo | https://github.com/ivan-bilan/tac-self-attention |
Framework | pytorch |
MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip
Title | MAT-CNN-SOPC: Motionless Analysis of Traffic Using Convolutional Neural Networks on System-On-a-Programmable-Chip |
Authors | Somdip Dey, Grigorios Kalliatakis, Sangeet Saha, Amit Kumar Singh, Shoaib Ehsan, Klaus McDonald-Maier |
Abstract | Intelligent Transportation Systems (ITS) have become an important pillar in modern “smart city” framework which demands intelligent involvement of machines. Traffic load recognition can be categorized as an important and challenging issue for such systems. Recently, Convolutional Neural Network (CNN) models have drawn considerable amount of interest in many areas such as weather classification, human rights violation detection through images, due to its accurate prediction capabilities. This work tackles real-life traffic load recognition problem on System-On-a-Programmable-Chip (SOPC) platform and coin it as MAT-CNN- SOPC, which uses an intelligent re-training mechanism of the CNN with known environments. The proposed methodology is capable of enhancing the efficacy of the approach by 2.44x in comparison to the state-of-art and proven through experimental analysis. We have also introduced a mathematical equation, which is capable of quantifying the suitability of using different CNN models over the other for a particular application based implementation. |
Tasks | |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.02098v2 |
http://arxiv.org/pdf/1807.02098v2.pdf | |
PWC | https://paperswithcode.com/paper/mat-cnn-sopc-motionless-analysis-of-traffic |
Repo | https://github.com/somdipdey/MAT-CNN-SOPC |
Framework | none |
Universal Sentence Encoder
Title | Universal Sentence Encoder |
Authors | Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil |
Abstract | We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub. |
Tasks | Semantic Textual Similarity, Sentence Embeddings, Sentiment Analysis, Subjectivity Analysis, Text Classification, Transfer Learning, Word Embeddings |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11175v2 |
http://arxiv.org/pdf/1803.11175v2.pdf | |
PWC | https://paperswithcode.com/paper/universal-sentence-encoder |
Repo | https://github.com/facebookresearch/InferSent |
Framework | pytorch |
Selective Refinement Network for High Performance Face Detection
Title | Selective Refinement Network for High Performance Face Detection |
Authors | Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z. Li, Xudong Zou |
Abstract | High performance face detection remains a very challenging problem, especially when there exists many tiny faces. This paper presents a novel single-shot face detector, named Selective Refinement Network (SRN), which introduces novel two-step classification and regression operations selectively into an anchor-based face detector to reduce false positives and improve location accuracy simultaneously. In particular, the SRN consists of two modules: the Selective Two-step Classification (STC) module and the Selective Two-step Regression (STR) module. The STC aims to filter out most simple negative anchors from low level detection layers to reduce the search space for the subsequent classifier, while the STR is designed to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the subsequent regressor. Moreover, we design a Receptive Field Enhancement (RFE) block to provide more diverse receptive field, which helps to better capture faces in some extreme poses. As a consequence, the proposed SRN detector achieves state-of-the-art performance on all the widely used face detection benchmarks, including AFW, PASCAL face, FDDB, and WIDER FACE datasets. Codes will be released to facilitate further studies on the face detection problem. |
Tasks | Face Detection |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02693v1 |
http://arxiv.org/pdf/1809.02693v1.pdf | |
PWC | https://paperswithcode.com/paper/selective-refinement-network-for-high |
Repo | https://github.com/faridSam/srn |
Framework | pytorch |
Shape Robust Text Detection with Progressive Scale Expansion Network
Title | Shape Robust Text Detection with Progressive Scale Expansion Network |
Authors | Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, Jian Yang |
Abstract | The challenges of shape robust text detection lie in two aspects: 1) most existing quadrangular bounding box based detectors are difficult to locate texts with arbitrary shapes, which are hard to be enclosed perfectly in a rectangle; 2) most pixel-wise segmentation-based detectors may not separate the text instances that are very close to each other. To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance. These predictions correspond to different `kernels’ produced by shrinking the original text instance into various scales. Consequently, the final detection can be conducted through our progressive scale expansion algorithm which gradually expands the kernels with minimal scales to the text instances with maximal and complete shapes. Due to the fact that there are large geometrical margins among these minimal kernels, our method is effective to distinguish the adjacent text instances and is robust to arbitrary shapes. The state-of-the-art results on ICDAR 2015 and ICDAR 2017 MLT benchmarks further confirm the great effectiveness of PSENet. Notably, PSENet outperforms the previous best record by absolute 6.37% on the curve text dataset SCUT-CTW1500. Code will be available in https://github.com/whai362/PSENet. | |
Tasks | Curved Text Detection, Scene Text Detection |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02559v1 |
http://arxiv.org/pdf/1806.02559v1.pdf | |
PWC | https://paperswithcode.com/paper/shape-robust-text-detection-with-progressive |
Repo | https://github.com/whai362/PSENet |
Framework | tf |
Robust Face Detection via Learning Small Faces on Hard Images
Title | Robust Face Detection via Learning Small Faces on Hard Images |
Authors | Zhishuai Zhang, Wei Shen, Siyuan Qiao, Yan Wang, Bo Wang, Alan Yuille |
Abstract | Recent anchor-based deep face detectors have achieved promising performance, but they are still struggling to detect hard faces, such as small, blurred and partially occluded faces. A reason is that they treat all images and faces equally, without putting more effort on hard ones; however, many training images only contain easy faces, which are less helpful to achieve better performance on hard images. In this paper, we propose that the robustness of a face detector against hard faces can be improved by learning small faces on hard images. Our intuitions are (1) hard images are the images which contain at least one hard face, thus they facilitate training robust face detectors; (2) most hard faces are small faces and other types of hard faces can be easily converted to small faces by shrinking. We build an anchor-based deep face detector, which only output a single feature map with small anchors, to specifically learn small faces and train it by a novel hard image mining strategy. Extensive experiments have been conducted on WIDER FACE, FDDB, Pascal Faces, and AFW datasets to show the effectiveness of our method. Our method achieves APs of 95.7, 94.9 and 89.7 on easy, medium and hard WIDER FACE val dataset respectively, which surpass the previous state-of-the-arts, especially on the hard subset. Code and model are available at https://github.com/bairdzhang/smallhardface. |
Tasks | Face Detection |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11662v1 |
http://arxiv.org/pdf/1811.11662v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-face-detection-via-learning-small |
Repo | https://github.com/bairdzhang/smallhardface |
Framework | none |
Few-shot Object Detection via Feature Reweighting
Title | Few-shot Object Detection via Feature Reweighting |
Authors | Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell |
Abstract | Conventional training of a deep CNN based object detector demands a large number of bounding box annotations, which may be unavailable for rare categories. In this work we develop a few-shot object detector that can learn to detect novel objects from only a few annotated examples. Our proposed model leverages fully labeled base classes and quickly adapts to novel classes, using a meta feature learner and a reweighting module within a one-stage detection architecture. The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples. The reweighting module transforms a few support examples from the novel classes to a global vector that indicates the importance or relevance of meta features for detecting the corresponding objects. These two modules, together with a detection prediction module, are trained end-to-end based on an episodic few-shot learning scheme and a carefully designed loss function. Through extensive experiments we demonstrate that our model outperforms well-established baselines by a large margin for few-shot object detection, on multiple datasets and settings. We also present analysis on various aspects of our proposed model, aiming to provide some inspiration for future few-shot detection works. |
Tasks | Few-Shot Learning, Few-Shot Object Detection, Image Classification, Meta-Learning, Object Detection |
Published | 2018-12-05 |
URL | https://arxiv.org/abs/1812.01866v2 |
https://arxiv.org/pdf/1812.01866v2.pdf | |
PWC | https://paperswithcode.com/paper/few-shot-object-detection-via-feature |
Repo | https://github.com/Ze-Yang/Context-Transformer |
Framework | pytorch |