October 20, 2019

3015 words 15 mins read

Paper Group AWR 330

Paper Group AWR 330

NCRF++: An Open-source Neural Sequence Labeling Toolkit. PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image. Building a Conversational Agent Overnight with Dialogue Self-Play. Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation. Scan2CAD: Learning CAD Model Alignment in RGB-D Sc …

NCRF++: An Open-source Neural Sequence Labeling Toolkit

Title NCRF++: An Open-source Neural Sequence Labeling Toolkit
Authors Jie Yang, Yue Zhang
Abstract This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most state-of-the-art neural sequence labeling models such as LSTM-CRF, facilitating reproducing and refinement on those methods.
Tasks Chunking, Named Entity Recognition, Part-Of-Speech Tagging
Published 2018-06-14
URL http://arxiv.org/abs/1806.05626v2
PDF http://arxiv.org/pdf/1806.05626v2.pdf
PWC https://paperswithcode.com/paper/ncrf-an-open-source-neural-sequence-labeling
Repo https://github.com/jiesutd/NCRFpp
Framework pytorch

PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image

Title PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image
Authors Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, Jan Kautz
Abstract This paper proposes a deep neural architecture, PlaneRCNN, that detects and reconstructs piecewise planar surfaces from a single RGB image. PlaneRCNN employs a variant of Mask R-CNN to detect planes with their plane parameters and segmentation masks. PlaneRCNN then jointly refines all the segmentation masks with a novel loss enforcing the consistency with a nearby view during training. The paper also presents a new benchmark with more fine-grained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics. PlaneRCNN makes an important step towards robust plane extraction, which would have an immediate impact on a wide range of applications including Robotics, Augmented Reality, and Virtual Reality.
Tasks 3D Plane Detection, 3D Reconstruction
Published 2018-12-10
URL http://arxiv.org/abs/1812.04072v2
PDF http://arxiv.org/pdf/1812.04072v2.pdf
PWC https://paperswithcode.com/paper/planercnn-3d-plane-detection-and
Repo https://github.com/NVlabs/planercnn
Framework pytorch

Building a Conversational Agent Overnight with Dialogue Self-Play

Title Building a Conversational Agent Overnight with Dialogue Self-Play
Authors Pararth Shah, Dilek Hakkani-Tür, Gokhan Tür, Abhinav Rastogi, Ankur Bapna, Neha Nayak, Larry Heck
Abstract We propose Machines Talking To Machines (M2M), a framework combining automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents for goal-oriented dialogues in arbitrary domains. M2M scales to new tasks with just a task schema and an API client from the dialogue system developer, but it is also customizable to cater to task-specific interactions. Compared to the Wizard-of-Oz approach for data collection, M2M achieves greater diversity and coverage of salient dialogue flows while maintaining the naturalness of individual utterances. In the first phase, a simulated user bot and a domain-agnostic system bot converse to exhaustively generate dialogue “outlines”, i.e. sequences of template utterances and their semantic parses. In the second phase, crowd workers provide contextual rewrites of the dialogues to make the utterances more natural while preserving their meaning. The entire process can finish within a few hours. We propose a new corpus of 3,000 dialogues spanning 2 domains collected with M2M, and present comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows.
Tasks
Published 2018-01-15
URL http://arxiv.org/abs/1801.04871v1
PDF http://arxiv.org/pdf/1801.04871v1.pdf
PWC https://paperswithcode.com/paper/building-a-conversational-agent-overnight
Repo https://github.com/marcomanciniunitn/Master-Thesis-Project
Framework none

Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation

Title Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation
Authors Tanya Nair, Doina Precup, Douglas L. Arnold, Tal Arbel
Abstract Deep learning (DL) networks have recently been shown to outperform other segmentation methods on various public, medical-image challenge datasets [3,11,16], especially for large pathologies. However, in the context of diseases such as Multiple Sclerosis (MS), monitoring all the focal lesions visible on MRI sequences, even very small ones, is essential for disease staging, prognosis, and evaluating treatment efficacy. Moreover, producing deterministic outputs hinders DL adoption into clinical routines. Uncertainty estimates for the predictions would permit subsequent revision by clinicians. We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images. Specifically, we develop a 3D MS lesion segmentation CNN, augmented to provide four different voxel-based uncertainty measures based on MC dropout. We train the network on a proprietary, large-scale, multi-site, multi-scanner, clinical MS dataset, and compute lesion-wise uncertainties by accumulating evidence from voxel-wise uncertainties within detected lesions. We analyze the performance of voxel-based segmentation and lesion-level detection by choosing operating points based on the uncertainty. Empirical evidence suggests that uncertainty measures consistently allow us to choose superior operating points compared only using the network’s sigmoid output as a probability.
Tasks Lesion Segmentation
Published 2018-08-03
URL http://arxiv.org/abs/1808.01200v2
PDF http://arxiv.org/pdf/1808.01200v2.pdf
PWC https://paperswithcode.com/paper/exploring-uncertainty-measures-in-deep
Repo https://github.com/tanyanair/segmentation_uncertainty
Framework tf

Scan2CAD: Learning CAD Model Alignment in RGB-D Scans

Title Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
Authors Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, Matthias Nießner
Abstract We present Scan2CAD, a novel data-driven method that learns to align clean 3D CAD models from a shape database to the noisy and incomplete geometry of a commodity RGB-D scan. For a 3D reconstruction of an indoor scene, our method takes as input a set of CAD models, and predicts a 9DoF pose that aligns each model to the underlying scan geometry. To tackle this problem, we create a new scan-to-CAD alignment dataset based on 1506 ScanNet scans with 97607 annotated keypoint pairs between 14225 CAD models from ShapeNet and their counterpart objects in the scans. Our method selects a set of representative keypoints in a 3D scan for which we find correspondences to the CAD geometry. To this end, we design a novel 3D CNN architecture that learns a joint embedding between real and synthetic objects, and from this predicts a correspondence heatmap. Based on these correspondence heatmaps, we formulate a variational energy minimization that aligns a given set of CAD models to the reconstruction. We evaluate our approach on our newly introduced Scan2CAD benchmark where we outperform both handcrafted feature descriptor as well as state-of-the-art CNN based methods by 21.39%.
Tasks 3D Reconstruction
Published 2018-11-27
URL http://arxiv.org/abs/1811.11187v1
PDF http://arxiv.org/pdf/1811.11187v1.pdf
PWC https://paperswithcode.com/paper/scan2cad-learning-cad-model-alignment-in-rgb
Repo https://github.com/skanti/Scan2CAD
Framework pytorch

Grid R-CNN

Title Grid R-CNN
Authors Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, Junjie Yan
Abstract This paper proposes a novel object detection framework named Grid R-CNN, which adopts a grid guided localization mechanism for accurate object detection. Different from the traditional regression based methods, the Grid R-CNN captures the spatial information explicitly and enjoys the position sensitive property of fully convolutional architecture. Instead of using only two independent points, we design a multi-point supervision formulation to encode more clues in order to reduce the impact of inaccurate prediction of specific points. To take the full advantage of the correlation of points in a grid, we propose a two-stage information fusion strategy to fuse feature maps of neighbor grid points. The grid guided localization approach is easy to be extended to different state-of-the-art detection frameworks. Grid R-CNN leads to high quality object localization, and experiments demonstrate that it achieves a 4.1% AP gain at IoU=0.8 and a 10.0% AP gain at IoU=0.9 on COCO benchmark compared to Faster R-CNN with Res50 backbone and FPN architecture.
Tasks Object Detection, Object Localization
Published 2018-11-29
URL http://arxiv.org/abs/1811.12030v1
PDF http://arxiv.org/pdf/1811.12030v1.pdf
PWC https://paperswithcode.com/paper/grid-r-cnn
Repo https://github.com/STVIR/Grid-R-CNN
Framework pytorch

Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder

Title Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder
Authors Jeffrey M. Ede
Abstract We present an atrous convolutional encoder-decoder trained to denoise 512$\times$512 crops from electron micrographs. It consists of a modified Xception backbone, atrous convoltional spatial pyramid pooling module and a multi-stage decoder. Our neural network was trained end-to-end to remove Poisson noise applied to low-dose ($\ll$ 300 counts ppx) micrographs created from a new dataset of 17267 2048$\times$2048 high-dose ($>$ 2500 counts ppx) micrographs and then fine-tuned for ordinary doses (200-2500 counts ppx). Its performance is benchmarked against bilateral, non-local means, total variation, wavelet, Wiener and other restoration methods with their default parameters. Our network outperforms their best mean squared error and structural similarity index performances by 24.6% and 9.6% for low doses and by 43.7% and 5.5% for ordinary doses. In both cases, our network’s mean squared error has the lowest variance. Source code and links to our new high-quality dataset and trained network have been made publicly available at https://github.com/Jeffrey-Ede/Electron-Micrograph-Denoiser
Tasks
Published 2018-07-30
URL http://arxiv.org/abs/1807.11234v2
PDF http://arxiv.org/pdf/1807.11234v2.pdf
PWC https://paperswithcode.com/paper/improving-electron-micrograph-signal-to-noise
Repo https://github.com/Jeffrey-Ede/Electron-Micrograph-Denoiser
Framework tf

Discrimination-aware Channel Pruning for Deep Neural Networks

Title Discrimination-aware Channel Pruning for Deep Neural Networks
Authors Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
Abstract Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0.39% in top-1 accuracy.
Tasks Model Compression
Published 2018-10-28
URL http://arxiv.org/abs/1810.11809v3
PDF http://arxiv.org/pdf/1810.11809v3.pdf
PWC https://paperswithcode.com/paper/discrimination-aware-channel-pruning-for-deep
Repo https://github.com/SCUT-AILab/DCP
Framework pytorch

Domain Adaptation with Adversarial Training and Graph Embeddings

Title Domain Adaptation with Adversarial Training and Graph Embeddings
Authors Firoj Alam, Shafiq Joty, Muhammad Imran
Abstract The success of deep neural networks (DNNs) is heavily dependent on the availability of labeled data. However, obtaining labeled data is a big challenge in many real-world problems. In such scenarios, a DNN model can leverage labeled and unlabeled data from a related domain, but it has to deal with the shift in data distributions between the source and the target domains. In this paper, we study the problem of classifying social media posts during a crisis event (e.g., Earthquake). For that, we use labeled and unlabeled data from past similar events (e.g., Flood) and unlabeled data for the current event. We propose a novel model that performs adversarial learning based domain adaptation to deal with distribution drifts and graph based semi-supervised learning to leverage unlabeled data within a single unified deep learning framework. Our experiments with two real-world crisis datasets collected from Twitter demonstrate significant improvements over several baselines.
Tasks Domain Adaptation
Published 2018-05-14
URL http://arxiv.org/abs/1805.05151v1
PDF http://arxiv.org/pdf/1805.05151v1.pdf
PWC https://paperswithcode.com/paper/domain-adaptation-with-adversarial-training
Repo https://github.com/firojalam/domain-adaptation
Framework tf

Generating Fine-Grained Open Vocabulary Entity Type Descriptions

Title Generating Fine-Grained Open Vocabulary Entity Type Descriptions
Authors Rajarshi Bhowmik, Gerard de Melo
Abstract While large-scale knowledge graphs provide vast amounts of structured facts about entities, a short textual description can often be useful to succinctly characterize an entity and its type. Unfortunately, many knowledge graph entities lack such textual descriptions. In this paper, we introduce a dynamic memory-based network that generates a short open vocabulary description of an entity by jointly leveraging induced fact embeddings as well as the dynamic context of the generated sequence of words. We demonstrate the ability of our architecture to discern relevant information for more accurate generation of type description by pitting the system against several strong baselines.
Tasks Knowledge Graphs
Published 2018-05-27
URL http://arxiv.org/abs/1805.10564v1
PDF http://arxiv.org/pdf/1805.10564v1.pdf
PWC https://paperswithcode.com/paper/generating-fine-grained-open-vocabulary
Repo https://github.com/kingsaint/Open-vocabulary-entity-type-description
Framework pytorch

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

Title Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
Authors Barbara Plank, Željko Agić
Abstract We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data.
Tasks Part-Of-Speech Tagging
Published 2018-08-29
URL http://arxiv.org/abs/1808.09733v1
PDF http://arxiv.org/pdf/1808.09733v1.pdf
PWC https://paperswithcode.com/paper/distant-supervision-from-disparate-sources
Repo https://github.com/bplank/bilstm-aux
Framework none

Knowledge-based Transfer Learning Explanation

Title Knowledge-based Transfer Learning Explanation
Authors Jiaoyan Chen, Freddy Lecue, Jeff Z. Pan, Ian Horrocks, Huajun Chen
Abstract Machine learning explanation can significantly boost machine learning’s application in decision making, but the usability of current methods is limited in human-centric explanation, especially for transfer learning, an important machine learning branch that aims at utilizing knowledge from one learning domain (i.e., a pair of dataset and prediction task) to enhance prediction model training in another learning domain. In this paper, we propose an ontology-based approach for human-centric explanation of transfer learning. Three kinds of knowledge-based explanatory evidence, with different granularities, including general factors, particular narrators and core contexts are first proposed and then inferred with both local ontologies and external knowledge bases. The evaluation with US flight data and DBpedia has presented their confidence and availability in explaining the transferability of feature representation in flight departure delay forecasting.
Tasks Decision Making, Transfer Learning
Published 2018-07-22
URL http://arxiv.org/abs/1807.08372v1
PDF http://arxiv.org/pdf/1807.08372v1.pdf
PWC https://paperswithcode.com/paper/knowledge-based-transfer-learning-explanation
Repo https://github.com/ChenJiaoyan/X-TL
Framework tf

ExpNet: Landmark-Free, Deep, 3D Facial Expressions

Title ExpNet: Landmark-Free, Deep, 3D Facial Expressions
Authors Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni
Abstract We describe a deep learning based method for estimating 3D facial expression coefficients. Unlike previous work, our process does not relay on facial landmark detection methods as a proxy step. Recent methods have shown that a CNN can be trained to regress accurate and discriminative 3D morphable model (3DMM) representations, directly from image intensities. By foregoing facial landmark detection, these methods were able to estimate shapes for occluded faces appearing in unprecedented in-the-wild viewing conditions. We build on those methods by showing that facial expressions can also be estimated by a robust, deep, landmark-free approach. Our ExpNet CNN is applied directly to the intensities of a face image and regresses a 29D vector of 3D expression coefficients. We propose a unique method for collecting data to train this network, leveraging on the robustness of deep networks to training label noise. We further offer a novel means of evaluating the accuracy of estimated expression coefficients: by measuring how well they capture facial emotions on the CK+ and EmotiW-17 emotion recognition benchmarks. We show that our ExpNet produces expression coefficients which better discriminate between facial emotions than those obtained using state of the art, facial landmark detection techniques. Moreover, this advantage grows as image scales drop, demonstrating that our ExpNet is more robust to scale changes than landmark detection methods. Finally, at the same level of accuracy, our ExpNet is orders of magnitude faster than its alternatives.
Tasks 3D Facial Expression Recognition, Emotion Recognition, Facial Landmark Detection
Published 2018-02-02
URL http://arxiv.org/abs/1802.00542v1
PDF http://arxiv.org/pdf/1802.00542v1.pdf
PWC https://paperswithcode.com/paper/expnet-landmark-free-deep-3d-facial
Repo https://github.com/fengju514/Expression-Net
Framework tf

Gradient Harmonized Single-stage Detector

Title Gradient Harmonized Single-stage Detector
Authors Buyu Li, Yu Liu, Xiaogang Wang
Abstract Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i.e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples. In this work, we first point out that the essential effect of the two disharmonies can be summarized in term of the gradient. Further, we propose a novel gradient harmonizing mechanism (GHM) to be a hedging for the disharmonies. The philosophy behind GHM can be easily embedded into both classification loss function like cross-entropy (CE) and regression loss function like smooth-$L_1$ ($SL_1$) loss. To this end, two novel loss functions called GHM-C and GHM-R are designed to balancing the gradient flow for anchor classification and bounding box refinement, respectively. Ablation study on MS COCO demonstrates that without laborious hyper-parameter tuning, both GHM-C and GHM-R can bring substantial improvement for single-stage detector. Without any whistles and bells, our model achieves 41.6 mAP on COCO test-dev set which surpasses the state-of-the-art method, Focal Loss (FL) + $SL_1$, by 0.8.
Tasks Object Detection
Published 2018-11-13
URL http://arxiv.org/abs/1811.05181v1
PDF http://arxiv.org/pdf/1811.05181v1.pdf
PWC https://paperswithcode.com/paper/gradient-harmonized-single-stage-detector
Repo https://github.com/xialuxi/GHMLoss-caffe
Framework none

OBOE: Collaborative Filtering for AutoML Model Selection

Title OBOE: Collaborative Filtering for AutoML Model Selection
Authors Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell
Abstract Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This paper introduces OBOE, a collaborative filtering method for time-constrained model selection and hyperparameter tuning. OBOE forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. OBOE can find good models under constraints on the number of models fit or the total time budget. To this end, this paper develops a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems. Moreover, the success of the bilinear model used by OBOE suggests that AutoML may be simpler than was previously understood.
Tasks Active Learning, AutoML, Matrix Completion, Model Selection
Published 2018-08-09
URL https://arxiv.org/abs/1808.03233v2
PDF https://arxiv.org/pdf/1808.03233v2.pdf
PWC https://paperswithcode.com/paper/oboe-collaborative-filtering-for-automl
Repo https://github.com/udellgroup/oboe
Framework none
comments powered by Disqus