October 20, 2019

3015 words 15 mins read

Paper Group AWR 330

NCRF++: An Open-source Neural Sequence Labeling Toolkit. PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image. Building a Conversational Agent Overnight with Dialogue Self-Play. Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation. Scan2CAD: Learning CAD Model Alignment in RGB-D Sc …

NCRF++: An Open-source Neural Sequence Labeling Toolkit


Title	NCRF++: An Open-source Neural Sequence Labeling Toolkit
Authors	Jie Yang, Yue Zhang
Abstract	This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most state-of-the-art neural sequence labeling models such as LSTM-CRF, facilitating reproducing and refinement on those methods.
Tasks	Chunking, Named Entity Recognition, Part-Of-Speech Tagging
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05626v2
PDF	http://arxiv.org/pdf/1806.05626v2.pdf
PWC	https://paperswithcode.com/paper/ncrf-an-open-source-neural-sequence-labeling
Repo	https://github.com/jiesutd/NCRFpp
Framework	pytorch

PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image


Title	PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image
Authors	Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, Jan Kautz
Abstract	This paper proposes a deep neural architecture, PlaneRCNN, that detects and reconstructs piecewise planar surfaces from a single RGB image. PlaneRCNN employs a variant of Mask R-CNN to detect planes with their plane parameters and segmentation masks. PlaneRCNN then jointly refines all the segmentation masks with a novel loss enforcing the consistency with a nearby view during training. The paper also presents a new benchmark with more fine-grained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics. PlaneRCNN makes an important step towards robust plane extraction, which would have an immediate impact on a wide range of applications including Robotics, Augmented Reality, and Virtual Reality.
Tasks	3D Plane Detection, 3D Reconstruction
Published	2018-12-10
URL	http://arxiv.org/abs/1812.04072v2
PDF	http://arxiv.org/pdf/1812.04072v2.pdf
PWC	https://paperswithcode.com/paper/planercnn-3d-plane-detection-and
Repo	https://github.com/NVlabs/planercnn
Framework	pytorch

Building a Conversational Agent Overnight with Dialogue Self-Play


Title	Building a Conversational Agent Overnight with Dialogue Self-Play
Authors	Pararth Shah, Dilek Hakkani-Tür, Gokhan Tür, Abhinav Rastogi, Ankur Bapna, Neha Nayak, Larry Heck
Abstract	We propose Machines Talking To Machines (M2M), a framework combining automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents for goal-oriented dialogues in arbitrary domains. M2M scales to new tasks with just a task schema and an API client from the dialogue system developer, but it is also customizable to cater to task-specific interactions. Compared to the Wizard-of-Oz approach for data collection, M2M achieves greater diversity and coverage of salient dialogue flows while maintaining the naturalness of individual utterances. In the first phase, a simulated user bot and a domain-agnostic system bot converse to exhaustively generate dialogue “outlines”, i.e. sequences of template utterances and their semantic parses. In the second phase, crowd workers provide contextual rewrites of the dialogues to make the utterances more natural while preserving their meaning. The entire process can finish within a few hours. We propose a new corpus of 3,000 dialogues spanning 2 domains collected with M2M, and present comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows.
Tasks
Published	2018-01-15
URL	http://arxiv.org/abs/1801.04871v1
PDF	http://arxiv.org/pdf/1801.04871v1.pdf
PWC	https://paperswithcode.com/paper/building-a-conversational-agent-overnight
Repo	https://github.com/marcomanciniunitn/Master-Thesis-Project
Framework	none

Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation


Title	Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation
Authors	Tanya Nair, Doina Precup, Douglas L. Arnold, Tal Arbel
Abstract	Deep learning (DL) networks have recently been shown to outperform other segmentation methods on various public, medical-image challenge datasets [3,11,16], especially for large pathologies. However, in the context of diseases such as Multiple Sclerosis (MS), monitoring all the focal lesions visible on MRI sequences, even very small ones, is essential for disease staging, prognosis, and evaluating treatment efficacy. Moreover, producing deterministic outputs hinders DL adoption into clinical routines. Uncertainty estimates for the predictions would permit subsequent revision by clinicians. We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images. Specifically, we develop a 3D MS lesion segmentation CNN, augmented to provide four different voxel-based uncertainty measures based on MC dropout. We train the network on a proprietary, large-scale, multi-site, multi-scanner, clinical MS dataset, and compute lesion-wise uncertainties by accumulating evidence from voxel-wise uncertainties within detected lesions. We analyze the performance of voxel-based segmentation and lesion-level detection by choosing operating points based on the uncertainty. Empirical evidence suggests that uncertainty measures consistently allow us to choose superior operating points compared only using the network’s sigmoid output as a probability.
Tasks	Lesion Segmentation
Published	2018-08-03
URL	http://arxiv.org/abs/1808.01200v2
PDF	http://arxiv.org/pdf/1808.01200v2.pdf
PWC	https://paperswithcode.com/paper/exploring-uncertainty-measures-in-deep
Repo	https://github.com/tanyanair/segmentation_uncertainty
Framework	tf

Scan2CAD: Learning CAD Model Alignment in RGB-D Scans


Title	Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
Authors	Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, Matthias Nießner
Abstract	We present Scan2CAD, a novel data-driven method that learns to align clean 3D CAD models from a shape database to the noisy and incomplete geometry of a commodity RGB-D scan. For a 3D reconstruction of an indoor scene, our method takes as input a set of CAD models, and predicts a 9DoF pose that aligns each model to the underlying scan geometry. To tackle this problem, we create a new scan-to-CAD alignment dataset based on 1506 ScanNet scans with 97607 annotated keypoint pairs between 14225 CAD models from ShapeNet and their counterpart objects in the scans. Our method selects a set of representative keypoints in a 3D scan for which we find correspondences to the CAD geometry. To this end, we design a novel 3D CNN architecture that learns a joint embedding between real and synthetic objects, and from this predicts a correspondence heatmap. Based on these correspondence heatmaps, we formulate a variational energy minimization that aligns a given set of CAD models to the reconstruction. We evaluate our approach on our newly introduced Scan2CAD benchmark where we outperform both handcrafted feature descriptor as well as state-of-the-art CNN based methods by 21.39%.
Tasks	3D Reconstruction
Published	2018-11-27
URL	http://arxiv.org/abs/1811.11187v1
PDF	http://arxiv.org/pdf/1811.11187v1.pdf
PWC	https://paperswithcode.com/paper/scan2cad-learning-cad-model-alignment-in-rgb
Repo	https://github.com/skanti/Scan2CAD
Framework	pytorch

Grid R-CNN


Title	Grid R-CNN
Authors	Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, Junjie Yan
Abstract	This paper proposes a novel object detection framework named Grid R-CNN, which adopts a grid guided localization mechanism for accurate object detection. Different from the traditional regression based methods, the Grid R-CNN captures the spatial information explicitly and enjoys the position sensitive property of fully convolutional architecture. Instead of using only two independent points, we design a multi-point supervision formulation to encode more clues in order to reduce the impact of inaccurate prediction of specific points. To take the full advantage of the correlation of points in a grid, we propose a two-stage information fusion strategy to fuse feature maps of neighbor grid points. The grid guided localization approach is easy to be extended to different state-of-the-art detection frameworks. Grid R-CNN leads to high quality object localization, and experiments demonstrate that it achieves a 4.1% AP gain at IoU=0.8 and a 10.0% AP gain at IoU=0.9 on COCO benchmark compared to Faster R-CNN with Res50 backbone and FPN architecture.
Tasks	Object Detection, Object Localization
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12030v1
PDF	http://arxiv.org/pdf/1811.12030v1.pdf
PWC	https://paperswithcode.com/paper/grid-r-cnn
Repo	https://github.com/STVIR/Grid-R-CNN
Framework	pytorch

Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder


Title	Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder
Authors	Jeffrey M. Ede
Abstract	We present an atrous convolutional encoder-decoder trained to denoise 512$\times$512 crops from electron micrographs. It consists of a modified Xception backbone, atrous convoltional spatial pyramid pooling module and a multi-stage decoder. Our neural network was trained end-to-end to remove Poisson noise applied to low-dose ($\ll$ 300 counts ppx) micrographs created from a new dataset of 17267 2048$\times$2048 high-dose ($>$ 2500 counts ppx) micrographs and then fine-tuned for ordinary doses (200-2500 counts ppx). Its performance is benchmarked against bilateral, non-local means, total variation, wavelet, Wiener and other restoration methods with their default parameters. Our network outperforms their best mean squared error and structural similarity index performances by 24.6% and 9.6% for low doses and by 43.7% and 5.5% for ordinary doses. In both cases, our network’s mean squared error has the lowest variance. Source code and links to our new high-quality dataset and trained network have been made publicly available at https://github.com/Jeffrey-Ede/Electron-Micrograph-Denoiser
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11234v2
PDF	http://arxiv.org/pdf/1807.11234v2.pdf
PWC	https://paperswithcode.com/paper/improving-electron-micrograph-signal-to-noise
Repo	https://github.com/Jeffrey-Ede/Electron-Micrograph-Denoiser
Framework	tf

Discrimination-aware Channel Pruning for Deep Neural Networks


Title	Discrimination-aware Channel Pruning for Deep Neural Networks
Authors	Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
Abstract	Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0.39% in top-1 accuracy.
Tasks	Model Compression
Published	2018-10-28
URL	http://arxiv.org/abs/1810.11809v3
PDF	http://arxiv.org/pdf/1810.11809v3.pdf
PWC	https://paperswithcode.com/paper/discrimination-aware-channel-pruning-for-deep
Repo	https://github.com/SCUT-AILab/DCP
Framework	pytorch

Domain Adaptation with Adversarial Training and Graph Embeddings


Title	Domain Adaptation with Adversarial Training and Graph Embeddings
Authors	Firoj Alam, Shafiq Joty, Muhammad Imran
Abstract	The success of deep neural networks (DNNs) is heavily dependent on the availability of labeled data. However, obtaining labeled data is a big challenge in many real-world problems. In such scenarios, a DNN model can leverage labeled and unlabeled data from a related domain, but it has to deal with the shift in data distributions between the source and the target domains. In this paper, we study the problem of classifying social media posts during a crisis event (e.g., Earthquake). For that, we use labeled and unlabeled data from past similar events (e.g., Flood) and unlabeled data for the current event. We propose a novel model that performs adversarial learning based domain adaptation to deal with distribution drifts and graph based semi-supervised learning to leverage unlabeled data within a single unified deep learning framework. Our experiments with two real-world crisis datasets collected from Twitter demonstrate significant improvements over several baselines.
Tasks	Domain Adaptation
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05151v1
PDF	http://arxiv.org/pdf/1805.05151v1.pdf
PWC	https://paperswithcode.com/paper/domain-adaptation-with-adversarial-training
Repo	https://github.com/firojalam/domain-adaptation
Framework	tf

Generating Fine-Grained Open Vocabulary Entity Type Descriptions


Title	Generating Fine-Grained Open Vocabulary Entity Type Descriptions
Authors	Rajarshi Bhowmik, Gerard de Melo
Abstract	While large-scale knowledge graphs provide vast amounts of structured facts about entities, a short textual description can often be useful to succinctly characterize an entity and its type. Unfortunately, many knowledge graph entities lack such textual descriptions. In this paper, we introduce a dynamic memory-based network that generates a short open vocabulary description of an entity by jointly leveraging induced fact embeddings as well as the dynamic context of the generated sequence of words. We demonstrate the ability of our architecture to discern relevant information for more accurate generation of type description by pitting the system against several strong baselines.
Tasks	Knowledge Graphs
Published	2018-05-27
URL	http://arxiv.org/abs/1805.10564v1
PDF	http://arxiv.org/pdf/1805.10564v1.pdf
PWC	https://paperswithcode.com/paper/generating-fine-grained-open-vocabulary
Repo	https://github.com/kingsaint/Open-vocabulary-entity-type-description
Framework	pytorch

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging


Title	Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
Authors	Barbara Plank, Željko Agić
Abstract	We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data.
Tasks	Part-Of-Speech Tagging
Published	2018-08-29
URL	http://arxiv.org/abs/1808.09733v1
PDF	http://arxiv.org/pdf/1808.09733v1.pdf
PWC	https://paperswithcode.com/paper/distant-supervision-from-disparate-sources
Repo	https://github.com/bplank/bilstm-aux
Framework	none

Knowledge-based Transfer Learning Explanation


Title	Knowledge-based Transfer Learning Explanation
Authors	Jiaoyan Chen, Freddy Lecue, Jeff Z. Pan, Ian Horrocks, Huajun Chen
Abstract	Machine learning explanation can significantly boost machine learning’s application in decision making, but the usability of current methods is limited in human-centric explanation, especially for transfer learning, an important machine learning branch that aims at utilizing knowledge from one learning domain (i.e., a pair of dataset and prediction task) to enhance prediction model training in another learning domain. In this paper, we propose an ontology-based approach for human-centric explanation of transfer learning. Three kinds of knowledge-based explanatory evidence, with different granularities, including general factors, particular narrators and core contexts are first proposed and then inferred with both local ontologies and external knowledge bases. The evaluation with US flight data and DBpedia has presented their confidence and availability in explaining the transferability of feature representation in flight departure delay forecasting.
Tasks	Decision Making, Transfer Learning
Published	2018-07-22
URL	http://arxiv.org/abs/1807.08372v1
PDF	http://arxiv.org/pdf/1807.08372v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-based-transfer-learning-explanation
Repo	https://github.com/ChenJiaoyan/X-TL
Framework	tf

ExpNet: Landmark-Free, Deep, 3D Facial Expressions


Title	ExpNet: Landmark-Free, Deep, 3D Facial Expressions
Authors	Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni
Abstract	We describe a deep learning based method for estimating 3D facial expression coefficients. Unlike previous work, our process does not relay on facial landmark detection methods as a proxy step. Recent methods have shown that a CNN can be trained to regress accurate and discriminative 3D morphable model (3DMM) representations, directly from image intensities. By foregoing facial landmark detection, these methods were able to estimate shapes for occluded faces appearing in unprecedented in-the-wild viewing conditions. We build on those methods by showing that facial expressions can also be estimated by a robust, deep, landmark-free approach. Our ExpNet CNN is applied directly to the intensities of a face image and regresses a 29D vector of 3D expression coefficients. We propose a unique method for collecting data to train this network, leveraging on the robustness of deep networks to training label noise. We further offer a novel means of evaluating the accuracy of estimated expression coefficients: by measuring how well they capture facial emotions on the CK+ and EmotiW-17 emotion recognition benchmarks. We show that our ExpNet produces expression coefficients which better discriminate between facial emotions than those obtained using state of the art, facial landmark detection techniques. Moreover, this advantage grows as image scales drop, demonstrating that our ExpNet is more robust to scale changes than landmark detection methods. Finally, at the same level of accuracy, our ExpNet is orders of magnitude faster than its alternatives.
Tasks	3D Facial Expression Recognition, Emotion Recognition, Facial Landmark Detection
Published	2018-02-02
URL	http://arxiv.org/abs/1802.00542v1
PDF	http://arxiv.org/pdf/1802.00542v1.pdf
PWC	https://paperswithcode.com/paper/expnet-landmark-free-deep-3d-facial
Repo	https://github.com/fengju514/Expression-Net
Framework	tf

Gradient Harmonized Single-stage Detector


Title	Gradient Harmonized Single-stage Detector
Authors	Buyu Li, Yu Liu, Xiaogang Wang
Abstract	Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i.e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples. In this work, we first point out that the essential effect of the two disharmonies can be summarized in term of the gradient. Further, we propose a novel gradient harmonizing mechanism (GHM) to be a hedging for the disharmonies. The philosophy behind GHM can be easily embedded into both classification loss function like cross-entropy (CE) and regression loss function like smooth-$L_1$ ($SL_1$) loss. To this end, two novel loss functions called GHM-C and GHM-R are designed to balancing the gradient flow for anchor classification and bounding box refinement, respectively. Ablation study on MS COCO demonstrates that without laborious hyper-parameter tuning, both GHM-C and GHM-R can bring substantial improvement for single-stage detector. Without any whistles and bells, our model achieves 41.6 mAP on COCO test-dev set which surpasses the state-of-the-art method, Focal Loss (FL) + $SL_1$, by 0.8.
Tasks	Object Detection
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05181v1
PDF	http://arxiv.org/pdf/1811.05181v1.pdf
PWC	https://paperswithcode.com/paper/gradient-harmonized-single-stage-detector
Repo	https://github.com/xialuxi/GHMLoss-caffe
Framework	none

OBOE: Collaborative Filtering for AutoML Model Selection


Title	OBOE: Collaborative Filtering for AutoML Model Selection
Authors	Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell
Abstract	Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This paper introduces OBOE, a collaborative filtering method for time-constrained model selection and hyperparameter tuning. OBOE forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. OBOE can find good models under constraints on the number of models fit or the total time budget. To this end, this paper develops a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems. Moreover, the success of the bilinear model used by OBOE suggests that AutoML may be simpler than was previously understood.
Tasks	Active Learning, AutoML, Matrix Completion, Model Selection
Published	2018-08-09
URL	https://arxiv.org/abs/1808.03233v2
PDF	https://arxiv.org/pdf/1808.03233v2.pdf
PWC	https://paperswithcode.com/paper/oboe-collaborative-filtering-for-automl
Repo	https://github.com/udellgroup/oboe
Framework	none