Paper Group AWR 330
NCRF++: An Open-source Neural Sequence Labeling Toolkit. PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image. Building a Conversational Agent Overnight with Dialogue Self-Play. Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation. Scan2CAD: Learning CAD Model Alignment in RGB-D Sc …
NCRF++: An Open-source Neural Sequence Labeling Toolkit
Title | NCRF++: An Open-source Neural Sequence Labeling Toolkit |
Authors | Jie Yang, Yue Zhang |
Abstract | This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most state-of-the-art neural sequence labeling models such as LSTM-CRF, facilitating reproducing and refinement on those methods. |
Tasks | Chunking, Named Entity Recognition, Part-Of-Speech Tagging |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05626v2 |
http://arxiv.org/pdf/1806.05626v2.pdf | |
PWC | https://paperswithcode.com/paper/ncrf-an-open-source-neural-sequence-labeling |
Repo | https://github.com/jiesutd/NCRFpp |
Framework | pytorch |
PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image
Title | PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image |
Authors | Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, Jan Kautz |
Abstract | This paper proposes a deep neural architecture, PlaneRCNN, that detects and reconstructs piecewise planar surfaces from a single RGB image. PlaneRCNN employs a variant of Mask R-CNN to detect planes with their plane parameters and segmentation masks. PlaneRCNN then jointly refines all the segmentation masks with a novel loss enforcing the consistency with a nearby view during training. The paper also presents a new benchmark with more fine-grained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics. PlaneRCNN makes an important step towards robust plane extraction, which would have an immediate impact on a wide range of applications including Robotics, Augmented Reality, and Virtual Reality. |
Tasks | 3D Plane Detection, 3D Reconstruction |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.04072v2 |
http://arxiv.org/pdf/1812.04072v2.pdf | |
PWC | https://paperswithcode.com/paper/planercnn-3d-plane-detection-and |
Repo | https://github.com/NVlabs/planercnn |
Framework | pytorch |
Building a Conversational Agent Overnight with Dialogue Self-Play
Title | Building a Conversational Agent Overnight with Dialogue Self-Play |
Authors | Pararth Shah, Dilek Hakkani-Tür, Gokhan Tür, Abhinav Rastogi, Ankur Bapna, Neha Nayak, Larry Heck |
Abstract | We propose Machines Talking To Machines (M2M), a framework combining automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents for goal-oriented dialogues in arbitrary domains. M2M scales to new tasks with just a task schema and an API client from the dialogue system developer, but it is also customizable to cater to task-specific interactions. Compared to the Wizard-of-Oz approach for data collection, M2M achieves greater diversity and coverage of salient dialogue flows while maintaining the naturalness of individual utterances. In the first phase, a simulated user bot and a domain-agnostic system bot converse to exhaustively generate dialogue “outlines”, i.e. sequences of template utterances and their semantic parses. In the second phase, crowd workers provide contextual rewrites of the dialogues to make the utterances more natural while preserving their meaning. The entire process can finish within a few hours. We propose a new corpus of 3,000 dialogues spanning 2 domains collected with M2M, and present comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows. |
Tasks | |
Published | 2018-01-15 |
URL | http://arxiv.org/abs/1801.04871v1 |
http://arxiv.org/pdf/1801.04871v1.pdf | |
PWC | https://paperswithcode.com/paper/building-a-conversational-agent-overnight |
Repo | https://github.com/marcomanciniunitn/Master-Thesis-Project |
Framework | none |
Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation
Title | Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation |
Authors | Tanya Nair, Doina Precup, Douglas L. Arnold, Tal Arbel |
Abstract | Deep learning (DL) networks have recently been shown to outperform other segmentation methods on various public, medical-image challenge datasets [3,11,16], especially for large pathologies. However, in the context of diseases such as Multiple Sclerosis (MS), monitoring all the focal lesions visible on MRI sequences, even very small ones, is essential for disease staging, prognosis, and evaluating treatment efficacy. Moreover, producing deterministic outputs hinders DL adoption into clinical routines. Uncertainty estimates for the predictions would permit subsequent revision by clinicians. We present the first exploration of multiple uncertainty estimates based on Monte Carlo (MC) dropout [4] in the context of deep networks for lesion detection and segmentation in medical images. Specifically, we develop a 3D MS lesion segmentation CNN, augmented to provide four different voxel-based uncertainty measures based on MC dropout. We train the network on a proprietary, large-scale, multi-site, multi-scanner, clinical MS dataset, and compute lesion-wise uncertainties by accumulating evidence from voxel-wise uncertainties within detected lesions. We analyze the performance of voxel-based segmentation and lesion-level detection by choosing operating points based on the uncertainty. Empirical evidence suggests that uncertainty measures consistently allow us to choose superior operating points compared only using the network’s sigmoid output as a probability. |
Tasks | Lesion Segmentation |
Published | 2018-08-03 |
URL | http://arxiv.org/abs/1808.01200v2 |
http://arxiv.org/pdf/1808.01200v2.pdf | |
PWC | https://paperswithcode.com/paper/exploring-uncertainty-measures-in-deep |
Repo | https://github.com/tanyanair/segmentation_uncertainty |
Framework | tf |
Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
Title | Scan2CAD: Learning CAD Model Alignment in RGB-D Scans |
Authors | Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, Matthias Nießner |
Abstract | We present Scan2CAD, a novel data-driven method that learns to align clean 3D CAD models from a shape database to the noisy and incomplete geometry of a commodity RGB-D scan. For a 3D reconstruction of an indoor scene, our method takes as input a set of CAD models, and predicts a 9DoF pose that aligns each model to the underlying scan geometry. To tackle this problem, we create a new scan-to-CAD alignment dataset based on 1506 ScanNet scans with 97607 annotated keypoint pairs between 14225 CAD models from ShapeNet and their counterpart objects in the scans. Our method selects a set of representative keypoints in a 3D scan for which we find correspondences to the CAD geometry. To this end, we design a novel 3D CNN architecture that learns a joint embedding between real and synthetic objects, and from this predicts a correspondence heatmap. Based on these correspondence heatmaps, we formulate a variational energy minimization that aligns a given set of CAD models to the reconstruction. We evaluate our approach on our newly introduced Scan2CAD benchmark where we outperform both handcrafted feature descriptor as well as state-of-the-art CNN based methods by 21.39%. |
Tasks | 3D Reconstruction |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11187v1 |
http://arxiv.org/pdf/1811.11187v1.pdf | |
PWC | https://paperswithcode.com/paper/scan2cad-learning-cad-model-alignment-in-rgb |
Repo | https://github.com/skanti/Scan2CAD |
Framework | pytorch |
Grid R-CNN
Title | Grid R-CNN |
Authors | Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, Junjie Yan |
Abstract | This paper proposes a novel object detection framework named Grid R-CNN, which adopts a grid guided localization mechanism for accurate object detection. Different from the traditional regression based methods, the Grid R-CNN captures the spatial information explicitly and enjoys the position sensitive property of fully convolutional architecture. Instead of using only two independent points, we design a multi-point supervision formulation to encode more clues in order to reduce the impact of inaccurate prediction of specific points. To take the full advantage of the correlation of points in a grid, we propose a two-stage information fusion strategy to fuse feature maps of neighbor grid points. The grid guided localization approach is easy to be extended to different state-of-the-art detection frameworks. Grid R-CNN leads to high quality object localization, and experiments demonstrate that it achieves a 4.1% AP gain at IoU=0.8 and a 10.0% AP gain at IoU=0.9 on COCO benchmark compared to Faster R-CNN with Res50 backbone and FPN architecture. |
Tasks | Object Detection, Object Localization |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12030v1 |
http://arxiv.org/pdf/1811.12030v1.pdf | |
PWC | https://paperswithcode.com/paper/grid-r-cnn |
Repo | https://github.com/STVIR/Grid-R-CNN |
Framework | pytorch |
Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder
Title | Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder |
Authors | Jeffrey M. Ede |
Abstract | We present an atrous convolutional encoder-decoder trained to denoise 512$\times$512 crops from electron micrographs. It consists of a modified Xception backbone, atrous convoltional spatial pyramid pooling module and a multi-stage decoder. Our neural network was trained end-to-end to remove Poisson noise applied to low-dose ($\ll$ 300 counts ppx) micrographs created from a new dataset of 17267 2048$\times$2048 high-dose ($>$ 2500 counts ppx) micrographs and then fine-tuned for ordinary doses (200-2500 counts ppx). Its performance is benchmarked against bilateral, non-local means, total variation, wavelet, Wiener and other restoration methods with their default parameters. Our network outperforms their best mean squared error and structural similarity index performances by 24.6% and 9.6% for low doses and by 43.7% and 5.5% for ordinary doses. In both cases, our network’s mean squared error has the lowest variance. Source code and links to our new high-quality dataset and trained network have been made publicly available at https://github.com/Jeffrey-Ede/Electron-Micrograph-Denoiser |
Tasks | |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11234v2 |
http://arxiv.org/pdf/1807.11234v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-electron-micrograph-signal-to-noise |
Repo | https://github.com/Jeffrey-Ede/Electron-Micrograph-Denoiser |
Framework | tf |
Discrimination-aware Channel Pruning for Deep Neural Networks
Title | Discrimination-aware Channel Pruning for Deep Neural Networks |
Authors | Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu |
Abstract | Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0.39% in top-1 accuracy. |
Tasks | Model Compression |
Published | 2018-10-28 |
URL | http://arxiv.org/abs/1810.11809v3 |
http://arxiv.org/pdf/1810.11809v3.pdf | |
PWC | https://paperswithcode.com/paper/discrimination-aware-channel-pruning-for-deep |
Repo | https://github.com/SCUT-AILab/DCP |
Framework | pytorch |
Domain Adaptation with Adversarial Training and Graph Embeddings
Title | Domain Adaptation with Adversarial Training and Graph Embeddings |
Authors | Firoj Alam, Shafiq Joty, Muhammad Imran |
Abstract | The success of deep neural networks (DNNs) is heavily dependent on the availability of labeled data. However, obtaining labeled data is a big challenge in many real-world problems. In such scenarios, a DNN model can leverage labeled and unlabeled data from a related domain, but it has to deal with the shift in data distributions between the source and the target domains. In this paper, we study the problem of classifying social media posts during a crisis event (e.g., Earthquake). For that, we use labeled and unlabeled data from past similar events (e.g., Flood) and unlabeled data for the current event. We propose a novel model that performs adversarial learning based domain adaptation to deal with distribution drifts and graph based semi-supervised learning to leverage unlabeled data within a single unified deep learning framework. Our experiments with two real-world crisis datasets collected from Twitter demonstrate significant improvements over several baselines. |
Tasks | Domain Adaptation |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05151v1 |
http://arxiv.org/pdf/1805.05151v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-adaptation-with-adversarial-training |
Repo | https://github.com/firojalam/domain-adaptation |
Framework | tf |
Generating Fine-Grained Open Vocabulary Entity Type Descriptions
Title | Generating Fine-Grained Open Vocabulary Entity Type Descriptions |
Authors | Rajarshi Bhowmik, Gerard de Melo |
Abstract | While large-scale knowledge graphs provide vast amounts of structured facts about entities, a short textual description can often be useful to succinctly characterize an entity and its type. Unfortunately, many knowledge graph entities lack such textual descriptions. In this paper, we introduce a dynamic memory-based network that generates a short open vocabulary description of an entity by jointly leveraging induced fact embeddings as well as the dynamic context of the generated sequence of words. We demonstrate the ability of our architecture to discern relevant information for more accurate generation of type description by pitting the system against several strong baselines. |
Tasks | Knowledge Graphs |
Published | 2018-05-27 |
URL | http://arxiv.org/abs/1805.10564v1 |
http://arxiv.org/pdf/1805.10564v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-fine-grained-open-vocabulary |
Repo | https://github.com/kingsaint/Open-vocabulary-entity-type-description |
Framework | pytorch |
Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging
Title | Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging |
Authors | Barbara Plank, Željko Agić |
Abstract | We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data. |
Tasks | Part-Of-Speech Tagging |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.09733v1 |
http://arxiv.org/pdf/1808.09733v1.pdf | |
PWC | https://paperswithcode.com/paper/distant-supervision-from-disparate-sources |
Repo | https://github.com/bplank/bilstm-aux |
Framework | none |
Knowledge-based Transfer Learning Explanation
Title | Knowledge-based Transfer Learning Explanation |
Authors | Jiaoyan Chen, Freddy Lecue, Jeff Z. Pan, Ian Horrocks, Huajun Chen |
Abstract | Machine learning explanation can significantly boost machine learning’s application in decision making, but the usability of current methods is limited in human-centric explanation, especially for transfer learning, an important machine learning branch that aims at utilizing knowledge from one learning domain (i.e., a pair of dataset and prediction task) to enhance prediction model training in another learning domain. In this paper, we propose an ontology-based approach for human-centric explanation of transfer learning. Three kinds of knowledge-based explanatory evidence, with different granularities, including general factors, particular narrators and core contexts are first proposed and then inferred with both local ontologies and external knowledge bases. The evaluation with US flight data and DBpedia has presented their confidence and availability in explaining the transferability of feature representation in flight departure delay forecasting. |
Tasks | Decision Making, Transfer Learning |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08372v1 |
http://arxiv.org/pdf/1807.08372v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-based-transfer-learning-explanation |
Repo | https://github.com/ChenJiaoyan/X-TL |
Framework | tf |
ExpNet: Landmark-Free, Deep, 3D Facial Expressions
Title | ExpNet: Landmark-Free, Deep, 3D Facial Expressions |
Authors | Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni |
Abstract | We describe a deep learning based method for estimating 3D facial expression coefficients. Unlike previous work, our process does not relay on facial landmark detection methods as a proxy step. Recent methods have shown that a CNN can be trained to regress accurate and discriminative 3D morphable model (3DMM) representations, directly from image intensities. By foregoing facial landmark detection, these methods were able to estimate shapes for occluded faces appearing in unprecedented in-the-wild viewing conditions. We build on those methods by showing that facial expressions can also be estimated by a robust, deep, landmark-free approach. Our ExpNet CNN is applied directly to the intensities of a face image and regresses a 29D vector of 3D expression coefficients. We propose a unique method for collecting data to train this network, leveraging on the robustness of deep networks to training label noise. We further offer a novel means of evaluating the accuracy of estimated expression coefficients: by measuring how well they capture facial emotions on the CK+ and EmotiW-17 emotion recognition benchmarks. We show that our ExpNet produces expression coefficients which better discriminate between facial emotions than those obtained using state of the art, facial landmark detection techniques. Moreover, this advantage grows as image scales drop, demonstrating that our ExpNet is more robust to scale changes than landmark detection methods. Finally, at the same level of accuracy, our ExpNet is orders of magnitude faster than its alternatives. |
Tasks | 3D Facial Expression Recognition, Emotion Recognition, Facial Landmark Detection |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00542v1 |
http://arxiv.org/pdf/1802.00542v1.pdf | |
PWC | https://paperswithcode.com/paper/expnet-landmark-free-deep-3d-facial |
Repo | https://github.com/fengju514/Expression-Net |
Framework | tf |
Gradient Harmonized Single-stage Detector
Title | Gradient Harmonized Single-stage Detector |
Authors | Buyu Li, Yu Liu, Xiaogang Wang |
Abstract | Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i.e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples. In this work, we first point out that the essential effect of the two disharmonies can be summarized in term of the gradient. Further, we propose a novel gradient harmonizing mechanism (GHM) to be a hedging for the disharmonies. The philosophy behind GHM can be easily embedded into both classification loss function like cross-entropy (CE) and regression loss function like smooth-$L_1$ ($SL_1$) loss. To this end, two novel loss functions called GHM-C and GHM-R are designed to balancing the gradient flow for anchor classification and bounding box refinement, respectively. Ablation study on MS COCO demonstrates that without laborious hyper-parameter tuning, both GHM-C and GHM-R can bring substantial improvement for single-stage detector. Without any whistles and bells, our model achieves 41.6 mAP on COCO test-dev set which surpasses the state-of-the-art method, Focal Loss (FL) + $SL_1$, by 0.8. |
Tasks | Object Detection |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05181v1 |
http://arxiv.org/pdf/1811.05181v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-harmonized-single-stage-detector |
Repo | https://github.com/xialuxi/GHMLoss-caffe |
Framework | none |
OBOE: Collaborative Filtering for AutoML Model Selection
Title | OBOE: Collaborative Filtering for AutoML Model Selection |
Authors | Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell |
Abstract | Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This paper introduces OBOE, a collaborative filtering method for time-constrained model selection and hyperparameter tuning. OBOE forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. OBOE can find good models under constraints on the number of models fit or the total time budget. To this end, this paper develops a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems. Moreover, the success of the bilinear model used by OBOE suggests that AutoML may be simpler than was previously understood. |
Tasks | Active Learning, AutoML, Matrix Completion, Model Selection |
Published | 2018-08-09 |
URL | https://arxiv.org/abs/1808.03233v2 |
https://arxiv.org/pdf/1808.03233v2.pdf | |
PWC | https://paperswithcode.com/paper/oboe-collaborative-filtering-for-automl |
Repo | https://github.com/udellgroup/oboe |
Framework | none |