Paper Group NANR 182
3D Object Detection With Latent Support Surfaces. Neural Models for Key Phrase Extraction and Question Generation. Learning Semantic Representations for Unsupervised Domain Adaptation. Multi-Level Fusion Based 3D Object Detection From Monocular Images. Multiplicative Tree-Structured Long Short-Term Memory Networks for Semantic Representations. Docu …
3D Object Detection With Latent Support Surfaces
Title | 3D Object Detection With Latent Support Surfaces |
Authors | Zhile Ren, Erik B. Sudderth |
Abstract | We develop a 3D object detection algorithm that uses latent support surfaces to capture contextual relationships in indoor scenes. Existing 3D representations for RGB-D images capture the local shape and appearance of object categories, but have limited power to represent objects with different visual styles. The detection of small objects is also challenging because the search space is very large in 3D scenes. However, we observe that much of the shape variation within 3D object categories can be explained by the location of a latent support surface, and smaller objects are often supported by larger objects. Therefore, we explicitly use latent support surfaces to better represent the 3D appearance of large objects, and provide contextual cues to improve the detection of small objects. We evaluate our model with 19 object categories from the SUN RGB-D database, and demonstrate state-of-the-art performance. |
Tasks | 3D Object Detection, Object Detection |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Ren_3D_Object_Detection_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Ren_3D_Object_Detection_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/3d-object-detection-with-latent-support |
Repo | |
Framework | |
Neural Models for Key Phrase Extraction and Question Generation
Title | Neural Models for Key Phrase Extraction and Question Generation |
Authors | S Subramanian, eep, Tong Wang, Xingdi Yuan, Saizheng Zhang, Adam Trischler, Yoshua Bengio |
Abstract | We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word sequences in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings. |
Tasks | Question Answering, Question Generation, Reading Comprehension |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-2609/ |
https://www.aclweb.org/anthology/W18-2609 | |
PWC | https://paperswithcode.com/paper/neural-models-for-key-phrase-extraction-and |
Repo | |
Framework | |
Learning Semantic Representations for Unsupervised Domain Adaptation
Title | Learning Semantic Representations for Unsupervised Domain Adaptation |
Authors | Shaoan Xie, Zibin Zheng, Liang Chen, Chuan Chen |
Abstract | It is important to transfer the knowledge from label-rich source domain to unlabeled target domain due to the expensive cost of manual labeling efforts. Prior domain adaptation methods address this problem through aligning the global distribution statistics between source domain and target domain, but a drawback of prior methods is that they ignore the semantic information contained in samples, e.g., features of backpacks in target domain might be mapped near features of cars in source domain. In this paper, we present moving semantic transfer network, which learn semantic representations for unlabeled target samples by aligning labeled source centroid and pseudo-labeled target centroid. Features in same class but different domains are expected to be mapped nearby, resulting in an improved target classification accuracy. Moving average centroid alignment is cautiously designed to compensate the insufficient categorical information within each mini batch. Experiments testify that our model yields state of the art results on standard datasets. |
Tasks | Domain Adaptation, Learning Semantic Representations, Unsupervised Domain Adaptation |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=1961 |
http://proceedings.mlr.press/v80/xie18c/xie18c.pdf | |
PWC | https://paperswithcode.com/paper/learning-semantic-representations-for |
Repo | |
Framework | |
Multi-Level Fusion Based 3D Object Detection From Monocular Images
Title | Multi-Level Fusion Based 3D Object Detection From Monocular Images |
Authors | Bin Xu, Zhenzhong Chen |
Abstract | In this paper, we present an end-to-end deep learning based framework for 3D object detection from a single monocular image. A deep convolutional neural network is introduced for simultaneous 2D and 3D object detection. First, 2D region proposals are generated through a region proposal network. Then the shared features are learned within the proposals to predict the class probability, 2D bounding box, orientation, dimension, and 3D location. We adopt a stand-alone module to predict the disparity and extract features from the computed point cloud. Thus features from the original image and the point cloud will be fused in different levels for accurate 3D localization. The estimated disparity is also used for front view feature encoding to enhance the input image,regarded as an input-fusionprocess. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm signiï¬cantly outperforms the state-of-the-art methods with only monocular images. |
Tasks | 3D Object Detection, 3D Object Detection From Monocular Images, Object Detection |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Xu_Multi-Level_Fusion_Based_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_Multi-Level_Fusion_Based_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/multi-level-fusion-based-3d-object-detection |
Repo | |
Framework | |
Multiplicative Tree-Structured Long Short-Term Memory Networks for Semantic Representations
Title | Multiplicative Tree-Structured Long Short-Term Memory Networks for Semantic Representations |
Authors | Nam Khanh Tran, Weiwei Cheng |
Abstract | Tree-structured LSTMs have shown advantages in learning semantic representations by exploiting syntactic information. Most existing methods model tree structures by bottom-up combinations of constituent nodes using the same shared compositional function and often making use of input word information only. The inability to capture the richness of compositionality makes these models lack expressive power. In this paper, we propose multiplicative tree-structured LSTMs to tackle this problem. Our model makes use of not only word information but also relation information between words. It is more expressive, as different combination functions can be used for each child node. In addition to syntactic trees, we also investigate the use of Abstract Meaning Representation in tree-structured models, in order to incorporate both syntactic and semantic information from the sentence. Experimental results on common NLP tasks show the proposed models lead to better sentence representation and AMR brings benefits in complex tasks. |
Tasks | Learning Semantic Representations, Machine Translation, Relation Extraction, Sentiment Analysis, Text Classification |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-2032/ |
https://www.aclweb.org/anthology/S18-2032 | |
PWC | https://paperswithcode.com/paper/multiplicative-tree-structured-long-short |
Repo | |
Framework | |
Document Embedding Enhanced Event Detection with Hierarchical and Supervised Attention
Title | Document Embedding Enhanced Event Detection with Hierarchical and Supervised Attention |
Authors | Yue Zhao, Xiaolong Jin, Yuanzhuo Wang, Xueqi Cheng |
Abstract | Document-level information is very important for event detection even at sentence level. In this paper, we propose a novel Document Embedding Enhanced Bi-RNN model, called DEEB-RNN, to detect events in sentences. This model first learns event detection oriented embeddings of documents through a hierarchical and supervised attention based RNN, which pays word-level attention to event triggers and sentence-level attention to those sentences containing events. It then uses the learned document embedding to enhance another bidirectional RNN model to identify event triggers and their types in sentences. Through experiments on the ACE-2005 dataset, we demonstrate the effectiveness and merits of the proposed DEEB-RNN model via comparison with state-of-the-art methods. |
Tasks | Document Embedding |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2066/ |
https://www.aclweb.org/anthology/P18-2066 | |
PWC | https://paperswithcode.com/paper/document-embedding-enhanced-event-detection |
Repo | |
Framework | |
Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation
Title | Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation |
Authors | Xiaoxue Zang, Ashwini Pokle, Marynel V{'a}zquez, Kevin Chen, Juan Carlos Niebles, Alvaro Soto, Silvio Savarese |
Abstract | We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. We use attention models to connect information from both the user instructions and a topological representation of the environment. We evaluate our model{'}s performance on a new dataset containing 10,050 pairs of navigation instructions. Our model significantly outperforms baseline approaches. Furthermore, our results suggest that it is possible to leverage the environment map as a relevant knowledge base to facilitate the translation of free-form navigational instruction. |
Tasks | Common Sense Reasoning, Robot Navigation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1286/ |
https://www.aclweb.org/anthology/D18-1286 | |
PWC | https://paperswithcode.com/paper/translating-navigation-instructions-in |
Repo | |
Framework | |
Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
Title | Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery |
Authors | Greire Payen de La Garanderie, Amir Atapour Abarghouei, Toby P. Breckon |
Abstract | Recent automotive vision work has focused almost exclusively on processing forward-facing cameras. However, future autonomous vehicles will not be viable without a more comprehensive surround sensing, akin to a human driver, as can be provided by 360° panoramic cameras. We present an approach to adapt contemporary deep network architectures developed on conventional rectilinear imagery to work on equirectangular 360° panoramic imagery. To address the lack of annotated panoramic automotive datasets availability, we adapt contemporary automotive dataset, via style and projection transformations, to facilitate the cross-domain retraining of contemporary algorithms for panoramic imagery. Following this approach we retrain and adapt existing architectures to recover scene depth and 3D pose of vehicles from monocular panoramic imagery without any panoramic training labels or calibration parameters. Our approach is evaluated qualitatively on crowd-sourced panoramic images and quantitatively using an automotive environment simulator to provide the first benchmark for such techniques within panoramic imagery. |
Tasks | 3D Object Detection, Autonomous Vehicles, Calibration, Depth Estimation, Monocular Depth Estimation, Object Detection |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Gregoire_Payen_de_La_Garanderie_Eliminating_the_Dreaded_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Gregoire_Payen_de_La_Garanderie_Eliminating_the_Dreaded_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/eliminating-the-blind-spot-adapting-3d-object |
Repo | |
Framework | |
Unsupervised Video Object Segmentation with Motion-based Bilateral Networks
Title | Unsupervised Video Object Segmentation with Motion-based Bilateral Networks |
Authors | Siyang Li, Bryan Seybold, Alexey Vorobyov, Xuejing Lei, C.-C. Jay Kuo |
Abstract | In this work, we study the unsupervised video object segmentation problem where moving objects are segmented without prior knowledge of these objects. First, we propose a motion-based bilateral network to estimate the background based on the motion pattern of non-object regions. The bilateral network reduces false positive regions by accurately identifying background objects. Then, we integrate the background estimate from the bilateral network with instance embeddings into a graph, which allows multiple frame reasoning with graph edges linking pixels from different frames. We classify graph nodes by defining and minimizing a cost function, and segment the video frames based on the node labels. The proposed method outperforms previous state-of-the-art unsupervised video object segmentation methods against the DAVIS 2016 and the FBMS-59 datasets. |
Tasks | Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Siyang_Li_Unsupervised_Video_Object_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Siyang_Li_Unsupervised_Video_Object_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-video-object-segmentation-with-1 |
Repo | |
Framework | |
MaskGAN: Better Text Generation via Filling in the _______
Title | MaskGAN: Better Text Generation via Filling in the _______ |
Authors | William Fedus, Ian Goodfellow, Andrew M. Dai |
Abstract | Neural text generation models are often autoregressive language models or seq2seq models. Neural autoregressive and seq2seq models that generate text by sampling words sequentially, with each word conditioned on the previous model, are state-of-the-art for several machine translation and summarization benchmarks. These benchmarks are often defined by validation perplexity even though this is not a direct measure of sample quality. Language models are typically trained via maximum likelihood and most often with teacher forcing. Teacher forcing is well-suited to optimizing perplexity but can result in poor sample quality because generating text requires conditioning on sequences of words that were never observed at training time. We propose to improve sample quality using Generative Adversarial Network (GANs), which explicitly train the generator to produce high quality samples and have shown a lot of success in image generation. GANs were originally to designed to output differentiable values, so discrete language generation is challenging for them. We introduce an actor-critic conditional GAN that fills in missing text conditioned on the surrounding context. We show qualitatively and quantitatively, evidence that this produces more realistic text samples compared to a maximum likelihood trained model. |
Tasks | Image Generation, Machine Translation, Multivariate Time Series Imputation, Text Generation |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=ByOExmWAb |
https://openreview.net/pdf?id=ByOExmWAb | |
PWC | https://paperswithcode.com/paper/maskgan-better-text-generation-via-filling-in-1 |
Repo | |
Framework | |
Decoupling the Layers in Residual Networks
Title | Decoupling the Layers in Residual Networks |
Authors | Ricky Fok, Aijun An, Zana Rashidi, Xiaogang Wang |
Abstract | We propose a Warped Residual Network (WarpNet) using a parallelizable warp operator for forward and backward propagation to distant layers that trains faster than the original residual neural network. We apply a perturbation theory on residual networks and decouple the interactions between residual units. The resulting warp operator is a first order approximation of the output over multiple layers. The first order perturbation theory exhibits properties such as binomial path lengths and exponential gradient scaling found experimentally by Veit et al (2016). We demonstrate through an extensive performance study that the proposed network achieves comparable predictive performance to the original residual network with the same number of parameters, while achieving a significant speed-up on the total training time. As WarpNet performs model parallelism in residual network training in which weights are distributed over different GPUs, it offers speed-up and capability to train larger networks compared to original residual networks. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=SyMvJrdaW |
https://openreview.net/pdf?id=SyMvJrdaW | |
PWC | https://paperswithcode.com/paper/decoupling-the-layers-in-residual-networks |
Repo | |
Framework | |
Bidirectional Retrieval Made Simple
Title | Bidirectional Retrieval Made Simple |
Authors | Jônatas Wehrmann, Rodrigo C. Barros |
Abstract | This paper provides a very simple yet effective character-level architecture for learning bidirectional retrieval models. Aligning multimodal content is particularly challenging considering the difficulty in finding semantic correspondence between images and descriptions. We introduce an efficient character-level inception module, designed to learn textual semantic embeddings by convolving raw characters in distinct granularity levels. Our approach is capable of explicitly encoding hierarchical information from distinct base-level representations (e.g., characters, words, and sentences) into a shared multimodal space, where it maps the semantic correspondence between images and descriptions via a contrastive pairwise loss function that minimizes order-violations. Models generated by our approach are far more robust to input noise than state-of-the-art strategies based on word-embeddings. Despite being conceptually much simpler and requiring fewer parameters, our models outperform the state-of-the-art approaches by 4.8% in the task of description retrieval and 2.7% (absolute R@1 values) in the task of image retrieval in the popular MS COCO retrieval dataset. Finally, we show that our models present solid performance for text classification as well, specially in multilingual and noisy domains. |
Tasks | Image Retrieval, Text Classification, Word Embeddings |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Wehrmann_Bidirectional_Retrieval_Made_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Wehrmann_Bidirectional_Retrieval_Made_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/bidirectional-retrieval-made-simple |
Repo | |
Framework | |
A Simple and Effective Approach to Coverage-Aware Neural Machine Translation
Title | A Simple and Effective Approach to Coverage-Aware Neural Machine Translation |
Authors | Yanyang Li, Tong Xiao, Yinqiao Li, Qiang Wang, Changming Xu, Jingbo Zhu |
Abstract | We offer a simple and effective method to seek a better balance between model confidence and length preference for Neural Machine Translation (NMT). Unlike the popular length normalization and coverage models, our model does not require training nor reranking the limited n-best outputs. Moreover, it is robust to large beam sizes, which is not well studied in previous work. On the Chinese-English and English-German translation tasks, our approach yields +0.4 1.5 BLEU improvements over the state-of-the-art baselines. |
Tasks | Machine Translation |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-2047/ |
https://www.aclweb.org/anthology/P18-2047 | |
PWC | https://paperswithcode.com/paper/a-simple-and-effective-approach-to-coverage |
Repo | |
Framework | |
A Multi-Domain Framework for Textual Similarity. A Case Study on Question-to-Question and Question-Answering Similarity Tasks
Title | A Multi-Domain Framework for Textual Similarity. A Case Study on Question-to-Question and Question-Answering Similarity Tasks |
Authors | Amir Hazem, Basma El Amal Boussaha, Hern, Nicolas ez |
Abstract | |
Tasks | Community Question Answering, Natural Language Inference, Question Answering, Question Similarity, Word Embeddings |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1432/ |
https://www.aclweb.org/anthology/L18-1432 | |
PWC | https://paperswithcode.com/paper/a-multi-domain-framework-for-textual |
Repo | |
Framework | |
Bootstrapping the Performance of Webly Supervised Semantic Segmentation
Title | Bootstrapping the Performance of Webly Supervised Semantic Segmentation |
Authors | Tong Shen, Guosheng Lin, Chunhua Shen, Ian Reid |
Abstract | Fully supervised methods for semantic segmentation require pixel-level class masks to train, the creation of which are expensive in terms of manual labour and time. In this work, we focus on weak supervision, developing a method for training a high-quality pixel-level classifier for semantic segmentation, using only image-level class labels as the provided ground-truth. Our method is formulated as a two-stage approach in which we first aim to create accurate pixel-level masks for the training images via a bootstrapping process, and then use these now-accurately segmented images as a proxy ground-truth in a more standard supervised setting. The key driver for our work is that in the target dataset we typically have reliable ground-truth image-level labels, while data crawled from the web may have unreliable labels, but can be filtered to comprise only easy images to segment, therefore having reliable boundaries. These two forms of information are complementary and we use this observation to build a novel bi-directional transfer learning. This framework transfers knowledge between two domains, target domain and web domain, bootstrapping the performance of weakly supervised semantic segmentation. Conducting experiments on the popular benchmark dataset PASCAL VOC 2012 based on both a VGG16 network and on ResNet50, we reach state-of-the-art performance with scores of 60.2% IoU and 63.9% IoU respectively. |
Tasks | Semantic Segmentation, Transfer Learning, Weakly-Supervised Semantic Segmentation |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Shen_Bootstrapping_the_Performance_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Shen_Bootstrapping_the_Performance_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/bootstrapping-the-performance-of-webly |
Repo | |
Framework | |