October 20, 2019

3378 words 16 mins read

Paper Group AWR 176

Paper Group AWR 176

Enhancing Sentence Embedding with Generalized Pooling. A Minimal Closed-Form Solution for Multi-Perspective Pose Estimation using Points and Lines. TI-CNN: Convolutional Neural Networks for Fake News Detection. Relative Saliency and Ranking: Models, Metrics, Data, and Benchmarks. From Volcano to Toyshop: Adaptive Discriminative Region Discovery for …

Enhancing Sentence Embedding with Generalized Pooling

Title Enhancing Sentence Embedding with Generalized Pooling
Authors Qian Chen, Zhen-Hua Ling, Xiaodan Zhu
Abstract Pooling is an essential component of a wide variety of sentence representation and embedding models. This paper explores generalized pooling methods to enhance sentence embedding. We propose vector-based multi-head attention that includes the widely used max pooling, mean pooling, and scalar self-attention as special cases. The model benefits from properly designed penalization terms to reduce redundancy in multi-head attention. We evaluate the proposed model on three different tasks: natural language inference (NLI), author profiling, and sentiment classification. The experiments show that the proposed model achieves significant improvement over strong sentence-encoding-based methods, resulting in state-of-the-art performances on four datasets. The proposed approach can be easily implemented for more problems than we discuss in this paper.
Tasks Natural Language Inference, Sentence Embedding, Sentiment Analysis
Published 2018-06-26
URL http://arxiv.org/abs/1806.09828v1
PDF http://arxiv.org/pdf/1806.09828v1.pdf
PWC https://paperswithcode.com/paper/enhancing-sentence-embedding-with-generalized
Repo https://github.com/lukecq1231/generalized-pooling
Framework none

A Minimal Closed-Form Solution for Multi-Perspective Pose Estimation using Points and Lines

Title A Minimal Closed-Form Solution for Multi-Perspective Pose Estimation using Points and Lines
Authors Pedro Miraldo, Tiago Dias, Srikumar Ramalingam
Abstract We propose a minimal solution for pose estimation using both points and lines for a multi-perspective camera. In this paper, we treat the multi-perspective camera as a collection of rigidly attached perspective cameras. These type of imaging devices are useful for several computer vision applications that require a large coverage such as surveillance, self-driving cars, and motion-capture studios. While prior methods have considered the cases using solely points or lines, the hybrid case involving both points and lines has not been solved for multi-perspective cameras. We present the solutions for two cases. In the first case, we are given 2D to 3D correspondences for two points and one line. In the later case, we are given 2D to 3D correspondences for one point and two lines. We show that the solution for the case of two points and one line can be formulated as a fourth degree equation. This is interesting because we can get a closed-form solution and thereby achieve high computational efficiency. The later case involving two lines and one point can be mapped to an eighth degree equation. We show simulations and real experiments to demonstrate the advantages and benefits over existing methods.
Tasks Motion Capture, Pose Estimation, Self-Driving Cars
Published 2018-07-26
URL http://arxiv.org/abs/1807.09970v1
PDF http://arxiv.org/pdf/1807.09970v1.pdf
PWC https://paperswithcode.com/paper/a-minimal-closed-form-solution-for-multi
Repo https://github.com/pmiraldo/MinimalMultiPerspectivePose
Framework none

TI-CNN: Convolutional Neural Networks for Fake News Detection

Title TI-CNN: Convolutional Neural Networks for Fake News Detection
Authors Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, Philip S. Yu
Abstract With the development of social networks, fake news for various commercial and political purposes has been appearing in large numbers and gotten widespread in the online world. With deceptive words, people can get infected by the fake news very easily and will share them without any fact-checking. For instance, during the 2016 US president election, various kinds of fake news about the candidates widely spread through both official news media and the online social networks. These fake news is usually released to either smear the opponents or support the candidate on their side. The erroneous information in the fake news is usually written to motivate the voters’ irrational emotion and enthusiasm. Such kinds of fake news sometimes can bring about devastating effects, and an important goal in improving the credibility of online social networks is to identify the fake news timely. In this paper, we propose to study the fake news detection problem. Automatic fake news identification is extremely hard, since pure model based fact-checking for news is still an open problem, and few existing models can be applied to solve the problem. With a thorough investigation of a fake news data, lots of useful explicit features are identified from both the text words and images used in the fake news. Besides the explicit features, there also exist some hidden patterns in the words and images used in fake news, which can be captured with a set of latent features extracted via the multiple convolutional layers in our model. A model named as TI-CNN (Text and Image information based Convolutinal Neural Network) is proposed in this paper. By projecting the explicit and latent features into a unified feature space, TI-CNN is trained with both the text and image information simultaneously. Extensive experiments carried on the real-world fake news datasets have demonstrate the effectiveness of TI-CNN.
Tasks Fake News Detection
Published 2018-06-03
URL http://arxiv.org/abs/1806.00749v1
PDF http://arxiv.org/pdf/1806.00749v1.pdf
PWC https://paperswithcode.com/paper/ti-cnn-convolutional-neural-networks-for-fake
Repo https://github.com/AIRLegend/fakenews
Framework tf

Relative Saliency and Ranking: Models, Metrics, Data, and Benchmarks

Title Relative Saliency and Ranking: Models, Metrics, Data, and Benchmarks
Authors Mahmoud Kalash, Md Amirul Islam, Neil D. B. Bruce
Abstract Salient object detection is a problem that has been considered in detail and \textcolor{black}{many solutions have been proposed}. In this paper, we argue that work to date has addressed a problem that is relatively ill-posed. Specifically, there is not universal agreement about what constitutes a salient object when multiple observers are queried. This implies that some objects are more likely to be judged salient than others, and implies a relative rank exists on salient objects. Initially, we present a novel deep learning solution based on a hierarchical representation of relative saliency and stage-wise refinement. Further to this, we present data, analysis and baseline benchmark results towards addressing the problem of salient object ranking. Methods for deriving suitable ranked salient object instances are presented, along with metrics suitable to measuring algorithm performance. In addition, we show how a derived dataset can be successively refined to provide cleaned results that correlate well with pristine ground truth in its characteristics and value for training and testing models. Finally, we provide a comparison among prevailing algorithms that address salient object ranking or detection to establish initial baselines providing a basis for comparison with future efforts addressing this problem. \textcolor{black}{The source code and data are publicly available via our project page:} \textrm{\href{https://ryersonvisionlab.github.io/cocosalrank.html}{ryersonvisionlab.github.io/cocosalrank}}
Tasks Object Detection, Salient Object Detection
Published 2018-10-03
URL https://arxiv.org/abs/1810.02426v2
PDF https://arxiv.org/pdf/1810.02426v2.pdf
PWC https://paperswithcode.com/paper/relative-saliency-and-ranking-models-metrics
Repo https://github.com/islamamirul/COCO-SalRank
Framework none

From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition

Title From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition
Authors Zhengyu Zhao, Martha Larson
Abstract As deep learning approaches to scene recognition emerge, they have continued to leverage discriminative regions at multiple scales, building on practices established by conventional image classification research. However, approaches remain largely generic, and do not carefully consider the special properties of scenes. In this paper, inspired by the intuitive differences between scenes and objects, we propose Adi-Red, an adaptive approach to discriminative region discovery for scene recognition. Adi-Red uses a CNN classifier, which was pre-trained using only image-level scene labels, to discover discriminative image regions directly. These regions are then used as a source of features to perform scene recognition. The use of the CNN classifier makes it possible to adapt the number of discriminative regions per image using a simple, yet elegant, threshold, at relatively low computational cost. Experimental results on the scene recognition benchmark dataset SUN397 demonstrate the ability of Adi-Red to outperform the state of the art. Additional experimental analysis on the Places dataset reveals the advantages of Adi-Red, and highlight how they are specific to scenes. We attribute the effectiveness of Adi-Red to the ability of adaptive region discovery to avoid introducing noise, while also not missing out on important information.
Tasks Image Classification, Scene Recognition
Published 2018-07-23
URL http://arxiv.org/abs/1807.08624v2
PDF http://arxiv.org/pdf/1807.08624v2.pdf
PWC https://paperswithcode.com/paper/from-volcano-to-toyshop-adaptive
Repo https://github.com/ZhengyuZhao/Adi-Red-Scene
Framework pytorch

Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Data

Title Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Data
Authors Michael Danielczuk, Matthew Matl, Saurabh Gupta, Andrew Li, Andrew Lee, Jeffrey Mahler, Ken Goldberg
Abstract The ability to segment unknown objects in depth images has potential to enhance robot skills in grasping and object tracking. Recent computer vision research has demonstrated that Mask R-CNN can be trained to segment specific categories of objects in RGB images when massive hand-labeled datasets are available. As generating these datasets is time consuming, we instead train with synthetic depth images. Many robots now use depth sensors, and recent results suggest training on synthetic depth data can transfer successfully to the real world. We present a method for automated dataset generation and rapidly generate a synthetic training dataset of 50,000 depth images and 320,000 object masks using simulated heaps of 3D CAD models. We train a variant of Mask R-CNN with domain randomization on the generated dataset to perform category-agnostic instance segmentation without any hand-labeled data and we evaluate the trained network, which we refer to as Synthetic Depth (SD) Mask R-CNN, on a set of real, high-resolution depth images of challenging, densely-cluttered bins containing objects with highly-varied geometry. SD Mask R-CNN outperforms point cloud clustering baselines by an absolute 15% in Average Precision and 20% in Average Recall on COCO benchmarks, and achieves performance levels similar to a Mask R-CNN trained on a massive, hand-labeled RGB dataset and fine-tuned on real images from the experimental setup. We deploy the model in an instance-specific grasping pipeline to demonstrate its usefulness in a robotics application. Code, the synthetic training dataset, and supplementary material are available at https://bit.ly/2letCuE.
Tasks Instance Segmentation, Object Tracking, Semantic Segmentation
Published 2018-09-16
URL http://arxiv.org/abs/1809.05825v2
PDF http://arxiv.org/pdf/1809.05825v2.pdf
PWC https://paperswithcode.com/paper/segmenting-unknown-3d-objects-from-real-depth
Repo https://github.com/BerkeleyAutomation/sd-maskrcnn
Framework none

Visual Object Networks: Image Generation with Disentangled 3D Representation

Title Visual Object Networks: Image Generation with Disentangled 3D Representation
Authors Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum, William T. Freeman
Abstract Recent progress in deep generative models has led to tremendous breakthroughs in image generation. However, while existing models can synthesize photorealistic images, they lack an understanding of our underlying 3D world. We present a new generative model, Visual Object Networks (VON), synthesizing natural images of objects with a disentangled 3D representation. Inspired by classic graphics rendering pipelines, we unravel our image formation process into three conditionally independent factors—shape, viewpoint, and texture—and present an end-to-end adversarial learning framework that jointly models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes that are indistinguishable from real shapes. It then renders the object’s 2.5D sketches (i.e., silhouette and depth map) from its shape under a sampled viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches to generate natural images. The VON not only generates images that are more realistic than state-of-the-art 2D image synthesis methods, but also enables many 3D operations such as changing the viewpoint of a generated image, editing of shape and texture, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.
Tasks Image Generation
Published 2018-12-06
URL http://arxiv.org/abs/1812.02725v1
PDF http://arxiv.org/pdf/1812.02725v1.pdf
PWC https://paperswithcode.com/paper/visual-object-networks-image-generation-with
Repo https://github.com/junyanz/VON
Framework pytorch

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling

Title Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
Authors Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B. Tenenbaum, William T. Freeman
Abstract We study 3D shape modeling from a single image and make contributions to it in three aspects. First, we present Pix3D, a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc. Building such a large-scale dataset, however, is highly challenging; existing datasets either contain only synthetic data, or lack precise alignment between 2D images and 3D shapes, or only have a small number of images. Second, we calibrate the evaluation criteria for 3D shape reconstruction through behavioral studies, and use them to objectively and systematically benchmark cutting-edge reconstruction algorithms on Pix3D. Third, we design a novel model that simultaneously performs 3D reconstruction and pose estimation; our multi-task learning approach achieves state-of-the-art performance on both tasks.
Tasks 3D Reconstruction, 3D Shape Modeling, Multi-Task Learning, Pose Estimation, Viewpoint Estimation
Published 2018-04-12
URL http://arxiv.org/abs/1804.04610v1
PDF http://arxiv.org/pdf/1804.04610v1.pdf
PWC https://paperswithcode.com/paper/pix3d-dataset-and-methods-for-single-image-3d
Repo https://github.com/xingyuansun/pix3d
Framework tf

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

Title Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Authors Sijie Yan, Yuanjun Xiong, Dahua Lin
Abstract Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power and difficulties of generalization. In this work, we propose a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data. This formulation not only leads to greater expressive power but also stronger generalization capability. On two large datasets, Kinetics and NTU-RGBD, it achieves substantial improvements over mainstream methods.
Tasks 3D Human Pose Estimation, Action Recognition In Videos, Multimodal Activity Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published 2018-01-23
URL http://arxiv.org/abs/1801.07455v2
PDF http://arxiv.org/pdf/1801.07455v2.pdf
PWC https://paperswithcode.com/paper/spatial-temporal-graph-convolutional-networks-1
Repo https://github.com/ZhangNYG/ST-GCN
Framework pytorch

Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility

Title Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility
Authors Denis Newman-Griffis, Ayah Zirikly
Abstract Functioning is gaining recognition as an important indicator of global health, but remains under-studied in medical natural language processing research. We present the first analysis of automatically extracting descriptions of patient mobility, using a recently-developed dataset of free text electronic health records. We frame the task as a named entity recognition (NER) problem, and investigate the applicability of NER techniques to mobility extraction. As text corpora focused on patient functioning are scarce, we explore domain adaptation of word embeddings for use in a recurrent neural network NER system. We find that embeddings trained on a small in-domain corpus perform nearly as well as those learned from large out-of-domain corpora, and that domain adaptation techniques yield additional improvements in both precision and recall. Our analysis identifies several significant challenges in extracting descriptions of patient mobility, including the length and complexity of annotated entities and high linguistic variability in mobility descriptions.
Tasks Domain Adaptation, Medical Named Entity Recognition, Named Entity Recognition, Word Embeddings
Published 2018-06-07
URL http://arxiv.org/abs/1806.02814v1
PDF http://arxiv.org/pdf/1806.02814v1.pdf
PWC https://paperswithcode.com/paper/embedding-transfer-for-low-resource-medical
Repo https://github.com/drgriffis/NeuralVecmap
Framework tf

Covariance Pooling For Facial Expression Recognition

Title Covariance Pooling For Facial Expression Recognition
Authors Dinesh Acharya, Zhiwu Huang, Danda Paudel, Luc Van Gool
Abstract Classifying facial expressions into different categories requires capturing regional distortions of facial landmarks. We believe that second-order statistics such as covariance is better able to capture such distortions in regional facial fea- tures. In this work, we explore the benefits of using a man- ifold network structure for covariance pooling to improve facial expression recognition. In particular, we first employ such kind of manifold networks in conjunction with tradi- tional convolutional networks for spatial pooling within in- dividual image feature maps in an end-to-end deep learning manner. By doing so, we are able to achieve a recognition accuracy of 58.14% on the validation set of Static Facial Expressions in the Wild (SFEW 2.0) and 87.0% on the vali- dation set of Real-World Affective Faces (RAF) Database. Both of these results are the best results we are aware of. Besides, we leverage covariance pooling to capture the tem- poral evolution of per-frame features for video-based facial expression recognition. Our reported results demonstrate the advantage of pooling image-set features temporally by stacking the designed manifold network of covariance pool-ing on top of convolutional network layers.
Tasks Facial Expression Recognition
Published 2018-05-13
URL http://arxiv.org/abs/1805.04855v1
PDF http://arxiv.org/pdf/1805.04855v1.pdf
PWC https://paperswithcode.com/paper/covariance-pooling-for-facial-expression
Repo https://github.com/d-acharya/CovPoolFER
Framework tf

LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks

Title LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks
Authors Christian Bartz, Haojin Yang, Joseph Bethge, Christoph Meinel
Abstract Recently, deep neural networks have achieved remarkable performance on the task of object detection and recognition. The reason for this success is mainly grounded in the availability of large scale, fully annotated datasets, but the creation of such a dataset is a complicated and costly task. In this paper, we propose a novel method for weakly supervised object detection that simplifies the process of gathering data for training an object detector. We train an ensemble of two models that work together in a student-teacher fashion. Our student (localizer) is a model that learns to localize an object, the teacher (assessor) assesses the quality of the localization and provides feedback to the student. The student uses this feedback to learn how to localize objects and is thus entirely supervised by the teacher, as we are using no labels for training the localizer. In our experiments, we show that our model is very robust to noise and reaches competitive performance compared to a state-of-the-art fully supervised approach. We also show the simplicity of creating a new dataset, based on a few videos (e.g. downloaded from YouTube) and artificially generated data.
Tasks Object Detection, Weakly Supervised Object Detection
Published 2018-11-14
URL http://arxiv.org/abs/1811.05773v2
PDF http://arxiv.org/pdf/1811.05773v2.pdf
PWC https://paperswithcode.com/paper/loans-weakly-supervised-object-detection-with
Repo https://github.com/Bartzi/loans
Framework none

Progressive Operational Perceptron with Memory

Title Progressive Operational Perceptron with Memory
Authors Dat Thanh Tran, Serkan Kiranyaz, Moncef Gabbouj, Alexandros Iosifidis
Abstract Generalized Operational Perceptron (GOP) was proposed to generalize the linear neuron model in the traditional Multilayer Perceptron (MLP) and this model can mimic the synaptic connections of the biological neurons that have nonlinear neurochemical behaviours. Progressive Operational Perceptron (POP) is a multilayer network composing of GOPs which is formed layer-wise progressively. In this work, we propose major modifications that can accelerate as well as augment the progressive learning procedure of POP by incorporating an information-preserving, linear projection path from the input to the output layer at each progressive step. The proposed extensions can be interpreted as a mechanism that provides direct information extracted from the previously learned layers to the network, hence the term “memory”. This allows the network to learn deeper architectures with better data representations. An extensive set of experiments show that the proposed modifications can surpass the learning capability of the original POPs and other related algorithms.
Tasks
Published 2018-08-20
URL https://arxiv.org/abs/1808.06377v3
PDF https://arxiv.org/pdf/1808.06377v3.pdf
PWC https://paperswithcode.com/paper/progressive-operational-perceptron-with
Repo https://github.com/viebboy/PyGOP
Framework tf

Benchmarking the Hill-Valley Evolutionary Algorithm for the GECCO 2018 Competition on Niching Methods Multimodal Optimization

Title Benchmarking the Hill-Valley Evolutionary Algorithm for the GECCO 2018 Competition on Niching Methods Multimodal Optimization
Authors S. C. Maree, T. Alderliesten, D. Thierens, P. A. N. Bosman
Abstract This report presents benchmarking results of the latest version of the Hill-Valley Evolutionary Algorithm (HillVallEA) on the CEC2013 niching benchmark suite. The benchmarking follows restrictions required by the GECCO 2018 competition on Niching methods for Multimodal Optimization. In particular, no problem dependent parameter tuning is performed. A number of adjustments have been made to original publication of HillVallEA that are discussed in this report.
Tasks
Published 2018-06-30
URL http://arxiv.org/abs/1807.00188v2
PDF http://arxiv.org/pdf/1807.00188v2.pdf
PWC https://paperswithcode.com/paper/benchmarking-the-hill-valley-evolutionary
Repo https://github.com/scmaree/HillVallEA
Framework none

Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons

Title Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons
Authors Sven Buechel, Udo Hahn
Abstract In the past years, sentiment analysis has increasingly shifted attention to representational frameworks more expressive than semantic polarity (being positive, negative or neutral). However, these richer formats (like Basic Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in psychological research, tend to proliferate the number of representation schemes for emotion encoding. Thus, a large amount of representationally incompatible emotion lexicons has been developed by various research groups adopting one or the other emotion representation format. As a consequence, the reusability of these resources decreases as does the comparability of systems using them. In this paper, we propose to solve this dilemma by methods and tools which map different representation formats onto each other for the sake of mutual compatibility and interoperability of language resources. We present the first large-scale investigation of such representation mappings for four typologically diverse languages and find evidence that our approach produces (near-)gold quality emotion lexicons, even in cross-lingual settings. Finally, we use our models to create new lexicons for eight typologically diverse languages.
Tasks Sentiment Analysis
Published 2018-07-02
URL http://arxiv.org/abs/1807.00775v1
PDF http://arxiv.org/pdf/1807.00775v1.pdf
PWC https://paperswithcode.com/paper/representation-mapping-a-novel-approach-to
Repo https://github.com/JULIELab/EmoMap
Framework none
comments powered by Disqus