October 20, 2019

3508 words 17 mins read

Paper Group AWR 286

Paper Group AWR 286

Predicting protein inter-residue contacts using composite likelihood maximization and deep learning. Multi-Scale Gradual Integration CNN for False Positive Reduction in Pulmonary Nodule Detection. cilantro: A Lean, Versatile, and Efficient Library for Point Cloud Data Processing. Deep Reasoning with Knowledge Graph for Social Relationship Understan …

Predicting protein inter-residue contacts using composite likelihood maximization and deep learning

Title Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
Authors Haicang Zhang, Qi Zhang, Fusong Ju, Jianwei Zhu, Shiwei Sun, Yujuan Gao, Ziwei Xie, Minghua Deng, Shiwei Sun, Wei-Mou Zheng, Dongbo Bu
Abstract Accurate prediction of inter-residue contacts of a protein is important to calcu- lating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective to inferring inter-residue contacts. The Markov ran- dom field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate, in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccu- rate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite- likelihood, i.e., the product of conditional probability of all residue pairs. Com- posite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, includ- ing PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction ac- curacy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset. Accessibility: The software clmDCA and a server are publicly accessible through http://protein.ict.ac.cn/clmDCA/.
Tasks
Published 2018-08-31
URL http://arxiv.org/abs/1809.00083v1
PDF http://arxiv.org/pdf/1809.00083v1.pdf
PWC https://paperswithcode.com/paper/predicting-protein-inter-residue-contacts
Repo https://github.com/zhanghaicang/MRF-suite
Framework none

Multi-Scale Gradual Integration CNN for False Positive Reduction in Pulmonary Nodule Detection

Title Multi-Scale Gradual Integration CNN for False Positive Reduction in Pulmonary Nodule Detection
Authors Bum-Chae Kim, Jun-Sik Choi, Heung-Il Suk
Abstract Lung cancer is a global and dangerous disease, and its early detection is crucial to reducing the risks of mortality. In this regard, it has been of great interest in developing a computer-aided system for pulmonary nodules detection as early as possible on thoracic CT scans. In general, a nodule detection system involves two steps: (i) candidate nodule detection at a high sensitivity, which captures many false positives and (ii) false positive reduction from candidates. However, due to the high variation of nodule morphological characteristics and the possibility of mistaking them for neighboring organs, candidate nodule detection remains a challenge. In this study, we propose a novel Multi-scale Gradual Integration Convolutional Neural Network (MGI-CNN), designed with three main strategies: (1) to use multi-scale inputs with different levels of contextual information, (2) to use abstract information inherent in different input scales with gradual integration, and (3) to learn multi-stream feature integration in an end-to-end manner. To verify the efficacy of the proposed network, we conducted exhaustive experiments on the LUNA16 challenge datasets by comparing the performance of the proposed method with state-of-the-art methods in the literature. On two candidate subsets of the LUNA16 dataset, i.e., V1 and V2, our method achieved an average CPM of 0.908 (V1) and 0.942 (V2), outperforming comparable methods by a large margin. Our MGI-CNN is implemented in Python using TensorFlow and the source code is available from ‘https://github.com/ku-milab/MGICNN.'
Tasks
Published 2018-07-24
URL http://arxiv.org/abs/1807.10581v1
PDF http://arxiv.org/pdf/1807.10581v1.pdf
PWC https://paperswithcode.com/paper/multi-scale-gradual-integration-cnn-for-false
Repo https://github.com/ku-milab/MGICNN
Framework tf

cilantro: A Lean, Versatile, and Efficient Library for Point Cloud Data Processing

Title cilantro: A Lean, Versatile, and Efficient Library for Point Cloud Data Processing
Authors Konstantinos Zampogiannis, Cornelia Fermuller, Yiannis Aloimonos
Abstract We introduce cilantro, an open-source C++ library for geometric and general-purpose point cloud data processing. The library provides functionality that covers low-level point cloud operations, spatial reasoning, various methods for point cloud segmentation and generic data clustering, flexible algorithms for robust or local geometric alignment, model fitting, as well as powerful visualization tools. To accommodate all kinds of workflows, cilantro is almost fully templated, and most of its generic algorithms operate in arbitrary data dimension. At the same time, the library is easy to use and highly expressive, promoting a clean and concise coding style. cilantro is highly optimized, has a minimal set of external dependencies, and supports rapid development of performant point cloud processing software in a wide variety of contexts.
Tasks
Published 2018-07-01
URL http://arxiv.org/abs/1807.00399v3
PDF http://arxiv.org/pdf/1807.00399v3.pdf
PWC https://paperswithcode.com/paper/cilantro-a-lean-versatile-and-efficient
Repo https://github.com/kzampog/cilantro
Framework none

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Title Deep Reasoning with Knowledge Graph for Social Relationship Understanding
Authors Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin
Abstract Social relationships (e.g., friends, couple etc.) form the basis of the social network in our daily life. Automatically interpreting such relationships bears a great potential for the intelligent systems to understand human behavior in depth and to better interact with people at a social level. Human beings interpret the social relationships within a group not only based on the people alone, and the interplay between such social relationships and the contextual information around the people also plays a significant role. However, these additional cues are largely overlooked by the previous studies. We found that the interplay between these two factors can be effectively modeled by a novel structured knowledge graph with proper message propagation and attention. And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects. Meanwhile, a graph attentional mechanism is introduced to explicitly reason about the discriminative objects to promote recognition. Extensive experiments on the public benchmarks demonstrate the superiority of our method over the existing leading competitors.
Tasks
Published 2018-07-02
URL http://arxiv.org/abs/1807.00504v1
PDF http://arxiv.org/pdf/1807.00504v1.pdf
PWC https://paperswithcode.com/paper/deep-reasoning-with-knowledge-graph-for
Repo https://github.com/HCPLab-SYSU/SR
Framework pytorch

Learning Monocular Depth by Distilling Cross-domain Stereo Networks

Title Learning Monocular Depth by Distilling Cross-domain Stereo Networks
Authors Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang
Abstract Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving. Existing supervised and unsupervised methods face great challenges. Supervised methods require large amounts of depth measurement data, which are generally difficult to obtain, while unsupervised methods are usually limited in estimation accuracy. Synthetic data generated by graphics engines provide a possible solution for collecting large amounts of depth data. However, the large domain gaps between synthetic and realistic data make directly training with them challenging. In this paper, we propose to use the stereo matching network as a proxy to learn depth from synthetic data and use predicted stereo disparity maps for supervising the monocular depth estimation network. Cross-domain synthetic data could be fully utilized in this novel framework. Different strategies are proposed to ensure learned depth perception capability well transferred across different domains. Our extensive experiments show state-of-the-art results of monocular depth estimation on KITTI dataset.
Tasks Autonomous Driving, Depth Estimation, Monocular Depth Estimation, Scene Understanding, Stereo Matching, Stereo Matching Hand
Published 2018-08-20
URL http://arxiv.org/abs/1808.06586v1
PDF http://arxiv.org/pdf/1808.06586v1.pdf
PWC https://paperswithcode.com/paper/learning-monocular-depth-by-distilling-cross
Repo https://github.com/xy-guo/Learning-Monocular-Depth-by-Stereo
Framework pytorch

Women also Snowboard: Overcoming Bias in Captioning Models

Title Women also Snowboard: Overcoming Bias in Captioning Models
Authors Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach
Abstract Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data (e.g., if a word is present in 60% of training sentences, it might be predicted in 70% of sentences at test time). This can lead to incorrect captions in domains where unbiased captions are desired, or required, due to over-reliance on the learned prior and image context. In this work we investigate generation of gender-specific caption words (e.g. man, woman) based on the person’s appearance or the image context. We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present. The resulting model is forced to look at a person rather than use contextual cues to make a gender-specific predictions. The losses that comprise our model, the Appearance Confusion Loss and the Confident Loss, are general, and can be added to any description model in order to mitigate impacts of unwanted bias in a description dataset. Our proposed model has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men. We also show that unlike other approaches, our model is indeed more often looking at people when predicting their gender.
Tasks Image Captioning
Published 2018-03-26
URL http://arxiv.org/abs/1803.09797v4
PDF http://arxiv.org/pdf/1803.09797v4.pdf
PWC https://paperswithcode.com/paper/women-also-snowboard-overcoming-bias-in-1
Repo https://github.com/dtak/local-independence-public
Framework tf

Video Re-localization

Title Video Re-localization
Authors Yang Feng, Lin Ma, Wei Liu, Tong Zhang, Jiebo Luo
Abstract Many methods have been developed to help people find the video contents they want efficiently. However, there are still some unsolved problems in this area. For example, given a query video and a reference video, how to accurately localize a segment in the reference video such that the segment semantically corresponds to the query video? We define a distinctively new task, namely \textbf{video re-localization}, to address this scenario. Video re-localization is an important emerging technology implicating many applications, such as fast seeking in videos, video copy detection, video surveillance, etc. Meanwhile, it is also a challenging research task because the visual appearance of a semantic concept in videos can have large variations. The first hurdle to clear for the video re-localization task is the lack of existing datasets. It is labor expensive to collect pairs of videos with semantic coherence or correspondence and label the corresponding segments. We first exploit and reorganize the videos in ActivityNet to form a new dataset for video re-localization research, which consists of about 10,000 videos of diverse visual appearances associated with localized boundary information. Subsequently, we propose an innovative cross gated bilinear matching model such that every time-step in the reference video is matched against the attentively weighted query video. Consequently, the prediction of the starting and ending time is formulated as a classification problem based on the matching results. Extensive experimental results show that the proposed method outperforms the competing methods. Our code is available at: https://github.com/fengyang0317/video_reloc.
Tasks
Published 2018-08-05
URL http://arxiv.org/abs/1808.01575v1
PDF http://arxiv.org/pdf/1808.01575v1.pdf
PWC https://paperswithcode.com/paper/video-re-localization
Repo https://github.com/fengyang0317/video_reloc
Framework tf

Parsing R-CNN for Instance-Level Human Analysis

Title Parsing R-CNN for Instance-Level Human Analysis
Authors Lu Yang, Qing Song, Zhihui Wang, Ming Jiang
Abstract Instance-level human analysis is common in real-life scenarios and has multiple manifestations, such as human part segmentation, dense pose estimation, human-object interactions, etc. Models need to distinguish different human instances in the image panel and learn rich features to represent the details of each instance. In this paper, we present an end-to-end pipeline for solving the instance-level human analysis, named Parsing R-CNN. It processes a set of human instances simultaneously through comprehensive considering the characteristics of region-based approach and the appearance of a human, thus allowing representing the details of instances. Parsing R-CNN is very flexible and efficient, which is applicable to many issues in human instance analysis. Our approach outperforms all state-of-the-art methods on CIHP (Crowd Instance-level Human Parsing), MHP v2.0 (Multi-Human Parsing) and DensePose-COCO datasets. Based on the proposed Parsing R-CNN, we reach the 1st place in the COCO 2018 Challenge DensePose Estimation task. Code and models are public available.
Tasks Human-Object Interaction Detection, Human Parsing, Human Part Segmentation, Multi-Human Parsing, Pose Estimation
Published 2018-11-30
URL http://arxiv.org/abs/1811.12596v1
PDF http://arxiv.org/pdf/1811.12596v1.pdf
PWC https://paperswithcode.com/paper/parsing-r-cnn-for-instance-level-human
Repo https://github.com/soeaver/Parsing-R-CNN
Framework pytorch

LSTA: Long Short-Term Attention for Egocentric Action Recognition

Title LSTA: Long Short-Term Attention for Egocentric Action Recognition
Authors Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
Abstract Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence. We demonstrate the effectiveness of LSTA on egocentric activity recognition with an end-to-end trainable two-stream architecture, achieving state of the art performance on four standard benchmarks.
Tasks Activity Recognition, Egocentric Activity Recognition, Temporal Action Localization
Published 2018-11-26
URL http://arxiv.org/abs/1811.10698v3
PDF http://arxiv.org/pdf/1811.10698v3.pdf
PWC https://paperswithcode.com/paper/lsta-long-short-term-attention-for-egocentric
Repo https://github.com/swathikirans/LSTA
Framework pytorch

Pop Music Highlighter: Marking the Emotion Keypoints

Title Pop Music Highlighter: Marking the Emotion Keypoints
Authors Yu-Siang Huang, Szu-Yu Chou, Yi-Hsuan Yang
Abstract The goal of music highlight extraction is to get a short consecutive segment of a piece of music that provides an effective representation of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification as a surrogate task for music highlight extraction, for Pop songs. The rationale behind that approach is that the highlight of a song is usually the most emotional part. This paper extends our previous work in the following two aspects. First, methodology-wise we experiment with a new architecture that does not need any recurrent layers, making the training process faster. Moreover, we compare a late-fusion variant and an early-fusion variant to study which one better exploits the attention mechanism. Second, we conduct and report an extensive set of experiments comparing the proposed attention-based methods against a heuristic energy-based method, a structural repetition-based method, and a few other simple feature-based methods for this task. Due to the lack of public-domain labeled data for highlight extraction, following our previous work we use the RWC POP 100-song data set to evaluate how the detected highlights overlap with any chorus sections of the songs. The experiments demonstrate the effectiveness of our methods over competing methods. For reproducibility, we open source the code and pre-trained model at https://github.com/remyhuang/pop-music-highlighter/.
Tasks Emotion Classification
Published 2018-02-28
URL http://arxiv.org/abs/1802.10495v2
PDF http://arxiv.org/pdf/1802.10495v2.pdf
PWC https://paperswithcode.com/paper/pop-music-highlighter-marking-the-emotion
Repo https://github.com/remyhuang/pop-music-highlighter
Framework tf

Constrained Neural Style Transfer for Decorated Logo Generation

Title Constrained Neural Style Transfer for Decorated Logo Generation
Authors Gantugs Atarsaikhan, Brian Kenji Iwana, Seiichi Uchida
Abstract Making decorated logos requires image editing skills, without sufficient skills, it could be a time-consuming task. While there are many on-line web services to make new logos, they have limited designs and duplicates can be made. We propose using neural style transfer with clip art and text for the creation of new and genuine logos. We introduce a new loss function based on distance transform of the input image, which allows the preservation of the silhouettes of text and objects. The proposed method constrains style transfer only around the designated area. We demonstrate the characteristics of proposed method. Finally, we show the results of logo generation with various input images.
Tasks Style Transfer
Published 2018-03-02
URL http://arxiv.org/abs/1803.00686v2
PDF http://arxiv.org/pdf/1803.00686v2.pdf
PWC https://paperswithcode.com/paper/constrained-neural-style-transfer-for
Repo https://github.com/gttugsuu/Constrained-Neural-Style-Transfer-for-Decorated-Logo-Generation
Framework tf

FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

Title FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models
Authors Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud
Abstract A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In this paper, we use Hutchinson’s trace estimator to give a scalable unbiased estimate of the log-density. The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures. We demonstrate our approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.
Tasks Density Estimation, Image Generation
Published 2018-10-02
URL http://arxiv.org/abs/1810.01367v3
PDF http://arxiv.org/pdf/1810.01367v3.pdf
PWC https://paperswithcode.com/paper/ffjord-free-form-continuous-dynamics-for
Repo https://github.com/rtqichen/ffjord
Framework pytorch

MT-CGCNN: Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction

Title MT-CGCNN: Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction
Authors Soumya Sanyal, Janakiraman Balachandran, Naganand Yadati, Abhishek Kumar, Padmini Rajagopalan, Suchismita Sanyal, Partha Talukdar
Abstract Developing accurate, transferable and computationally inexpensive machine learning models can rapidly accelerate the discovery and development of new materials. Some of the major challenges involved in developing such models are, (i) limited availability of materials data as compared to other fields, (ii) lack of universal descriptor of materials to predict its various properties. The limited availability of materials data can be addressed through transfer learning, while the generic representation was recently addressed by Xie and Grossman [1], where they developed a crystal graph convolutional neural network (CGCNN) that provides a unified representation of crystals. In this work, we develop a new model (MT-CGCNN) by integrating CGCNN with transfer learning based on multi-task (MT) learning. We demonstrate the effectiveness of MT-CGCNN by simultaneous prediction of various material properties such as Formation Energy, Band Gap and Fermi Energy for a wide range of inorganic crystals (46774 materials). MT-CGCNN is able to reduce the test error when employed on correlated properties by upto 8%. The model prediction has lower test error compared to CGCNN, even when the training data is reduced by 10%. We also demonstrate our model’s better performance through prediction of end user scenario related to metal/non-metal classification. These results encourage further development of machine learning approaches which leverage multi-task learning to address the aforementioned challenges in the discovery of new materials. We make MT-CGCNN’s source code available to encourage reproducible research.
Tasks Band Gap, Formation Energy, Multi-Task Learning, Transfer Learning
Published 2018-11-14
URL http://arxiv.org/abs/1811.05660v1
PDF http://arxiv.org/pdf/1811.05660v1.pdf
PWC https://paperswithcode.com/paper/mt-cgcnn-integrating-crystal-graph
Repo https://github.com/soumyasanyal/mt-cgcnn
Framework pytorch

Efficient Structured Pruning and Architecture Searching for Group Convolution

Title Efficient Structured Pruning and Architecture Searching for Group Convolution
Authors Ruizhe Zhao, Wayne Luk
Abstract Efficient inference of Convolutional Neural Networks is a thriving topic recently. It is desirable to achieve the maximal test accuracy under given inference budget constraints when deploying a pre-trained model. Network pruning is a commonly used technique while it may produce irregular sparse models that can hardly gain actual speed-up. Group convolution is a promising pruning target due to its regular structure; however, incorporating such structure into the pruning procedure is challenging. It is because structural constraints are hard to describe and can make pruning intractable to solve. The need for configuring group convolution architecture, i.e., the number of groups, that maximises test accuracy also increases difficulty. This paper presents an efficient method to address this challenge. We formulate group convolution pruning as finding the optimal channel permutation to impose structural constraints and solve it efficiently by heuristics. We also apply local search to exploring group configuration based on estimated pruning cost to maximise test accuracy. Compared to prior work, results show that our method produces competitive group convolution models for various tasks within a shorter pruning period and enables rapid group configuration exploration subject to inference budget constraints.
Tasks Domain Adaptation, Network Pruning
Published 2018-11-23
URL https://arxiv.org/abs/1811.09341v4
PDF https://arxiv.org/pdf/1811.09341v4.pdf
PWC https://paperswithcode.com/paper/learning-grouped-convolution-for-efficient
Repo https://github.com/kumasento/gconv-prune
Framework pytorch

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Title Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Authors Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray
Abstract First-person vision is gaining interest as it offers a unique viewpoint on people’s interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labeled for a total of 39.6K action segments and 454.3K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. Dataset and Project page: http://epic-kitchens.github.io
Tasks
Published 2018-04-08
URL http://arxiv.org/abs/1804.02748v2
PDF http://arxiv.org/pdf/1804.02748v2.pdf
PWC https://paperswithcode.com/paper/scaling-egocentric-vision-the-epic-kitchens
Repo https://github.com/antoninofurnari/rulstm
Framework pytorch
comments powered by Disqus