January 31, 2020

2869 words 14 mins read

Paper Group AWR 413

Paper Group AWR 413

Rep the Set: Neural Networks for Learning Set Representations. Efficient Learning on Point Clouds with Basis Point Sets. Incremental Transformer with Deliberation Decoder for Document Grounded Conversations. Model structures and fitting criteria for system identification with neural networks. DeepPruner: Learning Efficient Stereo Matching via Diffe …

Rep the Set: Neural Networks for Learning Set Representations

Title Rep the Set: Neural Networks for Learning Set Representations
Authors Konstantinos Skianis, Giannis Nikolentzos, Stratis Limnios, Michalis Vazirgiannis
Abstract In several domains, data objects can be decomposed into sets of simpler objects. It is then natural to represent each object as the set of its components or parts. Many conventional machine learning algorithms are unable to process this kind of representations, since sets may vary in cardinality and elements lack a meaningful ordering. In this paper, we present a new neural network architecture, called RepSet, that can handle examples that are represented as sets of vectors. The proposed model computes the correspondences between an input set and some hidden sets by solving a series of network flow problems. This representation is then fed to a standard neural network architecture to produce the output. The architecture allows end-to-end gradient-based learning. We demonstrate RepSet on classification tasks, including text categorization, and graph classification, and we show that the proposed neural network achieves performance better or comparable to state-of-the-art algorithms.
Tasks Graph Classification, Text Categorization
Published 2019-04-03
URL https://arxiv.org/abs/1904.01962v2
PDF https://arxiv.org/pdf/1904.01962v2.pdf
PWC https://paperswithcode.com/paper/rep-the-set-neural-networks-for-learning-set
Repo https://github.com/giannisnik/repset
Framework pytorch

Efficient Learning on Point Clouds with Basis Point Sets

Title Efficient Learning on Point Clouds with Basis Point Sets
Authors Sergey Prokudin, Christoph Lassner, Javier Romero
Abstract With the increased availability of 3D scanning technology, point clouds are moving into the focus of computer vision as a rich representation of everyday scenes. However, they are hard to handle for machine learning algorithms due to their unordered structure. One common approach is to apply occupancy grid mapping, which dramatically increases the amount of data stored and at the same time loses details through discretization. Recently, deep learning models were proposed to handle point clouds directly and achieve input permutation invariance. However, these architectures often use an increased number of parameters and are computationally inefficient. In this work, we propose basis point sets (BPS) as a highly efficient and fully general way to process point clouds with machine learning algorithms. The basis point set representation is a residual representation that can be computed efficiently and can be used with standard neural network architectures and other machine learning algorithms. Using the proposed representation as the input to a simple fully connected network allows us to match the performance of PointNet on a shape classification task while using three orders of magnitude less floating-point operations. In a second experiment, we show how the proposed representation can be used for registering high-resolution meshes to noisy 3D scans. Here, we present the first method for single-pass high-resolution mesh registration, avoiding time-consuming per-scan optimization and allowing real-time execution.
Tasks
Published 2019-08-24
URL https://arxiv.org/abs/1908.09186v1
PDF https://arxiv.org/pdf/1908.09186v1.pdf
PWC https://paperswithcode.com/paper/efficient-learning-on-point-clouds-with-basis
Repo https://github.com/sergeyprokudin/bps
Framework pytorch

Incremental Transformer with Deliberation Decoder for Document Grounded Conversations

Title Incremental Transformer with Deliberation Decoder for Document Grounded Conversations
Authors Zekang Li, Cheng Niu, Fandong Meng, Yang Feng, Qian Li, Jie Zhou
Abstract Document Grounded Conversations is a task to generate dialogue responses when chatting about the content of a given document. Obviously, document knowledge plays a critical role in Document Grounded Conversations, while existing dialogue models do not exploit this kind of knowledge effectively enough. In this paper, we propose a novel Transformer-based architecture for multi-turn document grounded conversations. In particular, we devise an Incremental Transformer to encode multi-turn utterances along with knowledge in related documents. Motivated by the human cognitive process, we design a two-pass decoder (Deliberation Decoder) to improve context coherence and knowledge correctness. Our empirical study on a real-world Document Grounded Dataset proves that responses generated by our model significantly outperform competitive baselines on both context coherence and knowledge relevance.
Tasks
Published 2019-07-20
URL https://arxiv.org/abs/1907.08854v3
PDF https://arxiv.org/pdf/1907.08854v3.pdf
PWC https://paperswithcode.com/paper/incremental-transformer-with-deliberation
Repo https://github.com/lizekang/ITDD
Framework pytorch

Model structures and fitting criteria for system identification with neural networks

Title Model structures and fitting criteria for system identification with neural networks
Authors Marco Forgione, Dario Piga
Abstract This paper focuses on the identification of dynamical systems with tailor-made model structures, where neural networks are used to approximate uncertain components and domain knowledge is retained, if available. These model structures are fitted to measured data using different criteria including a computationally efficient approach minimizing a regularized multi-step ahead simulation error. In this approach, the neural network parameters are estimated along with the initial conditions used to simulate the output signal in small-size subsequences. A regularization term is included in the fitting cost in order to enforce these initial conditions to be consistent with the estimated system dynamics. Pitfalls and limitations of naive one-step prediction and simulation error minimization are also discussed.
Tasks
Published 2019-11-29
URL https://arxiv.org/abs/1911.13034v1
PDF https://arxiv.org/pdf/1911.13034v1.pdf
PWC https://paperswithcode.com/paper/model-structures-and-fitting-criteria-for
Repo https://github.com/forgi86/sysid-neural-structures-fitting
Framework pytorch

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch

Title DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch
Authors Shivam Duggal, Shenlong Wang, Wei-Chiu Ma, Rui Hu, Raquel Urtasun
Abstract Our goal is to significantly speed up the runtime of current state-of-the-art stereo algorithms to enable real-time inference. Towards this goal, we developed a differentiable PatchMatch module that allows us to discard most disparities without requiring full cost volume evaluation. We then exploit this representation to learn which range to prune for each pixel. By progressively reducing the search space and effectively propagating such information, we are able to efficiently compute the cost volume for high likelihood hypotheses and achieve savings in both memory and computation. Finally, an image guided refinement module is exploited to further improve the performance. Since all our components are differentiable, the full network can be trained end-to-end. Our experiments show that our method achieves competitive results on KITTI and SceneFlow datasets while running in real-time at 62ms.
Tasks Stereo Matching, Stereo Matching Hand
Published 2019-09-12
URL https://arxiv.org/abs/1909.05845v1
PDF https://arxiv.org/pdf/1909.05845v1.pdf
PWC https://paperswithcode.com/paper/deeppruner-learning-efficient-stereo-matching
Repo https://github.com/uber-research/DeepPruner
Framework pytorch

Group-wise Correlation Stereo Network

Title Group-wise Correlation Stereo Network
Authors Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li
Abstract Stereo matching estimates the disparity between a rectified image pair, which is of great importance to depth sensing, autonomous driving, and other related tasks. Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps. In this paper, we propose to construct the cost volume by group-wise correlation. The left features and the right features are divided into groups along the channel dimension, and correlation maps are computed among each group to obtain multiple matching cost proposals, which are then packed into a cost volume. Group-wise correlation provides efficient representations for measuring feature similarities and will not lose too much information like full correlation. It also preserves better performance when reducing parameters compared with previous methods. The 3D stacked hourglass network proposed in previous works is improved to boost the performance and decrease the inference computational cost. Experiment results show that our method outperforms previous methods on Scene Flow, KITTI 2012, and KITTI 2015 datasets. The code is available at https://github.com/xy-guo/GwcNet
Tasks Autonomous Driving, Stereo Matching, Stereo Matching Hand
Published 2019-03-10
URL http://arxiv.org/abs/1903.04025v1
PDF http://arxiv.org/pdf/1903.04025v1.pdf
PWC https://paperswithcode.com/paper/group-wise-correlation-stereo-network
Repo https://github.com/xy-guo/GwcNet
Framework pytorch

CARAFE: Content-Aware ReAssembly of FEatures

Title CARAFE: Content-Aware ReAssembly of FEatures
Authors Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin
Abstract Feature upsampling is a key operation in a number of modern convolutional network architectures, e.g. feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE has several appealing properties: (1) Large field of view. Unlike previous works (e.g. bilinear interpolation) that only exploit sub-pixel neighborhood, CARAFE can aggregate contextual information within a large receptive field. (2) Content-aware handling. Instead of using a fixed kernel for all samples (e.g. deconvolution), CARAFE enables instance-specific content-aware handling, which generates adaptive kernels on-the-fly. (3) Lightweight and fast to compute. CARAFE introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and inpainting. CARAFE shows consistent and substantial gains across all the tasks (1.2%, 1.3%, 1.8%, 1.1db respectively) with negligible computational overhead. It has great potential to serve as a strong building block for future research. It has great potential to serve as a strong building block for future research. Code and models are available at https://github.com/open-mmlab/mmdetection.
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2019-05-06
URL https://arxiv.org/abs/1905.02188v3
PDF https://arxiv.org/pdf/1905.02188v3.pdf
PWC https://paperswithcode.com/paper/carafe-content-aware-reassembly-of-features
Repo https://github.com/smallsunsun1/custom_ops
Framework tf

Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis

Title Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis
Authors Feiyang Chen, Ziqian Luo, Yanyan Xu, Dengfeng Ke
Abstract Sentiment analysis, mostly based on text, has been rapidly developing in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion strategy including both multi-feature fusion and multi-modality fusion to improve the accuracy of audio-text sentiment analysis. We call it the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, which consists of two parallel branches, the audio modality based branch and the text modality based branch. Its core mechanisms are the fusion of multiple feature vectors and multiple modality attention. Experiments on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, both collected from YouTube for sentiment analysis, show the very competitive results of our DFF-ATMF model. Furthermore, by virtue of attention weight distribution heatmaps, we also demonstrate the deep features learned by using DFF-ATMF are complementary to each other and robust. Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition.
Tasks Emotion Recognition, Multimodal Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis
Published 2019-04-17
URL https://arxiv.org/abs/1904.08138v5
PDF https://arxiv.org/pdf/1904.08138v5.pdf
PWC https://paperswithcode.com/paper/audio-text-sentiment-analysis-using-deep
Repo https://github.com/Eurus-Holmes/DFF-ATMF
Framework none

Domain-invariant Stereo Matching Networks

Title Domain-invariant Stereo Matching Networks
Authors Feihu Zhang, Xiaojuan Qi, Ruigang Yang, Victor Prisacariu, Benjamin Wah, Philip Torr
Abstract State-of-the-art stereo matching networks have difficulties in generalizing to new unseen environments due to significant domain differences, such as color, illumination, contrast, and texture. In this paper, we aim at designing a domain-invariant stereo matching network (DSMNet) that generalizes well to unseen scenes. To achieve this goal, we propose i) a novel “domain normalization” approach that regularizes the distribution of learned representations to allow them to be invariant to domain differences, and ii) a trainable non-local graph-based filter for extracting robust structural and geometric representations that can further enhance domain-invariant generalizations. When trained on synthetic data and generalized to real test sets, our model performs significantly better than all state-of-the-art models. It even outperforms some deep learning models (e.g. MC-CNN) fine-tuned with test-domain data.
Tasks Stereo Matching
Published 2019-11-29
URL https://arxiv.org/abs/1911.13287v1
PDF https://arxiv.org/pdf/1911.13287v1.pdf
PWC https://paperswithcode.com/paper/domain-invariant-stereo-matching-networks
Repo https://github.com/feihuzhang/DSMNet
Framework pytorch

Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

Title Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
Authors Sam Ade Jacobs, Brian Van Essen, David Hysom, Jae-Seung Yeom, Tim Moon, Rushil Anirudh, Jayaraman J. Thiagaranjan, Shusen Liu, Peer-Timo Bremer, Jim Gaffney, Tom Benson, Peter Robinson, Luc Peterson, Brian Spears
Abstract Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train traditional as well as generative adversarial networks built on LBANN, a scalable deep learning framework optimized for HPC systems. LBANN combines multiple levels of parallelism and exploits some of the worlds largest supercomputers. We demonstrate our framework by creating a complex predictive model based on multi-variate data from high-energy-density physics containing hundreds of millions of images and hundreds of millions of scalar values derived from tens of millions of simulations of inertial confinement fusion. Our approach combines an HPC workflow and extends LBANN with optimized data ingestion and the new tournament-style training algorithm to produce a scalable neural network architecture using a CORAL-class supercomputer. Experimental results show that 64 trainers (1024 GPUs) achieve a speedup of 70.2 over a single trainer (16 GPUs) baseline, and an effective 109% parallel efficiency.
Tasks
Published 2019-10-05
URL https://arxiv.org/abs/1910.02270v1
PDF https://arxiv.org/pdf/1910.02270v1.pdf
PWC https://paperswithcode.com/paper/parallelizing-training-of-deep-generative
Repo https://github.com/rushilanirudh/macc
Framework tf

Neural Stored-program Memory

Title Neural Stored-program Memory
Authors Hung Le, Truyen Tran, Svetha Venkatesh
Abstract Neural networks powered with external memory simulate computer behaviors. These models, which use the memory to store data for a neural controller, can learn algorithms and other complex tasks. In this paper, we introduce a new memory to store weights for the controller, analogous to the stored-program memory in modern computer architectures. The proposed model, dubbed Neural Stored-program Memory, augments current memory-augmented neural networks, creating differentiable machines that can switch programs through time, adapt to variable contexts and thus resemble the Universal Turing Machine. A wide range of experiments demonstrate that the resulting machines not only excel in classical algorithmic problems, but also have potential for compositional, continual, few-shot learning and question-answering tasks.
Tasks Few-Shot Learning, Question Answering
Published 2019-05-25
URL https://arxiv.org/abs/1906.08862v2
PDF https://arxiv.org/pdf/1906.08862v2.pdf
PWC https://paperswithcode.com/paper/neural-stored-program-memory
Repo https://github.com/thaihungle/NSM
Framework pytorch

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Title DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Authors Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan
Abstract We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.
Tasks Conversational Response Generation
Published 2019-11-01
URL https://arxiv.org/abs/1911.00536v1
PDF https://arxiv.org/pdf/1911.00536v1.pdf
PWC https://paperswithcode.com/paper/dialogpt-large-scale-generative-pre-training
Repo https://github.com/microsoft/DialoGPT
Framework pytorch

DFNets: Spectral CNNs for Graphs with Feedback-Looped Filters

Title DFNets: Spectral CNNs for Graphs with Feedback-Looped Filters
Authors Asiri Wijesinghe, Qing Wang
Abstract We propose a novel spectral convolutional neural network (CNN) model on graph structured data, namely Distributed Feedback-Looped Networks (DFNets). This model is incorporated with a robust class of spectral graph filters, called feedback-looped filters, to provide better localization on vertices, while still attaining fast convergence and linear memory requirements. Theoretically, feedback-looped filters can guarantee convergence w.r.t. a specified error bound, and be applied universally to any graph without knowing its structure. Furthermore, the propagation rule of this model can diversify features from the preceding layers to produce strong gradient flows. We have evaluated our model using two benchmark tasks: semi-supervised document classification on citation networks and semi-supervised entity classification on a knowledge graph. The experimental results show that our model considerably outperforms the state-of-the-art methods in both benchmark tasks over all datasets.
Tasks Document Classification, Node Classification
Published 2019-10-24
URL https://arxiv.org/abs/1910.10866v5
PDF https://arxiv.org/pdf/1910.10866v5.pdf
PWC https://paperswithcode.com/paper/dfnets-spectral-cnns-for-graphs-with-feedback
Repo https://github.com/wokas36/DFNets
Framework tf

Ensembles of Locally Independent Prediction Models

Title Ensembles of Locally Independent Prediction Models
Authors Andrew Slavin Ross, Weiwei Pan, Leo Anthony Celi, Finale Doshi-Velez
Abstract Ensembles depend on diversity for improved performance. Many ensemble training methods, therefore, attempt to optimize for diversity, which they almost always define in terms of differences in training set predictions. In this paper, however, we demonstrate the diversity of predictions on the training set does not necessarily imply diversity under mild covariate shift, which can harm generalization in practical settings. To address this issue, we introduce a new diversity metric and associated method of training ensembles of models that extrapolate differently on local patches of the data manifold. Across a variety of synthetic and real-world tasks, we find that our method improves generalization and diversity in qualitatively novel ways, especially under data limits and covariate shift.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01291v3
PDF https://arxiv.org/pdf/1911.01291v3.pdf
PWC https://paperswithcode.com/paper/ensembles-of-locally-independent-prediction
Repo https://github.com/dtak/lit
Framework tf

On Finding Gray Pixels

Title On Finding Gray Pixels
Authors Yanlin Qian, Joni-Kristian Kämäräinen, Jarno Nikkanen, Jiri Matas
Abstract We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. The grayness index, GI in short, is derived using the Dichromatic Reflection Model and is learning-free. GI allows to estimate one or multiple illumination sources in color-biased images. On standard single-illumination and multiple-illumination estimation benchmarks, GI outperforms state-of-the-art statistical methods and many recent deep methods. GI is simple and fast, written in a few dozen lines of code, processing a 1080p image in ~0.4 seconds with a non-optimized Matlab code.
Tasks
Published 2019-01-09
URL https://arxiv.org/abs/1901.03198v3
PDF https://arxiv.org/pdf/1901.03198v3.pdf
PWC https://paperswithcode.com/paper/on-finding-gray-pixels
Repo https://github.com/yanlinqian/Grayness-Index
Framework none
comments powered by Disqus