October 20, 2019

3157 words 15 mins read

Paper Group AWR 325

Paper Group AWR 325

Reverse Attention for Salient Object Detection. Predicting Twitter User Socioeconomic Attributes with Network and Language Information. DeepIM: Deep Iterative Matching for 6D Pose Estimation. Debugging Neural Machine Translations. CapsuleGAN: Generative Adversarial Capsule Network. PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation. A Survey …

Reverse Attention for Salient Object Detection

Title Reverse Attention for Salient Object Detection
Authors Shuhan Chen, Xiuli Tan, Ben Wang, Xuelong Hu
Abstract Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).
Tasks Object Detection, Saliency Prediction, Salient Object Detection
Published 2018-07-26
URL http://arxiv.org/abs/1807.09940v2
PDF http://arxiv.org/pdf/1807.09940v2.pdf
PWC https://paperswithcode.com/paper/reverse-attention-for-salient-object
Repo https://github.com/lhaof/fast-salient-object-detection
Framework none

Predicting Twitter User Socioeconomic Attributes with Network and Language Information

Title Predicting Twitter User Socioeconomic Attributes with Network and Language Information
Authors Nikolaos Aletras, Benjamin Paul Chamberlain
Abstract Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users’ social networks has not yet been explored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary.
Tasks Recommendation Systems
Published 2018-04-11
URL http://arxiv.org/abs/1804.04095v1
PDF http://arxiv.org/pdf/1804.04095v1.pdf
PWC https://paperswithcode.com/paper/predicting-twitter-user-socioeconomic
Repo https://github.com/melifluos/income-prediction
Framework none

DeepIM: Deep Iterative Matching for 6D Pose Estimation

Title DeepIM: Deep Iterative Matching for 6D Pose Estimation
Authors Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, Dieter Fox
Abstract Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the observed image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using an untangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.
Tasks 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published 2018-03-31
URL https://arxiv.org/abs/1804.00175v4
PDF https://arxiv.org/pdf/1804.00175v4.pdf
PWC https://paperswithcode.com/paper/deepim-deep-iterative-matching-for-6d-pose
Repo https://github.com/liyi14/mx-DeepIM
Framework mxnet

Debugging Neural Machine Translations

Title Debugging Neural Machine Translations
Authors Matīss Rikters
Abstract In this paper, we describe a tool for debugging the output and attention weights of neural machine translation (NMT) systems and for improved estimations of confidence about the output based on the attention. The purpose of the tool is to help researchers and developers find weak and faulty example translations that their NMT systems produce without the need for reference translations. Our tool also includes an option to directly compare translation outputs from two different NMT engines or experiments. In addition, we present a demo website of our tool with examples of good and bad translations: http://attention.lielakeda.lv
Tasks Machine Translation
Published 2018-08-08
URL http://arxiv.org/abs/1808.02733v1
PDF http://arxiv.org/pdf/1808.02733v1.pdf
PWC https://paperswithcode.com/paper/debugging-neural-machine-translations
Repo https://github.com/M4t1ss/SoftAlignments
Framework tf

CapsuleGAN: Generative Adversarial Capsule Network

Title CapsuleGAN: Generative Adversarial Capsule Network
Authors Ayush Jaiswal, Wael AbdAlmageed, Yue Wu, Premkumar Natarajan
Abstract We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutional-GAN at modeling image data distribution on MNIST and CIFAR-10 datasets, evaluated on the generative adversarial metric and at semi-supervised image classification.
Tasks Image Classification, Semi-Supervised Image Classification
Published 2018-02-17
URL http://arxiv.org/abs/1802.06167v7
PDF http://arxiv.org/pdf/1802.06167v7.pdf
PWC https://paperswithcode.com/paper/capsulegan-generative-adversarial-capsule
Repo https://github.com/CPUFronz/CapsVoxGAN
Framework pytorch

PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation

Title PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation
Authors Sida Peng, Yuan Liu, Qixing Huang, Hujun Bao, Xiaowei Zhou
Abstract This paper addresses the challenge of 6DoF pose estimation from a single RGB image under severe occlusion or truncation. Many recent works have shown that a two-stage approach, which first detects keypoints and then solves a Perspective-n-Point (PnP) problem for pose estimation, achieves remarkable performance. However, most of these methods only localize a set of sparse keypoints by regressing their image coordinates or heatmaps, which are sensitive to occlusion and truncation. Instead, we introduce a Pixel-wise Voting Network (PVNet) to regress pixel-wise unit vectors pointing to the keypoints and use these vectors to vote for keypoint locations using RANSAC. This creates a flexible representation for localizing occluded or truncated keypoints. Another important feature of this representation is that it provides uncertainties of keypoint locations that can be further leveraged by the PnP solver. Experiments show that the proposed approach outperforms the state of the art on the LINEMOD, Occlusion LINEMOD and YCB-Video datasets by a large margin, while being efficient for real-time pose estimation. We further create a Truncation LINEMOD dataset to validate the robustness of our approach against truncation. The code will be avaliable at https://zju-3dv.github.io/pvnet/.
Tasks 6D Pose Estimation using RGB, Pose Estimation
Published 2018-12-31
URL http://arxiv.org/abs/1812.11788v1
PDF http://arxiv.org/pdf/1812.11788v1.pdf
PWC https://paperswithcode.com/paper/pvnet-pixel-wise-voting-network-for-6dof-pose
Repo https://github.com/zju3dv/pvnet
Framework pytorch

A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

Title A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task
Authors Josef Michalek, Jan Vanek
Abstract In this survey paper, we have evaluated several recent deep neural network (DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition (LVCSR) task. In recent years, many DNN published papers reported results on TIMIT. However, the reported phone error rates (PERs) were often much higher than a PER of a simple feed-forward (FF) DNN. That was the main motivation of this paper: To provide a baseline DNNs with open-source scripts to easily replicate the baseline results for future papers with lowest possible PERs. According to our knowledge, the best-achieved PER of this survey is better than the best-published PER to date.
Tasks Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published 2018-06-19
URL http://arxiv.org/abs/1806.07974v1
PDF http://arxiv.org/pdf/1806.07974v1.pdf
PWC https://paperswithcode.com/paper/a-survey-of-recent-dnn-architectures-on-the
Repo https://github.com/OrcusCZ/NNAcousticModeling
Framework none

DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation

Title DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation
Authors Bharath Bhushan Damodaran, Benjamin Kellenberger, Rémi Flamary, Devis Tuia, Nicolas Courty
Abstract In computer vision, one is often confronted with problems of domain shifts, which occur when one applies a classifier trained on a source dataset to target data sharing similar characteristics (e.g. same classes), but also different latent data structures (e.g. different acquisition conditions). In such a situation, the model will perform poorly on the new data, since the classifier is specialized to recognize visual cues specific to the source domain. In this work we explore a solution, named DeepJDOT, to tackle this problem: through a measure of discrepancy on joint deep representations/labels based on optimal transport, we not only learn new data representations aligned between the source and target domain, but also simultaneously preserve the discriminative information used by the classifier. We applied DeepJDOT to a series of visual recognition tasks, where it compares favorably against state-of-the-art deep domain adaptation methods.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2018-03-27
URL http://arxiv.org/abs/1803.10081v3
PDF http://arxiv.org/pdf/1803.10081v3.pdf
PWC https://paperswithcode.com/paper/deepjdot-deep-joint-distribution-optimal
Repo https://github.com/bbdamodaran/deepJDOT
Framework tf

Estimating 6D Pose From Localizing Designated Surface Keypoints

Title Estimating 6D Pose From Localizing Designated Surface Keypoints
Authors Zelin Zhao, Gao Peng, Haoyu Wang, Hao-Shu Fang, Chengkun Li, Cewu Lu
Abstract In this paper, we present an accurate yet effective solution for 6D pose estimation from an RGB image. The core of our approach is that we first designate a set of surface points on target object model as keypoints and then train a keypoint detector (KPD) to localize them. Finally a PnP algorithm can recover the 6D pose according to the 2D-3D relationship of keypoints. Different from recent state-of-the-art CNN-based approaches that rely on a time-consuming post-processing procedure, our method can achieve competitive accuracy without any refinement after pose prediction. Meanwhile, we obtain a 30% relative improvement in terms of ADD accuracy among methods without using refinement. Moreover, we succeed in handling heavy occlusion by selecting the most confident keypoints to recover the 6D pose. For the sake of reproducibility, we will make our code and models publicly available soon.
Tasks 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation, Pose Prediction
Published 2018-12-04
URL http://arxiv.org/abs/1812.01387v1
PDF http://arxiv.org/pdf/1812.01387v1.pdf
PWC https://paperswithcode.com/paper/estimating-6d-pose-from-localizing-designated
Repo https://github.com/why2011btv/6d_pose_estimation
Framework pytorch

Efficient Neural Audio Synthesis

Title Efficient Neural Audio Synthesis
Authors Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu
Abstract Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality. We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24kHz 16-bit audio 4x faster than real time on a GPU. Second, we apply a weight pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds for sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time. Finally, we propose a new generation scheme based on subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple samples at once. The Subscale WaveRNN produces 16 samples per step without loss of quality and offers an orthogonal method for increasing sampling efficiency.
Tasks Speech Synthesis, Text-To-Speech Synthesis
Published 2018-02-23
URL http://arxiv.org/abs/1802.08435v2
PDF http://arxiv.org/pdf/1802.08435v2.pdf
PWC https://paperswithcode.com/paper/efficient-neural-audio-synthesis
Repo https://github.com/CorentinJ/Real-Time-Voice-Cloning
Framework tf

Deep Part Induction from Articulated Object Pairs

Title Deep Part Induction from Articulated Object Pairs
Authors Li Yi, Haibin Huang, Difan Liu, Evangelos Kalogerakis, Hao Su, Leonidas Guibas
Abstract Object functionality is often expressed through part articulation – as when the two rigid parts of a scissor pivot against each other to perform the cutting function. Such articulations are often similar across objects within the same functional category. In this paper, we explore how the observation of different articulation states provides evidence for part structure and motion of 3D objects. Our method takes as input a pair of unsegmented shapes representing two different articulation states of two functionally related objects, and induces their common parts along with their underlying rigid motion. This is a challenging setting, as we assume no prior shape structure, no prior shape category information, no consistent shape orientation, the articulation states may belong to objects of different geometry, plus we allow inputs to be noisy and partial scans, or point clouds lifted from RGB images. Our method learns a neural network architecture with three modules that respectively propose correspondences, estimate 3D deformation flows, and perform segmentation. To achieve optimal performance, our architecture alternates between correspondence, deformation flow, and segmentation prediction iteratively in an ICP-like fashion. Our results demonstrate that our method significantly outperforms state-of-the-art techniques in the task of discovering articulated parts of objects. In addition, our part induction is object-class agnostic and successfully generalizes to new and unseen objects.
Tasks
Published 2018-09-19
URL http://arxiv.org/abs/1809.07417v1
PDF http://arxiv.org/pdf/1809.07417v1.pdf
PWC https://paperswithcode.com/paper/deep-part-induction-from-articulated-object
Repo https://github.com/ericyi/articulated-part-induction
Framework tf

DVAE++: Discrete Variational Autoencoders with Overlapping Transformations

Title DVAE++: Discrete Variational Autoencoders with Overlapping Transformations
Authors Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash
Abstract Training of discrete latent variable models remains challenging because passing gradient information through discrete units is difficult. We propose a new class of smoothing transformations based on a mixture of two overlapping distributions, and show that the proposed transformation can be used for training binary latent models with either directed or undirected priors. We derive a new variational bound to efficiently train with Boltzmann machine priors. Using this bound, we develop DVAE++, a generative model with a global discrete prior and a hierarchy of convolutional continuous variables. Experiments on several benchmarks show that overlapping transformations outperform other recent continuous relaxations of discrete latent variables including Gumbel-Softmax (Maddison et al., 2016; Jang et al., 2016), and discrete variational autoencoders (Rolfe 2016).
Tasks Latent Variable Models
Published 2018-02-14
URL http://arxiv.org/abs/1802.04920v2
PDF http://arxiv.org/pdf/1802.04920v2.pdf
PWC https://paperswithcode.com/paper/dvae-discrete-variational-autoencoders-with-1
Repo https://github.com/QuadrantAI/dvae
Framework tf

Learning Factorized Multimodal Representations

Title Learning Factorized Multimodal Representations
Authors Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, Ruslan Salakhutdinov
Abstract Learning multimodal representations is a fundamentally complex research problem due to the presence of multiple heterogeneous sources of information. Although the presence of multiple modalities provides additional valuable information, there are two key challenges to address when learning from multimodal data: 1) models must learn the complex intra-modal and cross-modal interactions for prediction and 2) models must be robust to unexpected missing or noisy modalities during testing. In this paper, we propose to optimize for a joint generative-discriminative objective across multimodal data and labels. We introduce a model that factorizes representations into two sets of independent factors: multimodal discriminative and modality-specific generative factors. Multimodal discriminative factors are shared across all modalities and contain joint multimodal features required for discriminative tasks such as sentiment prediction. Modality-specific generative factors are unique for each modality and contain the information required for generating data. Experimental results show that our model is able to learn meaningful multimodal representations that achieve state-of-the-art or competitive performance on six multimodal datasets. Our model demonstrates flexible generative capabilities by conditioning on independent factors and can reconstruct missing modalities without significantly impacting performance. Lastly, we interpret our factorized representations to understand the interactions that influence multimodal learning.
Tasks Representation Learning
Published 2018-06-16
URL https://arxiv.org/abs/1806.06176v3
PDF https://arxiv.org/pdf/1806.06176v3.pdf
PWC https://paperswithcode.com/paper/learning-factorized-multimodal
Repo https://github.com/pliang279/factorized
Framework pytorch

HOUDINI: Lifelong Learning as Program Synthesis

Title HOUDINI: Lifelong Learning as Program Synthesis
Authors Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, Swarat Chaudhuri
Abstract We present a neurosymbolic framework for the lifelong learning of algorithmic tasks that mix perception and procedural reasoning. Reusing high-level concepts across domains and learning complex procedures are key challenges in lifelong learning. We show that a program synthesis approach that combines gradient descent with combinatorial search over programs can be a more effective response to these challenges than purely neural methods. Our framework, called HOUDINI, represents neural networks as strongly typed, differentiable functional programs that use symbolic higher-order combinators to compose a library of neural functions. Our learning algorithm consists of: (1) a symbolic program synthesizer that performs a type-directed search over parameterized programs, and decides on the library functions to reuse, and the architectures to combine them, while learning a sequence of tasks; and (2) a neural module that trains these programs using stochastic gradient descent. We evaluate HOUDINI on three benchmarks that combine perception with the algorithmic tasks of counting, summing, and shortest-path computation. Our experiments show that HOUDINI transfers high-level concepts more effectively than traditional transfer learning and progressive neural networks, and that the typed representation of networks significantly accelerates the search.
Tasks Program Synthesis, Transfer Learning
Published 2018-03-31
URL http://arxiv.org/abs/1804.00218v2
PDF http://arxiv.org/pdf/1804.00218v2.pdf
PWC https://paperswithcode.com/paper/houdini-lifelong-learning-as-program
Repo https://github.com/capergroup/houdini
Framework pytorch

Simultaneous Coherent Structure Coloring facilitates interpretable clustering of scientific data by amplifying dissimilarity

Title Simultaneous Coherent Structure Coloring facilitates interpretable clustering of scientific data by amplifying dissimilarity
Authors Brooke E. Husic, Kristy L. Schlueter-Kuck, John O. Dabiri
Abstract The clustering of data into physically meaningful subsets often requires assumptions regarding the number, size, or shape of the subgroups. Here, we present a new method, simultaneous coherent structure coloring (sCSC), which accomplishes the task of unsupervised clustering without a priori guidance regarding the underlying structure of the data. sCSC performs a sequence of binary splittings on the dataset such that the most dissimilar data points are required to be in separate clusters. To achieve this, we obtain a set of orthogonal coordinates along which dissimilarity in the dataset is maximized from a generalized eigenvalue problem based on the pairwise dissimilarity between the data points to be clustered. This sequence of bifurcations produces a binary tree representation of the system, from which the number of clusters in the data and their interrelationships naturally emerge. To illustrate the effectiveness of the method in the absence of a priori assumptions, we apply it to three exemplary problems in fluid dynamics. Then, we illustrate its capacity for interpretability using a high-dimensional protein folding simulation dataset. While we restrict our examples to dynamical physical systems in this work, we anticipate straightforward translation to other fields where existing analysis tools require ad hoc assumptions on the data structure, lack the interpretability of the present method, or in which the underlying processes are less accessible, such as genomics and neuroscience.
Tasks
Published 2018-07-12
URL http://arxiv.org/abs/1807.04427v3
PDF http://arxiv.org/pdf/1807.04427v3.pdf
PWC https://paperswithcode.com/paper/simultaneous-coherent-structure-coloring
Repo https://github.com/brookehus/sCSC
Framework none
comments powered by Disqus