October 17, 2019

3124 words 15 mins read

Paper Group ANR 686

A Single-shot-per-pose Camera-Projector Calibration System For Imperfect Planar Targets. Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning. Which Facial Expressions Can Reveal Your Gender? A Study With 3D Faces. Feature learning based on visual similarity triplets in medical image analysis: A case study of emphysema …

A Single-shot-per-pose Camera-Projector Calibration System For Imperfect Planar Targets


Title	A Single-shot-per-pose Camera-Projector Calibration System For Imperfect Planar Targets
Authors	Bingyao Huang, Samed Ozdemir, Ying Tang, Chunyuan Liao, Haibin Ling
Abstract	Existing camera-projector calibration methods typically warp feature points from a camera image to a projector image using estimated homographies, and often suffer from errors in camera parameters and noise due to imperfect planarity of the calibration target. In this paper we propose a simple yet robust solution that explicitly deals with these challenges. Following the structured light (SL) camera-project calibration framework, a carefully designed correspondence algorithm is built on top of the De Bruijn patterns. Such correspondence is then used for initial camera-projector calibration. Then, to gain more robustness against noises, especially those from an imperfect planar calibration board, a bundle adjustment algorithm is developed to jointly optimize the estimated camera and projector models. Aside from the robustness, our solution requires only one shot of SL pattern for each calibration board pose, which is much more convenient than multi-shot solutions in practice. Data validations are conducted on both synthetic and real datasets, and our method shows clear advantages over existing methods in all experiments.
Tasks	Calibration
Published	2018-03-24
URL	http://arxiv.org/abs/1803.09058v2
PDF	http://arxiv.org/pdf/1803.09058v2.pdf
PWC	https://paperswithcode.com/paper/a-single-shot-per-pose-camera-projector
Repo
Framework

Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning


Title	Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
Authors	Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang
Abstract	Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities in the training corpus, and do not generalize to open vocabulary scenarios. Here we introduce a novel task, zero-shot video captioning, that aims at describing out-of-domain videos of unseen activities. Videos of different activities usually require different captioning strategies in many aspects, i.e. word selection, semantic construction, and style expression etc, which poses a great challenge to depict novel activities without paired training data. But meanwhile, similar activities share some of those aspects in common. Therefore, We propose a principled Topic-Aware Mixture of Experts (TAMoE) model for zero-shot video captioning, which learns to compose different experts based on different topic embeddings, implicitly transferring the knowledge learned from seen activities to unseen ones. Besides, we leverage external topic-related text corpus to construct the topic embedding for each activity, which embodies the most relevant semantic vectors within the topic. Empirical results not only validate the effectiveness of our method in utilizing semantic knowledge for video captioning, but also show its strong generalization ability when describing novel activities.
Tasks	Video Captioning
Published	2018-11-07
URL	http://arxiv.org/abs/1811.02765v2
PDF	http://arxiv.org/pdf/1811.02765v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-compose-topic-aware-mixture-of
Repo
Framework

Which Facial Expressions Can Reveal Your Gender? A Study With 3D Faces


Title	Which Facial Expressions Can Reveal Your Gender? A Study With 3D Faces
Authors	Baiqiang Xia
Abstract	Human exhibit rich gender cues in both appearance and behavior. In computer vision domain, gender recognition from facial appearance have been extensively studied, while facial behavior based gender recognition studies remain rare. In this work, we first demonstrate that facial expressions influence the gender patterns presented in 3D face, and gender recognition performance increases when training and testing within the same expression. In further, we design experiments which directly extract the morphological changes resulted from facial expressions as features, for expression-based gender recognition. Experimental results demonstrate that gender can be recognized with considerable accuracy in Happy and Disgust expressions, while Surprise and Sad expressions do not convey much gender related information. This is the first work in the literature which investigates expression-based gender classification with 3D faces, and reveals the strength of gender patterns incorporated in different types of expressions, namely the Happy, the Disgust, the Surprise and the Sad expressions.
Tasks
Published	2018-05-01
URL	http://arxiv.org/abs/1805.00371v1
PDF	http://arxiv.org/pdf/1805.00371v1.pdf
PWC	https://paperswithcode.com/paper/which-facial-expressions-can-reveal-your
Repo
Framework

Feature learning based on visual similarity triplets in medical image analysis: A case study of emphysema in chest CT scans


Title	Feature learning based on visual similarity triplets in medical image analysis: A case study of emphysema in chest CT scans
Authors	Silas Nyboe Ørting, Jens Petersen, Veronika Cheplygina, Laura H. Thomsen, Mathilde M W Wille, Marleen de Bruijne
Abstract	Supervised feature learning using convolutional neural networks (CNNs) can provide concise and disease relevant representations of medical images. However, training CNNs requires annotated image data. Annotating medical images can be a time-consuming task and even expert annotations are subject to substantial inter- and intra-rater variability. Assessing visual similarity of images instead of indicating specific pathologies or estimating disease severity could allow non-experts to participate, help uncover new patterns, and possibly reduce rater variability. We consider the task of assessing emphysema extent in chest CT scans. We derive visual similarity triplets from visually assessed emphysema extent and learn a low dimensional embedding using CNNs. We evaluate the networks on 973 images, and show that the CNNs can learn disease relevant feature representations from derived similarity triplets. To our knowledge this is the first medical image application where similarity triplets has been used to learn a feature representation that can be used for embedding unseen test images
Tasks
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07131v1
PDF	http://arxiv.org/pdf/1806.07131v1.pdf
PWC	https://paperswithcode.com/paper/feature-learning-based-on-visual-similarity
Repo
Framework


Title	Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings
Authors	Zhongwei Xie, Lin Li, Xian Zhong, Luo Zhong
Abstract	Image-to-video person re-identification identifies a target person by a probe image from quantities of pedestrian videos captured by non-overlapping cameras. Despite the great progress achieved,it’s still challenging to match in the multimodal scenario,i.e. between image and video. Currently,state-of-the-art approaches mainly focus on the task-specific data,neglecting the extra information on the different but related tasks. In this paper,we propose an end-to-end neural network framework for image-to-video person reidentification by leveraging cross-modal embeddings learned from extra information.Concretely speaking,cross-modal embeddings from image captioning and video captioning models are reused to help learned features be projected into a coordinated space,where similarity can be directly computed. Besides,training steps from fixed model reuse approach are integrated into our framework,which can incorporate beneficial information and eventually make the target networks independent of existing models. Apart from that,our proposed framework resorts to CNNs and LSTMs for extracting visual and spatiotemporal features,and combines the strengths of identification and verification model to improve the discriminative ability of the learned feature. The experimental results demonstrate the effectiveness of our framework on narrowing down the gap between heterogeneous data and obtaining observable improvement in image-to-video person re-identification.
Tasks	Image Captioning, Image-To-Video Person Re-Identification, Person Re-Identification, Video-Based Person Re-Identification, Video Captioning
Published	2018-10-04
URL	http://arxiv.org/abs/1810.03989v2
PDF	http://arxiv.org/pdf/1810.03989v2.pdf
PWC	https://paperswithcode.com/paper/image-to-video-person-re-identification-by
Repo
Framework

Fine-Grained Facial Expression Analysis Using Dimensional Emotion Model


Title	Fine-Grained Facial Expression Analysis Using Dimensional Emotion Model
Authors	Feng Zhou, Shu Kong, Charless Fowlkes, Tao Chen, Baiying Lei
Abstract	Automated facial expression analysis has a variety of applications in human-computer interaction. Traditional methods mainly analyze prototypical facial expressions of no more than eight discrete emotions as a classification task. However, in practice, spontaneous facial expressions in naturalistic environment can represent not only a wide range of emotions, but also different intensities within an emotion family. In such situation, these methods are not reliable or adequate. In this paper, we propose to train deep convolutional neural networks (CNNs) to analyze facial expressions explainable in a dimensional emotion model. The proposed method accommodates not only a set of basic emotion expressions, but also a full range of other emotions and subtle emotion intensities that we both feel in ourselves and perceive in others in our daily life. Specifically, we first mapped facial expressions into dimensional measures so that we transformed facial expression analysis from a classification problem to a regression one. We then tested our CNN-based methods for facial expression regression and these methods demonstrated promising performance. Moreover, we improved our method by a bilinear pooling which encodes second-order statistics of features. We showed such bilinear-CNN models significantly outperformed their respective baselines.
Tasks
Published	2018-05-02
URL	http://arxiv.org/abs/1805.01024v1
PDF	http://arxiv.org/pdf/1805.01024v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-facial-expression-analysis-using
Repo
Framework

Clustering-driven Deep Embedding with Pairwise Constraints


Title	Clustering-driven Deep Embedding with Pairwise Constraints
Authors	Sharon Fogel, Hadar Averbuch-Elor, Jacov Goldberger, Daniel Cohen-Or
Abstract	Recently, there has been increasing interest to leverage the competence of neural networks to analyze data. In particular, new clustering methods that employ deep embeddings have been presented. In this paper, we depart from centroid-based models and suggest a new framework, called Clustering-driven deep embedding with PAirwise Constraints (CPAC), for non-parametric clustering using a neural network. We present a clustering-driven embedding based on a Siamese network that encourages pairs of data points to output similar representations in the latent space. Our pair-based model allows augmenting the information with labeled pairs to constitute a semi-supervised framework. Our approach is based on analyzing the losses associated with each pair to refine the set of constraints. We show that clustering performance increases when using this scheme, even with a limited amount of user queries. We demonstrate how our architecture is adapted for various types of data and present the first deep framework to cluster 3D shapes.
Tasks
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08457v5
PDF	http://arxiv.org/pdf/1803.08457v5.pdf
PWC	https://paperswithcode.com/paper/clustering-driven-deep-embedding-with
Repo
Framework

Obligation and Prohibition Extraction Using Hierarchical RNNs


Title	Obligation and Prohibition Extraction Using Hierarchical RNNs
Authors	Ilias Chalkidis, Ion Androutsopoulos, Achilleas Michos
Abstract	We consider the task of detecting contractual obligations and prohibitions. We show that a self-attention mechanism improves the performance of a BILSTM classifier, the previous state of the art for this task, by allowing it to focus on indicative tokens. We also introduce a hierarchical BILSTM, which converts each sentence to an embedding, and processes the sentence embeddings to classify each sentence. Apart from being faster to train, the hierarchical BILSTM outperforms the flat one, even when the latter considers surrounding sentences, because the hierarchical model has a broader discourse view.
Tasks	Sentence Embeddings
Published	2018-05-10
URL	http://arxiv.org/abs/1805.03871v1
PDF	http://arxiv.org/pdf/1805.03871v1.pdf
PWC	https://paperswithcode.com/paper/obligation-and-prohibition-extraction-using
Repo
Framework

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning


Title	Learning a Prior over Intent via Meta-Inverse Reinforcement Learning
Authors	Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn
Abstract	A significant challenge for the practical application of reinforcement learning in the real world is the need to specify an oracle reward function that correctly defines a task. Inverse reinforcement learning (IRL) seeks to avoid this challenge by instead inferring a reward function from expert behavior. While appealing, it can be impractically expensive to collect datasets of demonstrations that cover the variation common in the real world (e.g. opening any type of door). Thus in practice, IRL must commonly be performed with only a limited set of demonstrations where it can be exceedingly difficult to unambiguously recover a reward function. In this work, we exploit the insight that demonstrations from other tasks can be used to constrain the set of possible reward functions by learning a “prior” that is specifically optimized for the ability to infer expressive reward functions from limited numbers of demonstrations. We demonstrate that our method can efficiently recover rewards from images for novel tasks and provide intuition as to how our approach is analogous to learning a prior.
Tasks
Published	2018-05-31
URL	https://arxiv.org/abs/1805.12573v5
PDF	https://arxiv.org/pdf/1805.12573v5.pdf
PWC	https://paperswithcode.com/paper/learning-a-prior-over-intent-via-meta-inverse
Repo
Framework

Product Kernel Interpolation for Scalable Gaussian Processes


Title	Product Kernel Interpolation for Scalable Gaussian Processes
Authors	Jacob R. Gardner, Geoff Pleiss, Ruihan Wu, Kilian Q. Weinberger, Andrew Gordon Wilson
Abstract	Recent work shows that inference for Gaussian processes can be performed efficiently using iterative methods that rely only on matrix-vector multiplications (MVMs). Structured Kernel Interpolation (SKI) exploits these techniques by deriving approximate kernels with very fast MVMs. Unfortunately, such strategies suffer badly from the curse of dimensionality. We develop a new technique for MVM based learning that exploits product kernel structure. We demonstrate that this technique is broadly applicable, resulting in linear rather than exponential runtime with dimension for SKI, as well as state-of-the-art asymptotic complexity for multi-task GPs.
Tasks	Gaussian Processes
Published	2018-02-24
URL	http://arxiv.org/abs/1802.08903v1
PDF	http://arxiv.org/pdf/1802.08903v1.pdf
PWC	https://paperswithcode.com/paper/product-kernel-interpolation-for-scalable
Repo
Framework

Vector Learning for Cross Domain Representations


Title	Vector Learning for Cross Domain Representations
Authors	Shagan Sah, Chi Zhang, Thang Nguyen, Dheeraj Kumar Peri, Ameya Shringi, Raymond Ptucha
Abstract	Recently, generative adversarial networks have gained a lot of popularity for image generation tasks. However, such models are associated with complex learning mechanisms and demand very large relevant datasets. This work borrows concepts from image and video captioning models to form an image generative framework. The model is trained in a similar fashion as recurrent captioning model and uses the learned weights for image generation. This is done in an inverse direction, where the input is a caption and the output is an image. The vector representation of the sentence and frames are extracted from an encoder-decoder model which is initially trained on similar sentence and image pairs. Our model conditions image generation on a natural language caption. We leverage a sequence-to-sequence model to generate synthetic captions that have the same meaning for having a robust image generation. One key advantage of our method is that the traditional image captioning datasets can be used for synthetic sentence paraphrases. Results indicate that images generated through multiple captions are better at capturing the semantic meaning of the family of captions.
Tasks	Image Captioning, Image Generation, Video Captioning
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10312v1
PDF	http://arxiv.org/pdf/1809.10312v1.pdf
PWC	https://paperswithcode.com/paper/vector-learning-for-cross-domain
Repo
Framework

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation


Title	On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation
Authors	Behzad Salami, Osman Unsal, Adrian Cristal
Abstract	Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.
Tasks
Published	2018-06-14
URL	https://arxiv.org/abs/1806.09679v1
PDF	https://arxiv.org/pdf/1806.09679v1.pdf
PWC	https://paperswithcode.com/paper/on-the-resilience-of-rtl-nn-accelerators
Repo
Framework

Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information


Title	Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information
Authors	Mario Giulianelli, Jack Harding, Florian Mohnert, Dieuwke Hupkes, Willem Zuidema
Abstract	How do neural language models keep track of number agreement between subject and verb? We show that `diagnostic classifiers’, trained to predict number from the internal states of a language model, provide a detailed understanding of how, when, and where this information is represented. Moreover, they give us insight into when and where number information is corrupted in cases where the language model ends up making agreement errors. To demonstrate the causal role played by the representations we find, we then use agreement information to influence the course of the LSTM during the processing of difficult sentences. Results from such an intervention reveal a large increase in the language model’s accuracy. Together, these results show that diagnostic classifiers give us an unrivalled detailed look into the representation of linguistic information in neural models, and demonstrate that this knowledge can be used to improve their performance. \|
Tasks	Language Modelling
Published	2018-08-24
URL	http://arxiv.org/abs/1808.08079v2
PDF	http://arxiv.org/pdf/1808.08079v2.pdf
PWC	https://paperswithcode.com/paper/under-the-hood-using-diagnostic-classifiers
Repo
Framework

A Bayesian framework for the analog reconstruction of kymographs from fluorescence microscopy data


Title	A Bayesian framework for the analog reconstruction of kymographs from fluorescence microscopy data
Authors	Denis K. Samuylov, Gábor Székely, Grégory Paul
Abstract	Kymographs are widely used to represent and anal- yse spatio-temporal dynamics of fluorescence markers along curvilinear biological compartments. These objects have a sin- gular geometry, thus kymograph reconstruction is inherently an analog image processing task. However, the existing approaches are essentially digital: the kymograph photometry is sampled directly from the time-lapse images. As a result, such kymographs rely on raw image data that suffer from the degradations entailed by the image formation process and the spatio-temporal resolution of the imaging setup. In this work, we address these limitations and introduce a well-grounded Bayesian framework for the analog reconstruction of kymographs. To handle the movement of the object, we introduce an intrinsic description of kymographs using differential geometry: a kymograph is a photometry defined on a parameter space that is embedded in physical space by a time-varying map that follows the object geometry. We model the kymograph photometry as a L'evy innovation process, a flexible class of non-parametric signal priors. We account for the image formation process using the virtual microscope framework. We formulate a computationally tractable representation of the associated maximum a posteriori problem and solve it using a class of efficient and modular algorithms based on the alternating split Bregman. We assess the performance of our Bayesian framework on synthetic data and apply it to reconstruct the fluorescence dynamics along microtubules in vivo in the budding yeast S. cerevisiae. We demonstrate that our framework allows revealing patterns from single time-lapse data that are invisible on standard digital kymographs.
Tasks
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01590v1
PDF	http://arxiv.org/pdf/1809.01590v1.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-framework-for-the-analog
Repo
Framework

Do GANs leave artificial fingerprints?


Title	Do GANs leave artificial fingerprints?
Authors	Francesco Marra, Diego Gragnaniello, Luisa Verdoliva, Giovanni Poggi
Abstract	In the last few years, generative adversarial networks (GAN) have shown tremendous potential for a number of applications in computer vision and related fields. With the current pace of progress, it is a sure bet they will soon be able to generate high-quality images and videos, virtually indistinguishable from real ones. Unfortunately, realistic GAN-generated images pose serious threats to security, to begin with a possible flood of fake multimedia, and multimedia forensic countermeasures are in urgent need. In this work, we show that each GAN leaves its specific fingerprint in the images it generates, just like real-world cameras mark acquired images with traces of their photo-response non-uniformity pattern. Source identification experiments with several popular GANs show such fingerprints to represent a precious asset for forensic analyses.
Tasks
Published	2018-12-31
URL	http://arxiv.org/abs/1812.11842v1
PDF	http://arxiv.org/pdf/1812.11842v1.pdf
PWC	https://paperswithcode.com/paper/do-gans-leave-artificial-fingerprints
Repo
Framework