July 27, 2019

3543 words 17 mins read

Paper Group ANR 658

Can Image Retrieval help Visual Saliency Detection?. Top-Down Saliency Detection Driven by Visual Classification. SMC Faster R-CNN: Toward a scene-specialized multi-object detector. Self-explanatory Deep Salient Object Detection. Similarity Function Tracking using Pairwise Comparisons. The Morphospace of Consciousness. KGAN: How to Break The Minima …

Can Image Retrieval help Visual Saliency Detection?


Title	Can Image Retrieval help Visual Saliency Detection?
Authors	Shuang Li, Peter Mathews
Abstract	We propose a novel image retrieval framework for visual saliency detection using information about salient objects contained within bounding box annotations for similar images. For each test image, we train a customized SVM from similar example images to predict the saliency values of its object proposals and generate an external saliency map (ES) by aggregating the regional scores. To overcome limitations caused by the size of the training dataset, we also propose an internal optimization module which computes an internal saliency map (IS) by measuring the low-level contrast information of the test image. The two maps, ES and IS, have complementary properties so we take a weighted combination to further improve the detection performance. Experimental results on several challenging datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
Tasks	Image Retrieval, Saliency Detection
Published	2017-09-24
URL	http://arxiv.org/abs/1709.08172v1
PDF	http://arxiv.org/pdf/1709.08172v1.pdf
PWC	https://paperswithcode.com/paper/can-image-retrieval-help-visual-saliency
Repo
Framework

Top-Down Saliency Detection Driven by Visual Classification


Title	Top-Down Saliency Detection Driven by Visual Classification
Authors	Francesca Murabito, Concetto Spampinato, Simone Palazzo, Konstantin Pogorelov, Michael Riegler
Abstract	This paper presents an approach for top-down saliency detection guided by visual classification tasks. We first learn how to compute visual saliency when a specific visual task has to be accomplished, as opposed to most state-of-the-art methods which assess saliency merely through bottom-up principles. Afterwards, we investigate if and to what extent visual saliency can support visual classification in nontrivial cases. To achieve this, we propose SalClassNet, a CNN framework consisting of two networks jointly trained: a) the first one computing top-down saliency maps from input images, and b) the second one exploiting the computed saliency maps for visual classification. To test our approach, we collected a dataset of eye-gaze maps, using a Tobii T60 eye tracker, by asking several subjects to look at images from the Stanford Dogs dataset, with the objective of distinguishing dog breeds. Performance analysis on our dataset and other saliency bench-marking datasets, such as POET, showed that SalClassNet out-performs state-of-the-art saliency detectors, such as SalNet and SALICON. Finally, we analyzed the performance of SalClassNet in a fine-grained recognition task and found out that it generalizes better than existing visual classifiers. The achieved results, thus, demonstrate that 1) conditioning saliency detectors with object classes reaches state-of-the-art performance, and 2) providing explicitly top-down saliency maps to visual classifiers enhances classification accuracy.
Tasks	Saliency Detection
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05307v3
PDF	http://arxiv.org/pdf/1709.05307v3.pdf
PWC	https://paperswithcode.com/paper/top-down-saliency-detection-driven-by-visual
Repo
Framework

SMC Faster R-CNN: Toward a scene-specialized multi-object detector


Title	SMC Faster R-CNN: Toward a scene-specialized multi-object detector
Authors	Ala Mhalla, Thierry Chateau, Houda Maamatou, Sami Gazzah, Najoua Essoukri Ben Amara
Abstract	Generally, the performance of a generic detector decreases significantly when it is tested on a specific scene due to the large variation between the source training dataset and the samples from the target scene. To solve this problem, we propose a new formalism of transfer learning based on the theory of a Sequential Monte Carlo (SMC) filter to automatically specialize a scene-specific Faster R-CNN detector. The suggested framework uses different strategies based on the SMC filter steps to approximate iteratively the target distribution as a set of samples in order to specialize the Faster R-CNN detector towards a target scene. Moreover, we put forward a likelihood function that combines spatio-temporal information extracted from the target video sequence and the confidence-score given by the output layer of the Faster R-CNN, to favor the selection of target samples associated with the right label. The effectiveness of the suggested framework is demonstrated through experiments on several public traffic datasets. Compared with the state-of-the-art specialization frameworks, the proposed framework presents encouraging results for both single and multi-traffic object detections.
Tasks	Transfer Learning
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10217v1
PDF	http://arxiv.org/pdf/1706.10217v1.pdf
PWC	https://paperswithcode.com/paper/smc-faster-r-cnn-toward-a-scene-specialized
Repo
Framework

Self-explanatory Deep Salient Object Detection


Title	Self-explanatory Deep Salient Object Detection
Authors	Huaxin Xiao, Jiashi Feng, Yunchao Wei, Maojun Zhang
Abstract	Salient object detection has seen remarkable progress driven by deep learning techniques. However, most of deep learning based salient object detection methods are black-box in nature and lacking in interpretability. This paper proposes the first self-explanatory saliency detection network that explicitly exploits low- and high-level features for salient object detection. We demonstrate that such supportive clues not only significantly enhances performance of salient object detection but also gives better justified detection results. More specifically, we develop a multi-stage saliency encoder to extract multi-scale features which contain both low- and high-level saliency context. Dense short- and long-range connections are introduced to reuse these features iteratively. Benefiting from the direct access to low- and high-level features, the proposed saliency encoder can not only model the object context but also preserve the boundary. Furthermore, a self-explanatory generator is proposed to interpret how the proposed saliency encoder or other deep saliency models making decisions. The generator simulates the absence of interesting features by preventing these features from contributing to the saliency classifier and estimates the corresponding saliency prediction without these features. A comparison function, saliency explanation, is defined to measure the prediction changes between deep saliency models and corresponding generator. Through visualizing the differences, we can interpret the capability of different deep neural networks based saliency detection models and demonstrate that our proposed model indeed uses more reasonable structure for salient object detection. Extensive experiments on five popular benchmark datasets and the visualized saliency explanation demonstrate that the proposed method provides new state-of-the-art.
Tasks	Object Detection, Saliency Detection, Saliency Prediction, Salient Object Detection
Published	2017-08-18
URL	http://arxiv.org/abs/1708.05595v1
PDF	http://arxiv.org/pdf/1708.05595v1.pdf
PWC	https://paperswithcode.com/paper/self-explanatory-deep-salient-object
Repo
Framework

Similarity Function Tracking using Pairwise Comparisons


Title	Similarity Function Tracking using Pairwise Comparisons
Authors	Kristjan Greenewald, Stephen Kelley, Brandon Oselio, Alfred O. Hero III
Abstract	Recent work in distance metric learning has focused on learning transformations of data that best align with specified pairwise similarity and dissimilarity constraints, often supplied by a human observer. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we address the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes in the feature subspaces in which the class structure is apparent. We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD), a general adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We apply the OCELAD framework to an ensemble of online learners. Specifically, we create a retro-initialized composite objective mirror descent (COMID) ensemble (RICE) consisting of a set of parallel COMID learners with different learning rates, and demonstrate parameter-free RICE-OCELAD metric learning on both synthetic data and a highly nonstationary Twitter dataset. We show significant performance improvements and increased robustness to nonstationary effects relative to previously proposed batch and online distance metric learning algorithms.
Tasks	Metric Learning
Published	2017-01-07
URL	http://arxiv.org/abs/1701.02804v1
PDF	http://arxiv.org/pdf/1701.02804v1.pdf
PWC	https://paperswithcode.com/paper/similarity-function-tracking-using-pairwise
Repo
Framework

The Morphospace of Consciousness


Title	The Morphospace of Consciousness
Authors	Xerxes D. Arsiwalla, Ricard Sole, Clement Moulin-Frier, Ivan Herreros, Marti Sanchez-Fibla, Paul Verschure
Abstract	We construct a complexity-based morphospace to study systems-level properties of conscious & intelligent systems. The axes of this space label 3 complexity types: autonomous, cognitive & social. Given recent proposals to synthesize consciousness, a generic complexity-based conceptualization provides a useful framework for identifying defining features of conscious & synthetic systems. Based on current clinical scales of consciousness that measure cognitive awareness and wakefulness, we take a perspective on how contemporary artificially intelligent machines & synthetically engineered life forms measure on these scales. It turns out that awareness & wakefulness can be associated to computational & autonomous complexity respectively. Subsequently, building on insights from cognitive robotics, we examine the function that consciousness serves, & argue the role of consciousness as an evolutionary game-theoretic strategy. This makes the case for a third type of complexity for describing consciousness: social complexity. Having identified these complexity types, allows for a representation of both, biological & synthetic systems in a common morphospace. A consequence of this classification is a taxonomy of possible conscious machines. We identify four types of consciousness, based on embodiment: (i) biological consciousness, (ii) synthetic consciousness, (iii) group consciousness (resulting from group interactions), & (iv) simulated consciousness (embodied by virtual agents within a simulated reality). This taxonomy helps in the investigation of comparative signatures of consciousness across domains, in order to highlight design principles necessary to engineer conscious machines. This is particularly relevant in the light of recent developments at the crossroads of cognitive neuroscience, biomedical engineering, artificial intelligence & biomimetics.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1705.11190v3
PDF	http://arxiv.org/pdf/1705.11190v3.pdf
PWC	https://paperswithcode.com/paper/the-morphospace-of-consciousness
Repo
Framework

KGAN: How to Break The Minimax Game in GAN


Title	KGAN: How to Break The Minimax Game in GAN
Authors	Trung Le, Tu Dinh Nguyen, Dinh Phung
Abstract	Generative Adversarial Networks (GANs) were intuitively and attractively explained under the perspective of game theory, wherein two involving parties are a discriminator and a generator. In this game, the task of the discriminator is to discriminate the real and generated (i.e., fake) data, whilst the task of the generator is to generate the fake data that maximally confuses the discriminator. In this paper, we propose a new viewpoint for GANs, which is termed as the minimizing general loss viewpoint. This viewpoint shows a connection between the general loss of a classification problem regarding a convex loss function and a f-divergence between the true and fake data distributions. Mathematically, we proposed a setting for the classification problem of the true and fake data, wherein we can prove that the general loss of this classification problem is exactly the negative f-divergence for a certain convex function f. This allows us to interpret the problem of learning the generator for dismissing the f-divergence between the true and fake data distributions as that of maximizing the general loss which is equivalent to the min-max problem in GAN if the Logistic loss is used in the classification problem. However, this viewpoint strengthens GANs in two ways. First, it allows us to employ any convex loss function for the discriminator. Second, it suggests that rather than limiting ourselves in NN-based discriminators, we can alternatively utilize other powerful families. Bearing this viewpoint, we then propose using the kernel-based family for discriminators. This family has two appealing features: i) a powerful capacity in classifying non-linear nature data and ii) being convex in the feature space. Using the convexity of this family, we can further develop Fenchel duality to equivalently transform the max-min problem to the max-max dual problem.
Tasks
Published	2017-11-06
URL	http://arxiv.org/abs/1711.01744v1
PDF	http://arxiv.org/pdf/1711.01744v1.pdf
PWC	https://paperswithcode.com/paper/kgan-how-to-break-the-minimax-game-in-gan
Repo
Framework

Learning Deep Representations of Medical Images using Siamese CNNs with Application to Content-Based Image Retrieval


Title	Learning Deep Representations of Medical Images using Siamese CNNs with Application to Content-Based Image Retrieval
Authors	Yu-An Chung, Wei-Hung Weng
Abstract	Deep neural networks have been investigated in learning latent representations of medical images, yet most of the studies limit their approach in a single supervised convolutional neural network (CNN), which usually rely heavily on a large scale annotated dataset for training. To learn image representations with less supervision involved, we propose a deep Siamese CNN (SCNN) architecture that can be trained with only binary image pair information. We evaluated the learned image representations on a task of content-based medical image retrieval using a publicly available multiclass diabetic retinopathy fundus image dataset. The experimental results show that our proposed deep SCNN is comparable to the state-of-the-art single supervised CNN, and requires much less supervision for training.
Tasks	Content-Based Image Retrieval, Image Retrieval, Medical Image Retrieval
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08490v2
PDF	http://arxiv.org/pdf/1711.08490v2.pdf
PWC	https://paperswithcode.com/paper/learning-deep-representations-of-medical
Repo
Framework

Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database


Title	Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database
Authors	Adriana Fernandez-Lopez, Oriol Martinez, Federico M. Sukno
Abstract	Speech is the most used communication method between humans and it involves the perception of auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, although the video can provide information that is complementary to the audio. Exploiting the visual information, however, has proven challenging. On one hand, researchers have reported that the mapping between phonemes and visemes (visual units) is one-to-many because there are phonemes which are visually similar and indistinguishable between them. On the other hand, it is known that some people are very good lip-readers (e.g: deaf people). We study the limit of visual only speech recognition in controlled conditions. With this goal, we designed a new database in which the speakers are aware of being read and aim to facilitate lip-reading. In the literature, there are discrepancies on whether hearing-impaired people are better lip-readers than normal-hearing people. Then, we analyze if there are differences between the lip-reading abilities of 9 hearing-impaired and 15 normal-hearing people. Finally, human abilities are compared with the performance of a visual automatic speech recognition system. In our tests, hearing-impaired participants outperformed the normal-hearing participants but without reaching statistical significance. Human observers were able to decode 44% of the spoken message. In contrast, the visual only automatic system achieved 20% of word recognition rate. However, if we repeat the comparison in terms of phonemes both obtained very similar recognition rates, just above 50%. This suggests that the gap between human lip-reading and automatic speech-reading might be more related to the use of context than to the ability to interpret mouth appearance.
Tasks	Speech Recognition, Visual Speech Recognition
Published	2017-04-26
URL	http://arxiv.org/abs/1704.08028v1
PDF	http://arxiv.org/pdf/1704.08028v1.pdf
PWC	https://paperswithcode.com/paper/towards-estimating-the-upper-bound-of-visual
Repo
Framework

Robust Face Tracking using Multiple Appearance Models and Graph Relational Learning


Title	Robust Face Tracking using Multiple Appearance Models and Graph Relational Learning
Authors	Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger
Abstract	This paper addresses the problem of appearance matching across different challenges while doing visual face tracking in real-world scenarios. In this paper, FaceTrack is proposed that utilizes multiple appearance models with its long-term and short-term appearance memory for efficient face tracking. It demonstrates robustness to deformation, in-plane and out-of-plane rotation, scale, distractors and background clutter. It capitalizes on the advantages of the tracking-by-detection, by using a face detector that tackles drastic scale appearance change of a face. The detector also helps to reinitialize FaceTrack during drift. A weighted score-level fusion strategy is proposed to obtain the face tracking output having the highest fusion score by generating candidates around possible face locations. The tracker showcases impressive performance when initiated automatically by outperforming many state-of-the-art trackers, except Struck by a very minute margin: 0.001 in precision and 0.017 in success respectively.
Tasks	Relational Reasoning
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09806v2
PDF	http://arxiv.org/pdf/1706.09806v2.pdf
PWC	https://paperswithcode.com/paper/robust-face-tracking-using-multiple
Repo
Framework

3D Deep Learning for Biological Function Prediction from Physical Fields


Title	3D Deep Learning for Biological Function Prediction from Physical Fields
Authors	Vladimir Golkov, Marcin J. Skwark, Atanas Mirchev, Georgi Dikov, Alexander R. Geanes, Jeffrey Mendenhall, Jens Meiler, Daniel Cremers
Abstract	Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic potential field of a molecule contain the “raw fingerprint” of how this molecule can fit to binding partners. In this paper, we show that deep learning can predict biological function of molecules directly from their raw 3D approximated electron density and electrostatic potential fields. Protein function based on EC numbers is predicted from the approximated electron density field. In another experiment, the activity of small molecules is predicted with quality comparable to state-of-the-art descriptor-based methods. We propose several alternative computational models for the GPU with different memory and runtime requirements for different sizes of molecules and of databases. We also propose application-specific multi-channel data representations. With future improvements of training datasets and neural network settings in combination with complementary information sources (sequence, genomic context, expression level), deep learning can be expected to show its generalization power and revolutionize the field of molecular function prediction.
Tasks
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04039v1
PDF	http://arxiv.org/pdf/1704.04039v1.pdf
PWC	https://paperswithcode.com/paper/3d-deep-learning-for-biological-function
Repo
Framework

Sketch-pix2seq: a Model to Generate Sketches of Multiple Categories


Title	Sketch-pix2seq: a Model to Generate Sketches of Multiple Categories
Authors	Yajing Chen, Shikui Tu, Yuqi Yi, Lei Xu
Abstract	Sketch is an important media for human to communicate ideas, which reflects the superiority of human intelligence. Studies on sketch can be roughly summarized into recognition and generation. Existing models on image recognition failed to obtain satisfying performance on sketch classification. But for sketch generation, a recent study proposed a sequence-to-sequence variational-auto-encoder (VAE) model called sketch-rnn which was able to generate sketches based on human inputs. The model achieved amazing results when asked to learn one category of object, such as an animal or a vehicle. However, the performance dropped when multiple categories were fed into the model. Here, we proposed a model called sketch-pix2seq which could learn and draw multiple categories of sketches. Two modifications were made to improve the sketch-rnn model: one is to replace the bidirectional recurrent neural network (BRNN) encoder with a convolutional neural network(CNN); the other is to remove the Kullback-Leibler divergence from the objective function of VAE. Experimental results showed that models with CNN encoders outperformed those with RNN encoders in generating human-style sketches. Visualization of the latent space illustrated that the removal of KL-divergence made the encoder learn a posterior of latent space that reflected the features of different categories. Moreover, the combination of CNN encoder and removal of KL-divergence, i.e., the sketch-pix2seq model, had better performance in learning and generating sketches of multiple categories and showed promising results in creativity tasks.
Tasks
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04121v1
PDF	http://arxiv.org/pdf/1709.04121v1.pdf
PWC	https://paperswithcode.com/paper/sketch-pix2seq-a-model-to-generate-sketches
Repo
Framework

Be Your Own Prada: Fashion Synthesis with Structural Coherence


Title	Be Your Own Prada: Fashion Synthesis with Structural Coherence
Authors	Shizhan Zhu, Sanja Fidler, Raquel Urtasun, Dahua Lin, Chen Change Loy
Abstract	We present a novel and effective approach for generating new clothing on a wearer through generative adversarial learning. Given an input image of a person and a sentence describing a different outfit, our model “redresses” the person as desired, while at the same time keeping the wearer and her/his pose unchanged. Generating new outfits with precise regions conforming to a language description while retaining wearer’s body structure is a new challenging task. Existing generative adversarial networks are not ideal in ensuring global coherence of structure given both the input photograph and language description as conditions. We address this challenge by decomposing the complex generative process into two conditional stages. In the first stage, we generate a plausible semantic segmentation map that obeys the wearer’s pose as a latent spatial arrangement. An effective spatial constraint is formulated to guide the generation of this semantic segmentation map. In the second stage, a generative model with a newly proposed compositional mapping layer is used to render the final image with precise regions and textures conditioned on this map. We extended the DeepFashion dataset [8] by collecting sentence descriptions for 79K images. We demonstrate the effectiveness of our approach through both quantitative and qualitative evaluations. A user study is also conducted. The codes and the data are available at http://mmlab.ie.cuhk. edu.hk/projects/FashionGAN/.
Tasks	Semantic Segmentation
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07346v1
PDF	http://arxiv.org/pdf/1710.07346v1.pdf
PWC	https://paperswithcode.com/paper/be-your-own-prada-fashion-synthesis-with
Repo
Framework

A two-dimensional decomposition approach for matrix completion through gossip


Title	A two-dimensional decomposition approach for matrix completion through gossip
Authors	Mukul Bhutani, Bamdev Mishra
Abstract	Factoring a matrix into two low rank matrices is at the heart of many problems. The problem of matrix completion especially uses it to decompose a sparse matrix into two non sparse, low rank matrices which can then be used to predict unknown entries of the original matrix. We present a scalable and decentralized approach in which instead of learning two factors for the original input matrix, we decompose the original matrix into a grid blocks, each of whose factors can be individually learned just by communicating (gossiping) with neighboring blocks. This eliminates any need for a central server. We show that our algorithm performs well on both synthetic and real datasets.
Tasks	Matrix Completion
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07684v2
PDF	http://arxiv.org/pdf/1711.07684v2.pdf
PWC	https://paperswithcode.com/paper/a-two-dimensional-decomposition-approach-for
Repo
Framework

Algorithmes de classification et d’optimisation: participation du LIA/ADOC á DEFT’14


Title	Algorithmes de classification et d’optimisation: participation du LIA/ADOC á DEFT’14
Authors	Luis Adrián Cabrera-Diego, Stéphane Huet, Bassam Jabaian, Alejandro Molina, Juan-Manuel Torres-Moreno, Marc El-Bèze, Barthélémy Durette
Abstract	This year, the DEFT campaign (D'efi Fouilles de Textes) incorporates a task which aims at identifying the session in which articles of previous TALN conferences were presented. We describe the three statistical systems developed at LIA/ADOC for this task. A fusion of these systems enables us to obtain interesting results (micro-precision score of 0.76 measured on the test corpus)
Tasks
Published	2017-02-21
URL	http://arxiv.org/abs/1702.06510v1
PDF	http://arxiv.org/pdf/1702.06510v1.pdf
PWC	https://paperswithcode.com/paper/algorithmes-de-classification-et
Repo
Framework