Paper Group ANR 1049
Prediction of Dashed Café Wall illusion by the Classical Receptive Field Model. EEG-Based Emotion Recognition Using Regularized Graph Neural Networks. Active inference body perception and action for humanoid robots. Invariant Tensor Feature Coding. Adversarial Video Generation on Complex Datasets. Distilling Policy Distillation. Permutation Recover …
Prediction of Dashed Café Wall illusion by the Classical Receptive Field Model
Title | Prediction of Dashed Café Wall illusion by the Classical Receptive Field Model |
Authors | Nasim Nematzadeh, David M. W. Powers |
Abstract | The Caf'e Wall illusion is one of a class of tilt illusions where lines that are parallel appear to be tilted. We demonstrate that a simple Differences of Gaussian model provides an explanatory mechanism for the illusory tilt perceived in a family of Caf'e Wall illusion generalizes to the dashed versions of Caf'e Wall. Our explanation models the visual mechanisms in low-level stages that can reveal tilt cues in Geometrical distortion illusions such as Tile illusions particularly Caf'e Wall illusions. For this, we simulate the activation of the retinal/cortical simple cells in responses to these patterns based on a Classical Receptive Field (CRF) model to explain tilt effects in these illusions. Previously, it was assumed that all these visual experiences of tilt arise from the orientation selectivity properties described for more complex cortical cells. An estimation of an overall tilt angle perceived in these illusions is based on the integration of the local tilts detected by simple cells which is presumed to be a key mechanism utilized by the complex cells to create our final perception of tilt. |
Tasks | |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03739v1 |
http://arxiv.org/pdf/1902.03739v1.pdf | |
PWC | https://paperswithcode.com/paper/prediction-of-dashed-cafe-wall-illusion-by |
Repo | |
Framework | |
EEG-Based Emotion Recognition Using Regularized Graph Neural Networks
Title | EEG-Based Emotion Recognition Using Regularized Graph Neural Networks |
Authors | Peixiang Zhong, Di Wang, Chunyan Miao |
Abstract | EEG signals measure the neuronal activities on different brain regions via electrodes. Many existing studies on EEG-based emotion recognition do not exploit the topological structure of EEG signals. In this paper, we propose a regularized graph neural network (RGNN) for EEG-based emotion recognition, which is biologically supported and captures both local and global inter-channel relations. Specifically, we model the inter-channel relations in EEG signals via an adjacency matrix in our graph neural network where the connection and sparseness of the adjacency matrix are supported by the neurosicience theories of human brain organization. In addition, we propose two regularizers, namely node-wise domain adversarial training (NodeDAT) and emotion-aware distribution learning (EmotionDL), to improve the robustness of our model against cross-subject EEG variations and noisy labels, respectively. To thoroughly evaluate our model, we conduct extensive experiments in both subject-dependent and subject-independent classification settings on two public datasets: SEED and SEED-IV. Our model obtains better performance than competitive baselines such as SVM, DBN, DGCNN, BiDANN, and the state-of-the-art BiHDM in most experimental settings . Our model analysis demonstrates that the proposed biologically supported adjacency matrix and two regularizers contribute consistent and significant gain to the performance. Investigations on the neuronal activities reveal that pre-frontal, parietal and occipital regions may be the most informative regions for emotion recognition, which is consistent with relevant prior studies. In addition, experimental results suggest that global inter-channel relations between the left and right hemispheres are important for emotion recognition and local inter-channel relations between (FP1, AF3), (F6, F8) and (FP2, AF4) may also provide useful information. |
Tasks | EEG, Emotion Recognition |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.07835v2 |
https://arxiv.org/pdf/1907.07835v2.pdf | |
PWC | https://paperswithcode.com/paper/eeg-based-emotion-recognition-using |
Repo | |
Framework | |
Active inference body perception and action for humanoid robots
Title | Active inference body perception and action for humanoid robots |
Authors | Guillermo Oliver, Pablo Lanillos, Gordon Cheng |
Abstract | Providing artificial agents with the same computational models of biological systems is a way to understand how intelligent behaviours may emerge. We present an active inference body perception and action model working for the first time in a humanoid robot. The model relies on the free energy principle proposed for the brain, where both perception and action goal is to minimise the prediction error through gradient descent on the variational free energy bound. The body state (latent variable) is inferred by minimising the difference between the observed (visual and proprioceptive) sensor values and the predicted ones. Simultaneously, the action makes sensory data sampling to better correspond to the prediction made by the inner model. We formalised and implemented the algorithm on the iCub robot and tested in 2D and 3D visual spaces for online adaptation to visual changes, sensory noise and discrepancies between the model and the real robot. We also compared our approach with classical inverse kinematics in a reaching task, analysing the suitability of such a neuroscience-inspired approach for real-world interaction. The algorithm gave the robot adaptive body perception and upper body reaching with head object tracking (toddler-like), and was able to incorporate visual features online (in a closed-loop manner) without increasing the computational complexity. Moreover, our model predicted involuntary actions in the presence of sensorimotor conflicts showing the path for a potential proof of active inference in humans. |
Tasks | Object Tracking |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.03022v3 |
https://arxiv.org/pdf/1906.03022v3.pdf | |
PWC | https://paperswithcode.com/paper/active-inference-body-perception-and-action |
Repo | |
Framework | |
Invariant Tensor Feature Coding
Title | Invariant Tensor Feature Coding |
Authors | Yusuke Mukuta, Tatsuya Harada |
Abstract | We propose a novel feature coding method that exploits invariance. We consider the setting where the transformations that preserve the image contents compose a finite group of orthogonal matrices. This is the case in many image transformations, such as image rotations and image flipping. We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier using convex loss minimization. From this result, we propose a novel feature modeling for principal component analysis and k-means clustering, which are used for most feature coding methods, and global feature functions that explicitly consider the group action. Although the global feature functions are complex nonlinear functions in general, we can calculate the group action on this space easily by constructing the functions as the tensor product representations of basic representations, resulting in the explicit form of invariant feature functions. We demonstrate the effectiveness of our methods on several image datasets. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01857v2 |
https://arxiv.org/pdf/1906.01857v2.pdf | |
PWC | https://paperswithcode.com/paper/invariant-tensor-feature-coding |
Repo | |
Framework | |
Adversarial Video Generation on Complex Datasets
Title | Adversarial Video Generation on Complex Datasets |
Authors | Aidan Clark, Jeff Donahue, Karen Simonyan |
Abstract | Generative models of natural images have progressed towards high fidelity samples by the strong leveraging of scale. We attempt to carry this success to the field of video modeling by showing that large Generative Adversarial Networks trained on the complex Kinetics-600 dataset are able to produce video samples of substantially higher complexity and fidelity than previous work. Our proposed model, Dual Video Discriminator GAN (DVD-GAN), scales to longer and higher resolution videos by leveraging a computationally efficient decomposition of its discriminator. We evaluate on the related tasks of video synthesis and video prediction, and achieve new state-of-the-art Fr'echet Inception Distance for prediction for Kinetics-600, as well as state-of-the-art Inception Score for synthesis on the UCF-101 dataset, alongside establishing a strong baseline for synthesis on Kinetics-600. |
Tasks | Video Generation, Video Prediction |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06571v2 |
https://arxiv.org/pdf/1907.06571v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-video-generation-on-complex |
Repo | |
Framework | |
Distilling Policy Distillation
Title | Distilling Policy Distillation |
Authors | Wojciech Marian Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg |
Abstract | The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stronger performance faster, on harder domains [26, 32, 5, 8]. Despite the widespread use and conceptual simplicity of distillation, many different formulations are used in practice, and the subtle variations between them can often drastically change the performance and the resulting objective that is being optimised. In this work, we rigorously explore the entire landscape of policy distillation, comparing the motivations and strengths of each variant through theoretical and empirical analysis. Our results point to three distillation techniques, that are preferred depending on specifics of the task. Specifically a newly proposed expected entropy regularised distillation allows for quicker learning in a wide range of situations, while still guaranteeing convergence. |
Tasks | |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.02186v1 |
http://arxiv.org/pdf/1902.02186v1.pdf | |
PWC | https://paperswithcode.com/paper/distilling-policy-distillation |
Repo | |
Framework | |
Permutation Recovery from Multiple Measurement Vectors in Unlabeled Sensing
Title | Permutation Recovery from Multiple Measurement Vectors in Unlabeled Sensing |
Authors | Hang Zhang, Martin Slawski, Ping Li |
Abstract | In “Unlabeled Sensing”, one observes a set of linear measurements of an underlying signal with incomplete or missing information about their ordering, which can be modeled in terms of an unknown permutation. Previous work on the case of a single noisy measurement vector has exposed two main challenges: 1) a high requirement concerning the \emph{signal-to-noise ratio} (snr), i.e., approximately of the order of $n^{5}$, and 2) a massive computational burden in light of NP-hardness in general. In this paper, we study the case of \emph{multiple} noisy measurement vectors (MMVs) resulting from a \emph{common} permutation and investigate to what extent the number of MMVs $m$ facilitates permutation recovery by “borrowing strength”. The above two challenges have at least partially been resolved within our work. First, we show that a large stable rank of the signal significantly reduces the required snr which can drop from a polynomial in $n$ for $m = 1$ to a constant for $m = \Omega(\log n)$, where $m$ denotes the number of MMVs and $n$ denotes the number of measurements per MV. This bound is shown to be sharp and is associated with a phase transition phenomenon. Second, we propose computational schemes for recovering the unknown permutation in practice. For the “oracle case” with the known signal, the maximum likelihood (ML) estimator reduces to a linear assignment problem whose global optimum can be obtained efficiently. For the case in which both the signal and permutation are unknown, the problem is reformulated as a bi-convex optimization problem with an auxiliary variable, which can be solved by the Alternating Direction Method of Multipliers (ADMM). Numerical experiments based on the proposed computational schemes confirm the tightness of our theoretical analysis. |
Tasks | |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02496v1 |
https://arxiv.org/pdf/1909.02496v1.pdf | |
PWC | https://paperswithcode.com/paper/permutation-recovery-from-multiple |
Repo | |
Framework | |
LinesToFacePhoto: Face Photo Generation from Lines with Conditional Self-Attention Generative Adversarial Network
Title | LinesToFacePhoto: Face Photo Generation from Lines with Conditional Self-Attention Generative Adversarial Network |
Authors | Yuhang Li, Xuejin Chen, Feng Wu, Zheng-Jun Zha |
Abstract | In this paper, we explore the task of generating photo-realistic face images from lines. Previous methods based on conditional generative adversarial networks (cGANs) have shown their power to generate visually plausible images when a conditional image and an output image share well-aligned structures. However, these models fail to synthesize face images with a whole set of well-defined structures, e.g. eyes, noses, mouths, etc., especially when the conditional line map lacks one or several parts. To address this problem, we propose a conditional self-attention generative adversarial network (CSAGAN). We introduce a conditional self-attention mechanism to cGANs to capture long-range dependencies between different regions in faces. We also build a multi-scale discriminator. The large-scale discriminator enforces the completeness of global structures and the small-scale discriminator encourages fine details, thereby enhancing the realism of generated face images. We evaluate the proposed model on the CelebA-HD dataset by two perceptual user studies and three quantitative metrics. The experiment results demonstrate that our method generates high-quality facial images while preserving facial structures. Our results outperform state-of-the-art methods both quantitatively and qualitatively. |
Tasks | |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.08914v1 |
https://arxiv.org/pdf/1910.08914v1.pdf | |
PWC | https://paperswithcode.com/paper/linestofacephoto-face-photo-generation-from |
Repo | |
Framework | |
Sub-policy Adaptation for Hierarchical Reinforcement Learning
Title | Sub-policy Adaptation for Hierarchical Reinforcement Learning |
Authors | Alexander C. Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel |
Abstract | Hierarchical reinforcement learning is a promising approach to tackle long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Leaving the skills fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient with an unbiased latent-dependent baseline, and we introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly. Second, we propose a method of training time-abstractions that improves the robustness of the obtained skills to environment changes. Code and results are available at sites.google.com/view/hippo-rl |
Tasks | Decision Making, Hierarchical Reinforcement Learning |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05862v3 |
https://arxiv.org/pdf/1906.05862v3.pdf | |
PWC | https://paperswithcode.com/paper/sub-policy-adaptation-for-hierarchical |
Repo | |
Framework | |
Improving Unsupervised Domain Adaptation with Variational Information Bottleneck
Title | Improving Unsupervised Domain Adaptation with Variational Information Bottleneck |
Authors | Yuxuan Song, Lantao Yu, Zhangjie Cao, Zhiming Zhou, Jian Shen, Shuo Shao, Weinan Zhang, Yong Yu |
Abstract | Domain adaptation aims to leverage the supervision signal of source domain to obtain an accurate model for target domain, where the labels are not available. To leverage and adapt the label information from source domain, most existing methods employ a feature extracting function and match the marginal distributions of source and target domains in a shared feature space. In this paper, from the perspective of information theory, we show that representation matching is actually an insufficient constraint on the feature space for obtaining a model with good generalization performance in target domain. We then propose variational bottleneck domain adaptation (VBDA), a new domain adaptation method which improves feature transferability by explicitly enforcing the feature extractor to ignore the task-irrelevant factors and focus on the information that is essential to the task of interest for both source and target domains. Extensive experimental results demonstrate that VBDA significantly outperforms state-of-the-art methods across three domain adaptation benchmark datasets. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09310v1 |
https://arxiv.org/pdf/1911.09310v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-unsupervised-domain-adaptation-with |
Repo | |
Framework | |
Direct Nonlinear Acceleration
Title | Direct Nonlinear Acceleration |
Authors | Aritra Dutta, El Houcine Bergou, Yunming Xiao, Marco Canini, Peter Richtárik |
Abstract | Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al., were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA). In DNA, we aim to minimize (an approximation of) the function value at the extrapolated point instead. We adopt a regularized approach with regularizers designed to prevent the model from entering a region in which the functional approximation is less precise. While the computational cost of DNA is comparable to that of RNA, our direct approach significantly outperforms RNA on both synthetic and real-world datasets. While the focus of this paper is on convex problems, we obtain very encouraging results in accelerating the training of neural networks. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11692v1 |
https://arxiv.org/pdf/1905.11692v1.pdf | |
PWC | https://paperswithcode.com/paper/direct-nonlinear-acceleration |
Repo | |
Framework | |
Rational Kernels: A survey
Title | Rational Kernels: A survey |
Authors | Abhishek Ghose |
Abstract | Many kinds of data are naturally amenable to being treated as sequences. An example is text data, where a text may be seen as a sequence of words. Another example is clickstream data, where a data instance is a sequence of clicks made by a visitor to a website. This is also common for data originating in the domains of speech processing and computational biology. Using such data with statistical learning techniques can often prove to be cumbersome since most of them only allow fixed-length feature vectors as input. In casting the data to fixed-length feature vectors to suit these techniques, we lose the convenience, and possibly information, a good sequence-based representation can offer. The framework of rational kernels partly addresses this problem by providing an elegant representation for sequences, for algorithms that use kernel functions. In this report, we take a comprehensive look at this framework, its various extensions and applications. We start with an overview of the core ideas, where we look at the characterization of rational kernels, and then extend our discussion to extensions, applications and use at scale. Rational kernels represent a family of kernels, and thus, learning an appropriate rational kernel instead of picking one, suggests a convenient way to use them; we explore this idea in our concluding section. Rational kernels are not as popular as the many other learning techniques in use today; however, we hope that this summary effectively shows that not only is their theory well-developed, but also that various practical aspects have been carefully studied over time. |
Tasks | |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.13800v1 |
https://arxiv.org/pdf/1910.13800v1.pdf | |
PWC | https://paperswithcode.com/paper/rational-kernels-a-survey |
Repo | |
Framework | |
Feature Fusion for Online Mutual Knowledge Distillation
Title | Feature Fusion for Online Mutual Knowledge Distillation |
Authors | Jangho Kim, Minsung Hyun, Inseop Chung, Nojun Kwak |
Abstract | We propose a learning framework named Feature Fusion Learning (FFL) that efficiently trains a powerful classifier through a fusion module which combines the feature maps generated from parallel neural networks. Specifically, we train a number of parallel neural networks as sub-networks, then we combine the feature maps from each sub-network using a fusion module to create a more meaningful feature map. The fused feature map is passed into the fused classifier for overall classification. Unlike existing feature fusion methods, in our framework, an ensemble of sub-network classifiers transfers its knowledge to the fused classifier and then the fused classifier delivers its knowledge back to each sub-network, mutually teaching one another in an online-knowledge distillation manner. This mutually teaching system not only improves the performance of the fused classifier but also obtains performance gain in each sub-network. Moreover, our model is more beneficial because different types of network can be used for each sub-network. We have performed a variety of experiments on multiple datasets such as CIFAR-10, CIFAR-100 and ImageNet and proved that our method is more effective than other alternative methods in terms of performance of both sub-networks and the fused classifier. |
Tasks | |
Published | 2019-04-19 |
URL | http://arxiv.org/abs/1904.09058v1 |
http://arxiv.org/pdf/1904.09058v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-fusion-for-online-mutual-knowledge |
Repo | |
Framework | |
Large Scale Landmark Recognition via Deep Metric Learning
Title | Large Scale Landmark Recognition via Deep Metric Learning |
Authors | Andrei Boiarov, Eduard Tyantov |
Abstract | This paper presents a novel approach for landmark recognition in images that we’ve successfully deployed at Mail ru. This method enables us to recognize famous places, buildings, monuments, and other landmarks in user photos. The main challenge lies in the fact that it’s very complicated to give a precise definition of what is and what is not a landmark. Some buildings, statues and natural objects are landmarks; others are not. There’s also no database with a fairly large number of landmarks to train a recognition model. A key feature of using landmark recognition in a production environment is that the number of photos containing landmarks is extremely small. This is why the model should have a very low false positive rate as well as high recognition accuracy. We propose a metric learning-based approach that successfully deals with existing challenges and efficiently handles a large number of landmarks. Our method uses a deep neural network and requires a single pass inference that makes it fast to use in production. We also describe an algorithm for cleaning landmarks database which is essential for training a metric learning model. We provide an in-depth description of basic components of our method like neural network architecture, the learning strategy, and the features of our metric learning approach. We show the results of proposed solutions in tests that emulate the distribution of photos with and without landmarks from a user collection. We compare our method with others during these tests. The described system has been deployed as a part of a photo recognition solution at Cloud Mail ru, which is the photo sharing and storage service at Mail ru Group. |
Tasks | Metric Learning |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10192v3 |
https://arxiv.org/pdf/1908.10192v3.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-landmark-recognition-via-deep |
Repo | |
Framework | |
Global-Local Temporal Representations For Video Person Re-Identification
Title | Global-Local Temporal Representations For Video Person Re-Identification |
Authors | Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang |
Abstract | This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID). GLTR is constructed by first modeling the short-term temporal cues among adjacent frames, then capturing the long-term relations among inconsecutive frames. Specifically, the short-term temporal cues are modeled by parallel dilated convolutions with different temporal dilation rates to represent the motion and appearance of pedestrian. The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences. The short and long-term temporal cues are aggregated as the final GLTR by a simple single-stream CNN. GLTR shows substantial superiority to existing features learned with body part cues or metric learning on four widely-used video ReID datasets. For instance, it achieves Rank-1 Accuracy of 87.02% on MARS dataset without re-ranking, better than current state-of-the art. |
Tasks | Metric Learning, Person Re-Identification, Video-Based Person Re-Identification |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10049v1 |
https://arxiv.org/pdf/1908.10049v1.pdf | |
PWC | https://paperswithcode.com/paper/global-local-temporal-representations-for |
Repo | |
Framework | |