January 30, 2020

3269 words 16 mins read

Paper Group ANR 235

Paper Group ANR 235

Understanding Ancient Coin Images. Rodent: Relevance determination in differential equations. Inverse reinforcement learning conditioned on brain scan. Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints. Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation an …

Understanding Ancient Coin Images

Title Understanding Ancient Coin Images
Authors Jessica Cooper, Ognjen Arandjelovic
Abstract In recent years, a range of problems within the broad umbrella of automatic, computer vision based analysis of ancient coins has been attracting an increasing amount of attention. Notwithstanding this research effort, the results achieved by the state of the art in the published literature remain poor and far from sufficiently well performing for any practical purpose. In the present paper we present a series of contributions which we believe will benefit the interested community. Firstly, we explain that the approach of visual matching of coins, universally adopted in all existing published papers on the topic, is not of practical interest because the number of ancient coin types exceeds by far the number of those types which have been imaged, be it in digital form (e.g. online) or otherwise (traditional film, in print, etc.). Rather, we argue that the focus should be on the understanding of the semantic content of coins. Hence, we describe a novel method which uses real-world multimodal input to extract and associate semantic concepts with the correct coin images and then using a novel convolutional neural network learn the appearance of these concepts. Empirical evidence on a real-world and by far the largest data set of ancient coins, we demonstrate highly promising results.
Tasks
Published 2019-03-07
URL http://arxiv.org/abs/1903.02665v2
PDF http://arxiv.org/pdf/1903.02665v2.pdf
PWC https://paperswithcode.com/paper/understanding-ancient-coin-images
Repo
Framework

Rodent: Relevance determination in differential equations

Title Rodent: Relevance determination in differential equations
Authors Niklas Heim, Václav Šmídl, Tomáš Pevný
Abstract We aim to identify the generating, ordinary differential equation (ODE) from a set of trajectories of a partially observed system. Our approach does not need prescribed basis functions to learn the ODE model, but only a rich set of Neural Arithmetic Units. For maximal explainability of the learnt model, we minimise the state size of the ODE as well as the number of non-zero parameters that are needed to solve the problem. This sparsification is realized through a combination of the Variational Auto-Encoder (VAE) and Automatic Relevance Determination (ARD). We show that it is possible to learn not only one specific model for a single process, but a manifold of models representing harmonic signals as well as a manifold of Lotka-Volterra systems.
Tasks
Published 2019-12-02
URL https://arxiv.org/abs/1912.00656v2
PDF https://arxiv.org/pdf/1912.00656v2.pdf
PWC https://paperswithcode.com/paper/rodent-relevance-determination-in-ode
Repo
Framework

Inverse reinforcement learning conditioned on brain scan

Title Inverse reinforcement learning conditioned on brain scan
Authors Tofara Moyo
Abstract We outline a way for an agent to learn the dispositions of a particular individual through inverse reinforcement learning where the state space at time t includes an fMRI scan of the individual, to represent his brain state at that time. The fundamental assumption being that the information shown on an fMRI scan of an individual is conditioned on his thoughts and thought processes. The system models both long and short term memory as well any internal dynamics we may not be aware of that are in the human brain. The human expert will put on a suit for a set duration with sensors whose information will be used to train a policy network, while a generative model will be trained to produce the next fMRI scan image conditioned on the present one and the state of the environment. During operation the humanoid robots actions will be conditioned on this evolving fMRI and the environment it is in.
Tasks
Published 2019-06-24
URL https://arxiv.org/abs/1906.09770v1
PDF https://arxiv.org/pdf/1906.09770v1.pdf
PWC https://paperswithcode.com/paper/inverse-reinforcement-learning-conditioned-on
Repo
Framework

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

Title Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints
Authors Mengtian Li, Ersin Yumer, Deva Ramanan
Abstract In most practical settings and theoretical analyses, one assumes that a model can be trained until convergence. However, the growing complexity of machine learning datasets and models may violate such assumptions. Indeed, current approaches for hyper-parameter tuning and neural architecture search tend to be limited by practical resource constraints. Therefore, we introduce a formal setting for studying training under the non-asymptotic, resource-constrained regime, i.e., budgeted training. We analyze the following problem: “given a dataset, algorithm, and fixed resource budget, what is the best achievable performance?” We focus on the number of optimization iterations as the representative resource. Under such a setting, we show that it is critical to adjust the learning rate schedule according to the given budget. Among budget-aware learning schedules, we find simple linear decay to be both robust and high-performing. We support our claim through extensive experiments with state-of-the-art models on ImageNet (image classification), Kinetics (video classification), MS COCO (object detection and instance segmentation), and Cityscapes (semantic segmentation). We also analyze our results and find that the key to a good schedule is budgeted convergence, a phenomenon whereby the gradient vanishes at the end of each allowed budget. We also revisit existing approaches for fast convergence and show that budget-aware learning schedules readily outperform such approaches under (the practical but under-explored) budgeted training setting.
Tasks Image Classification, Instance Segmentation, Neural Architecture Search, Object Detection, Semantic Segmentation, Video Classification
Published 2019-05-12
URL https://arxiv.org/abs/1905.04753v3
PDF https://arxiv.org/pdf/1905.04753v3.pdf
PWC https://paperswithcode.com/paper/budgeted-training-rethinking-deep-neural
Repo
Framework

Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation and Semantic Frame Selection

Title Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation and Semantic Frame Selection
Authors Sang-Sang Tan, Jin-Cheon Na
Abstract Semantic parsing is the task of transforming sentences from natural language into formal representations of predicate-argument structures. Under this research area, frame-semantic parsing has attracted much interest. This parsing approach leverages the lexical information defined in FrameNet to associate marked predicates or targets with semantic frames, thereby assigning semantic roles to sentence components based on pre-specified frame elements in FrameNet. In this paper, a deep neural network architecture known as Positional Attention-based Frame Identification with BERT (PAFIBERT) is presented as a solution to the frame identification subtask in frame-semantic parsing. Although the importance of this subtask is well-established, prior research has yet to find a robust solution that works satisfactorily for both in-domain and out-of-domain data. This study thus set out to improve frame identification in light of recent advancements of language modeling and transfer learning in natural language processing. The proposed method is partially empowered by BERT, a pre-trained language model that excels at capturing contextual information in texts. By combining the language representation power of BERT with a position-based attention mechanism, PAFIBERT is able to attend to target-specific contexts in sentences for disambiguating targets and associating them with the most suitable semantic frames. Under various experimental settings, PAFIBERT outperformed existing solutions by a significant margin, achieving new state-of-the-art results for both in-domain and out-of-domain benchmark test sets.
Tasks Language Modelling, Semantic Parsing, Transfer Learning
Published 2019-10-31
URL https://arxiv.org/abs/1910.14549v1
PDF https://arxiv.org/pdf/1910.14549v1.pdf
PWC https://paperswithcode.com/paper/positional-attention-based-frame
Repo
Framework

Triplet-Based Deep Hashing Network for Cross-Modal Retrieval

Title Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Authors Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, Dacheng Tao
Abstract Given the benefits of its low storage requirements and high retrieval efficiency, hashing has recently received increasing attention. In particular,cross-modal hashing has been widely and successfully used in multimedia similarity search applications. However, almost all existing methods employing cross-modal hashing cannot obtain powerful hash codes due to their ignoring the relative similarity between heterogeneous data that contains richer semantic information, leading to unsatisfactory retrieval performance. In this paper, we propose a triplet-based deep hashing (TDH) network for cross-modal retrieval. First, we utilize the triplet labels, which describes the relative relationships among three instances as supervision in order to capture more general semantic correlations between cross-modal instances. We then establish a loss function from the inter-modal view and the intra-modal view to boost the discriminative abilities of the hash codes. Finally, graph regularization is introduced into our proposed TDH method to preserve the original semantic similarity between hash codes in Hamming space. Experimental results show that our proposed method outperforms several state-of-the-art approaches on two popular cross-modal datasets.
Tasks Cross-Modal Retrieval, Semantic Similarity, Semantic Textual Similarity
Published 2019-04-04
URL http://arxiv.org/abs/1904.02449v1
PDF http://arxiv.org/pdf/1904.02449v1.pdf
PWC https://paperswithcode.com/paper/triplet-based-deep-hashing-network-for-cross
Repo
Framework

Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View

Title Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View
Authors Benoit Massé, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud
Abstract In this paper we address the problems of detecting objects of interest in a video and of estimating their locations, solely from the gaze directions of people present in the video. Objects can be indistinctly located inside or outside the camera field of view. We refer to this problem as extended gaze following. The contributions of the paper are the followings. First, we propose a novel spatial representation of the gaze directions adopting a top-view perspective. Second, we develop several convolutional encoder/decoder networks to predict object locations and compare them with heuristics and with classical learning-based approaches. Third, in order to train the proposed models, we generate a very large number of synthetic scenarios employing a probabilistic formulation. Finally, our methodology is empirically validated using a publicly available dataset.
Tasks
Published 2019-02-28
URL http://arxiv.org/abs/1902.10953v1
PDF http://arxiv.org/pdf/1902.10953v1.pdf
PWC https://paperswithcode.com/paper/extended-gaze-following-detecting-objects-in
Repo
Framework

Learning Embodied Semantics via Music and Dance Semiotic Correlations

Title Learning Embodied Semantics via Music and Dance Semiotic Correlations
Authors Francisco Afonso Raposo, David Martins de Matos, Ricardo Ribeiro
Abstract Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We leverage this aspect of cognition, by considering dance as a proxy for music perception, in a statistical computational model that learns semiotic correlations between music audio and dance video. We evaluate the ability of this model to effectively capture underlying semantics in a cross-modal retrieval task. Quantitative results, validated with statistical significance testing, strengthen the body of evidence for embodied cognition in music and show the model can recommend music audio for dance video queries and vice-versa.
Tasks Cross-Modal Retrieval
Published 2019-03-25
URL http://arxiv.org/abs/1903.10534v1
PDF http://arxiv.org/pdf/1903.10534v1.pdf
PWC https://paperswithcode.com/paper/learning-embodied-semantics-via-music-and
Repo
Framework

Semantic Similarity Based Softmax Classifier for Zero-Shot Learning

Title Semantic Similarity Based Softmax Classifier for Zero-Shot Learning
Authors Shabnam Daghaghi, Tharun Medini, Anshumali Shrivastava
Abstract Zero-Shot Learning (ZSL) is a classification task where we do not have even a single training labeled example from a set of unseen classes. Instead, we only have prior information (or description) about seen and unseen classes, often in the form of physically realizable or descriptive attributes. Lack of any single training example from a set of classes prohibits use of standard classification techniques and losses, including the popular crossentropy loss. Currently, state-of-the-art approaches encode the prior class information into dense vectors and optimize some distance between the learned projections of the input vector and the corresponding class vector (collectively known as embedding models). In this paper, we propose a novel architecture of casting zero-shot learning as a standard neural-network with crossentropy loss. During training our approach performs soft-labeling by combining the observed training data for the seen classes with the similarity information from the attributes for which we have no training data or unseen classes. To the best of our knowledge, such similarity based soft-labeling is not explored in the field of deep learning. We evaluate the proposed model on the four benchmark datasets for zero-shot learning, AwA, aPY, SUN and CUB datasets, and show that our model achieves significant improvement over the state-of-the-art methods in Generalized-ZSL and ZSL settings on all of these datasets consistently.
Tasks Semantic Similarity, Semantic Textual Similarity, Zero-Shot Learning
Published 2019-09-10
URL https://arxiv.org/abs/1909.04790v1
PDF https://arxiv.org/pdf/1909.04790v1.pdf
PWC https://paperswithcode.com/paper/semantic-similarity-based-softmax-classifier
Repo
Framework

Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies

Title Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies
Authors Meinard Müller, Andreas Arzt, Stefan Balke, Matthias Dorfer, Gerhard Widmer
Abstract There has been a rapid growth of digitally available music data, including audio recordings, digitized images of sheet music, album covers and liner notes, and video clips. This huge amount of data calls for retrieval strategies that allow users to explore large music collections in a convenient way. More precisely, there is a need for cross-modal retrieval algorithms that, given a query in one modality (e.g., a short audio excerpt), find corresponding information and entities in other modalities (e.g., the name of the piece and the sheet music). This goes beyond exact audio identification and subsequent retrieval of metainformation as performed by commercial applications like Shazam [1].
Tasks Cross-Modal Retrieval
Published 2019-02-12
URL http://arxiv.org/abs/1902.04397v1
PDF http://arxiv.org/pdf/1902.04397v1.pdf
PWC https://paperswithcode.com/paper/cross-modal-music-retrieval-and-applications
Repo
Framework

Self-Supervised Visual Representations for Cross-Modal Retrieval

Title Self-Supervised Visual Representations for Cross-Modal Retrieval
Authors Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
Abstract Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are usually limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration (global context), and (2) the semantic context of its caption (local context). Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like image classification and object detection, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset.
Tasks Cross-Modal Retrieval, Image Classification, Object Detection
Published 2019-01-31
URL http://arxiv.org/abs/1902.00378v1
PDF http://arxiv.org/pdf/1902.00378v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-visual-representations-for
Repo
Framework

Maximizing Drift is Not Optimal for Solving OneMax

Title Maximizing Drift is Not Optimal for Solving OneMax
Authors Nathan Buskulic, Carola Doerr
Abstract It seems very intuitive that for the maximization of the OneMax problem $f(x):=\sum_{i=1}^n{x_i}$ the best that an elitist unary unbiased search algorithm can do is to store a best so far solution, and to modify it with the operator that yields the best possible expected progress in function value. This assumption has been implicitly used in several empirical works. In [Doerr, Doerr, Yang: GECCO 2016] it was formally proven that this approach is indeed almost optimal. In this work we prove that drift maximization is \emph{not} optimal. More precisely, we show that for most fitness levels $n/2<\ell/2 < 2n/3$ the optimal mutation strengths are larger than the drift-maximizing ones. This implies that the optimal RLS is more risk-affine than the variant maximizing the step-wise expected progress. We show similar results for the mutation rates of the classic (1+1) Evolutionary Algorithm (EA) and its resampling variant, the (1+1) EA$_{>0}$. As a result of independent interest we show that the optimal mutation strengths, unlike the drift-maximizing ones, can be even.
Tasks
Published 2019-04-16
URL http://arxiv.org/abs/1904.07818v1
PDF http://arxiv.org/pdf/1904.07818v1.pdf
PWC https://paperswithcode.com/paper/maximizing-drift-is-not-optimal-for-solving
Repo
Framework

AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation

Title AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation
Authors Pierrick Coupé, Boris Mansencal, Michaël Clément, Rémi Giraud, Baudouin Denis de Senneville, Vinh-Thong Ta, Vincent Lepetit, José V. Manjon
Abstract Whole brain segmentation using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a global convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processing different overlapping brain areas. Inspired by parliamentary decision-making systems, we propose a framework called AssemblyNet, made of two “assemblies” of U-Nets. Such a parliamentary system is capable of dealing with complex decisions and reaching a consensus quickly. AssemblyNet introduces sharing of knowledge among neighboring U-Nets, an “amendment” procedure made by the second assembly at higher-resolution to refine the decision taken by the first one, and a final decision obtained by majority voting. When using the same 45 training images, AssemblyNet outperforms global U-Net by 28% in terms of the Dice metric, patch-based joint label fusion by 15% and SLANT-27 by 10%. Finally, AssemblyNet demonstrates high capacity to deal with limited training data to achieve whole brain segmentation in practical training and testing times.
Tasks Brain Segmentation, Decision Making
Published 2019-06-05
URL https://arxiv.org/abs/1906.01862v1
PDF https://arxiv.org/pdf/1906.01862v1.pdf
PWC https://paperswithcode.com/paper/assemblynet-a-novel-deep-decision-making
Repo
Framework

BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning

Title BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning
Authors Abhijit Guha Roy, Shayan Siddiqui, Sebastian Pölsterl, Nassir Navab, Christian Wachinger
Abstract Access to sufficient annotated data is a common challenge in training deep neural networks on medical images. As annotating data is expensive and time-consuming, it is difficult for an individual medical center to reach large enough sample sizes to build their own, personalized models. As an alternative, data from all centers could be pooled to train a centralized model that everyone can use. However, such a strategy is often infeasible due to the privacy-sensitive nature of medical data. Recently, federated learning (FL) has been introduced to collaboratively learn a shared prediction model across centers without the need for sharing data. In FL, clients are locally training models on site-specific datasets for a few epochs and then sharing their model weights with a central server, which orchestrates the overall training process. Importantly, the sharing of models does not compromise patient privacy. A disadvantage of FL is the dependence on a central server, which requires all clients to agree on one trusted central body, and whose failure would disrupt the training process of all clients. In this paper, we introduce BrainTorrent, a new FL framework without a central server, particularly targeted towards medical applications. BrainTorrent presents a highly dynamic peer-to-peer environment, where all centers directly interact with each other without depending on a central body. We demonstrate the overall effectiveness of FL for the challenging task of whole brain segmentation and observe that the proposed server-less BrainTorrent approach does not only outperform the traditional server-based one but reaches a similar performance to a model trained on pooled data.
Tasks Brain Segmentation
Published 2019-05-16
URL https://arxiv.org/abs/1905.06731v1
PDF https://arxiv.org/pdf/1905.06731v1.pdf
PWC https://paperswithcode.com/paper/braintorrent-a-peer-to-peer-environment-for
Repo
Framework

Composite Shape Modeling via Latent Space Factorization

Title Composite Shape Modeling via Latent Space Factorization
Authors Anastasia Dubrovina, Fei Xia, Panos Achlioptas, Mira Shalah, Raphael Groscot, Leonidas Guibas
Abstract We present a novel neural network architecture, termed Decomposer-Composer, for semantic structure-aware 3D shape modeling. Our method utilizes an auto-encoder-based pipeline, and produces a novel factorized shape embedding space, where the semantic structure of the shape collection translates into a data-dependent sub-space factorization, and where shape composition and decomposition become simple linear operations on the embedding coordinates. We further propose to model shape assembly using an explicit learned part deformation module, which utilizes a 3D spatial transformer network to perform an in-network volumetric grid deformation, and which allows us to train the whole system end-to-end. The resulting network allows us to perform part-level shape manipulation, unattainable by existing approaches. Our extensive ablation study, comparison to baseline methods and qualitative analysis demonstrate the improved performance of the proposed method.
Tasks 3D Shape Modeling
Published 2019-01-09
URL https://arxiv.org/abs/1901.02968v2
PDF https://arxiv.org/pdf/1901.02968v2.pdf
PWC https://paperswithcode.com/paper/composite-shape-modeling-via-latent-space
Repo
Framework
comments powered by Disqus