Paper Group ANR 235
Understanding Ancient Coin Images. Rodent: Relevance determination in differential equations. Inverse reinforcement learning conditioned on brain scan. Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints. Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation an …
Understanding Ancient Coin Images
Title | Understanding Ancient Coin Images |
Authors | Jessica Cooper, Ognjen Arandjelovic |
Abstract | In recent years, a range of problems within the broad umbrella of automatic, computer vision based analysis of ancient coins has been attracting an increasing amount of attention. Notwithstanding this research effort, the results achieved by the state of the art in the published literature remain poor and far from sufficiently well performing for any practical purpose. In the present paper we present a series of contributions which we believe will benefit the interested community. Firstly, we explain that the approach of visual matching of coins, universally adopted in all existing published papers on the topic, is not of practical interest because the number of ancient coin types exceeds by far the number of those types which have been imaged, be it in digital form (e.g. online) or otherwise (traditional film, in print, etc.). Rather, we argue that the focus should be on the understanding of the semantic content of coins. Hence, we describe a novel method which uses real-world multimodal input to extract and associate semantic concepts with the correct coin images and then using a novel convolutional neural network learn the appearance of these concepts. Empirical evidence on a real-world and by far the largest data set of ancient coins, we demonstrate highly promising results. |
Tasks | |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.02665v2 |
http://arxiv.org/pdf/1903.02665v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-ancient-coin-images |
Repo | |
Framework | |
Rodent: Relevance determination in differential equations
Title | Rodent: Relevance determination in differential equations |
Authors | Niklas Heim, Václav Šmídl, Tomáš Pevný |
Abstract | We aim to identify the generating, ordinary differential equation (ODE) from a set of trajectories of a partially observed system. Our approach does not need prescribed basis functions to learn the ODE model, but only a rich set of Neural Arithmetic Units. For maximal explainability of the learnt model, we minimise the state size of the ODE as well as the number of non-zero parameters that are needed to solve the problem. This sparsification is realized through a combination of the Variational Auto-Encoder (VAE) and Automatic Relevance Determination (ARD). We show that it is possible to learn not only one specific model for a single process, but a manifold of models representing harmonic signals as well as a manifold of Lotka-Volterra systems. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00656v2 |
https://arxiv.org/pdf/1912.00656v2.pdf | |
PWC | https://paperswithcode.com/paper/rodent-relevance-determination-in-ode |
Repo | |
Framework | |
Inverse reinforcement learning conditioned on brain scan
Title | Inverse reinforcement learning conditioned on brain scan |
Authors | Tofara Moyo |
Abstract | We outline a way for an agent to learn the dispositions of a particular individual through inverse reinforcement learning where the state space at time t includes an fMRI scan of the individual, to represent his brain state at that time. The fundamental assumption being that the information shown on an fMRI scan of an individual is conditioned on his thoughts and thought processes. The system models both long and short term memory as well any internal dynamics we may not be aware of that are in the human brain. The human expert will put on a suit for a set duration with sensors whose information will be used to train a policy network, while a generative model will be trained to produce the next fMRI scan image conditioned on the present one and the state of the environment. During operation the humanoid robots actions will be conditioned on this evolving fMRI and the environment it is in. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09770v1 |
https://arxiv.org/pdf/1906.09770v1.pdf | |
PWC | https://paperswithcode.com/paper/inverse-reinforcement-learning-conditioned-on |
Repo | |
Framework | |
Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints
Title | Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints |
Authors | Mengtian Li, Ersin Yumer, Deva Ramanan |
Abstract | In most practical settings and theoretical analyses, one assumes that a model can be trained until convergence. However, the growing complexity of machine learning datasets and models may violate such assumptions. Indeed, current approaches for hyper-parameter tuning and neural architecture search tend to be limited by practical resource constraints. Therefore, we introduce a formal setting for studying training under the non-asymptotic, resource-constrained regime, i.e., budgeted training. We analyze the following problem: “given a dataset, algorithm, and fixed resource budget, what is the best achievable performance?” We focus on the number of optimization iterations as the representative resource. Under such a setting, we show that it is critical to adjust the learning rate schedule according to the given budget. Among budget-aware learning schedules, we find simple linear decay to be both robust and high-performing. We support our claim through extensive experiments with state-of-the-art models on ImageNet (image classification), Kinetics (video classification), MS COCO (object detection and instance segmentation), and Cityscapes (semantic segmentation). We also analyze our results and find that the key to a good schedule is budgeted convergence, a phenomenon whereby the gradient vanishes at the end of each allowed budget. We also revisit existing approaches for fast convergence and show that budget-aware learning schedules readily outperform such approaches under (the practical but under-explored) budgeted training setting. |
Tasks | Image Classification, Instance Segmentation, Neural Architecture Search, Object Detection, Semantic Segmentation, Video Classification |
Published | 2019-05-12 |
URL | https://arxiv.org/abs/1905.04753v3 |
https://arxiv.org/pdf/1905.04753v3.pdf | |
PWC | https://paperswithcode.com/paper/budgeted-training-rethinking-deep-neural |
Repo | |
Framework | |
Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation and Semantic Frame Selection
Title | Positional Attention-based Frame Identification with BERT: A Deep Learning Approach to Target Disambiguation and Semantic Frame Selection |
Authors | Sang-Sang Tan, Jin-Cheon Na |
Abstract | Semantic parsing is the task of transforming sentences from natural language into formal representations of predicate-argument structures. Under this research area, frame-semantic parsing has attracted much interest. This parsing approach leverages the lexical information defined in FrameNet to associate marked predicates or targets with semantic frames, thereby assigning semantic roles to sentence components based on pre-specified frame elements in FrameNet. In this paper, a deep neural network architecture known as Positional Attention-based Frame Identification with BERT (PAFIBERT) is presented as a solution to the frame identification subtask in frame-semantic parsing. Although the importance of this subtask is well-established, prior research has yet to find a robust solution that works satisfactorily for both in-domain and out-of-domain data. This study thus set out to improve frame identification in light of recent advancements of language modeling and transfer learning in natural language processing. The proposed method is partially empowered by BERT, a pre-trained language model that excels at capturing contextual information in texts. By combining the language representation power of BERT with a position-based attention mechanism, PAFIBERT is able to attend to target-specific contexts in sentences for disambiguating targets and associating them with the most suitable semantic frames. Under various experimental settings, PAFIBERT outperformed existing solutions by a significant margin, achieving new state-of-the-art results for both in-domain and out-of-domain benchmark test sets. |
Tasks | Language Modelling, Semantic Parsing, Transfer Learning |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14549v1 |
https://arxiv.org/pdf/1910.14549v1.pdf | |
PWC | https://paperswithcode.com/paper/positional-attention-based-frame |
Repo | |
Framework | |
Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Title | Triplet-Based Deep Hashing Network for Cross-Modal Retrieval |
Authors | Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, Dacheng Tao |
Abstract | Given the benefits of its low storage requirements and high retrieval efficiency, hashing has recently received increasing attention. In particular,cross-modal hashing has been widely and successfully used in multimedia similarity search applications. However, almost all existing methods employing cross-modal hashing cannot obtain powerful hash codes due to their ignoring the relative similarity between heterogeneous data that contains richer semantic information, leading to unsatisfactory retrieval performance. In this paper, we propose a triplet-based deep hashing (TDH) network for cross-modal retrieval. First, we utilize the triplet labels, which describes the relative relationships among three instances as supervision in order to capture more general semantic correlations between cross-modal instances. We then establish a loss function from the inter-modal view and the intra-modal view to boost the discriminative abilities of the hash codes. Finally, graph regularization is introduced into our proposed TDH method to preserve the original semantic similarity between hash codes in Hamming space. Experimental results show that our proposed method outperforms several state-of-the-art approaches on two popular cross-modal datasets. |
Tasks | Cross-Modal Retrieval, Semantic Similarity, Semantic Textual Similarity |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02449v1 |
http://arxiv.org/pdf/1904.02449v1.pdf | |
PWC | https://paperswithcode.com/paper/triplet-based-deep-hashing-network-for-cross |
Repo | |
Framework | |
Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View
Title | Extended Gaze Following: Detecting Objects in Videos Beyond the Camera Field of View |
Authors | Benoit Massé, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud |
Abstract | In this paper we address the problems of detecting objects of interest in a video and of estimating their locations, solely from the gaze directions of people present in the video. Objects can be indistinctly located inside or outside the camera field of view. We refer to this problem as extended gaze following. The contributions of the paper are the followings. First, we propose a novel spatial representation of the gaze directions adopting a top-view perspective. Second, we develop several convolutional encoder/decoder networks to predict object locations and compare them with heuristics and with classical learning-based approaches. Third, in order to train the proposed models, we generate a very large number of synthetic scenarios employing a probabilistic formulation. Finally, our methodology is empirically validated using a publicly available dataset. |
Tasks | |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.10953v1 |
http://arxiv.org/pdf/1902.10953v1.pdf | |
PWC | https://paperswithcode.com/paper/extended-gaze-following-detecting-objects-in |
Repo | |
Framework | |
Learning Embodied Semantics via Music and Dance Semiotic Correlations
Title | Learning Embodied Semantics via Music and Dance Semiotic Correlations |
Authors | Francisco Afonso Raposo, David Martins de Matos, Ricardo Ribeiro |
Abstract | Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We leverage this aspect of cognition, by considering dance as a proxy for music perception, in a statistical computational model that learns semiotic correlations between music audio and dance video. We evaluate the ability of this model to effectively capture underlying semantics in a cross-modal retrieval task. Quantitative results, validated with statistical significance testing, strengthen the body of evidence for embodied cognition in music and show the model can recommend music audio for dance video queries and vice-versa. |
Tasks | Cross-Modal Retrieval |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10534v1 |
http://arxiv.org/pdf/1903.10534v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-embodied-semantics-via-music-and |
Repo | |
Framework | |
Semantic Similarity Based Softmax Classifier for Zero-Shot Learning
Title | Semantic Similarity Based Softmax Classifier for Zero-Shot Learning |
Authors | Shabnam Daghaghi, Tharun Medini, Anshumali Shrivastava |
Abstract | Zero-Shot Learning (ZSL) is a classification task where we do not have even a single training labeled example from a set of unseen classes. Instead, we only have prior information (or description) about seen and unseen classes, often in the form of physically realizable or descriptive attributes. Lack of any single training example from a set of classes prohibits use of standard classification techniques and losses, including the popular crossentropy loss. Currently, state-of-the-art approaches encode the prior class information into dense vectors and optimize some distance between the learned projections of the input vector and the corresponding class vector (collectively known as embedding models). In this paper, we propose a novel architecture of casting zero-shot learning as a standard neural-network with crossentropy loss. During training our approach performs soft-labeling by combining the observed training data for the seen classes with the similarity information from the attributes for which we have no training data or unseen classes. To the best of our knowledge, such similarity based soft-labeling is not explored in the field of deep learning. We evaluate the proposed model on the four benchmark datasets for zero-shot learning, AwA, aPY, SUN and CUB datasets, and show that our model achieves significant improvement over the state-of-the-art methods in Generalized-ZSL and ZSL settings on all of these datasets consistently. |
Tasks | Semantic Similarity, Semantic Textual Similarity, Zero-Shot Learning |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04790v1 |
https://arxiv.org/pdf/1909.04790v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-similarity-based-softmax-classifier |
Repo | |
Framework | |
Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies
Title | Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies |
Authors | Meinard Müller, Andreas Arzt, Stefan Balke, Matthias Dorfer, Gerhard Widmer |
Abstract | There has been a rapid growth of digitally available music data, including audio recordings, digitized images of sheet music, album covers and liner notes, and video clips. This huge amount of data calls for retrieval strategies that allow users to explore large music collections in a convenient way. More precisely, there is a need for cross-modal retrieval algorithms that, given a query in one modality (e.g., a short audio excerpt), find corresponding information and entities in other modalities (e.g., the name of the piece and the sheet music). This goes beyond exact audio identification and subsequent retrieval of metainformation as performed by commercial applications like Shazam [1]. |
Tasks | Cross-Modal Retrieval |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04397v1 |
http://arxiv.org/pdf/1902.04397v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-modal-music-retrieval-and-applications |
Repo | |
Framework | |
Self-Supervised Visual Representations for Cross-Modal Retrieval
Title | Self-Supervised Visual Representations for Cross-Modal Retrieval |
Authors | Yash Patel, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar |
Abstract | Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are usually limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration (global context), and (2) the semantic context of its caption (local context). Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like image classification and object detection, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset. |
Tasks | Cross-Modal Retrieval, Image Classification, Object Detection |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1902.00378v1 |
http://arxiv.org/pdf/1902.00378v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-visual-representations-for |
Repo | |
Framework | |
Maximizing Drift is Not Optimal for Solving OneMax
Title | Maximizing Drift is Not Optimal for Solving OneMax |
Authors | Nathan Buskulic, Carola Doerr |
Abstract | It seems very intuitive that for the maximization of the OneMax problem $f(x):=\sum_{i=1}^n{x_i}$ the best that an elitist unary unbiased search algorithm can do is to store a best so far solution, and to modify it with the operator that yields the best possible expected progress in function value. This assumption has been implicitly used in several empirical works. In [Doerr, Doerr, Yang: GECCO 2016] it was formally proven that this approach is indeed almost optimal. In this work we prove that drift maximization is \emph{not} optimal. More precisely, we show that for most fitness levels $n/2<\ell/2 < 2n/3$ the optimal mutation strengths are larger than the drift-maximizing ones. This implies that the optimal RLS is more risk-affine than the variant maximizing the step-wise expected progress. We show similar results for the mutation rates of the classic (1+1) Evolutionary Algorithm (EA) and its resampling variant, the (1+1) EA$_{>0}$. As a result of independent interest we show that the optimal mutation strengths, unlike the drift-maximizing ones, can be even. |
Tasks | |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07818v1 |
http://arxiv.org/pdf/1904.07818v1.pdf | |
PWC | https://paperswithcode.com/paper/maximizing-drift-is-not-optimal-for-solving |
Repo | |
Framework | |
AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation
Title | AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation |
Authors | Pierrick Coupé, Boris Mansencal, Michaël Clément, Rémi Giraud, Baudouin Denis de Senneville, Vinh-Thong Ta, Vincent Lepetit, José V. Manjon |
Abstract | Whole brain segmentation using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a global convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processing different overlapping brain areas. Inspired by parliamentary decision-making systems, we propose a framework called AssemblyNet, made of two “assemblies” of U-Nets. Such a parliamentary system is capable of dealing with complex decisions and reaching a consensus quickly. AssemblyNet introduces sharing of knowledge among neighboring U-Nets, an “amendment” procedure made by the second assembly at higher-resolution to refine the decision taken by the first one, and a final decision obtained by majority voting. When using the same 45 training images, AssemblyNet outperforms global U-Net by 28% in terms of the Dice metric, patch-based joint label fusion by 15% and SLANT-27 by 10%. Finally, AssemblyNet demonstrates high capacity to deal with limited training data to achieve whole brain segmentation in practical training and testing times. |
Tasks | Brain Segmentation, Decision Making |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01862v1 |
https://arxiv.org/pdf/1906.01862v1.pdf | |
PWC | https://paperswithcode.com/paper/assemblynet-a-novel-deep-decision-making |
Repo | |
Framework | |
BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning
Title | BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning |
Authors | Abhijit Guha Roy, Shayan Siddiqui, Sebastian Pölsterl, Nassir Navab, Christian Wachinger |
Abstract | Access to sufficient annotated data is a common challenge in training deep neural networks on medical images. As annotating data is expensive and time-consuming, it is difficult for an individual medical center to reach large enough sample sizes to build their own, personalized models. As an alternative, data from all centers could be pooled to train a centralized model that everyone can use. However, such a strategy is often infeasible due to the privacy-sensitive nature of medical data. Recently, federated learning (FL) has been introduced to collaboratively learn a shared prediction model across centers without the need for sharing data. In FL, clients are locally training models on site-specific datasets for a few epochs and then sharing their model weights with a central server, which orchestrates the overall training process. Importantly, the sharing of models does not compromise patient privacy. A disadvantage of FL is the dependence on a central server, which requires all clients to agree on one trusted central body, and whose failure would disrupt the training process of all clients. In this paper, we introduce BrainTorrent, a new FL framework without a central server, particularly targeted towards medical applications. BrainTorrent presents a highly dynamic peer-to-peer environment, where all centers directly interact with each other without depending on a central body. We demonstrate the overall effectiveness of FL for the challenging task of whole brain segmentation and observe that the proposed server-less BrainTorrent approach does not only outperform the traditional server-based one but reaches a similar performance to a model trained on pooled data. |
Tasks | Brain Segmentation |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06731v1 |
https://arxiv.org/pdf/1905.06731v1.pdf | |
PWC | https://paperswithcode.com/paper/braintorrent-a-peer-to-peer-environment-for |
Repo | |
Framework | |
Composite Shape Modeling via Latent Space Factorization
Title | Composite Shape Modeling via Latent Space Factorization |
Authors | Anastasia Dubrovina, Fei Xia, Panos Achlioptas, Mira Shalah, Raphael Groscot, Leonidas Guibas |
Abstract | We present a novel neural network architecture, termed Decomposer-Composer, for semantic structure-aware 3D shape modeling. Our method utilizes an auto-encoder-based pipeline, and produces a novel factorized shape embedding space, where the semantic structure of the shape collection translates into a data-dependent sub-space factorization, and where shape composition and decomposition become simple linear operations on the embedding coordinates. We further propose to model shape assembly using an explicit learned part deformation module, which utilizes a 3D spatial transformer network to perform an in-network volumetric grid deformation, and which allows us to train the whole system end-to-end. The resulting network allows us to perform part-level shape manipulation, unattainable by existing approaches. Our extensive ablation study, comparison to baseline methods and qualitative analysis demonstrate the improved performance of the proposed method. |
Tasks | 3D Shape Modeling |
Published | 2019-01-09 |
URL | https://arxiv.org/abs/1901.02968v2 |
https://arxiv.org/pdf/1901.02968v2.pdf | |
PWC | https://paperswithcode.com/paper/composite-shape-modeling-via-latent-space |
Repo | |
Framework | |