January 26, 2020

3245 words 16 mins read

Paper Group ANR 1394

Information criteria for non-normalized models. Adaptive Embedding Gate for Attention-Based Scene Text Recognition. Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions. Grounding Language Attributes to Objects using Bayesian Eigenobjects. Positive-Unlabeled Reward Learning. Fusing Visual, Textual and Connectivity …

Information criteria for non-normalized models


Title	Information criteria for non-normalized models
Authors	Takeru Matsuda, Masatoshi Uehara, Aapo Hyvarinen
Abstract	Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed which do not require explicit computation of the normalization constant, such as noise contrastive estimation (NCE) and score matching. However, model selection methods for general non-normalized models have not been proposed so far. In this study, we develop information criteria for non-normalized models estimated by NCE or score matching. They are derived as approximately unbiased estimators of discrepancy measures for non-normalized models. Experimental results demonstrate that the proposed criteria enable selection of the appropriate non-normalized model in a data-driven manner. Extension to a finite mixture of non-normalized models is also discussed.
Tasks	Model Selection
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05976v1
PDF	https://arxiv.org/pdf/1905.05976v1.pdf
PWC	https://paperswithcode.com/paper/information-criteria-for-non-normalized
Repo
Framework

Adaptive Embedding Gate for Attention-Based Scene Text Recognition


Title	Adaptive Embedding Gate for Attention-Based Scene Text Recognition
Authors	Xiaoxue Chen, Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo
Abstract	Scene text recognition has attracted particular research interest because it is a very challenging problem and has various applications. The most cutting-edge methods are attentional encoder-decoder frameworks that learn the alignment between the input image and output sequences. In particular, the decoder recurrently outputs predictions, using the prediction of the previous step as a guidance for every time step. In this study, we point out that the inappropriate use of previous predictions in existing attention mechanisms restricts the recognition performance and brings instability. To handle this problem, we propose a novel module, namely adaptive embedding gate(AEG). The proposed AEG focuses on introducing high-order character language models to attention mechanism by controlling the information transmission between adjacent characters. AEG is a flexible module and can be easily integrated into the state-of-the-art attentional methods. We evaluate its effectiveness as well as robustness on a number of standard benchmarks, including the IIIT$5$K, SVT, SVT-P, CUTE$80$, and ICDAR datasets. Experimental results demonstrate that AEG can significantly boost recognition performance and bring better robustness.
Tasks	Scene Text Recognition
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09475v1
PDF	https://arxiv.org/pdf/1908.09475v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-embedding-gate-for-attention-based
Repo
Framework

Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions


Title	Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions
Authors	Eita Nakamura, Kazuyoshi Yoshii
Abstract	Most work on models for music transcription has focused on describing local sequential dependence of notes in musical scores and failed to capture their global repetitive structure, which can be a useful guide for transcribing music. Focusing on the rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetitions indirectly by sparse transition probabilities of notes or note patterns. This enables us to construct piece-specific models for unseen scores with unfixed repetitive structure and to derive tractable inference algorithms. Moreover, to describe approximate repetitions, we explicitly incorporate a process of modifying the repeated notes/note patterns. We apply these models as a prior music language model for rhythm transcription, where piece-specific score models are inferred from performed MIDI data by unsupervised learning, in contrast to the conventional supervised construction of score models. Evaluations using vocal melodies of popular music showed that the Bayesian models improved the transcription accuracy for most of the tested model types, indicating the universal efficacy of the proposed approach.
Tasks	Language Modelling
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06969v1
PDF	https://arxiv.org/pdf/1908.06969v1.pdf
PWC	https://paperswithcode.com/paper/music-transcription-based-on-bayesian-piece
Repo
Framework

Grounding Language Attributes to Objects using Bayesian Eigenobjects


Title	Grounding Language Attributes to Objects using Bayesian Eigenobjects
Authors	Vanya Cohen, Benjamin Burchfiel, Thao Nguyen, Nakul Gopalan, Stefanie Tellex, George Konidaris
Abstract	We develop a system to disambiguate object instances within the same class based on simple physical descriptions. The system takes as input a natural language phrase and a depth image containing a segmented object and predicts how similar the observed object is to the object described by the phrase. Our system is designed to learn from only a small amount of human-labeled language data and generalize to viewpoints not represented in the language-annotated depth image training set. By decoupling 3D shape representation from language representation, this method is able to ground language to novel objects using a small amount of language-annotated depth-data and a larger corpus of unlabeled 3D object meshes, even when these objects are partially observed from unusual viewpoints. Our system is able to disambiguate between novel objects, observed via depth images, based on natural language descriptions. Our method also enables view-point transfer; trained on human-annotated data on a small set of depth images captured from frontal viewpoints, our system successfully predicted object attributes from rear views despite having no such depth images in its training set. Finally, we demonstrate our approach on a Baxter robot, enabling it to pick specific objects based on human-provided natural language descriptions.
Tasks	3D Shape Representation
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13153v2
PDF	https://arxiv.org/pdf/1905.13153v2.pdf
PWC	https://paperswithcode.com/paper/grounding-language-attributes-to-objects
Repo
Framework

Positive-Unlabeled Reward Learning


Title	Positive-Unlabeled Reward Learning
Authors	Danfei Xu, Misha Denil
Abstract	Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL) for robotics. However, a major challenge in training agents from learned reward models is that the agent can learn to exploit errors in the reward model to achieve high reward behaviors that do not correspond to the intended task. These reward delusions can lead to unintended and even dangerous behaviors. On the other hand, adversarial imitation learning frameworks tend to suffer the opposite problem, where the discriminator learns to trivially distinguish agent and expert behavior, resulting in reward models that produce low reward signal regardless of the input state. In this paper, we connect these two classes of reward learning methods to positive-unlabeled (PU) learning, and we show that by applying a large-scale PU learning algorithm to the reward learning problem, we can address both the reward under- and over-estimation problems simultaneously. Our approach drastically improves both GAIL and supervised reward learning, without any additional assumptions.
Tasks	Imitation Learning
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00459v1
PDF	https://arxiv.org/pdf/1911.00459v1.pdf
PWC	https://paperswithcode.com/paper/positive-unlabeled-reward-learning
Repo
Framework

Fusing Visual, Textual and Connectivity Clues for Studying Mental Health


Title	Fusing Visual, Textual and Connectivity Clues for Studying Mental Health
Authors	Amir Hossein Yazdavar, Mohammad Saeid Mahdavinejad, Goonmeet Bajaj, William Romine, Amirhassan Monadjemi, Krishnaprasad Thirunarayan, Amit Sheth, Jyotishman Pathak
Abstract	With ubiquity of social media platforms, millions of people are sharing their online persona by expressing their thoughts, moods, emotions, feelings, and even their daily struggles with mental health issues voluntarily and publicly on social media. Unlike the most existing efforts which study depression by analyzing textual content, we examine and exploit multimodal big data to discern depressive behavior using a wide variety of features including individual-level demographics. By developing a multimodal framework and employing statistical techniques for fusing heterogeneous sets of features obtained by processing visual, textual and user interaction data, we significantly enhance the current state-of-the-art approaches for identifying depressed individuals on Twitter (improving the average F1-Score by 5 percent) as well as facilitate demographic inference from social media for broader applications. Besides providing insights into the relationship between demographics and mental health, our research assists in the design of a new breed of demographic-aware health interventions.
Tasks
Published	2019-02-19
URL	http://arxiv.org/abs/1902.06843v1
PDF	http://arxiv.org/pdf/1902.06843v1.pdf
PWC	https://paperswithcode.com/paper/fusing-visual-textual-and-connectivity-clues
Repo
Framework

Meta-Learned Per-Instance Algorithm Selection in Scholarly Recommender Systems


Title	Meta-Learned Per-Instance Algorithm Selection in Scholarly Recommender Systems
Authors	Andrew Collins, Joeran Beel
Abstract	The effectiveness of recommender system algorithms varies in different real-world scenarios. It is difficult to choose a best algorithm for a scenario due to the quantity of algorithms available, and because of their varying performances. Furthermore, it is not possible to choose one single algorithm that will work optimally for all recommendation requests. We apply meta-learning to this problem of algorithm selection for scholarly article recommendation. We train a random forest, gradient boosting machine, and generalized linear model, to predict a best-algorithm from a pool of content similarity-based algorithms. We evaluate our approach on an offline dataset for scholarly article recommendation and attempt to predict the best algorithm per-instance. The best meta-learning model achieved an average increase in F1 of 88% when compared to the average F1 of all base-algorithms (F1; 0.0708 vs 0.0376) and was significantly able to correctly select each base-algorithm (Paired t-test; p < 0.1). The meta-learner had a 3% higher F1 when compared to the single-best base-algorithm (F1; 0.0739 vs 0.0717). We further perform an online evaluation of our approach, conducting an A/B test through our recommender-as-a-service platform Mr. DLib. We deliver 148K recommendations to users between January and March 2019. User engagement was significantly increased for recommendations generated using our meta-learning approach when compared to a random selection of algorithm (Click-through rate (CTR); 0.51% vs. 0.44%, Chi-Squared test; p < 0.1), however our approach did not produce a higher CTR than the best algorithm alone (CTR; MoreLikeThis (Title): 0.58%).
Tasks	Meta-Learning, Recommendation Systems
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08694v1
PDF	https://arxiv.org/pdf/1912.08694v1.pdf
PWC	https://paperswithcode.com/paper/meta-learned-per-instance-algorithm-selection
Repo
Framework

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions


Title	Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions
Authors	Lars Buesing, Nicolas Heess, Theophane Weber
Abstract	A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models. Exact inference is often prohibitively expensive, as it may require evaluating the (unnormalized) target density on its entire domain. Here we consider the setting where only a limited budget of calls to the unnormalized density oracle is available, raising the challenge of where in the domain to allocate these function calls in order to construct a good approximate solution. We formulate this problem as an instance of sequential decision-making under uncertainty and leverage methods from reinforcement learning for probabilistic inference with budget constraints. In particular, we propose the TreeSample algorithm, an adaptation of Monte Carlo Tree Search to approximate inference. This algorithm caches all previous queries to the density oracle in an explicit search tree, and dynamically allocates new queries based on a “best-first” heuristic for exploration, using existing upper confidence bound methods. Our non-parametric inference method can be effectively combined with neural networks that compile approximate conditionals of the target, which are then used to guide the inference search and enable generalization across multiple target distributions. We show empirically that TreeSample outperforms standard approximate inference methods on synthetic factor graphs.
Tasks	Decision Making, Decision Making Under Uncertainty
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06862v1
PDF	https://arxiv.org/pdf/1910.06862v1.pdf
PWC	https://paperswithcode.com/paper/approximate-inference-in-discrete
Repo
Framework

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control


Title	Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control
Authors	Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine
Abstract	Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations – for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.
Tasks	Imitation Learning
Published	2019-10-30
URL	https://arxiv.org/abs/1910.14033v1
PDF	https://arxiv.org/pdf/1910.14033v1.pdf
PWC	https://paperswithcode.com/paper/plan-arithmetic-compositional-plan-vectors
Repo
Framework

Efficacy of Pixel-Level OOD Detection for Semantic Segmentation


Title	Efficacy of Pixel-Level OOD Detection for Semantic Segmentation
Authors	Matt Angus, Krzysztof Czarnecki, Rick Salay
Abstract	The detection of out of distribution samples for image classification has been widely researched. Safety critical applications, such as autonomous driving, would benefit from the ability to localise the unusual objects causing the image to be out of distribution. This paper adapts state-of-the-art methods for detecting out of distribution images for image classification to the new task of detecting out of distribution pixels, which can localise the unusual objects. It further experimentally compares the adapted methods on two new datasets derived from existing semantic segmentation datasets using PSPNet and DeeplabV3+ architectures, as well as proposing a new metric for the task. The evaluation shows that the performance ranking of the compared methods does not transfer to the new task and every method performs significantly worse than their image-level counterparts.
Tasks	Autonomous Driving, Image Classification, Semantic Segmentation
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02897v1
PDF	https://arxiv.org/pdf/1911.02897v1.pdf
PWC	https://paperswithcode.com/paper/efficacy-of-pixel-level-ood-detection-for-1
Repo
Framework

TiFi: Taxonomy Induction for Fictional Domains [Extended version]


Title	TiFi: Taxonomy Induction for Fictional Domains [Extended version]
Authors	Cuong Xuan Chu, Simon Razniewski, Gerhard Weikum
Abstract	Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10263v1
PDF	http://arxiv.org/pdf/1901.10263v1.pdf
PWC	https://paperswithcode.com/paper/tifi-taxonomy-induction-for-fictional-domains
Repo
Framework

Automatic Financial Trading Agent for Low-risk Portfolio Management using Deep Reinforcement Learning


Title	Automatic Financial Trading Agent for Low-risk Portfolio Management using Deep Reinforcement Learning
Authors	Wonsup Shin, Seok-Jun Bu, Sung-Bae Cho
Abstract	The autonomous trading agent is one of the most actively studied areas of artificial intelligence to solve the capital market portfolio management problem. The two primary goals of the portfolio management problem are maximizing profit and restrainting risk. However, most approaches to this problem solely take account of maximizing returns. Therefore, this paper proposes a deep reinforcement learning based trading agent that can manage the portfolio considering not only profit maximization but also risk restraint. We also propose a new target policy to allow the trading agent to learn to prefer low-risk actions. The new target policy can be reflected in the update by adjusting the greediness for the optimal action through the hyper parameter. The proposed trading agent verifies the performance through the data of the cryptocurrency market. The Cryptocurrency market is the best test-ground for testing our trading agents because of the huge amount of data accumulated every minute and the market volatility is extremely large. As a experimental result, during the test period, our agents achieved a return of 1800% and provided the least risky investment strategy among the existing methods. And, another experiment shows that the agent can maintain robust generalized performance even if market volatility is large or training period is short.
Tasks
Published	2019-09-07
URL	https://arxiv.org/abs/1909.03278v1
PDF	https://arxiv.org/pdf/1909.03278v1.pdf
PWC	https://paperswithcode.com/paper/automatic-financial-trading-agent-for-low
Repo
Framework

Distributed deep learning for robust multi-site segmentation of CT imaging after traumatic brain injury


Title	Distributed deep learning for robust multi-site segmentation of CT imaging after traumatic brain injury
Authors	Samuel Remedios, Snehashis Roy, Justin Blaber, Camilo Bermudez, Vishwesh Nath, Mayur B. Patel, John A. Butman, Bennett A. Landman, Dzung L. Pham
Abstract	Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available on which to train models. To address this conundrum, we analyze the efficacy of transferring the model itself in lieu of data between different sites. By doing so we accomplish two goals: 1) the model gains access to training on a larger dataset that it could not normally obtain and 2) the model better generalizes, having trained on data from separate locations. In this paper, we implement multi-site learning with disparate datasets from the National Institutes of Health (NIH) and Vanderbilt University Medical Center (VUMC) without compromising PHI. Three neural networks are trained to convergence on a computed tomography (CT) brain hematoma segmentation task: one only with NIH data,one only with VUMC data, and one multi-site model alternating between NIH and VUMC data. Resultant lesion masks with the multi-site model attain an average Dice similarity coefficient of 0.64 and the automatically segmented hematoma volumes correlate to those done manually with a Pearson correlation coefficient of 0.87,corresponding to an 8% and 5% improvement, respectively, over the single-site model counterparts.
Tasks	Computed Tomography (CT)
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04207v1
PDF	http://arxiv.org/pdf/1903.04207v1.pdf
PWC	https://paperswithcode.com/paper/distributed-deep-learning-for-robust-multi
Repo
Framework

Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study


Title	Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study
Authors	Erion Çano, Ondřej Bojar
Abstract	Using data-driven models for solving text summarization or similar tasks has become very common in the last years. Yet most of the studies report basic accuracy scores only, and nothing is known about the ability of the proposed models to improve when trained on more data. In this paper, we define and propose three data efficiency metrics: data score efficiency, data time deficiency and overall data efficiency. We also propose a simple scheme that uses those metrics and apply it for a more comprehensive evaluation of popular methods on text summarization and title generation tasks. For the latter task, we process and release a huge collection of 35 million abstract-title pairs from scientific articles. Our results reveal that among the tested models, the Transformer is the most efficient on both tasks.
Tasks	Text Summarization
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06618v1
PDF	https://arxiv.org/pdf/1909.06618v1.pdf
PWC	https://paperswithcode.com/paper/efficiency-metrics-for-data-driven-models-a
Repo
Framework

Detection and Classification of Breast Cancer Metastates Based on U-Net


Title	Detection and Classification of Breast Cancer Metastates Based on U-Net
Authors	Lin Xu, Cheng Xu, Yi Tong, Yu Chun Su
Abstract	This paper presents U-net based breast cancer metastases detection and classification in lymph nodes, as well as patient-level classification based on metastases detection. The whole pipeline can be divided into five steps: preprocessing and data argumentation, patch-based segmentation, post processing, slide-level classification, and patient-level classification. In order to reduce overfitting and speedup convergence, we applied batch normalization and dropout into U-Net. The final Kappa score reaches 0.902 on training data.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04141v1
PDF	https://arxiv.org/pdf/1909.04141v1.pdf
PWC	https://paperswithcode.com/paper/detection-and-classification-of-breast-cancer
Repo
Framework