January 27, 2020

2885 words 14 mins read

Paper Group ANR 1306

Paper Group ANR 1306

An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks. ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling – RRC-LSVT. A Unified Point-Based Framework for 3D Segmentation. Neural Machine Translation with Recurrent Highway Networks. Transfer Learning with Sparse Associative Memories. Toward …

An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks

Title An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks
Authors Charles P Martin, Jim Torresen
Abstract This paper is about creating digital musical instruments where a predictive neural network model is integrated into the interactive system. Rather than predicting symbolic music (e.g., MIDI notes), we suggest that predicting future control data from the user and precise temporal information can lead to new and interesting interactive possibilities. We propose that a mixture density recurrent neural network (MDRNN) is an appropriate model for this task. The predictions can be used to fill-in control data when the user stops performing, or as a kind of filter on the user’s own input. We present an interactive MDRNN prediction server that allows rapid prototyping of new NIMEs featuring predictive musical interaction by recording datasets, training MDRNN models, and experimenting with interaction modes. We illustrate our system with several example NIMEs applying this idea. Our evaluation shows that real-time predictive interaction is viable even on single-board computers and that small models are appropriate for small datasets.
Tasks
Published 2019-04-10
URL http://arxiv.org/abs/1904.05009v1
PDF http://arxiv.org/pdf/1904.05009v1.pdf
PWC https://paperswithcode.com/paper/an-interactive-musical-prediction-system-with
Repo
Framework

ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling – RRC-LSVT

Title ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling – RRC-LSVT
Authors Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
Abstract Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 50, 000 and 400, 000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing the gap between research benchmarks and real applications. During the competition period, a total of 41 teams participated in the two proposed tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2019-LSVT challenge.
Tasks Text Spotting
Published 2019-09-17
URL https://arxiv.org/abs/1909.07741v1
PDF https://arxiv.org/pdf/1909.07741v1.pdf
PWC https://paperswithcode.com/paper/icdar-2019-competition-on-large-scale-street
Repo
Framework

A Unified Point-Based Framework for 3D Segmentation

Title A Unified Point-Based Framework for 3D Segmentation
Authors Hung-Yueh Chiang, Yen-Liang Lin, Yueh-Cheng Liu, Winston H. Hsu
Abstract 3D point cloud segmentation remains challenging for structureless and textureless regions. We present a new unified point-based framework for 3D point cloud segmentation that effectively optimizes pixel-level features, geometrical structures and global context priors of an entire scene. By back-projecting 2D image features into 3D coordinates, our network learns 2D textural appearance and 3D structural features in a unified framework. In addition, we investigate a global context prior to obtain a better prediction. We evaluate our framework on ScanNet online benchmark and show that our method outperforms several state-of-the-art approaches. We explore synthesizing camera poses in 3D reconstructed scenes for achieving higher performance. In-depth analysis on feature combinations and synthetic camera pose verify that features from different modalities benefit each other and dense camera pose sampling further improves the segmentation results.
Tasks
Published 2019-08-01
URL https://arxiv.org/abs/1908.00478v4
PDF https://arxiv.org/pdf/1908.00478v4.pdf
PWC https://paperswithcode.com/paper/a-unified-point-based-framework-for-3d
Repo
Framework

Neural Machine Translation with Recurrent Highway Networks

Title Neural Machine Translation with Recurrent Highway Networks
Authors Maulik Parmar, V. Susheela Devi
Abstract Recurrent Neural Networks have lately gained a lot of popularity in language modelling tasks, especially in neural machine translation(NMT). Very recent NMT models are based on Encoder-Decoder, where a deep LSTM based encoder is used to project the source sentence to a fixed dimensional vector and then another deep LSTM decodes the target sentence from the vector. However there has been very little work on exploring architectures that have more than one layer in space(i.e. in each time step). This paper examines the effectiveness of the simple Recurrent Highway Networks(RHN) in NMT tasks. The model uses Recurrent Highway Neural Network in encoder and decoder, with attention .We also explore the reconstructor model to improve adequacy. We demonstrate the effectiveness of all three approaches on the IWSLT English-Vietnamese dataset. We see that RHN performs on par with LSTM based models and even better in some cases.We see that deep RHN models are easy to train compared to deep LSTM based models because of highway connections. The paper also investigates the effects of increasing recurrent depth in each time step.
Tasks Language Modelling, Machine Translation
Published 2019-04-28
URL http://arxiv.org/abs/1905.01996v1
PDF http://arxiv.org/pdf/1905.01996v1.pdf
PWC https://paperswithcode.com/paper/190501996
Repo
Framework

Transfer Learning with Sparse Associative Memories

Title Transfer Learning with Sparse Associative Memories
Authors Quentin Jodelet, Vincent Gripon, Masafumi Hagiwara
Abstract In this paper, we introduce a novel layer designed to be used as the output of pre-trained neural networks in the context of classification. Based on Associative Memories, this layer can help design Deep Neural Networks which support incremental learning and that can be (partially) trained in real time on embedded devices. Experiments on the ImageNet dataset and other different domain specific datasets show that it is possible to design more flexible and faster-to-train Neural Networks at the cost of a slight decrease in accuracy.
Tasks Transfer Learning
Published 2019-04-04
URL https://arxiv.org/abs/1904.02420v3
PDF https://arxiv.org/pdf/1904.02420v3.pdf
PWC https://paperswithcode.com/paper/transfer-learning-with-sparse-associative
Repo
Framework

Towards Unconstrained End-to-End Text Spotting

Title Towards Unconstrained End-to-End Text Spotting
Authors Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao
Abstract We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Additionally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%, and on the Total-Text (curved) benchmark by more than 16%.
Tasks Instance Segmentation, Optical Character Recognition, Semantic Segmentation, Text Spotting
Published 2019-08-24
URL https://arxiv.org/abs/1908.09231v1
PDF https://arxiv.org/pdf/1908.09231v1.pdf
PWC https://paperswithcode.com/paper/towards-unconstrained-end-to-end-text
Repo
Framework

Mutex Graphs and Multicliques: Reducing Grounding Size for Planning

Title Mutex Graphs and Multicliques: Reducing Grounding Size for Planning
Authors David Spies, Jia-Huai You, Ryan Hayward
Abstract We present an approach to representing large sets of mutual exclusions, also known as mutexes or mutex constraints. These are the types of constraints that specify the exclusion of some properties, events, processes, and so on. They are ubiquitous in many areas of applications. The size of these constraints for a given problem can be overwhelming enough to present a bottleneck for the solving efficiency of the underlying solver. In this paper, we propose a novel graph-theoretic technique based on multicliques for a compact representation of mutex constraints and apply it to domain-independent planning in ASP. As computing a minimum multiclique covering from a mutex graph is NP-hard, we propose an efficient approximation algorithm for multiclique covering and show experimentally that it generates substantially smaller grounding size for mutex constraints in ASP than the previously known work in SAT.
Tasks
Published 2019-09-18
URL https://arxiv.org/abs/1909.08240v1
PDF https://arxiv.org/pdf/1909.08240v1.pdf
PWC https://paperswithcode.com/paper/mutex-graphs-and-multicliques-reducing
Repo
Framework

Towards a Framework to Manage Perceptual Uncertainty for Safe Automated Driving

Title Towards a Framework to Manage Perceptual Uncertainty for Safe Automated Driving
Authors Krzysztof Czarnecki, Rick Salay
Abstract Perception is a safety-critical function of autonomous vehicles and machine learning (ML) plays a key role in its implementation. This position paper identifies (1) perceptual uncertainty as a performance measure used to define safety requirements and (2) its influence factors when using supervised ML. This work is a first step towards a framework for measuring and controling the effects of these factors and supplying evidence to support claims about perceptual uncertainty.
Tasks Autonomous Vehicles
Published 2019-03-03
URL http://arxiv.org/abs/1903.03438v1
PDF http://arxiv.org/pdf/1903.03438v1.pdf
PWC https://paperswithcode.com/paper/towards-a-framework-to-manage-perceptual
Repo
Framework

Hybrid Precoding for Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach

Title Hybrid Precoding for Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach
Authors Ahmet M. Elbir, Anastasios Papazafeiropoulos
Abstract In multi-user millimeter wave (mmWave) multiple-input-multiple-output (MIMO) systems, hybrid precoding is a crucial task to lower the complexity and cost while achieving a sufficient sum-rate. Previous works on hybrid precoding were usually based on optimization or greedy approaches. These methods either provide higher complexity or have sub-optimum performance. Moreover, the performance of these methods mostly relies on the quality of the channel data. In this work, we propose a deep learning (DL) framework to improve the performance and provide less computation time as compared to conventional techniques. In fact, we design a convolutional neural network for MIMO (CNN-MIMO) that accepts as input an imperfect channel matrix and gives the analog precoder and combiners at the output. The procedure includes two main stages. First, we develop an exhaustive search algorithm to select the analog precoder and combiners from a predefined codebook maximizing the achievable sum-rate. Then, the selected precoder and combiners are used as output labels in the training stage of CNN-MIMO where the input-output pairs are obtained. We evaluate the performance of the proposed method through numerous and extensive simulations and show that the proposed DL framework outperforms conventional techniques. Overall, CNN-MIMO provides a robust hybrid precoding scheme in the presence of imperfections regarding the channel matrix. On top of this, the proposed approach exhibits less computation time with comparison to the optimization and codebook based approaches.
Tasks
Published 2019-11-11
URL https://arxiv.org/abs/1911.04239v1
PDF https://arxiv.org/pdf/1911.04239v1.pdf
PWC https://paperswithcode.com/paper/hybrid-precoding-for-multi-user-millimeter
Repo
Framework

Exponential Family Graph Embeddings

Title Exponential Family Graph Embeddings
Authors Abdulkadir Çelikkanat, Fragkiskos D. Malliaros
Abstract Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional \textit{Skip-Gram} model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic \textit{exponential family graph embedding} model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.
Tasks Graph Embedding, Link Prediction, Node Classification, Representation Learning
Published 2019-11-20
URL https://arxiv.org/abs/1911.09007v1
PDF https://arxiv.org/pdf/1911.09007v1.pdf
PWC https://paperswithcode.com/paper/exponential-family-graph-embeddings
Repo
Framework

Decision Propagation Networks for Image Classification

Title Decision Propagation Networks for Image Classification
Authors Keke Tang, Peng Song, Yuexin Ma, Zhaoquan Gu, Yu Su, Zhihong Tian, Wenping Wang
Abstract High-level (e.g., semantic) features encoded in the latter layers of convolutional neural networks are extensively exploited for image classification, leaving low-level (e.g., color) features in the early layers underexplored. In this paper, we propose a novel Decision Propagation Module (DPM) to make an intermediate decision that could act as category-coherent guidance extracted from early layers, and then propagate it to the latter layers. Therefore, by stacking a collection of DPMs into a classification network, the generated Decision Propagation Network is explicitly formulated as to progressively encode more discriminative features guided by the decision, and then refine the decision based on the new generated features layer by layer. Comprehensive results on four publicly available datasets validate DPM could bring significant improvements for existing classification networks with minimal additional computational cost and is superior to the state-of-the-art methods.
Tasks Image Classification
Published 2019-11-27
URL https://arxiv.org/abs/1911.12101v1
PDF https://arxiv.org/pdf/1911.12101v1.pdf
PWC https://paperswithcode.com/paper/decision-propagation-networks-for-image
Repo
Framework

Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning

Title Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
Authors Junchao Zhang, Yuxin Peng
Abstract Video captioning aims to automatically generate natural language descriptions of video content, which has drawn a lot of attention recent years. Generating accurate and fine-grained captions needs to not only understand the global content of video, but also capture the detailed object information. Meanwhile, video representations have great impact on the quality of generated captions. Thus, it is important for video captioning to capture salient objects with their detailed temporal dynamics, and represent them using discriminative spatio-temporal representations. In this paper, we propose a new video captioning approach based on object-aware aggregation with bidirectional temporal graph (OA-BTG), which captures detailed temporal dynamics for salient objects in video, and learns discriminative spatio-temporal representations by performing object-aware local feature aggregation on detected object regions. The main novelties and advantages are: (1) Bidirectional temporal graph: A bidirectional temporal graph is constructed along and reversely along the temporal order, which provides complementary ways to capture the temporal trajectories for each salient object. (2) Object-aware aggregation: Learnable VLAD (Vector of Locally Aggregated Descriptors) models are constructed on object temporal trajectories and global frame sequence, which performs object-aware aggregation to learn discriminative representations. A hierarchical attention mechanism is also developed to distinguish different contributions of multiple objects. Experiments on two widely-used datasets demonstrate our OA-BTG achieves state-of-the-art performance in terms of BLEU@4, METEOR and CIDEr metrics.
Tasks Video Captioning
Published 2019-06-11
URL https://arxiv.org/abs/1906.04375v1
PDF https://arxiv.org/pdf/1906.04375v1.pdf
PWC https://paperswithcode.com/paper/object-aware-aggregation-with-bidirectional-1
Repo
Framework

Confidentiality and linked data

Title Confidentiality and linked data
Authors Felix Ritchie, Jim Smith
Abstract Data providers such as government statistical agencies perform a balancing act: maximising information published to inform decision-making and research, while simultaneously protecting privacy. The emergence of identified administrative datasets with the potential for sharing (and thus linking) offers huge potential benefits but significant additional risks. This article introduces the principles and methods of linking data across different sources and points in time, focusing on potential areas of risk. We then consider confidentiality risk, focusing in particular on the “intruder” problem central to the area, and looking at both risks from data producer outputs and from the release of micro-data for further analysis. Finally, we briefly consider potential solutions to micro-data release, both the statistical solutions considered in other contributed articles and non-statistical solutions.
Tasks Decision Making
Published 2019-07-15
URL https://arxiv.org/abs/1907.06465v1
PDF https://arxiv.org/pdf/1907.06465v1.pdf
PWC https://paperswithcode.com/paper/confidentiality-and-linked-data
Repo
Framework

Correcting Sociodemographic Selection Biases for Accurate Population Prediction from Social Media

Title Correcting Sociodemographic Selection Biases for Accurate Population Prediction from Social Media
Authors Salvatore Giorgi, Veronica Lynn, Sandra Matz, Lyle Ungar, H. Andrew Schwartz
Abstract Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population — a “selection bias”. Across five tasks for predicting US county population health statistics from Twitter, we explore standard restratification techniques — bias mitigation approaches that reweight people-specific variables according to how under-sampled their socio-demographic groups are. We found standard restratification provided no improvement and often degraded population prediction accuracy. The core reason for this seemed to be both shrunken and sparse estimates of each population’s socio-demographics for which we thus develop and evaluate three methods to address: predictive redistribution to account for shrinking, as well as adaptive binning and informed smoothing to handle sparse socio-demographic estimates. We show each of our methods can significantly improve over the standard restratification approaches. Combining approaches, we find substantial improvements over non-restratified models as well, yielding a 35.4% increase in variance explained for predicting surveyed life satisfaction, and an 10.0% average increase across all tasks.
Tasks
Published 2019-11-10
URL https://arxiv.org/abs/1911.03855v1
PDF https://arxiv.org/pdf/1911.03855v1.pdf
PWC https://paperswithcode.com/paper/correcting-sociodemographic-selection-biases
Repo
Framework

KCAT: A Knowledge-Constraint Typing Annotation Tool

Title KCAT: A Knowledge-Constraint Typing Annotation Tool
Authors Sheng Lin, Luye Zheng, Bo Chen, Siliang Tang, Yueting Zhuang, Fei Wu, Zhigang Chen, Guoping Hu, Xiang Ren
Abstract Fine-grained Entity Typing is a tough task which suffers from noise samples extracted from distant supervision. Thousands of manually annotated samples can achieve greater performance than millions of samples generated by the previous distant supervision method. Whereas, it’s hard for human beings to differentiate and memorize thousands of types, thus making large-scale human labeling hardly possible. In this paper, we introduce a Knowledge-Constraint Typing Annotation Tool (KCAT), which is efficient for fine-grained entity typing annotation. KCAT reduces the size of candidate types to an acceptable range for human beings through entity linking and provides a Multi-step Typing scheme to revise the entity linking result. Moreover, KCAT provides an efficient Annotator Client to accelerate the annotation process and a comprehensive Manager Module to analyse crowdsourcing annotations. Experiment shows that KCAT can significantly improve annotation efficiency, the time consumption increases slowly as the size of type set expands.
Tasks Entity Linking, Entity Typing
Published 2019-06-13
URL https://arxiv.org/abs/1906.05670v1
PDF https://arxiv.org/pdf/1906.05670v1.pdf
PWC https://paperswithcode.com/paper/kcat-a-knowledge-constraint-typing-annotation
Repo
Framework
comments powered by Disqus