January 27, 2020

2885 words 14 mins read

Paper Group ANR 1306

An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks. ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling – RRC-LSVT. A Unified Point-Based Framework for 3D Segmentation. Neural Machine Translation with Recurrent Highway Networks. Transfer Learning with Sparse Associative Memories. Toward …

An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks


Title	An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks
Authors	Charles P Martin, Jim Torresen
Abstract	This paper is about creating digital musical instruments where a predictive neural network model is integrated into the interactive system. Rather than predicting symbolic music (e.g., MIDI notes), we suggest that predicting future control data from the user and precise temporal information can lead to new and interesting interactive possibilities. We propose that a mixture density recurrent neural network (MDRNN) is an appropriate model for this task. The predictions can be used to fill-in control data when the user stops performing, or as a kind of filter on the user’s own input. We present an interactive MDRNN prediction server that allows rapid prototyping of new NIMEs featuring predictive musical interaction by recording datasets, training MDRNN models, and experimenting with interaction modes. We illustrate our system with several example NIMEs applying this idea. Our evaluation shows that real-time predictive interaction is viable even on single-board computers and that small models are appropriate for small datasets.
Tasks
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05009v1
PDF	http://arxiv.org/pdf/1904.05009v1.pdf
PWC	https://paperswithcode.com/paper/an-interactive-musical-prediction-system-with
Repo
Framework

ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling – RRC-LSVT


Title	ICDAR 2019 Competition on Large-scale Street View Text with Partial Labeling – RRC-LSVT
Authors	Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
Abstract	Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 50, 000 and 400, 000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing the gap between research benchmarks and real applications. During the competition period, a total of 41 teams participated in the two proposed tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2019-LSVT challenge.
Tasks	Text Spotting
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07741v1
PDF	https://arxiv.org/pdf/1909.07741v1.pdf
PWC	https://paperswithcode.com/paper/icdar-2019-competition-on-large-scale-street
Repo
Framework

A Unified Point-Based Framework for 3D Segmentation


Title	A Unified Point-Based Framework for 3D Segmentation
Authors	Hung-Yueh Chiang, Yen-Liang Lin, Yueh-Cheng Liu, Winston H. Hsu
Abstract	3D point cloud segmentation remains challenging for structureless and textureless regions. We present a new unified point-based framework for 3D point cloud segmentation that effectively optimizes pixel-level features, geometrical structures and global context priors of an entire scene. By back-projecting 2D image features into 3D coordinates, our network learns 2D textural appearance and 3D structural features in a unified framework. In addition, we investigate a global context prior to obtain a better prediction. We evaluate our framework on ScanNet online benchmark and show that our method outperforms several state-of-the-art approaches. We explore synthesizing camera poses in 3D reconstructed scenes for achieving higher performance. In-depth analysis on feature combinations and synthetic camera pose verify that features from different modalities benefit each other and dense camera pose sampling further improves the segmentation results.
Tasks
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00478v4
PDF	https://arxiv.org/pdf/1908.00478v4.pdf
PWC	https://paperswithcode.com/paper/a-unified-point-based-framework-for-3d
Repo
Framework

Neural Machine Translation with Recurrent Highway Networks


Title	Neural Machine Translation with Recurrent Highway Networks
Authors	Maulik Parmar, V. Susheela Devi
Abstract	Recurrent Neural Networks have lately gained a lot of popularity in language modelling tasks, especially in neural machine translation(NMT). Very recent NMT models are based on Encoder-Decoder, where a deep LSTM based encoder is used to project the source sentence to a fixed dimensional vector and then another deep LSTM decodes the target sentence from the vector. However there has been very little work on exploring architectures that have more than one layer in space(i.e. in each time step). This paper examines the effectiveness of the simple Recurrent Highway Networks(RHN) in NMT tasks. The model uses Recurrent Highway Neural Network in encoder and decoder, with attention .We also explore the reconstructor model to improve adequacy. We demonstrate the effectiveness of all three approaches on the IWSLT English-Vietnamese dataset. We see that RHN performs on par with LSTM based models and even better in some cases.We see that deep RHN models are easy to train compared to deep LSTM based models because of highway connections. The paper also investigates the effects of increasing recurrent depth in each time step.
Tasks	Language Modelling, Machine Translation
Published	2019-04-28
URL	http://arxiv.org/abs/1905.01996v1
PDF	http://arxiv.org/pdf/1905.01996v1.pdf
PWC	https://paperswithcode.com/paper/190501996
Repo
Framework

Transfer Learning with Sparse Associative Memories


Title	Transfer Learning with Sparse Associative Memories
Authors	Quentin Jodelet, Vincent Gripon, Masafumi Hagiwara
Abstract	In this paper, we introduce a novel layer designed to be used as the output of pre-trained neural networks in the context of classification. Based on Associative Memories, this layer can help design Deep Neural Networks which support incremental learning and that can be (partially) trained in real time on embedded devices. Experiments on the ImageNet dataset and other different domain specific datasets show that it is possible to design more flexible and faster-to-train Neural Networks at the cost of a slight decrease in accuracy.
Tasks	Transfer Learning
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02420v3
PDF	https://arxiv.org/pdf/1904.02420v3.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-with-sparse-associative
Repo
Framework

Towards Unconstrained End-to-End Text Spotting


Title	Towards Unconstrained End-to-End Text Spotting
Authors	Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, Ying Xiao
Abstract	We propose an end-to-end trainable network that can simultaneously detect and recognize text of arbitrary shape, making substantial progress on the open problem of reading scene text of irregular shape. We formulate arbitrary shape text detection as an instance segmentation problem; an attention model is then used to decode the textual content of each irregularly shaped text region without rectification. To extract useful irregularly shaped text instance features from image scale features, we propose a simple yet effective RoI masking step. Additionally, we show that predictions from an existing multi-step OCR engine can be leveraged as partially labeled training data, which leads to significant improvements in both the detection and recognition accuracy of our model. Our method surpasses the state-of-the-art for end-to-end recognition tasks on the ICDAR15 (straight) benchmark by 4.6%, and on the Total-Text (curved) benchmark by more than 16%.
Tasks	Instance Segmentation, Optical Character Recognition, Semantic Segmentation, Text Spotting
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09231v1
PDF	https://arxiv.org/pdf/1908.09231v1.pdf
PWC	https://paperswithcode.com/paper/towards-unconstrained-end-to-end-text
Repo
Framework

Mutex Graphs and Multicliques: Reducing Grounding Size for Planning


Title	Mutex Graphs and Multicliques: Reducing Grounding Size for Planning
Authors	David Spies, Jia-Huai You, Ryan Hayward
Abstract	We present an approach to representing large sets of mutual exclusions, also known as mutexes or mutex constraints. These are the types of constraints that specify the exclusion of some properties, events, processes, and so on. They are ubiquitous in many areas of applications. The size of these constraints for a given problem can be overwhelming enough to present a bottleneck for the solving efficiency of the underlying solver. In this paper, we propose a novel graph-theoretic technique based on multicliques for a compact representation of mutex constraints and apply it to domain-independent planning in ASP. As computing a minimum multiclique covering from a mutex graph is NP-hard, we propose an efficient approximation algorithm for multiclique covering and show experimentally that it generates substantially smaller grounding size for mutex constraints in ASP than the previously known work in SAT.
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08240v1
PDF	https://arxiv.org/pdf/1909.08240v1.pdf
PWC	https://paperswithcode.com/paper/mutex-graphs-and-multicliques-reducing
Repo
Framework

Towards a Framework to Manage Perceptual Uncertainty for Safe Automated Driving


Title	Towards a Framework to Manage Perceptual Uncertainty for Safe Automated Driving
Authors	Krzysztof Czarnecki, Rick Salay
Abstract	Perception is a safety-critical function of autonomous vehicles and machine learning (ML) plays a key role in its implementation. This position paper identifies (1) perceptual uncertainty as a performance measure used to define safety requirements and (2) its influence factors when using supervised ML. This work is a first step towards a framework for measuring and controling the effects of these factors and supplying evidence to support claims about perceptual uncertainty.
Tasks	Autonomous Vehicles
Published	2019-03-03
URL	http://arxiv.org/abs/1903.03438v1
PDF	http://arxiv.org/pdf/1903.03438v1.pdf
PWC	https://paperswithcode.com/paper/towards-a-framework-to-manage-perceptual
Repo
Framework

Hybrid Precoding for Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach


Title	Hybrid Precoding for Multi-User Millimeter Wave Massive MIMO Systems: A Deep Learning Approach
Authors	Ahmet M. Elbir, Anastasios Papazafeiropoulos
Abstract	In multi-user millimeter wave (mmWave) multiple-input-multiple-output (MIMO) systems, hybrid precoding is a crucial task to lower the complexity and cost while achieving a sufficient sum-rate. Previous works on hybrid precoding were usually based on optimization or greedy approaches. These methods either provide higher complexity or have sub-optimum performance. Moreover, the performance of these methods mostly relies on the quality of the channel data. In this work, we propose a deep learning (DL) framework to improve the performance and provide less computation time as compared to conventional techniques. In fact, we design a convolutional neural network for MIMO (CNN-MIMO) that accepts as input an imperfect channel matrix and gives the analog precoder and combiners at the output. The procedure includes two main stages. First, we develop an exhaustive search algorithm to select the analog precoder and combiners from a predefined codebook maximizing the achievable sum-rate. Then, the selected precoder and combiners are used as output labels in the training stage of CNN-MIMO where the input-output pairs are obtained. We evaluate the performance of the proposed method through numerous and extensive simulations and show that the proposed DL framework outperforms conventional techniques. Overall, CNN-MIMO provides a robust hybrid precoding scheme in the presence of imperfections regarding the channel matrix. On top of this, the proposed approach exhibits less computation time with comparison to the optimization and codebook based approaches.
Tasks
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04239v1
PDF	https://arxiv.org/pdf/1911.04239v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-precoding-for-multi-user-millimeter
Repo
Framework

Exponential Family Graph Embeddings


Title	Exponential Family Graph Embeddings
Authors	Abdulkadir Çelikkanat, Fragkiskos D. Malliaros
Abstract	Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional \textit{Skip-Gram} model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic \textit{exponential family graph embedding} model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.
Tasks	Graph Embedding, Link Prediction, Node Classification, Representation Learning
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09007v1
PDF	https://arxiv.org/pdf/1911.09007v1.pdf
PWC	https://paperswithcode.com/paper/exponential-family-graph-embeddings
Repo
Framework

Decision Propagation Networks for Image Classification


Title	Decision Propagation Networks for Image Classification
Authors	Keke Tang, Peng Song, Yuexin Ma, Zhaoquan Gu, Yu Su, Zhihong Tian, Wenping Wang
Abstract	High-level (e.g., semantic) features encoded in the latter layers of convolutional neural networks are extensively exploited for image classification, leaving low-level (e.g., color) features in the early layers underexplored. In this paper, we propose a novel Decision Propagation Module (DPM) to make an intermediate decision that could act as category-coherent guidance extracted from early layers, and then propagate it to the latter layers. Therefore, by stacking a collection of DPMs into a classification network, the generated Decision Propagation Network is explicitly formulated as to progressively encode more discriminative features guided by the decision, and then refine the decision based on the new generated features layer by layer. Comprehensive results on four publicly available datasets validate DPM could bring significant improvements for existing classification networks with minimal additional computational cost and is superior to the state-of-the-art methods.
Tasks	Image Classification
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12101v1
PDF	https://arxiv.org/pdf/1911.12101v1.pdf
PWC	https://paperswithcode.com/paper/decision-propagation-networks-for-image
Repo
Framework

Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning


Title	Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
Authors	Junchao Zhang, Yuxin Peng
Abstract	Video captioning aims to automatically generate natural language descriptions of video content, which has drawn a lot of attention recent years. Generating accurate and fine-grained captions needs to not only understand the global content of video, but also capture the detailed object information. Meanwhile, video representations have great impact on the quality of generated captions. Thus, it is important for video captioning to capture salient objects with their detailed temporal dynamics, and represent them using discriminative spatio-temporal representations. In this paper, we propose a new video captioning approach based on object-aware aggregation with bidirectional temporal graph (OA-BTG), which captures detailed temporal dynamics for salient objects in video, and learns discriminative spatio-temporal representations by performing object-aware local feature aggregation on detected object regions. The main novelties and advantages are: (1) Bidirectional temporal graph: A bidirectional temporal graph is constructed along and reversely along the temporal order, which provides complementary ways to capture the temporal trajectories for each salient object. (2) Object-aware aggregation: Learnable VLAD (Vector of Locally Aggregated Descriptors) models are constructed on object temporal trajectories and global frame sequence, which performs object-aware aggregation to learn discriminative representations. A hierarchical attention mechanism is also developed to distinguish different contributions of multiple objects. Experiments on two widely-used datasets demonstrate our OA-BTG achieves state-of-the-art performance in terms of BLEU@4, METEOR and CIDEr metrics.
Tasks	Video Captioning
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04375v1
PDF	https://arxiv.org/pdf/1906.04375v1.pdf
PWC	https://paperswithcode.com/paper/object-aware-aggregation-with-bidirectional-1
Repo
Framework

Confidentiality and linked data


Title	Confidentiality and linked data
Authors	Felix Ritchie, Jim Smith
Abstract	Data providers such as government statistical agencies perform a balancing act: maximising information published to inform decision-making and research, while simultaneously protecting privacy. The emergence of identified administrative datasets with the potential for sharing (and thus linking) offers huge potential benefits but significant additional risks. This article introduces the principles and methods of linking data across different sources and points in time, focusing on potential areas of risk. We then consider confidentiality risk, focusing in particular on the “intruder” problem central to the area, and looking at both risks from data producer outputs and from the release of micro-data for further analysis. Finally, we briefly consider potential solutions to micro-data release, both the statistical solutions considered in other contributed articles and non-statistical solutions.
Tasks	Decision Making
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06465v1
PDF	https://arxiv.org/pdf/1907.06465v1.pdf
PWC	https://paperswithcode.com/paper/confidentiality-and-linked-data
Repo
Framework


Title	Correcting Sociodemographic Selection Biases for Accurate Population Prediction from Social Media
Authors	Salvatore Giorgi, Veronica Lynn, Sandra Matz, Lyle Ungar, H. Andrew Schwartz
Abstract	Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population — a “selection bias”. Across five tasks for predicting US county population health statistics from Twitter, we explore standard restratification techniques — bias mitigation approaches that reweight people-specific variables according to how under-sampled their socio-demographic groups are. We found standard restratification provided no improvement and often degraded population prediction accuracy. The core reason for this seemed to be both shrunken and sparse estimates of each population’s socio-demographics for which we thus develop and evaluate three methods to address: predictive redistribution to account for shrinking, as well as adaptive binning and informed smoothing to handle sparse socio-demographic estimates. We show each of our methods can significantly improve over the standard restratification approaches. Combining approaches, we find substantial improvements over non-restratified models as well, yielding a 35.4% increase in variance explained for predicting surveyed life satisfaction, and an 10.0% average increase across all tasks.
Tasks
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03855v1
PDF	https://arxiv.org/pdf/1911.03855v1.pdf
PWC	https://paperswithcode.com/paper/correcting-sociodemographic-selection-biases
Repo
Framework

KCAT: A Knowledge-Constraint Typing Annotation Tool


Title	KCAT: A Knowledge-Constraint Typing Annotation Tool
Authors	Sheng Lin, Luye Zheng, Bo Chen, Siliang Tang, Yueting Zhuang, Fei Wu, Zhigang Chen, Guoping Hu, Xiang Ren
Abstract	Fine-grained Entity Typing is a tough task which suffers from noise samples extracted from distant supervision. Thousands of manually annotated samples can achieve greater performance than millions of samples generated by the previous distant supervision method. Whereas, it’s hard for human beings to differentiate and memorize thousands of types, thus making large-scale human labeling hardly possible. In this paper, we introduce a Knowledge-Constraint Typing Annotation Tool (KCAT), which is efficient for fine-grained entity typing annotation. KCAT reduces the size of candidate types to an acceptable range for human beings through entity linking and provides a Multi-step Typing scheme to revise the entity linking result. Moreover, KCAT provides an efficient Annotator Client to accelerate the annotation process and a comprehensive Manager Module to analyse crowdsourcing annotations. Experiment shows that KCAT can significantly improve annotation efficiency, the time consumption increases slowly as the size of type set expands.
Tasks	Entity Linking, Entity Typing
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05670v1
PDF	https://arxiv.org/pdf/1906.05670v1.pdf
PWC	https://paperswithcode.com/paper/kcat-a-knowledge-constraint-typing-annotation
Repo
Framework