October 17, 2019

3052 words 15 mins read

Paper Group ANR 936

On Stacked Denoising Autoencoder based Pre-training of ANN for Isolated Handwritten Bengali Numerals Dataset Recognition. Reasoning about Unforeseen Possibilities During Policy Learning. An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer. Optimal Stochastic Delivery Planning in Full-Truckload and Less-Than-Truckl …

On Stacked Denoising Autoencoder based Pre-training of ANN for Isolated Handwritten Bengali Numerals Dataset Recognition


Title	On Stacked Denoising Autoencoder based Pre-training of ANN for Isolated Handwritten Bengali Numerals Dataset Recognition
Authors	Al Mehdi Saadat Chowdhury, M. Shahidur Rahman, Asia Khanom, Tamanna Islam Chowdhury, Afaz Uddin
Abstract	This work attempts to find the most optimal parameter setting of a deep artificial neural network (ANN) for Bengali digit dataset by pre-training it using stacked denoising autoencoder (SDA). Although SDA based recognition is hugely popular in image, speech and language processing related tasks among the researchers, it was never tried in Bengali dataset recognition. For this work, a dataset of 70000 handwritten samples were used from (Chowdhury and Rahman, 2016) and was recognized using several settings of network architecture. Among all these settings, the most optimal setting being found to be five or more deeper hidden layers with sigmoid activation and one output layer with softmax activation. We proposed the optimal number of neurons that can be used in the hidden layer is 1500 or more. The minimum validation error found from this work is 2.34% which is the lowest error rate on handwritten Bengali dataset proposed till date.
Tasks	Denoising
Published	2018-12-14
URL	http://arxiv.org/abs/1812.05758v1
PDF	http://arxiv.org/pdf/1812.05758v1.pdf
PWC	https://paperswithcode.com/paper/on-stacked-denoising-autoencoder-based-pre
Repo
Framework

Reasoning about Unforeseen Possibilities During Policy Learning


Title	Reasoning about Unforeseen Possibilities During Policy Learning
Authors	Craig Innes, Alex Lascarides, Stefano V Albrecht, Subramanian Ramamoorthy, Benjamin Rosman
Abstract	Methods for learning optimal policies in autonomous agents often assume that the way the domain is conceptualised—its possible states and actions and their causal structure—is known in advance and does not change during learning. This is an unrealistic assumption in many scenarios, because new evidence can reveal important information about what is possible, possibilities that the agent was not aware existed prior to learning. We present a model of an agent which both discovers and learns to exploit unforeseen possibilities using two sources of evidence: direct interaction with the world and communication with a domain expert. We use a combination of probabilistic and symbolic reasoning to estimate all components of the decision problem, including its set of random variables and their causal dependencies. Agent simulations show that the agent converges on optimal polices even when it starts out unaware of factors that are critical to behaving optimally.
Tasks
Published	2018-01-10
URL	http://arxiv.org/abs/1801.03331v1
PDF	http://arxiv.org/pdf/1801.03331v1.pdf
PWC	https://paperswithcode.com/paper/reasoning-about-unforeseen-possibilities
Repo
Framework

An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer


Title	An Implementation of Back-Propagation Learning on GF11, a Large SIMD Parallel Computer
Authors	Michael Witbrock, Marco Zagha
Abstract	Current connectionist simulations require huge computational resources. We describe a neural network simulator for the IBM GF11, an experimental SIMD machine with 566 processors and a peak arithmetic performance of 11 Gigaflops. We present our parallel implementation of the backpropagation learning algorithm, techniques for increasing efficiency, performance measurements on the NetTalk text-to-speech benchmark, and a performance model for the simulator. Our simulator currently runs the back-propagation learning algorithm at 900 million connections per second, where each “connection per second” includes both a forward and backward pass. This figure was obtained on the machine when only 356 processors were working; with all 566 processors operational, our simulation will run at over one billion connections per second. We conclude that the GF11 is well-suited to neural network simulation, and we analyze our use of the machine to determine which features are the most important for high performance.
Tasks
Published	2018-01-04
URL	http://arxiv.org/abs/1801.01554v1
PDF	http://arxiv.org/pdf/1801.01554v1.pdf
PWC	https://paperswithcode.com/paper/an-implementation-of-back-propagation
Repo
Framework

Optimal Stochastic Delivery Planning in Full-Truckload and Less-Than-Truckload Delivery


Title	Optimal Stochastic Delivery Planning in Full-Truckload and Less-Than-Truckload Delivery
Authors	Suttinee Sawadsitang, Rakpong Kaewpuang, Siwei Jiang, Dusit Niyato, Ping Wang
Abstract	With an increasing demand from emerging logistics businesses, Vehicle Routing Problem with Private fleet and common Carrier (VRPPC) has been introduced to manage package delivery services from a supplier to customers. However, almost all of existing studies focus on the deterministic problem that assumes all parameters are known perfectly at the time when the planning and routing decisions are made. In reality, some parameters are random and unknown. Therefore, in this paper, we consider VRPPC with hard time windows and random demand, called Optimal Delivery Planning (ODP). The proposed ODP aims to minimize the total package delivery cost while meeting the customer time window constraints. We use stochastic integer programming to formulate the optimization problem incorporating the customer demand uncertainty. Moreover, we evaluate the performance of the ODP using test data from benchmark dataset and from actual Singapore road map.
Tasks
Published	2018-02-04
URL	http://arxiv.org/abs/1802.08540v1
PDF	http://arxiv.org/pdf/1802.08540v1.pdf
PWC	https://paperswithcode.com/paper/optimal-stochastic-delivery-planning-in-full
Repo
Framework

Improving generalization of vocal tract feature reconstruction: from augmented acoustic inversion to articulatory feature reconstruction without articulatory data


Title	Improving generalization of vocal tract feature reconstruction: from augmented acoustic inversion to articulatory feature reconstruction without articulatory data
Authors	Rosanna Turrisi, Raffaele Tavarone, Leonardo Badino
Abstract	We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels. The scarce availability of multi-speaker articulatory data makes it difficult to learn a reconstruction that generalizes to new speakers and across datasets. We first consider the XRMB dataset where audio, articulatory measurements and phonetic transcriptions are available. We show that phonetic labels, used as input to deep recurrent neural networks that reconstruct articulatory features, are in general more helpful than acoustic features in both matched and mismatched training-testing conditions. In a second experiment, we test a novel approach that attempts to build articulatory features from prior articulatory information extracted from phonetic labels. Such approach recovers vocal tract movements directly from an acoustic-only dataset without using any articulatory measurement. Results show that articulatory features generated by this approach can correlate up to 0.59 Pearson product-moment correlation with measured articulatory features.
Tasks
Published	2018-09-04
URL	http://arxiv.org/abs/1809.00938v2
PDF	http://arxiv.org/pdf/1809.00938v2.pdf
PWC	https://paperswithcode.com/paper/improving-generalization-of-vocal-tract
Repo
Framework

Pathology Extraction from Chest X-Ray Radiology Reports: A Performance Study


Title	Pathology Extraction from Chest X-Ray Radiology Reports: A Performance Study
Authors	Tahsin Mostafiz, Khalid Ashraf
Abstract	Extraction of relevant pathological terms from radiology reports is important for correct image label generation and disease population studies. In this letter, we compare the performance of some known application program interface (APIs) for the task of thoracic abnormality extraction from radiology reports. We explored several medical domain specific annotation tools like Medical Text Indexer(MTI) with Non-MEDLINE and Mesh On Demand(MOD) options and generic Natural Language Understanding (NLU) API provided by the IBM cloud. Our results show that although MTI and MOD are intended for extracting medical terms, their performance is worst compared to generic extraction API like IBM NLU. Finally, we trained a DNN-based Named Entity Recognition (NER) model to extract the key concept words from radiology reports. Our model outperforms the medical specific and generic API performance by a large margin. Our results demonstrate the inadequacy of generic APIs for pathology extraction task and establish the importance of domain specific model training for improved results. We hope that these results motivate the research community to release larger de-identified radiology reports corpus for building high accuracy machine learning models for the important task of pathology extraction.
Tasks	Named Entity Recognition
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02305v1
PDF	http://arxiv.org/pdf/1812.02305v1.pdf
PWC	https://paperswithcode.com/paper/pathology-extraction-from-chest-x-ray
Repo
Framework

Graph HyperNetworks for Neural Architecture Search


Title	Graph HyperNetworks for Neural Architecture Search
Authors	Chris Zhang, Mengye Ren, Raquel Urtasun
Abstract	Neural architecture search (NAS) automatically finds the best task-specific neural network topology, outperforming many manual architecture designs. However, it can be prohibitively expensive as the search requires training thousands of different networks, while each can last for hours. In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network. GHNs model the topology of an architecture and therefore can predict network performance more accurately than regular hypernetworks and premature early stopping. To perform NAS, we randomly sample architectures and use the validation accuracy of networks with GHN generated weights as the surrogate search signal. GHNs are fast – they can search nearly 10 times faster than other random search methods on CIFAR-10 and ImageNet. GHNs can be further extended to the anytime prediction setting, where they have found networks with better speed-accuracy tradeoff than the state-of-the-art manual designs.
Tasks	Neural Architecture Search
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05749v2
PDF	http://arxiv.org/pdf/1810.05749v2.pdf
PWC	https://paperswithcode.com/paper/graph-hypernetworks-for-neural-architecture
Repo
Framework

RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks


Title	RGB-D Based Action Recognition with Light-weight 3D Convolutional Networks
Authors	Haokui Zhang, Ying Li, Peng Wang, Yu Liu, Chunhua Shen
Abstract	Different from RGB videos, depth data in RGB-D videos provide key complementary information for tristimulus visual data which potentially could achieve accuracy improvement for action recognition. However, most of the existing action recognition models solely using RGB videos limit the performance capacity. Additionally, the state-of-the-art action recognition models, namely 3D convolutional neural networks (3D-CNNs) contain tremendous parameters suffering from computational inefficiency. In this paper, we propose a series of 3D light-weight architectures for action recognition based on RGB-D data. Compared with conventional 3D-CNN models, the proposed light-weight 3D-CNNs have considerably less parameters involving lower computation cost, while it results in favorable recognition performance. Experimental results on two public benchmark datasets show that our models can approximate or outperform the state-of-the-art approaches. Specifically, on the RGB+D-NTU (NTU) dataset, we achieve 93.2% and 97.6% for cross-subject and cross-view measurement, and on the Northwestern-UCLA Multiview Action 3D (N-UCLA) dataset, we achieve 95.5% accuracy of cross-view.
Tasks	Temporal Action Localization
Published	2018-11-24
URL	http://arxiv.org/abs/1811.09908v1
PDF	http://arxiv.org/pdf/1811.09908v1.pdf
PWC	https://paperswithcode.com/paper/rgb-d-based-action-recognition-with-light
Repo
Framework


Title	Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources
Authors	Maria Glenski, Tim Weninger, Svitlana Volkova
Abstract	In the age of social news, it is important to understand the types of reactions that are evoked from news sources with various levels of credibility. In the present work we seek to better understand how users react to trusted and deceptive news sources across two popular, and very different, social media platforms. To that end, (1) we develop a model to classify user reactions into one of nine types, such as answer, elaboration, and question, etc, and (2) we measure the speed and the type of reaction for trusted and deceptive news sources for 10.8M Twitter posts and 6.2M Reddit comments. We show that there are significant differences in the speed and the type of reactions between trusted and deceptive news sources on Twitter, but far smaller differences on Reddit.
Tasks
Published	2018-05-30
URL	http://arxiv.org/abs/1805.12032v1
PDF	http://arxiv.org/pdf/1805.12032v1.pdf
PWC	https://paperswithcode.com/paper/identifying-and-understanding-user-reactions
Repo
Framework

A state of the art of urban reconstruction: street, street network, vegetation, urban feature


Title	A state of the art of urban reconstruction: street, street network, vegetation, urban feature
Authors	Remi Cura, Julien Perret, Nicolas Paparoditis
Abstract	World population is raising, especially the part of people living in cities. With increased population and complex roles regarding their inhabitants and their surroundings, cities concentrate difficulties for design, planning and analysis. These tasks require a way to reconstruct/model a city. Traditionally, much attention has been given to buildings reconstruction, yet an essential part of city were neglected: streets. Streets reconstruction has been seldom researched. Streets are also complex compositions of urban features, and have a unique role for transportation (as they comprise roads). We aim at completing the recent state of the art for building reconstruction (Musialski2012) by considering all other aspect of urban reconstruction. We introduce the need for city models. Because reconstruction always necessitates data, we first analyse which data are available. We then expose a state of the art of street reconstruction, street network reconstruction, urban features reconstruction/modelling, vegetation , and urban objects reconstruction/modelling. Although reconstruction strategies vary widely, we can order them by the role the model plays, from data driven approach, to model-based approach, to inverse procedural modelling and model catalogue matching. The main challenges seems to come from the complex nature of urban environment and from the limitations of the available data. Urban features have strong relationships, between them, and to their surrounding, as well as in hierarchical relations. Procedural modelling has the power to express these relations, and could be applied to the reconstruction of urban features via the Inverse Procedural Modelling paradigm.
Tasks
Published	2018-01-18
URL	http://arxiv.org/abs/1803.04332v1
PDF	http://arxiv.org/pdf/1803.04332v1.pdf
PWC	https://paperswithcode.com/paper/a-state-of-the-art-of-urban-reconstruction
Repo
Framework

Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time


Title	Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time
Authors	Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, Nicholas R. Waytowich
Abstract	This paper investigates how to utilize different forms of human interaction to safely train autonomous systems in real-time by learning from both human demonstrations and interventions. We implement two components of the Cycle-of-Learning for Autonomous Systems, which is our framework for combining multiple modalities of human interaction. The current effort employs human demonstrations to teach a desired behavior via imitation learning, then leverages intervention data to correct for undesired behaviors produced by the imitation learner to teach novel tasks to an autonomous agent safely, after only minutes of training. We demonstrate this method in an autonomous perching task using a quadrotor with continuous roll, pitch, yaw, and throttle commands and imagery captured from a downward-facing camera in a high-fidelity simulated environment. Our method improves task completion performance for the same amount of human interaction when compared to learning from demonstrations alone, while also requiring on average 32% less data to achieve that performance. This provides evidence that combining multiple modes of human interaction can increase both the training speed and overall performance of policies for autonomous systems.
Tasks	Imitation Learning
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11545v2
PDF	http://arxiv.org/pdf/1810.11545v2.pdf
PWC	https://paperswithcode.com/paper/efficiently-combining-human-demonstrations
Repo
Framework

Siamese Networks for Semantic Pattern Similarity


Title	Siamese Networks for Semantic Pattern Similarity
Authors	Yassine Benajiba, Jin Sun, Yong Zhang, Longquan Jiang, Zhiliang Weng, Or Biran
Abstract	Semantic Pattern Similarity is an interesting, though not often encountered NLP task where two sentences are compared not by their specific meaning, but by their more abstract semantic pattern (e.g., preposition or frame). We utilize Siamese Networks to model this task, and show its usefulness in determining SQL patterns for unseen questions in a database-backed question answering scenario. Our approach achieves high accuracy and contains a built-in proxy for confidence, which can be used to keep precision arbitrarily high.
Tasks	Question Answering
Published	2018-12-17
URL	http://arxiv.org/abs/1812.06604v1
PDF	http://arxiv.org/pdf/1812.06604v1.pdf
PWC	https://paperswithcode.com/paper/siamese-networks-for-semantic-pattern
Repo
Framework

Decision fusion with multiple spatial supports by conditional random fields


Title	Decision fusion with multiple spatial supports by conditional random fields
Authors	Devis Tuia, Michele Volpi, Gabriele Moser
Abstract	Classification of remotely sensed images into land cover or land use is highly dependent on geographical information at least at two levels. First, land cover classes are observed in a spatially smooth domain separated by sharp region boundaries. Second, land classes and observation scale are also tightly intertwined: they tend to be consistent within areas of homogeneous appearance, or regions, in the sense that all pixels within a roof should be classified as roof, independently on the spatial support used for the classification. In this paper, we follow these two observations and encode them as priors in an energy minimization framework based on conditional random fields (CRFs), where classification results obtained at pixel and region levels are probabilistically fused. The aim is to enforce the final maps to be consistent not only in their own spatial supports (pixel and region) but also across supports, i.e., by getting the predictions on the pixel lattice and on the set of regions to agree. To this end, we define an energy function with three terms: 1) a data term for the individual elements in each support (support-specific nodes); 2) spatial regularization terms in a neighborhood for each of the supports (support-specific edges); and 3) a regularization term between individual pixels and the region containing each of them (intersupports edges). We utilize these priors in a unified energy minimization problem that can be optimized by standard solvers. The proposed 2LCRF model consists of a CRF defined over a bipartite graph, i.e., two interconnected layers within a single graph accounting for interlattice connections.
Tasks
Published	2018-08-24
URL	http://arxiv.org/abs/1808.08024v1
PDF	http://arxiv.org/pdf/1808.08024v1.pdf
PWC	https://paperswithcode.com/paper/decision-fusion-with-multiple-spatial
Repo
Framework

Latent Variable Modeling for Generative Concept Representations and Deep Generative Models


Title	Latent Variable Modeling for Generative Concept Representations and Deep Generative Models
Authors	Daniel T. Chang
Abstract	Latent representations are the essence of deep generative models and determine their usefulness and power. For latent representations to be useful as generative concept representations, their latent space must support latent space interpolation, attribute vectors and concept vectors, among other things. We investigate and discuss latent variable modeling, including latent variable models, latent representations and latent spaces, particularly hierarchical latent representations and latent space vectors and geometry. Our focus is on that used in variational autoencoders and generative adversarial networks.
Tasks	Latent Variable Models
Published	2018-12-26
URL	http://arxiv.org/abs/1812.11856v1
PDF	http://arxiv.org/pdf/1812.11856v1.pdf
PWC	https://paperswithcode.com/paper/latent-variable-modeling-for-generative
Repo
Framework

Towards Fluent Translations from Disfluent Speech


Title	Towards Fluent Translations from Disfluent Speech
Authors	Elizabeth Salesky, Susanne Burger, Jan Niehues, Alex Waibel
Abstract	When translating from speech, special consideration for conversational speech phenomena such as disfluencies is necessary. Most machine translation training data consists of well-formed written texts, causing issues when translating spontaneous speech. Previous work has introduced an intermediate step between speech recognition (ASR) and machine translation (MT) to remove disfluencies, making the data better-matched to typical translation text and significantly improving performance. However, with the rise of end-to-end speech translation systems, this intermediate step must be incorporated into the sequence-to-sequence architecture. Further, though translated speech datasets exist, they are typically news or rehearsed speech without many disfluencies (e.g. TED), or the disfluencies are translated into the references (e.g. Fisher). To generate clean translations from disfluent speech, cleaned references are necessary for evaluation. We introduce a corpus of cleaned target data for the Fisher Spanish-English dataset for this task. We compare how different architectures handle disfluencies and provide a baseline for removing disfluencies in end-to-end translation.
Tasks	Machine Translation, Speech Recognition
Published	2018-11-07
URL	http://arxiv.org/abs/1811.03189v1
PDF	http://arxiv.org/pdf/1811.03189v1.pdf
PWC	https://paperswithcode.com/paper/towards-fluent-translations-from-disfluent
Repo
Framework