October 17, 2019

3024 words 15 mins read

Paper Group ANR 948

Learning Structure and Strength of CNN Filters for Small Sample Size Training. MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation. Predicate learning in neural systems: Discovering latent generative structures. Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concep …

Learning Structure and Strength of CNN Filters for Small Sample Size Training


Title	Learning Structure and Strength of CNN Filters for Small Sample Size Training
Authors	Rohit Keshari, Mayank Vatsa, Richa Singh, Afzel Noore
Abstract	Convolutional Neural Networks have provided state-of-the-art results in several computer vision problems. However, due to a large number of parameters in CNNs, they require a large number of training samples which is a limiting factor for small sample size problems. To address this limitation, we propose SSF-CNN which focuses on learning the structure and strength of filters. The structure of the filter is initialized using a dictionary-based filter learning algorithm and the strength of the filter is learned using the small sample training data. The architecture provides the flexibility of training with both small and large training databases and yields good accuracies even with small size training data. The effectiveness of the algorithm is first demonstrated on MNIST, CIFAR10, and NORB databases, with a varying number of training samples. The results show that SSF-CNN significantly reduces the number of parameters required for training while providing high accuracies the test databases. On small sample size problems such as newborn face recognition and Omniglot, it yields state-of-the-art results. Specifically, on the IIITD Newborn Face Database, the results demonstrate improvement in rank-1 identification accuracy by at least 10%.
Tasks	Face Recognition, Omniglot
Published	2018-03-30
URL	http://arxiv.org/abs/1803.11405v1
PDF	http://arxiv.org/pdf/1803.11405v1.pdf
PWC	https://paperswithcode.com/paper/learning-structure-and-strength-of-cnn
Repo
Framework

MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation


Title	MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation
Authors	Tao Liu, Zihao Liu, Fuhong Lin, Yier Jin, Gang Quan, Wujie Wen
Abstract	Modern deep learning enabled artificial neural networks, such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN), have achieved a series of breaking records on a broad spectrum of recognition applications. However, the enormous computation and storage requirements associated with such deep and complex neural network models greatly challenge their implementations on resource-limited platforms. Time-based spiking neural network has recently emerged as a promising solution in Neuromorphic Computing System designs for achieving remarkable computing and power efficiency within a single chip. However, the relevant research activities have been narrowly concentrated on the biological plausibility and theoretical learning approaches, causing inefficient neural processing and impracticable multilayer extension thus significantly limitations on speed and accuracy when handling the realistic cognitive tasks. In this work, a practical multilayer time-based spiking neuromorphic architecture, namely “MT-Spike”, is developed to fill this gap. With the proposed practical time-coding scheme, average delay response model, temporal error backpropagation algorithm, and heuristic loss function, “MT-Spike” achieves more efficient neural processing through flexible neural model size reduction while offering very competitive classification accuracy for realistic recognition tasks. Simulation results well validated that the algorithmic power of deep multi-layer learning can be seamlessly merged with the efficiency of time-based spiking neuromorphic architecture, demonstrating great potentials of “MT-Spike” in resource and power constrained embedded platforms.
Tasks
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05117v1
PDF	http://arxiv.org/pdf/1803.05117v1.pdf
PWC	https://paperswithcode.com/paper/mt-spike-a-multilayer-time-based-spiking
Repo
Framework

Predicate learning in neural systems: Discovering latent generative structures


Title	Predicate learning in neural systems: Discovering latent generative structures
Authors	Andrea E. Martin, Leonidas A. A. Doumas
Abstract	Humans learn complex latent structures from their environments (e.g., natural language, mathematics, music, social hierarchies). In cognitive science and cognitive neuroscience, models that infer higher-order structures from sensory or first-order representations have been proposed to account for the complexity and flexibility of human behavior. But how do the structures that these models invoke arise in neural systems in the first place? To answer this question, we explain how a system can learn latent representational structures (i.e., predicates) from experience with wholly unstructured data. During the process of predicate learning, an artificial neural network exploits the naturally occurring dynamic properties of distributed computing across neuronal assemblies in order to learn predicates, but also to combine them compositionally, two computational aspects which appear to be necessary for human behavior as per formal theories in multiple domains. We describe how predicates can be combined generatively using neural oscillations to achieve human-like extrapolation and compositionality in an artificial neural network. The ability to learn predicates from experience, to represent structures compositionally, and to extrapolate to unseen data offers an inroads to understanding and modeling the most complex human behaviors.
Tasks
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01127v1
PDF	http://arxiv.org/pdf/1810.01127v1.pdf
PWC	https://paperswithcode.com/paper/predicate-learning-in-neural-systems
Repo
Framework

Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts


Title	Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Authors	Raymond A. Yeh, Jinjun Xiong, Wen-mei W. Hwu, Minh N. Do, Alexander G. Schwing
Abstract	Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all possible bounding boxes. Hence, the method is able to consider significantly more proposals and doesn’t rely on a successful first stage hypothesizing bounding box proposals. Beyond, we demonstrate that the trained parameters of our model can be used as word-embeddings which capture spatial-image relationships and provide interpretability. Lastly, at the time of submission, our approach outperformed the current state-of-the-art methods on the Flickr 30k Entities and the ReferItGame dataset by 3.08% and 7.77% respectively.
Tasks	Word Embeddings
Published	2018-03-29
URL	http://arxiv.org/abs/1803.11209v1
PDF	http://arxiv.org/pdf/1803.11209v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-and-globally-optimal-prediction
Repo
Framework

Fixation Data Analysis for High Resolution Satellite Images


Title	Fixation Data Analysis for High Resolution Satellite Images
Authors	Ashu Sharma, Jayanta Kumar Ghosh, Saptrarshi Kolay
Abstract	The presented study is an eye tracking experiment for high-resolution satellite (HRS) images. The reported experiment explores the Area Of Interest (AOI) based analysis of eye fixation data for complex HRS images. The study reflects the requisite of reference data for bottom-up saliency-based segmentation and the struggle of eye tracking data analysis for complex satellite images. The intended fixation data analysis aims towards the reference data creation for bottom-up saliency-based segmentation of high-resolution satellite images. The analytical outcome of this experimental study provides a solution for AOI-based analysis for fixation data in the complex environment of satellite images and recommendations for reference data construction which is already an ongoing effort.
Tasks	Eye Tracking
Published	2018-05-01
URL	http://arxiv.org/abs/1805.00192v1
PDF	http://arxiv.org/pdf/1805.00192v1.pdf
PWC	https://paperswithcode.com/paper/fixation-data-analysis-for-high-resolution
Repo
Framework

Nrityantar: Pose oblivious Indian classical dance sequence classification system


Title	Nrityantar: Pose oblivious Indian classical dance sequence classification system
Authors	Vinay Kaushik, Prerana Mukherjee, Brejesh Lall
Abstract	In this paper, we attempt to advance the research work done in human action recognition to a rather specialized application namely Indian Classical Dance (ICD) classification. The variation in such dance forms in terms of hand and body postures, facial expressions or emotions and head orientation makes pose estimation an extremely challenging task. To circumvent this problem, we construct a pose-oblivious shape signature which is fed to a sequence learning framework. The pose signature representation is done in two-fold process. First, we represent person-pose in first frame of a dance video using symmetric Spatial Transformer Networks (STN) to extract good person object proposals and CNN-based parallel single person pose estimator (SPPE). Next, the pose basis are converted to pose flows by assigning a similarity score between successive poses followed by non-maximal suppression. Instead of feeding a simple chain of joints in the sequence learner which generally hinders the network performance we constitute a feature vector of the normalized distance vectors, flow, angles between anchor joints which captures the adjacency configuration in the skeletal pattern. Thus, the kinematic relationship amongst the body joints across the frames using pose estimation helps in better establishing the spatio-temporal dependencies. We present an exhaustive empirical evaluation of state-of-the-art deep network based methods for dance classification on ICD dataset.
Tasks	Pose Estimation, Temporal Action Localization
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05231v1
PDF	http://arxiv.org/pdf/1812.05231v1.pdf
PWC	https://paperswithcode.com/paper/nrityantar-pose-oblivious-indian-classical
Repo
Framework

A Forest Mixture Bound for Block-Free Parallel Inference


Title	A Forest Mixture Bound for Block-Free Parallel Inference
Authors	Neal Lawton, Aram Galstyan, Greg Ver Steeg
Abstract	Coordinate ascent variational inference is an important algorithm for inference in probabilistic models, but it is slow because it updates only a single variable at a time. Block coordinate methods perform inference faster by updating blocks of variables in parallel. However, the speed and stability of these algorithms depends on how the variables are partitioned into blocks. In this paper, we give a stable parallel algorithm for inference in deep exponential families that doesn’t require the variables to be partitioned into blocks. We achieve this by lower bounding the ELBO by a new objective we call the forest mixture bound (FM bound) that separates the inference problem for variables within a hidden layer. We apply this to the simple case when all random variables are Gaussian and show empirically that the algorithm converges faster for models that are inherently more forest-like.
Tasks
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06951v1
PDF	http://arxiv.org/pdf/1805.06951v1.pdf
PWC	https://paperswithcode.com/paper/a-forest-mixture-bound-for-block-free
Repo
Framework

Coordinating and Integrating Faceted Classification with Rich Semantic Modeling


Title	Coordinating and Integrating Faceted Classification with Rich Semantic Modeling
Authors	Robert B. Allen, Jaihyun Park
Abstract	Faceted classifications define dimensions for the types of entities included. In effect, the facets provide an “ontological commitment”. We compare a faceted thesaurus, the Art and Architecture Thesaurus (AAT), with ontologies derived from the Basic Formal Ontology (BFO2), which is an upper (or formal) ontology widely used to describe entities in biomedicine. We consider how the AAT and BFO2-based ontologies could be coordinated and integrated into a Human Activity and Infrastructure Foundry (HAIF). To extend the AAT to enable this coordination and integration, we describe how a wider range of relationships among its terms could be introduced. Using these extensions, we explore richer modeling of topics from AAT that deal with Technology. Finally, we consider how ontology-based frames and semantic role frames can be integrated to make rich semantic statements about changes in the world.
Tasks
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09548v1
PDF	http://arxiv.org/pdf/1809.09548v1.pdf
PWC	https://paperswithcode.com/paper/coordinating-and-integrating-faceted
Repo
Framework

Current Trends and Future Research Directions for Interactive Music


Title	Current Trends and Future Research Directions for Interactive Music
Authors	Mauricio Toro
Abstract	In this review, it is explained and compared different software and formalisms used in music interaction: sequencers, computer-assisted improvisation, meta- instruments, score-following, asynchronous dataflow languages, synchronous dataflow languages, process calculi, temporal constraints and interactive scores. Formal approaches have the advantage of providing rigorous semantics of the behavior of the model and proving correctness during execution. The main disadvantage of formal approaches is lack of commercial tools.
Tasks
Published	2018-10-05
URL	http://arxiv.org/abs/1810.04276v1
PDF	http://arxiv.org/pdf/1810.04276v1.pdf
PWC	https://paperswithcode.com/paper/current-trends-and-future-research-directions
Repo
Framework

Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning


Title	Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning
Authors	Hongyu Xu, Jingjing Zheng, Azadeh Alavi, Rama Chellappa
Abstract	In real-world visual recognition problems, the assumption that the training data (source domain) and test data (target domain) are sampled from the same distribution is often violated. This is known as the domain adaptation problem. In this work, we propose a novel domain-adaptive dictionary learning framework for cross-domain visual recognition. Our method generates a set of intermediate domains. These intermediate domains form a smooth path and bridge the gap between the source and target domains. Specifically, we not only learn a common dictionary to encode the domain-shared features, but also learn a set of domain-specific dictionaries to model the domain shift. The separation of the common and domain-specific dictionaries enables us to learn more compact and reconstructive dictionaries for domain adaptation. These dictionaries are learned by alternating between domain-adaptive sparse coding and dictionary updating steps. Meanwhile, our approach gradually recovers the feature representations of both source and target data along the domain path. By aligning all the recovered domain data, we derive the final domain-adaptive features for cross-domain visual recognition. Extensive experiments on three public datasets demonstrates that our approach outperforms most state-of-the-art methods.
Tasks	Dictionary Learning, Domain Adaptation
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04687v2
PDF	http://arxiv.org/pdf/1804.04687v2.pdf
PWC	https://paperswithcode.com/paper/cross-domain-visual-recognition-via-domain
Repo
Framework

SAVERS: SAR ATR with Verification Support Based on Convolutional Neural Network


Title	SAVERS: SAR ATR with Verification Support Based on Convolutional Neural Network
Authors	Hidetoshi Furukawa
Abstract	We propose a new convolutional neural network (CNN) which performs coarse and fine segmentation for end-to-end synthetic aperture radar (SAR) automatic target recognition (ATR) system. In recent years, many CNNs for SAR ATR using deep learning have been proposed, but most of them classify target classes from fixed size target chips extracted from SAR imagery. On the other hand, we proposed the CNN which outputs the score of the multiple target classes and a background class for each pixel from the SAR imagery of arbitrary size and multiple targets as fine segmentation. However, it was necessary for humans to judge the CNN segmentation result. In this report, we propose a CNN called SAR ATR with verification support (SAVERS), which performs region-wise (i.e. coarse) segmentation and pixel-wise segmentation. SAVERS discriminates between target and non-target, and classifies multiple target classes and non-target class by coarse segmentation. This report describes the evaluation results of SAVERS using the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset.
Tasks
Published	2018-05-14
URL	http://arxiv.org/abs/1805.06298v1
PDF	http://arxiv.org/pdf/1805.06298v1.pdf
PWC	https://paperswithcode.com/paper/savers-sar-atr-with-verification-support
Repo
Framework

ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction


Title	ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction
Authors	Jonathan Rotsztejn, Nora Hollenstein, Ce Zhang
Abstract	Reliably detecting relevant relations between entities in unstructured text is a valuable resource for knowledge extraction, which is why it has awaken significant interest in the field of Natural Language Processing. In this paper, we present a system for relation classification and extraction based on an ensemble of convolutional and recurrent neural networks that ranked first in 3 out of the 4 subtasks at SemEval 2018 Task 7. We provide detailed explanations and grounds for the design choices behind the most relevant features and analyze their importance.
Tasks	Relation Classification
Published	2018-04-05
URL	http://arxiv.org/abs/1804.02042v1
PDF	http://arxiv.org/pdf/1804.02042v1.pdf
PWC	https://paperswithcode.com/paper/eth-ds3lab-at-semeval-2018-task-7-effectively
Repo
Framework

Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback


Title	Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback
Authors	Tanner Fiez, Shreyas Sekar, Lillian J. Ratliff
Abstract	We study a multi-armed bandit problem in a dynamic environment where arm rewards evolve in a correlated fashion according to a Markov chain. Different than much of the work on related problems, in our formulation a learning algorithm does not have access to either a priori information or observations of the state of the Markov chain and only observes smoothed reward feedback following time intervals we refer to as epochs. We demonstrate that existing methods such as UCB and $\varepsilon$-greedy can suffer linear regret in such an environment. Employing mixing-time bounds on Markov chains, we develop algorithms called EpochUCB and EpochGreedy that draw inspiration from the aforementioned methods, yet which admit sublinear regret guarantees for the problem formulation. Our proposed algorithms proceed in epochs in which an arm is played repeatedly for a number of iterations that grows linearly as a function of the number of times an arm has been played in the past. We analyze these algorithms under two types of smoothed reward feedback at the end of each epoch: a reward that is the discount-average of the discounted rewards within an epoch, and a reward that is the time-average of the rewards within an epoch.
Tasks	Multi-Armed Bandits, Q-Learning
Published	2018-03-11
URL	http://arxiv.org/abs/1803.04008v2
PDF	http://arxiv.org/pdf/1803.04008v2.pdf
PWC	https://paperswithcode.com/paper/multi-armed-bandits-for-correlated-markovian
Repo
Framework

General-purpose Declarative Inductive Programming with Domain-Specific Background Knowledge for Data Wrangling Automation


Title	General-purpose Declarative Inductive Programming with Domain-Specific Background Knowledge for Data Wrangling Automation
Authors	Lidia Contreras-Ochando, César Ferri, José Hernández-Orallo, Fernando Martínez-Plumed, María José Ramírez-Quintana, Susumu Katayama
Abstract	Given one or two examples, humans are good at understanding how to solve a problem independently of its domain, because they are able to detect what the problem is and to choose the appropriate background knowledge according to the context. For instance, presented with the string “8/17/2017” to be transformed to “17th of August of 2017”, humans will process this in two steps: (1) they recognise that it is a date and (2) they map the date to the 17th of August of 2017. Inductive Programming (IP) aims at learning declarative (functional or logic) programs from examples. Two key advantages of IP are the use of background knowledge and the ability to synthesise programs from a few input/output examples (as humans do). In this paper we propose to use IP as a means for automating repetitive data manipulation tasks, frequently presented during the process of {\em data wrangling} in many data manipulation problems. Here we show that with the use of general-purpose declarative (programming) languages jointly with generic IP systems and the definition of domain-specific knowledge, many specific data wrangling problems from different application domains can be automatically solved from very few examples. We also propose an integrated benchmark for data wrangling, which we share publicly for the community.
Tasks
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10054v1
PDF	http://arxiv.org/pdf/1809.10054v1.pdf
PWC	https://paperswithcode.com/paper/general-purpose-declarative-inductive
Repo
Framework

Unspeech: Unsupervised Speech Context Embeddings


Title	Unspeech: Unsupervised Speech Context Embeddings
Authors	Benjamin Milde, Chris Biemann
Abstract	We introduce “Unspeech” embeddings, which are based on unsupervised learning of context feature representations for spoken language. The embeddings were trained on up to 9500 hours of crawled English speech data without transcriptions or speaker information, by using a straightforward learning objective based on context and non-context discrimination with negative sampling. We use a Siamese convolutional neural network architecture to train Unspeech embeddings and evaluate them on speaker comparison, utterance clustering and as a context feature in TDNN-HMM acoustic models trained on TED-LIUM, comparing it to i-vector baselines. Particularly decoding out-of-domain speech data from the recently released Common Voice corpus shows consistent WER reductions. We release our source code and pre-trained Unspeech models under a permissive open source license.
Tasks
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06775v2
PDF	http://arxiv.org/pdf/1804.06775v2.pdf
PWC	https://paperswithcode.com/paper/unspeech-unsupervised-speech-context
Repo
Framework