Paper Group ANR 948
Learning Structure and Strength of CNN Filters for Small Sample Size Training. MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation. Predicate learning in neural systems: Discovering latent generative structures. Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concep …
Learning Structure and Strength of CNN Filters for Small Sample Size Training
Title | Learning Structure and Strength of CNN Filters for Small Sample Size Training |
Authors | Rohit Keshari, Mayank Vatsa, Richa Singh, Afzel Noore |
Abstract | Convolutional Neural Networks have provided state-of-the-art results in several computer vision problems. However, due to a large number of parameters in CNNs, they require a large number of training samples which is a limiting factor for small sample size problems. To address this limitation, we propose SSF-CNN which focuses on learning the structure and strength of filters. The structure of the filter is initialized using a dictionary-based filter learning algorithm and the strength of the filter is learned using the small sample training data. The architecture provides the flexibility of training with both small and large training databases and yields good accuracies even with small size training data. The effectiveness of the algorithm is first demonstrated on MNIST, CIFAR10, and NORB databases, with a varying number of training samples. The results show that SSF-CNN significantly reduces the number of parameters required for training while providing high accuracies the test databases. On small sample size problems such as newborn face recognition and Omniglot, it yields state-of-the-art results. Specifically, on the IIITD Newborn Face Database, the results demonstrate improvement in rank-1 identification accuracy by at least 10%. |
Tasks | Face Recognition, Omniglot |
Published | 2018-03-30 |
URL | http://arxiv.org/abs/1803.11405v1 |
http://arxiv.org/pdf/1803.11405v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-structure-and-strength-of-cnn |
Repo | |
Framework | |
MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation
Title | MT-Spike: A Multilayer Time-based Spiking Neuromorphic Architecture with Temporal Error Backpropagation |
Authors | Tao Liu, Zihao Liu, Fuhong Lin, Yier Jin, Gang Quan, Wujie Wen |
Abstract | Modern deep learning enabled artificial neural networks, such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN), have achieved a series of breaking records on a broad spectrum of recognition applications. However, the enormous computation and storage requirements associated with such deep and complex neural network models greatly challenge their implementations on resource-limited platforms. Time-based spiking neural network has recently emerged as a promising solution in Neuromorphic Computing System designs for achieving remarkable computing and power efficiency within a single chip. However, the relevant research activities have been narrowly concentrated on the biological plausibility and theoretical learning approaches, causing inefficient neural processing and impracticable multilayer extension thus significantly limitations on speed and accuracy when handling the realistic cognitive tasks. In this work, a practical multilayer time-based spiking neuromorphic architecture, namely “MT-Spike”, is developed to fill this gap. With the proposed practical time-coding scheme, average delay response model, temporal error backpropagation algorithm, and heuristic loss function, “MT-Spike” achieves more efficient neural processing through flexible neural model size reduction while offering very competitive classification accuracy for realistic recognition tasks. Simulation results well validated that the algorithmic power of deep multi-layer learning can be seamlessly merged with the efficiency of time-based spiking neuromorphic architecture, demonstrating great potentials of “MT-Spike” in resource and power constrained embedded platforms. |
Tasks | |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05117v1 |
http://arxiv.org/pdf/1803.05117v1.pdf | |
PWC | https://paperswithcode.com/paper/mt-spike-a-multilayer-time-based-spiking |
Repo | |
Framework | |
Predicate learning in neural systems: Discovering latent generative structures
Title | Predicate learning in neural systems: Discovering latent generative structures |
Authors | Andrea E. Martin, Leonidas A. A. Doumas |
Abstract | Humans learn complex latent structures from their environments (e.g., natural language, mathematics, music, social hierarchies). In cognitive science and cognitive neuroscience, models that infer higher-order structures from sensory or first-order representations have been proposed to account for the complexity and flexibility of human behavior. But how do the structures that these models invoke arise in neural systems in the first place? To answer this question, we explain how a system can learn latent representational structures (i.e., predicates) from experience with wholly unstructured data. During the process of predicate learning, an artificial neural network exploits the naturally occurring dynamic properties of distributed computing across neuronal assemblies in order to learn predicates, but also to combine them compositionally, two computational aspects which appear to be necessary for human behavior as per formal theories in multiple domains. We describe how predicates can be combined generatively using neural oscillations to achieve human-like extrapolation and compositionality in an artificial neural network. The ability to learn predicates from experience, to represent structures compositionally, and to extrapolate to unseen data offers an inroads to understanding and modeling the most complex human behaviors. |
Tasks | |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01127v1 |
http://arxiv.org/pdf/1810.01127v1.pdf | |
PWC | https://paperswithcode.com/paper/predicate-learning-in-neural-systems |
Repo | |
Framework | |
Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
Title | Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts |
Authors | Raymond A. Yeh, Jinjun Xiong, Wen-mei W. Hwu, Minh N. Do, Alexander G. Schwing |
Abstract | Textual grounding is an important but challenging task for human-computer interaction, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all possible bounding boxes. Hence, the method is able to consider significantly more proposals and doesn’t rely on a successful first stage hypothesizing bounding box proposals. Beyond, we demonstrate that the trained parameters of our model can be used as word-embeddings which capture spatial-image relationships and provide interpretability. Lastly, at the time of submission, our approach outperformed the current state-of-the-art methods on the Flickr 30k Entities and the ReferItGame dataset by 3.08% and 7.77% respectively. |
Tasks | Word Embeddings |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11209v1 |
http://arxiv.org/pdf/1803.11209v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-and-globally-optimal-prediction |
Repo | |
Framework | |
Fixation Data Analysis for High Resolution Satellite Images
Title | Fixation Data Analysis for High Resolution Satellite Images |
Authors | Ashu Sharma, Jayanta Kumar Ghosh, Saptrarshi Kolay |
Abstract | The presented study is an eye tracking experiment for high-resolution satellite (HRS) images. The reported experiment explores the Area Of Interest (AOI) based analysis of eye fixation data for complex HRS images. The study reflects the requisite of reference data for bottom-up saliency-based segmentation and the struggle of eye tracking data analysis for complex satellite images. The intended fixation data analysis aims towards the reference data creation for bottom-up saliency-based segmentation of high-resolution satellite images. The analytical outcome of this experimental study provides a solution for AOI-based analysis for fixation data in the complex environment of satellite images and recommendations for reference data construction which is already an ongoing effort. |
Tasks | Eye Tracking |
Published | 2018-05-01 |
URL | http://arxiv.org/abs/1805.00192v1 |
http://arxiv.org/pdf/1805.00192v1.pdf | |
PWC | https://paperswithcode.com/paper/fixation-data-analysis-for-high-resolution |
Repo | |
Framework | |
Nrityantar: Pose oblivious Indian classical dance sequence classification system
Title | Nrityantar: Pose oblivious Indian classical dance sequence classification system |
Authors | Vinay Kaushik, Prerana Mukherjee, Brejesh Lall |
Abstract | In this paper, we attempt to advance the research work done in human action recognition to a rather specialized application namely Indian Classical Dance (ICD) classification. The variation in such dance forms in terms of hand and body postures, facial expressions or emotions and head orientation makes pose estimation an extremely challenging task. To circumvent this problem, we construct a pose-oblivious shape signature which is fed to a sequence learning framework. The pose signature representation is done in two-fold process. First, we represent person-pose in first frame of a dance video using symmetric Spatial Transformer Networks (STN) to extract good person object proposals and CNN-based parallel single person pose estimator (SPPE). Next, the pose basis are converted to pose flows by assigning a similarity score between successive poses followed by non-maximal suppression. Instead of feeding a simple chain of joints in the sequence learner which generally hinders the network performance we constitute a feature vector of the normalized distance vectors, flow, angles between anchor joints which captures the adjacency configuration in the skeletal pattern. Thus, the kinematic relationship amongst the body joints across the frames using pose estimation helps in better establishing the spatio-temporal dependencies. We present an exhaustive empirical evaluation of state-of-the-art deep network based methods for dance classification on ICD dataset. |
Tasks | Pose Estimation, Temporal Action Localization |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05231v1 |
http://arxiv.org/pdf/1812.05231v1.pdf | |
PWC | https://paperswithcode.com/paper/nrityantar-pose-oblivious-indian-classical |
Repo | |
Framework | |
A Forest Mixture Bound for Block-Free Parallel Inference
Title | A Forest Mixture Bound for Block-Free Parallel Inference |
Authors | Neal Lawton, Aram Galstyan, Greg Ver Steeg |
Abstract | Coordinate ascent variational inference is an important algorithm for inference in probabilistic models, but it is slow because it updates only a single variable at a time. Block coordinate methods perform inference faster by updating blocks of variables in parallel. However, the speed and stability of these algorithms depends on how the variables are partitioned into blocks. In this paper, we give a stable parallel algorithm for inference in deep exponential families that doesn’t require the variables to be partitioned into blocks. We achieve this by lower bounding the ELBO by a new objective we call the forest mixture bound (FM bound) that separates the inference problem for variables within a hidden layer. We apply this to the simple case when all random variables are Gaussian and show empirically that the algorithm converges faster for models that are inherently more forest-like. |
Tasks | |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06951v1 |
http://arxiv.org/pdf/1805.06951v1.pdf | |
PWC | https://paperswithcode.com/paper/a-forest-mixture-bound-for-block-free |
Repo | |
Framework | |
Coordinating and Integrating Faceted Classification with Rich Semantic Modeling
Title | Coordinating and Integrating Faceted Classification with Rich Semantic Modeling |
Authors | Robert B. Allen, Jaihyun Park |
Abstract | Faceted classifications define dimensions for the types of entities included. In effect, the facets provide an “ontological commitment”. We compare a faceted thesaurus, the Art and Architecture Thesaurus (AAT), with ontologies derived from the Basic Formal Ontology (BFO2), which is an upper (or formal) ontology widely used to describe entities in biomedicine. We consider how the AAT and BFO2-based ontologies could be coordinated and integrated into a Human Activity and Infrastructure Foundry (HAIF). To extend the AAT to enable this coordination and integration, we describe how a wider range of relationships among its terms could be introduced. Using these extensions, we explore richer modeling of topics from AAT that deal with Technology. Finally, we consider how ontology-based frames and semantic role frames can be integrated to make rich semantic statements about changes in the world. |
Tasks | |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09548v1 |
http://arxiv.org/pdf/1809.09548v1.pdf | |
PWC | https://paperswithcode.com/paper/coordinating-and-integrating-faceted |
Repo | |
Framework | |
Current Trends and Future Research Directions for Interactive Music
Title | Current Trends and Future Research Directions for Interactive Music |
Authors | Mauricio Toro |
Abstract | In this review, it is explained and compared different software and formalisms used in music interaction: sequencers, computer-assisted improvisation, meta- instruments, score-following, asynchronous dataflow languages, synchronous dataflow languages, process calculi, temporal constraints and interactive scores. Formal approaches have the advantage of providing rigorous semantics of the behavior of the model and proving correctness during execution. The main disadvantage of formal approaches is lack of commercial tools. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.04276v1 |
http://arxiv.org/pdf/1810.04276v1.pdf | |
PWC | https://paperswithcode.com/paper/current-trends-and-future-research-directions |
Repo | |
Framework | |
Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning
Title | Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning |
Authors | Hongyu Xu, Jingjing Zheng, Azadeh Alavi, Rama Chellappa |
Abstract | In real-world visual recognition problems, the assumption that the training data (source domain) and test data (target domain) are sampled from the same distribution is often violated. This is known as the domain adaptation problem. In this work, we propose a novel domain-adaptive dictionary learning framework for cross-domain visual recognition. Our method generates a set of intermediate domains. These intermediate domains form a smooth path and bridge the gap between the source and target domains. Specifically, we not only learn a common dictionary to encode the domain-shared features, but also learn a set of domain-specific dictionaries to model the domain shift. The separation of the common and domain-specific dictionaries enables us to learn more compact and reconstructive dictionaries for domain adaptation. These dictionaries are learned by alternating between domain-adaptive sparse coding and dictionary updating steps. Meanwhile, our approach gradually recovers the feature representations of both source and target data along the domain path. By aligning all the recovered domain data, we derive the final domain-adaptive features for cross-domain visual recognition. Extensive experiments on three public datasets demonstrates that our approach outperforms most state-of-the-art methods. |
Tasks | Dictionary Learning, Domain Adaptation |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04687v2 |
http://arxiv.org/pdf/1804.04687v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-domain-visual-recognition-via-domain |
Repo | |
Framework | |
SAVERS: SAR ATR with Verification Support Based on Convolutional Neural Network
Title | SAVERS: SAR ATR with Verification Support Based on Convolutional Neural Network |
Authors | Hidetoshi Furukawa |
Abstract | We propose a new convolutional neural network (CNN) which performs coarse and fine segmentation for end-to-end synthetic aperture radar (SAR) automatic target recognition (ATR) system. In recent years, many CNNs for SAR ATR using deep learning have been proposed, but most of them classify target classes from fixed size target chips extracted from SAR imagery. On the other hand, we proposed the CNN which outputs the score of the multiple target classes and a background class for each pixel from the SAR imagery of arbitrary size and multiple targets as fine segmentation. However, it was necessary for humans to judge the CNN segmentation result. In this report, we propose a CNN called SAR ATR with verification support (SAVERS), which performs region-wise (i.e. coarse) segmentation and pixel-wise segmentation. SAVERS discriminates between target and non-target, and classifies multiple target classes and non-target class by coarse segmentation. This report describes the evaluation results of SAVERS using the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset. |
Tasks | |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.06298v1 |
http://arxiv.org/pdf/1805.06298v1.pdf | |
PWC | https://paperswithcode.com/paper/savers-sar-atr-with-verification-support |
Repo | |
Framework | |
ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction
Title | ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction |
Authors | Jonathan Rotsztejn, Nora Hollenstein, Ce Zhang |
Abstract | Reliably detecting relevant relations between entities in unstructured text is a valuable resource for knowledge extraction, which is why it has awaken significant interest in the field of Natural Language Processing. In this paper, we present a system for relation classification and extraction based on an ensemble of convolutional and recurrent neural networks that ranked first in 3 out of the 4 subtasks at SemEval 2018 Task 7. We provide detailed explanations and grounds for the design choices behind the most relevant features and analyze their importance. |
Tasks | Relation Classification |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.02042v1 |
http://arxiv.org/pdf/1804.02042v1.pdf | |
PWC | https://paperswithcode.com/paper/eth-ds3lab-at-semeval-2018-task-7-effectively |
Repo | |
Framework | |
Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback
Title | Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback |
Authors | Tanner Fiez, Shreyas Sekar, Lillian J. Ratliff |
Abstract | We study a multi-armed bandit problem in a dynamic environment where arm rewards evolve in a correlated fashion according to a Markov chain. Different than much of the work on related problems, in our formulation a learning algorithm does not have access to either a priori information or observations of the state of the Markov chain and only observes smoothed reward feedback following time intervals we refer to as epochs. We demonstrate that existing methods such as UCB and $\varepsilon$-greedy can suffer linear regret in such an environment. Employing mixing-time bounds on Markov chains, we develop algorithms called EpochUCB and EpochGreedy that draw inspiration from the aforementioned methods, yet which admit sublinear regret guarantees for the problem formulation. Our proposed algorithms proceed in epochs in which an arm is played repeatedly for a number of iterations that grows linearly as a function of the number of times an arm has been played in the past. We analyze these algorithms under two types of smoothed reward feedback at the end of each epoch: a reward that is the discount-average of the discounted rewards within an epoch, and a reward that is the time-average of the rewards within an epoch. |
Tasks | Multi-Armed Bandits, Q-Learning |
Published | 2018-03-11 |
URL | http://arxiv.org/abs/1803.04008v2 |
http://arxiv.org/pdf/1803.04008v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-armed-bandits-for-correlated-markovian |
Repo | |
Framework | |
General-purpose Declarative Inductive Programming with Domain-Specific Background Knowledge for Data Wrangling Automation
Title | General-purpose Declarative Inductive Programming with Domain-Specific Background Knowledge for Data Wrangling Automation |
Authors | Lidia Contreras-Ochando, César Ferri, José Hernández-Orallo, Fernando Martínez-Plumed, María José Ramírez-Quintana, Susumu Katayama |
Abstract | Given one or two examples, humans are good at understanding how to solve a problem independently of its domain, because they are able to detect what the problem is and to choose the appropriate background knowledge according to the context. For instance, presented with the string “8/17/2017” to be transformed to “17th of August of 2017”, humans will process this in two steps: (1) they recognise that it is a date and (2) they map the date to the 17th of August of 2017. Inductive Programming (IP) aims at learning declarative (functional or logic) programs from examples. Two key advantages of IP are the use of background knowledge and the ability to synthesise programs from a few input/output examples (as humans do). In this paper we propose to use IP as a means for automating repetitive data manipulation tasks, frequently presented during the process of {\em data wrangling} in many data manipulation problems. Here we show that with the use of general-purpose declarative (programming) languages jointly with generic IP systems and the definition of domain-specific knowledge, many specific data wrangling problems from different application domains can be automatically solved from very few examples. We also propose an integrated benchmark for data wrangling, which we share publicly for the community. |
Tasks | |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.10054v1 |
http://arxiv.org/pdf/1809.10054v1.pdf | |
PWC | https://paperswithcode.com/paper/general-purpose-declarative-inductive |
Repo | |
Framework | |
Unspeech: Unsupervised Speech Context Embeddings
Title | Unspeech: Unsupervised Speech Context Embeddings |
Authors | Benjamin Milde, Chris Biemann |
Abstract | We introduce “Unspeech” embeddings, which are based on unsupervised learning of context feature representations for spoken language. The embeddings were trained on up to 9500 hours of crawled English speech data without transcriptions or speaker information, by using a straightforward learning objective based on context and non-context discrimination with negative sampling. We use a Siamese convolutional neural network architecture to train Unspeech embeddings and evaluate them on speaker comparison, utterance clustering and as a context feature in TDNN-HMM acoustic models trained on TED-LIUM, comparing it to i-vector baselines. Particularly decoding out-of-domain speech data from the recently released Common Voice corpus shows consistent WER reductions. We release our source code and pre-trained Unspeech models under a permissive open source license. |
Tasks | |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06775v2 |
http://arxiv.org/pdf/1804.06775v2.pdf | |
PWC | https://paperswithcode.com/paper/unspeech-unsupervised-speech-context |
Repo | |
Framework | |