Paper Group ANR 205
EEF: Exponentially Embedded Families with Class-Specific Features for Classification. Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering. Modelling Interaction of Sentence Pair with coupled-LSTMs. Stability of Analytic Neural Networks with Event-triggered Synaptic Feedbacks. Optimal Generalized Decision T …
EEF: Exponentially Embedded Families with Class-Specific Features for Classification
Title | EEF: Exponentially Embedded Families with Class-Specific Features for Classification |
Authors | Bo Tang, Steven Kay, Haibo He, Paul M. Baggenstoss |
Abstract | In this letter, we present a novel exponentially embedded families (EEF) based classification method, in which the probability density function (PDF) on raw data is estimated from the PDF on features. With the PDF construction, we show that class-specific features can be used in the proposed classification method, instead of a common feature subset for all classes as used in conventional approaches. We apply the proposed EEF classifier for text categorization as a case study and derive an optimal Bayesian classification rule with class-specific feature selection based on the Information Gain (IG) score. The promising performance on real-life data sets demonstrates the effectiveness of the proposed approach and indicates its wide potential applications. |
Tasks | Feature Selection, Text Categorization |
Published | 2016-05-11 |
URL | http://arxiv.org/abs/1605.03631v2 |
http://arxiv.org/pdf/1605.03631v2.pdf | |
PWC | https://paperswithcode.com/paper/eef-exponentially-embedded-families-with |
Repo | |
Framework | |
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering
Title | Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering |
Authors | Arun Mallya, Svetlana Lazebnik |
Abstract | This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels each. We use multiple instance learning to handle the lack of supervision on the level of individual person instances, and weighted loss to handle unbalanced training data. Further, we show how specialized features trained on these datasets can be used to improve accuracy on the Visual Question Answering (VQA) task, in the form of multiple choice fill-in-the-blank questions (Visual Madlibs). Specifically, we tackle two types of questions on person activity and person-object relationship and show improvements over generic features trained on the ImageNet classification task. |
Tasks | Human-Object Interaction Detection, Multiple Instance Learning, Question Answering, Visual Question Answering |
Published | 2016-04-16 |
URL | http://arxiv.org/abs/1604.04808v2 |
http://arxiv.org/pdf/1604.04808v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-models-for-actions-and-person-object |
Repo | |
Framework | |
Modelling Interaction of Sentence Pair with coupled-LSTMs
Title | Modelling Interaction of Sentence Pair with coupled-LSTMs |
Authors | Pengfei Liu, Xipeng Qiu, Xuanjing Huang |
Abstract | Recently, there is rising interest in modelling the interactions of two sentences with deep neural networks. However, most of the existing methods encode two sequences with separate encoders, in which a sentence is encoded with little or no information from the other sentence. In this paper, we propose a deep architecture to model the strong interaction of sentence pair with two coupled-LSTMs. Specifically, we introduce two coupled ways to model the interdependences of two LSTMs, coupling the local contextualized interactions of two sentences. We then aggregate these interactions and use a dynamic pooling to select the most informative features. Experiments on two very large datasets demonstrate the efficacy of our proposed architecture and its superiority to state-of-the-art methods. |
Tasks | |
Published | 2016-05-18 |
URL | http://arxiv.org/abs/1605.05573v2 |
http://arxiv.org/pdf/1605.05573v2.pdf | |
PWC | https://paperswithcode.com/paper/modelling-interaction-of-sentence-pair-with |
Repo | |
Framework | |
Stability of Analytic Neural Networks with Event-triggered Synaptic Feedbacks
Title | Stability of Analytic Neural Networks with Event-triggered Synaptic Feedbacks |
Authors | Ren Zheng, Xinlei Yi, Wenlian Lu, Tianping Chen |
Abstract | In this paper, we investigate stability of a class of analytic neural networks with the synaptic feedback via event-triggered rules. This model is general and include Hopfield neural network as a special case. These event-trigger rules can efficiently reduces loads of computation and information transmission at synapses of the neurons. The synaptic feedback of each neuron keeps a constant value based on the outputs of the other neurons at its latest triggering time but changes at its next triggering time, which is determined by certain criterion. It is proved that every trajectory of the analytic neural network converges to certain equilibrium under this event-triggered rule for all initial values except a set of zero measure. The main technique of the proof is the Lojasiewicz inequality to prove the finiteness of trajectory length. The realization of this event-triggered rule is verified by the exclusion of Zeno behaviors. Numerical examples are provided to illustrate the efficiency of the theoretical results. |
Tasks | |
Published | 2016-04-02 |
URL | http://arxiv.org/abs/1604.00457v1 |
http://arxiv.org/pdf/1604.00457v1.pdf | |
PWC | https://paperswithcode.com/paper/stability-of-analytic-neural-networks-with |
Repo | |
Framework | |
Optimal Generalized Decision Trees via Integer Programming
Title | Optimal Generalized Decision Trees via Integer Programming |
Authors | Oktay Gunluk, Jayant Kalagnanam, Minhan Li, Matt Menickelly, Katya Scheinberg |
Abstract | Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. Our approach can also handle numerical features via thresholding. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers. |
Tasks | |
Published | 2016-12-10 |
URL | https://arxiv.org/abs/1612.03225v3 |
https://arxiv.org/pdf/1612.03225v3.pdf | |
PWC | https://paperswithcode.com/paper/optimal-generalized-decision-trees-via |
Repo | |
Framework | |
Predicting the Relative Difficulty of Single Sentences With and Without Surrounding Context
Title | Predicting the Relative Difficulty of Single Sentences With and Without Surrounding Context |
Authors | Elliot Schumacher, Maxine Eskenazi, Gwen Frishkoff, Kevyn Collins-Thompson |
Abstract | The problem of accurately predicting relative reading difficulty across a set of sentences arises in a number of important natural language applications, such as finding and curating effective usage examples for intelligent language tutoring systems. Yet while significant research has explored document- and passage-level reading difficulty, the special challenges involved in assessing aspects of readability for single sentences have received much less attention, particularly when considering the role of surrounding passages. We introduce and evaluate a novel approach for estimating the relative reading difficulty of a set of sentences, with and without surrounding context. Using different sets of lexical and grammatical features, we explore models for predicting pairwise relative difficulty using logistic regression, and examine rankings generated by aggregating pairwise difficulty labels using a Bayesian rating system to form a final ranking. We also compare rankings derived for sentences assessed with and without context, and find that contextual features can help predict differences in relative difficulty judgments across these two conditions. |
Tasks | |
Published | 2016-06-27 |
URL | http://arxiv.org/abs/1606.08425v3 |
http://arxiv.org/pdf/1606.08425v3.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-relative-difficulty-of-single |
Repo | |
Framework | |
New Ideas for Brain Modelling 3
Title | New Ideas for Brain Modelling 3 |
Authors | Kieran Greer |
Abstract | This paper considers a process for the creation and subsequent firing of sequences of neuronal patterns, as might be found in the human brain. The scale is one of larger patterns emerging from an ensemble mass, possibly through some type of energy equation and a reduction procedure. The links between the patterns can be formed naturally, as a residual effect of the pattern creation itself. This paper follows-on closely from the earlier research, including two earlier papers in the series and uses the ideas of entropy and cohesion. With a small addition, it is possible to show how the inter-pattern links can be determined. A compact Grid form of an earlier Counting Mechanism is also demonstrated and may be a new clustering technique. It is possible to explain how a very basic repeating structure can form the arbitrary patterns and activation sequences between them, and a key question of how nodes synchronise may even be answerable. |
Tasks | |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1612.00369v8 |
http://arxiv.org/pdf/1612.00369v8.pdf | |
PWC | https://paperswithcode.com/paper/new-ideas-for-brain-modelling-3 |
Repo | |
Framework | |
Multi-Source Multi-View Clustering via Discrepancy Penalty
Title | Multi-Source Multi-View Clustering via Discrepancy Penalty |
Authors | Weixiang Shao, Jiawei Zhang, Lifang He, Philip S. Yu |
Abstract | With the advance of technology, entities can be observed in multiple views. Multiple views containing different types of features can be used for clustering. Although multi-view clustering has been successfully applied in many applications, the previous methods usually assume the complete instance mapping between different views. In many real-world applications, information can be gathered from multiple sources, while each source can contain multiple views, which are more cohesive for learning. The views under the same source are usually fully mapped, but they can be very heterogeneous. Moreover, the mappings between different sources are usually incomplete and partially observed, which makes it more difficult to integrate all the views across different sources. In this paper, we propose MMC (Multi-source Multi-view Clustering), which is a framework based on collective spectral clustering with a discrepancy penalty across sources, to tackle these challenges. MMC has several advantages compared with other existing methods. First, MMC can deal with incomplete mapping between sources. Second, it considers the disagreements between sources while treating views in the same source as a cohesive set. Third, MMC also tries to infer the instance similarities across sources to enhance the clustering performance. Extensive experiments conducted on real-world data demonstrate the effectiveness of the proposed approach. |
Tasks | |
Published | 2016-04-14 |
URL | http://arxiv.org/abs/1604.04029v2 |
http://arxiv.org/pdf/1604.04029v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-source-multi-view-clustering-via |
Repo | |
Framework | |
Using Recurrent Neural Network for Learning Expressive Ontologies
Title | Using Recurrent Neural Network for Learning Expressive Ontologies |
Authors | Giulio Petrucci, Chiara Ghidini, Marco Rospocher |
Abstract | Recently, Neural Networks have been proven extremely effective in many natural language processing tasks such as sentiment analysis, question answering, or machine translation. Aiming to exploit such advantages in the Ontology Learning process, in this technical report we present a detailed description of a Recurrent Neural Network based system to be used to pursue such goal. |
Tasks | Machine Translation, Question Answering, Sentiment Analysis |
Published | 2016-07-14 |
URL | http://arxiv.org/abs/1607.04110v1 |
http://arxiv.org/pdf/1607.04110v1.pdf | |
PWC | https://paperswithcode.com/paper/using-recurrent-neural-network-for-learning |
Repo | |
Framework | |
Lens depth function and k-relative neighborhood graph: versatile tools for ordinal data analysis
Title | Lens depth function and k-relative neighborhood graph: versatile tools for ordinal data analysis |
Authors | Matthäus Kleindessner, Ulrike von Luxburg |
Abstract | In recent years it has become popular to study machine learning problems in a setting of ordinal distance information rather than numerical distance measurements. By ordinal distance information we refer to binary answers to distance comparisons such as $d(A,B)<d(C,D)$. For many problems in machine learning and statistics it is unclear how to solve them in such a scenario. Up to now, the main approach is to explicitly construct an ordinal embedding of the data points in the Euclidean space, an approach that has a number of drawbacks. In this paper, we propose algorithms for the problems of medoid estimation, outlier identification, classification, and clustering when given only ordinal data. They are based on estimating the lens depth function and the $k$-relative neighborhood graph on a data set. Our algorithms are simple, are much faster than an ordinal embedding approach and avoid some of its drawbacks, and can easily be parallelized. |
Tasks | |
Published | 2016-02-23 |
URL | http://arxiv.org/abs/1602.07194v2 |
http://arxiv.org/pdf/1602.07194v2.pdf | |
PWC | https://paperswithcode.com/paper/lens-depth-function-and-k-relative |
Repo | |
Framework | |
Application of artificial neural networks and genetic algorithms for crude fractional distillation process modeling
Title | Application of artificial neural networks and genetic algorithms for crude fractional distillation process modeling |
Authors | Lukasz Pater |
Abstract | This work presents the application of the artificial neural networks, trained and structurally optimized by genetic algorithms, for modeling of crude distillation process at PKN ORLEN S.A. refinery. Models for the main fractionator distillation column products were developed using historical data. Quality of the fractions were predicted based on several chosen process variables. The performance of the model was validated using test data. Neural networks used in companion with genetic algorithms proved that they can accurately predict fractions quality shifts, reproducing the results of the standard laboratory analysis. Simple knowledge extraction method from neural network model built was also performed. Genetic algorithms can be successfully utilized in efficient training of large neural networks and finding their optimal structures. |
Tasks | |
Published | 2016-04-30 |
URL | http://arxiv.org/abs/1605.00097v1 |
http://arxiv.org/pdf/1605.00097v1.pdf | |
PWC | https://paperswithcode.com/paper/application-of-artificial-neural-networks-and |
Repo | |
Framework | |
Minimax Lower Bounds for Linear Independence Testing
Title | Minimax Lower Bounds for Linear Independence Testing |
Authors | Aaditya Ramdas, David Isenberg, Aarti Singh, Larry Wasserman |
Abstract | Linear independence testing is a fundamental information-theoretic and statistical problem that can be posed as follows: given $n$ points ${(X_i,Y_i)}^n_{i=1}$ from a $p+q$ dimensional multivariate distribution where $X_i \in \mathbb{R}^p$ and $Y_i \in\mathbb{R}^q$, determine whether $a^T X$ and $b^T Y$ are uncorrelated for every $a \in \mathbb{R}^p, b\in \mathbb{R}^q$ or not. We give minimax lower bound for this problem (when $p+q,n \to \infty$, $(p+q)/n \leq \kappa < \infty$, without sparsity assumptions). In summary, our results imply that $n$ must be at least as large as $\sqrt {pq}/\Sigma_{XY}_F^2$ for any procedure (test) to have non-trivial power, where $\Sigma_{XY}$ is the cross-covariance matrix of $X,Y$. We also provide some evidence that the lower bound is tight, by connections to two-sample testing and regression in specific settings. |
Tasks | |
Published | 2016-01-23 |
URL | http://arxiv.org/abs/1601.06259v1 |
http://arxiv.org/pdf/1601.06259v1.pdf | |
PWC | https://paperswithcode.com/paper/minimax-lower-bounds-for-linear-independence |
Repo | |
Framework | |
Training Sparse Neural Networks
Title | Training Sparse Neural Networks |
Authors | Suraj Srinivas, Akshayvarun Subramanya, R. Venkatesh Babu |
Abstract | Deep neural networks with lots of parameters are typically used for large-scale computer vision tasks such as image classification. This is a result of using dense matrix multiplications and convolutions. However, sparse computations are known to be much more efficient. In this work, we train and build neural networks which implicitly use sparse computations. We introduce additional gate variables to perform parameter selection and show that this is equivalent to using a spike-and-slab prior. We experimentally validate our method on both small and large networks and achieve state-of-the-art compression results for sparse neural network models. |
Tasks | Image Classification |
Published | 2016-11-21 |
URL | http://arxiv.org/abs/1611.06694v1 |
http://arxiv.org/pdf/1611.06694v1.pdf | |
PWC | https://paperswithcode.com/paper/training-sparse-neural-networks |
Repo | |
Framework | |
Deep Hashing: A Joint Approach for Image Signature Learning
Title | Deep Hashing: A Joint Approach for Image Signature Learning |
Authors | Yadong Mu, Zhu Liu |
Abstract | Similarity-based image hashing represents crucial technique for visual data storage reduction and expedited image search. Conventional hashing schemes typically feed hand-crafted features into hash functions, which separates the procedures of feature extraction and hash function learning. In this paper, we propose a novel algorithm that concurrently performs feature engineering and non-linear supervised hashing function learning. Our technical contributions in this paper are two-folds: 1) deep network optimization is often achieved by gradient propagation, which critically requires a smooth objective function. The discrete nature of hash codes makes them not amenable for gradient-based optimization. To address this issue, we propose an exponentiated hashing loss function and its bilinear smooth approximation. Effective gradient calculation and propagation are thereby enabled; 2) pre-training is an important trick in supervised deep learning. The impact of pre-training on the hash code quality has never been discussed in current deep hashing literature. We propose a pre-training scheme inspired by recent advance in deep network based image classification, and experimentally demonstrate its effectiveness. Comprehensive quantitative evaluations are conducted on several widely-used image benchmarks. On all benchmarks, our proposed deep hashing algorithm outperforms all state-of-the-art competitors by significant margins. In particular, our algorithm achieves a near-perfect 0.99 in terms of Hamming ranking accuracy with only 12 bits on MNIST, and a new record of 0.74 on the CIFAR10 dataset. In comparison, the best accuracies obtained on CIFAR10 by existing hashing algorithms without or with deep networks are known to be 0.36 and 0.58 respectively. |
Tasks | Feature Engineering, Image Classification, Image Retrieval |
Published | 2016-08-12 |
URL | http://arxiv.org/abs/1608.03658v1 |
http://arxiv.org/pdf/1608.03658v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-hashing-a-joint-approach-for-image |
Repo | |
Framework | |
Centralized and Decentralized Global Outer-synchronization of Asymmetric Recurrent Time-varying Neural Network by Data-sampling
Title | Centralized and Decentralized Global Outer-synchronization of Asymmetric Recurrent Time-varying Neural Network by Data-sampling |
Authors | Wenlian Lu, Ren Zheng, Tianping Chen |
Abstract | In this paper, we discuss the outer-synchronization of the asymmetrically connected recurrent time-varying neural networks. By both centralized and decentralized discretization data sampling principles, we derive several sufficient conditions based on diverse vector norms that guarantee that any two trajectories from different initial values of the identical neural network system converge together. The lower bounds of the common time intervals between data samples in centralized and decentralized principles are proved to be positive, which guarantees exclusion of Zeno behavior. A numerical example is provided to illustrate the efficiency of the theoretical results. |
Tasks | |
Published | 2016-04-02 |
URL | http://arxiv.org/abs/1604.00462v1 |
http://arxiv.org/pdf/1604.00462v1.pdf | |
PWC | https://paperswithcode.com/paper/centralized-and-decentralized-global-outer |
Repo | |
Framework | |