Paper Group ANR 489
Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks. Discriminative Label Consistent Domain Adaptation. Unsupervised Detection and Explanation of Latent-class Contextual Anomalies. On the information in spike timing: neural codes derived from polychronous groups. Using Normalized Cross …
Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks
Title | Emotions are Universal: Learning Sentiment Based Representations of Resource-Poor Languages using Siamese Networks |
Authors | Nurendra Choudhary, Rajat Singh, Ishita Bindlish, Manish Shrivastava |
Abstract | Machine learning approaches in sentiment analysis principally rely on the abundance of resources. To limit this dependence, we propose a novel method called Siamese Network Architecture for Sentiment Analysis (SNASA) to learn representations of resource-poor languages by jointly training them with resource-rich languages using a siamese network. SNASA model consists of twin Bi-directional Long Short-Term Memory Recurrent Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive loss function, based on a similarity metric. The model learns the sentence representations of resource-poor and resource-rich language in a common sentiment space by using a similarity metric based on their individual sentiments. The model, hence, projects sentences with similar sentiment closer to each other and the sentences with different sentiment farther from each other. Experiments on large-scale datasets of resource-rich languages - English and Spanish and resource-poor languages - Hindi and Telugu reveal that SNASA outperforms the state-of-the-art sentiment analysis approaches based on distributional semantics, semantic rules, lexicon lists and deep neural network representations without sh |
Tasks | Sentiment Analysis |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.00805v1 |
http://arxiv.org/pdf/1804.00805v1.pdf | |
PWC | https://paperswithcode.com/paper/emotions-are-universal-learning-sentiment |
Repo | |
Framework | |
Discriminative Label Consistent Domain Adaptation
Title | Discriminative Label Consistent Domain Adaptation |
Authors | Lingkun Luo, Liming Chen, Ying lu, Shiqiang Hu |
Abstract | Domain adaptation (DA) is transfer learning which aims to learn an effective predictor on target data from source data despite data distribution mismatch between source and target. We present in this paper a novel unsupervised DA method for cross-domain visual recognition which simultaneously optimizes the three terms of a theoretically established error bound. Specifically, the proposed DA method iteratively searches a latent shared feature subspace where not only the divergence of data distributions between the source domain and the target domain is decreased as most state-of-the-art DA methods do, but also the inter-class distances are increased to facilitate discriminative learning. Moreover, the proposed DA method sparsely regresses class labels from the features achieved in the shared subspace while minimizing the prediction errors on the source data and ensuring label consistency between source and target. Data outliers are also accounted for to further avoid negative knowledge transfer. Comprehensive experiments and in-depth analysis verify the effectiveness of the proposed DA method which consistently outperforms the state-of-the-art DA methods on standard DA benchmarks, i.e., 12 cross-domain image classification tasks. |
Tasks | Domain Adaptation, Image Classification, Transfer Learning |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.08077v1 |
http://arxiv.org/pdf/1802.08077v1.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-label-consistent-domain |
Repo | |
Framework | |
Unsupervised Detection and Explanation of Latent-class Contextual Anomalies
Title | Unsupervised Detection and Explanation of Latent-class Contextual Anomalies |
Authors | Jacob Kauffmann, Grégoire Montavon, Luiz Alberto Lima, Shinichi Nakajima, Klaus-Robert Müller, Nico Görnitz |
Abstract | Detecting and explaining anomalies is a challenging effort. This holds especially true when data exhibits strong dependencies and single measurements need to be assessed and analyzed in their respective context. In this work, we consider scenarios where measurements are non-i.i.d, i.e. where samples are dependent on corresponding discrete latent variables which are connected through some given dependency structure, the contextual information. Our contribution is twofold: (i) Building atop of support vector data description (SVDD), we derive a method able to cope with latent-class dependency structure that can still be optimized efficiently. We further show that our approach neatly generalizes vanilla SVDD as well as k-means and conditional random fields (CRF) and provide a corresponding probabilistic interpretation. (ii) In unsupervised scenarios where it is not possible to quantify the accuracy of an anomaly detector, having an human-interpretable solution is the key to success. Based on deep Taylor decomposition and a reformulation of our trained anomaly detector as a neural network, we are able to backpropagate predictions to pixel-domain and thus identify features and regions of high relevance. We demonstrate the usefulness of our novel approach on toy data with known spatio-temporal structure and successfully validate on synthetic as well as real world off-shore data from the oil industry. |
Tasks | |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1806.11326v1 |
http://arxiv.org/pdf/1806.11326v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-detection-and-explanation-of |
Repo | |
Framework | |
On the information in spike timing: neural codes derived from polychronous groups
Title | On the information in spike timing: neural codes derived from polychronous groups |
Authors | Zhinus Marzi, Joao Hespanha, Upamanyu Madhow |
Abstract | There is growing evidence regarding the importance of spike timing in neural information processing, with even a small number of spikes carrying information, but computational models lag significantly behind those for rate coding. Experimental evidence on neuronal behavior is consistent with the dynamical and state dependent behavior provided by recurrent connections. This motivates the minimalistic abstraction investigated in this paper, aimed at providing insight into information encoding in spike timing via recurrent connections. We employ information-theoretic techniques for a simple reservoir model which encodes input spatiotemporal patterns into a sparse neural code, translating the polychronous groups introduced by Izhikevich into codewords on which we can perform standard vector operations. We show that the distance properties of the code are similar to those for (optimal) random codes. In particular, the code meets benchmarks associated with both linear classification and capacity, with the latter scaling exponentially with reservoir size. |
Tasks | |
Published | 2018-03-09 |
URL | http://arxiv.org/abs/1803.03692v1 |
http://arxiv.org/pdf/1803.03692v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-information-in-spike-timing-neural |
Repo | |
Framework | |
Using Normalized Cross Correlation in Least Squares Optimizations
Title | Using Normalized Cross Correlation in Least Squares Optimizations |
Authors | Oliver J. Woodford |
Abstract | Direct methods for vision have widely used photometric least squares minimizations since the seminal 1981 work of Lucas & Kanade, and have leveraged normalized cross correlation since at least 1972. However, no work to our knowledge has successfully combined photometric least squares minimizations and normalized cross correlation: despite obvious complementary benefits of efficiency and accuracy on the one hand, and robustness to lighting changes on the other. This work shows that combining the two methods is not only possible, but also straightforward and efficient. The resulting minimization is shown to be superior to competing approaches, both in terms of convergence rate and computation time. Furthermore, a new, robust, sparse formulation is introduced to mitigate local intensity variations and partial occlusions. |
Tasks | |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.04320v1 |
http://arxiv.org/pdf/1810.04320v1.pdf | |
PWC | https://paperswithcode.com/paper/using-normalized-cross-correlation-in-least |
Repo | |
Framework | |
On the Analysis of Trajectories of Gradient Descent in the Optimization of Deep Neural Networks
Title | On the Analysis of Trajectories of Gradient Descent in the Optimization of Deep Neural Networks |
Authors | Adepu Ravi Sankar, Vishwak Srinivasan, Vineeth N Balasubramanian |
Abstract | Theoretical analysis of the error landscape of deep neural networks has garnered significant interest in recent years. In this work, we theoretically study the importance of noise in the trajectories of gradient descent towards optimal solutions in multi-layer neural networks. We show that adding noise (in different ways) to a neural network while training increases the rank of the product of weight matrices of a multi-layer linear neural network. We thus study how adding noise can assist reaching a global optimum when the product matrix is full-rank (under certain conditions). We establish theoretical foundations between the noise induced into the neural network - either to the gradient, to the architecture, or to the input/output to a neural network - and the rank of product of weight matrices. We corroborate our theoretical findings with empirical results. |
Tasks | |
Published | 2018-07-21 |
URL | http://arxiv.org/abs/1807.08140v1 |
http://arxiv.org/pdf/1807.08140v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-analysis-of-trajectories-of-gradient |
Repo | |
Framework | |
Towards Task Understanding in Visual Settings
Title | Towards Task Understanding in Visual Settings |
Authors | Sebastin Santy, Wazeer Zulfikar, Rishabh Mehrotra, Emine Yilmaz |
Abstract | We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external hierarchical task ontology to produce task descriptions from input images. Detailed experiments highlight the efficacy of the extracted descriptions, which could potentially find their way in many applications, including image alt text generation. |
Tasks | Image Captioning, Text Generation |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11833v1 |
http://arxiv.org/pdf/1811.11833v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-task-understanding-in-visual-settings |
Repo | |
Framework | |
A Content-Based Late Fusion Approach Applied to Pedestrian Detection
Title | A Content-Based Late Fusion Approach Applied to Pedestrian Detection |
Authors | Jessica Sena, Artur Jordao, William Robson Schwartz |
Abstract | The variety of pedestrians detectors proposed in recent years has encouraged some works to fuse pedestrian detectors to achieve a more accurate detection. The intuition behind is to combine the detectors based on its spatial consensus. We propose a novel method called Content-Based Spatial Consensus (CSBC), which, in addition to relying on spatial consensus, considers the content of the detection windows to learn a weighted-fusion of pedestrian detectors. The result is a reduction in false alarms and an enhancement in the detection. In this work, we also demonstrate that there is small influence of the feature used to learn the contents of the windows of each detector, which enables our method to be efficient even employing simple features. The CSBC overcomes state-of-the-art fusion methods in the ETH dataset and in the Caltech dataset. Particularly, our method is more efficient since fewer detectors are necessary to achieve expressive results. |
Tasks | Pedestrian Detection |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.03361v1 |
http://arxiv.org/pdf/1806.03361v1.pdf | |
PWC | https://paperswithcode.com/paper/a-content-based-late-fusion-approach-applied |
Repo | |
Framework | |
Matrix Linear Discriminant Analysis
Title | Matrix Linear Discriminant Analysis |
Authors | Wei Hu, Weining Shen, Hua Zhou, Dehan Kong |
Abstract | We propose a novel linear discriminant analysis approach for the classification of high-dimensional matrix-valued data that commonly arises from imaging studies. Motivated by the equivalence of the conventional linear discriminant analysis and the ordinary least squares, we consider an efficient nuclear norm penalized regression that encourages a low-rank structure. Theoretical properties including a non-asymptotic risk bound and a rank consistency result are established. Simulation studies and an application to electroencephalography data show the superior performance of the proposed method over the existing approaches. |
Tasks | |
Published | 2018-09-24 |
URL | https://arxiv.org/abs/1809.08746v2 |
https://arxiv.org/pdf/1809.08746v2.pdf | |
PWC | https://paperswithcode.com/paper/matrix-linear-discriminant-analysis |
Repo | |
Framework | |
Cluster validity index based on Jeffrey divergence
Title | Cluster validity index based on Jeffrey divergence |
Authors | Ahmed Ben Said, Rachid Hadjidj, Sebti Foufou |
Abstract | Cluster validity indexes are very important tools designed for two purposes: comparing the performance of clustering algorithms and determining the number of clusters that best fits the data. These indexes are in general constructed by combining a measure of compactness and a measure of separation. A classical measure of compactness is the variance. As for separation, the distance between cluster centers is used. However, such a distance does not always reflect the quality of the partition between clusters and sometimes gives misleading results. In this paper, we propose a new cluster validity index for which Jeffrey divergence is used to measure separation between clusters. Experimental results are conducted using different types of data and comparison with widely used cluster validity indexes demonstrates the outperformance of the proposed index. |
Tasks | |
Published | 2018-12-20 |
URL | http://arxiv.org/abs/1812.08891v1 |
http://arxiv.org/pdf/1812.08891v1.pdf | |
PWC | https://paperswithcode.com/paper/cluster-validity-index-based-on-jeffrey |
Repo | |
Framework | |
Revisiting Perspective Information for Efficient Crowd Counting
Title | Revisiting Perspective Information for Efficient Crowd Counting |
Authors | Miaojing Shi, Zhaohui Yang, Chao Xu, Qijun Chen |
Abstract | Crowd counting is the task of estimating people numbers in crowd images. Modern crowd counting methods employ deep neural networks to estimate crowd counts via crowd density regressions. A major challenge of this task lies in the perspective distortion, which results in drastic person scale change in an image. Density regression on the small person area is in general very hard. In this work, we propose a perspective-aware convolutional neural network (PACNN) for efficient crowd counting, which integrates the perspective information into density regression to provide additional knowledge of the person scale change in an image. Ground truth perspective maps are firstly generated for training; PACNN is then specifically designed to predict multi-scale perspective maps, and encode them as perspective-aware weighting layers in the network to adaptively combine the outputs of multi-scale density maps. The weights are learned at every pixel of the maps such that the final density combination is robust to the perspective distortion. We conduct extensive experiments on the ShanghaiTech, WorldExpo’10, UCF_CC_50, and UCSD datasets, and demonstrate the effectiveness and efficiency of PACNN over the state-of-the-art. |
Tasks | Crowd Counting |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.01989v3 |
http://arxiv.org/pdf/1807.01989v3.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-perspective-information-for |
Repo | |
Framework | |
Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
Title | Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition |
Authors | Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen |
Abstract | Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets. Recently most ZSL approaches focus on learning visual-semantic embeddings to transfer knowledge from the auxiliary datasets to the novel classes. However, few works study whether the semantic information is discriminative or not for the recognition task. To tackle such problem, we propose a coupled dictionary learning approach to align the visual-semantic structures using the class prototypes, where the discriminative information lying in the visual space is utilized to improve the less discriminative semantic space. Then, zero-shot recognition can be performed in different spaces by the simple nearest neighbor approach using the learned class prototypes. Extensive experiments on four benchmark datasets show the effectiveness of the proposed approach. |
Tasks | Dictionary Learning, Zero-Shot Learning |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.09123v1 |
http://arxiv.org/pdf/1807.09123v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-class-prototypes-via-structure |
Repo | |
Framework | |
Correlated Anomaly Detection from Large Streaming Data
Title | Correlated Anomaly Detection from Large Streaming Data |
Authors | Zheng Chen, Xinli Yu, Yuan Ling, Bo Song, Wei Quan, Xiaohua Hu, Erjia Yan |
Abstract | Correlated anomaly detection (CAD) from streaming data is a type of group anomaly detection and an essential task in useful real-time data mining applications like botnet detection, financial event detection, industrial process monitor, etc. The primary approach for this type of detection in previous researches is based on principal score (PS) of divided batches or sliding windows by computing top eigenvalues of the correlation matrix, e.g. the Lanczos algorithm. However, this paper brings up the phenomenon of principal score degeneration for large data set, and then mathematically and practically prove current PS-based methods are likely to fail for CAD on large-scale streaming data even if the number of correlated anomalies grows with the data size at a reasonable rate; in reality, anomalies tend to be the minority of the data, and this issue can be more serious. We propose a framework with two novel randomized algorithms rPS and gPS for better detection of correlated anomalies from large streaming data of various correlation strength. The experiment shows high and balanced recall and estimated accuracy of our framework for anomaly detection from a large server log data set and a U.S. stock daily price data set in comparison to direct principal score evaluation and some other recent group anomaly detection algorithms. Moreover, our techniques significantly improve the computation efficiency and scalability for principal score calculation. |
Tasks | Anomaly Detection, Group Anomaly Detection |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.09387v2 |
http://arxiv.org/pdf/1812.09387v2.pdf | |
PWC | https://paperswithcode.com/paper/correlated-anomaly-detection-from-large |
Repo | |
Framework | |
A Symbolic Approach to Explaining Bayesian Network Classifiers
Title | A Symbolic Approach to Explaining Bayesian Network Classifiers |
Authors | Andy Shih, Arthur Choi, Adnan Darwiche |
Abstract | We propose an approach for explaining Bayesian network classifiers, which is based on compiling such classifiers into decision functions that have a tractable and symbolic form. We introduce two types of explanations for why a classifier may have classified an instance positively or negatively and suggest algorithms for computing these explanations. The first type of explanation identifies a minimal set of the currently active features that is responsible for the current classification, while the second type of explanation identifies a minimal set of features whose current state (active or not) is sufficient for the classification. We consider in particular the compilation of Naive and Latent-Tree Bayesian network classifiers into Ordered Decision Diagrams (ODDs), providing a context for evaluating our proposal using case studies and experiments based on classifiers from the literature. |
Tasks | |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03364v1 |
http://arxiv.org/pdf/1805.03364v1.pdf | |
PWC | https://paperswithcode.com/paper/a-symbolic-approach-to-explaining-bayesian |
Repo | |
Framework | |
Connecting the Dots Between MLE and RL for Sequence Prediction
Title | Connecting the Dots Between MLE and RL for Sequence Prediction |
Authors | Bowen Tan, Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric Xing |
Abstract | Sequence prediction models can be learned from example sequences with a variety of training algorithms. Maximum likelihood learning is simple and efficient, yet can suffer from compounding error at test time. Reinforcement learning such as policy gradient addresses the issue but can have prohibitively poor exploration efficiency. A rich set of other algorithms such as RAML, SPG, and data noising, have also been developed from different perspectives. This paper establishes a formal connection between these algorithms. We present a generalized entropy regularized policy optimization formulation, and show that the apparently distinct algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of a reward function and a couple of hyperparameters. The unified interpretation offers a systematic view of the varying properties of exploration and learning efficiency. Besides, inspired from the framework, we present a new algorithm that dynamically interpolates among the family of algorithms for scheduled sequence model learning. Experiments on machine translation, text summarization, and game imitation learning demonstrate the superiority of the proposed algorithm. |
Tasks | Imitation Learning, Machine Translation, Text Summarization |
Published | 2018-11-24 |
URL | https://arxiv.org/abs/1811.09740v2 |
https://arxiv.org/pdf/1811.09740v2.pdf | |
PWC | https://paperswithcode.com/paper/connecting-the-dots-between-mle-and-rl-for |
Repo | |
Framework | |