Paper Group ANR 393
Theory reconstruction: a representation learning view on predicate invention. Combining local and global smoothing in multivariate density estimation. Disfluency Detection using a Bidirectional LSTM. Recommendations as Treatments: Debiasing Learning and Evaluation. Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in C …
Theory reconstruction: a representation learning view on predicate invention
Title | Theory reconstruction: a representation learning view on predicate invention |
Authors | Sebastijan Dumancic, Wannes Meert, Hendrik Blockeel |
Abstract | With this positional paper we present a representation learning view on predicate invention. The intention of this proposal is to bridge the relational and deep learning communities on the problem of predicate invention. We propose a theory reconstruction approach, a formalism that extends autoencoder approach to representation learning to the relational settings. Our intention is to start a discussion to define a unifying framework for predicate invention and theory revision. |
Tasks | Representation Learning |
Published | 2016-06-28 |
URL | http://arxiv.org/abs/1606.08660v2 |
http://arxiv.org/pdf/1606.08660v2.pdf | |
PWC | https://paperswithcode.com/paper/theory-reconstruction-a-representation |
Repo | |
Framework | |
Combining local and global smoothing in multivariate density estimation
Title | Combining local and global smoothing in multivariate density estimation |
Authors | Adelchi Azzalini |
Abstract | Non-parametric estimation of a multivariate density estimation is tackled via a method which combines traditional local smoothing with a form of global smoothing but without imposing a rigid structure. Simulation work delivers encouraging indications on the effectiveness of the method. An application to density-based clustering illustrates a possible usage. |
Tasks | Density Estimation |
Published | 2016-10-07 |
URL | http://arxiv.org/abs/1610.02372v1 |
http://arxiv.org/pdf/1610.02372v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-local-and-global-smoothing-in |
Repo | |
Framework | |
Disfluency Detection using a Bidirectional LSTM
Title | Disfluency Detection using a Bidirectional LSTM |
Authors | Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi |
Abstract | We introduce a new approach for disfluency detection using a Bidirectional Long-Short Term Memory neural network (BLSTM). In addition to the word sequence, the model takes as input pattern match features that were developed to reduce sensitivity to vocabulary size in training, which lead to improved performance over the word sequence alone. The BLSTM takes advantage of explicit repair states in addition to the standard reparandum states. The final output leverages integer linear programming to incorporate constraints of disfluency structure. In experiments on the Switchboard corpus, the model achieves state-of-the-art performance for both the standard disfluency detection task and the correction detection task. Analysis shows that the model has better detection of non-repetition disfluencies, which tend to be much harder to detect. |
Tasks | |
Published | 2016-04-12 |
URL | http://arxiv.org/abs/1604.03209v1 |
http://arxiv.org/pdf/1604.03209v1.pdf | |
PWC | https://paperswithcode.com/paper/disfluency-detection-using-a-bidirectional |
Repo | |
Framework | |
Recommendations as Treatments: Debiasing Learning and Evaluation
Title | Recommendations as Treatments: Debiasing Learning and Evaluation |
Authors | Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims |
Abstract | Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, finding that it is highly practical and scalable. |
Tasks | Causal Inference, Recommendation Systems |
Published | 2016-02-17 |
URL | http://arxiv.org/abs/1602.05352v2 |
http://arxiv.org/pdf/1602.05352v2.pdf | |
PWC | https://paperswithcode.com/paper/recommendations-as-treatments-debiasing |
Repo | |
Framework | |
Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets
Title | Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets |
Authors | Yunjie Liu, Evan Racah, Prabhat, Joaquin Correa, Amir Khosrowshahi, David Lavers, Kenneth Kunkel, Michael Wehner, William Collins |
Abstract | Detecting extreme events in large datasets is a major challenge in climate science research. Current algorithms for extreme event detection are build upon human expertise in defining events based on subjective thresholds of relevant physical variables. Often, multiple competing methods produce vastly different results on the same dataset. Accurate characterization of extreme events in climate simulations and observational data archives is critical for understanding the trends and potential impacts of such events in a climate change content. This study presents the first application of Deep Learning techniques as alternative methodology for climate extreme events detection. Deep neural networks are able to learn high-level representations of a broad class of patterns from labeled data. In this work, we developed deep Convolutional Neural Network (CNN) classification system and demonstrated the usefulness of Deep Learning technique for tackling climate pattern detection problems. Coupled with Bayesian based hyper-parameter optimization scheme, our deep CNN system achieves 89%-99% of accuracy in detecting extreme events (Tropical Cyclones, Atmospheric Rivers and Weather Fronts |
Tasks | |
Published | 2016-05-04 |
URL | http://arxiv.org/abs/1605.01156v1 |
http://arxiv.org/pdf/1605.01156v1.pdf | |
PWC | https://paperswithcode.com/paper/application-of-deep-convolutional-neural-1 |
Repo | |
Framework | |
Joint Bayesian Gaussian discriminant analysis for speaker verification
Title | Joint Bayesian Gaussian discriminant analysis for speaker verification |
Authors | Yiyan Wang, Haotian Xu, Zhijian Ou |
Abstract | State-of-the-art i-vector based speaker verification relies on variants of Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis. We are mainly motivated by the recent work of the joint Bayesian (JB) method, which is originally proposed for discriminant analysis in face verification. We apply JB to speaker verification and make three contributions beyond the original JB. 1) In contrast to the EM iterations with approximated statistics in the original JB, the EM iterations with exact statistics are employed and give better performance. 2) We propose to do simultaneous diagonalization (SD) of the within-class and between-class covariance matrices to achieve efficient testing, which has broader application scope than the SVD-based efficient testing method in the original JB. 3) We scrutinize similarities and differences between various Gaussian PLDAs and JB, complementing the previous analysis of comparing JB only with Prince-Elder PLDA. Extensive experiments are conducted on NIST SRE10 core condition 5, empirically validating the superiority of JB with faster convergence rate and 9-13% EER reduction compared with state-of-the-art PLDA. |
Tasks | Face Verification, Speaker Verification |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04056v2 |
http://arxiv.org/pdf/1612.04056v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-bayesian-gaussian-discriminant-analysis |
Repo | |
Framework | |
Learning Grimaces by Watching TV
Title | Learning Grimaces by Watching TV |
Authors | Samuel Albanie, Andrea Vedaldi |
Abstract | Differently from computer vision systems which require explicit supervision, humans can learn facial expressions by observing people in their environment. In this paper, we look at how similar capabilities could be developed in machine vision. As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos. In particular, we consider a gameshow in which contestants play to win significant sums of money. We extract events affecting the game and corresponding facial expressions objectively and automatically from the videos, obtaining large quantities of labelled data for our study. We also develop, using benchmarks such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial expression recognition, showing that pre-training on face verification data can be highly beneficial for this task. Then, we extend these models to use facial expressions to predict events in videos and learn nameable expressions from them. The dataset and emotion recognition models are available at http://www.robots.ox.ac.uk/~vgg/data/facevalue |
Tasks | Emotion Recognition, Face Verification, Facial Expression Recognition |
Published | 2016-10-07 |
URL | http://arxiv.org/abs/1610.02255v1 |
http://arxiv.org/pdf/1610.02255v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-grimaces-by-watching-tv |
Repo | |
Framework | |
A Differentiable Physics Engine for Deep Learning in Robotics
Title | A Differentiable Physics Engine for Deep Learning in Robotics |
Authors | Jonas Degrave, Michiel Hermans, Joni Dambre, Francis wyffels |
Abstract | An important field in robotics is the optimization of controllers. Currently, robots are often treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. When gradient-based methods are used, models are kept small or rely on finite difference approximations for the Jacobian. This method quickly grows expensive with increasing numbers of parameters, such as found in deep learning. We propose the implementation of a modern physics engine, which can differentiate control parameters. This engine is implemented for both CPU and GPU. Firstly, this paper shows how such an engine speeds up the optimization process, even for small problems. Furthermore, it explains why this is an alternative approach to deep Q-learning, for using deep learning in robotics. Finally, we argue that this is a big step for deep learning in robotics, as it opens up new possibilities to optimize robots, both in hardware and software. |
Tasks | Q-Learning |
Published | 2016-11-05 |
URL | http://arxiv.org/abs/1611.01652v2 |
http://arxiv.org/pdf/1611.01652v2.pdf | |
PWC | https://paperswithcode.com/paper/a-differentiable-physics-engine-for-deep |
Repo | |
Framework | |
Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks
Title | Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks |
Authors | Lorenzo Baraldi, Costantino Grana, Rita Cucchiara |
Abstract | This paper presents a novel approach for temporal and semantic segmentation of edited videos into meaningful segments, from the point of view of the storytelling structure. The objective is to decompose a long video into more manageable sequences, which can in turn be used to retrieve the most significant parts of it given a textual query and to provide an effective summarization. Previous video decomposition methods mainly employed perceptual cues, tackling the problem either as a story change detection, or as a similarity grouping task, and the lack of semantics limited their ability to identify story boundaries. Our proposal connects together perceptual, audio and semantic cues in a specialized deep network architecture designed with a combination of CNNs which generate an appropriate embedding, and clusters shots into connected sequences of semantic scenes, i.e. stories. A retrieval presentation strategy is also proposed, by selecting the semantically and aesthetically “most valuable” thumbnails to present, considering the query in order to improve the storytelling presentation. Finally, the subjective nature of the task is considered, by conducting experiments with different annotators and by proposing an algorithm to maximize the agreement between automatic results and human annotators. |
Tasks | Semantic Segmentation |
Published | 2016-10-05 |
URL | http://arxiv.org/abs/1610.01376v2 |
http://arxiv.org/pdf/1610.01376v2.pdf | |
PWC | https://paperswithcode.com/paper/recognizing-and-presenting-the-storytelling |
Repo | |
Framework | |
Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment
Title | Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment |
Authors | Muhao Chen, Yingtao Tian, Mohan Yang, Carlo Zaniolo |
Abstract | Many recent works have demonstrated the benefits of knowledge graph embeddings in completing monolingual knowledge graphs. Inasmuch as related knowledge bases are built in several different languages, achieving cross-lingual knowledge alignment will help people in constructing a coherent knowledge base, and assist machines in dealing with different expressions of entity relationships across diverse human languages. Unfortunately, achieving this highly desirable crosslingual alignment by human labor is very costly and errorprone. Thus, we propose MTransE, a translation-based model for multilingual knowledge graph embeddings, to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its cross-lingual counterparts in other spaces, while preserving the functionalities of monolingual embeddings. We deploy three different techniques to represent cross-lingual transitions, namely axis calibration, translation vectors, and linear transformations, and derive five variants for MTransE using different loss functions. Our models can be trained on partially aligned graphs, where just a small portion of triples are aligned with their cross-lingual counterparts. The experiments on cross-lingual entity matching and triple-wise alignment verification show promising results, with some variants consistently outperforming others on different tasks. We also explore how MTransE preserves the key properties of its monolingual counterpart TransE. |
Tasks | Calibration, Knowledge Graph Embeddings, Knowledge Graphs |
Published | 2016-11-12 |
URL | http://arxiv.org/abs/1611.03954v3 |
http://arxiv.org/pdf/1611.03954v3.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-knowledge-graph-embeddings-for |
Repo | |
Framework | |
3D Human Pose Estimation from a Single Image via Distance Matrix Regression
Title | 3D Human Pose Estimation from a Single Image via Distance Matrix Regression |
Authors | Francesc Moreno-Noguer |
Abstract | This paper addresses the problem of 3D human pose estimation from a single image. We follow a standard two-step pipeline by first detecting the 2D position of the $N$ body joints, and then using these observations to infer 3D pose. For the first step, we use a recent CNN-based detector. For the second step, most existing approaches perform 2$N$-to-3$N$ regression of the Cartesian joint coordinates. We show that more precise pose estimates can be obtained by representing both the 2D and 3D human poses using $N\times N$ distance matrices, and formulating the problem as a 2D-to-3D distance matrix regression. For learning such a regressor we leverage on simple Neural Network architectures, which by construction, enforce positivity and symmetry of the predicted matrices. The approach has also the advantage to naturally handle missing observations and allowing to hypothesize the position of non-observed joints. Quantitative results on Humaneva and Human3.6M datasets demonstrate consistent performance gains over state-of-the-art. Qualitative evaluation on the images in-the-wild of the LSP dataset, using the regressor learned on Human3.6M, reveals very promising generalization results. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09010v1 |
http://arxiv.org/pdf/1611.09010v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-human-pose-estimation-from-a-single-image |
Repo | |
Framework | |
Joint Representation Learning of Text and Knowledge for Knowledge Graph Completion
Title | Joint Representation Learning of Text and Knowledge for Knowledge Graph Completion |
Authors | Xu Han, Zhiyuan Liu, Maosong Sun |
Abstract | Joint representation learning of text and knowledge within a unified semantic space enables us to perform knowledge graph completion more accurately. In this work, we propose a novel framework to embed words, entities and relations into the same continuous vector space. In this model, both entity and relation embeddings are learned by taking knowledge graph and plain text into consideration. In experiments, we evaluate the joint learning model on three tasks including entity prediction, relation prediction and relation classification from text. The experiment results show that our model can significantly and consistently improve the performance on the three tasks as compared with other baselines. |
Tasks | Knowledge Graph Completion, Relation Classification, Representation Learning |
Published | 2016-11-13 |
URL | http://arxiv.org/abs/1611.04125v1 |
http://arxiv.org/pdf/1611.04125v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-representation-learning-of-text-and |
Repo | |
Framework | |
Improving energy efficiency and classification accuracy of neuromorphic chips by learning binary synaptic crossbars
Title | Improving energy efficiency and classification accuracy of neuromorphic chips by learning binary synaptic crossbars |
Authors | Antonio Jimeno Yepes, Jianbin Tang |
Abstract | Deep Neural Networks (DNN) have achieved human level performance in many image analytics tasks but DNNs are mostly deployed to GPU platforms that consume a considerable amount of power. Brain-inspired spiking neuromorphic chips consume low power and can be highly parallelized. However, for deploying DNNs to energy efficient neuromorphic chips the incompatibility between continuous neurons and synaptic weights of traditional DNNs, discrete spiking neurons and synapses of neuromorphic chips has to be overcome. Previous work has achieved this by training a network to learn continuous probabilities and deployment to a neuromorphic architecture by random sampling these probabilities. An ensemble of sampled networks is needed to approximate the performance of the trained network. In the work presented in this paper, we have extended previous research by directly learning binary synaptic crossbars. Results on MNIST show that better performance can be achieved with a small network in one time step (92.7% maximum observed accuracy vs 95.98% accuracy in our work). Top results on a larger network are similar to previously published results (99.42% maximum observed accuracy vs 99.45% accuracy in our work). More importantly, in our work a smaller ensemble is needed to achieve similar or better accuracy than previous work, which translates into significantly decreased energy consumption for both networks. Results of our work are stable since they do not require random sampling. |
Tasks | |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07740v1 |
http://arxiv.org/pdf/1605.07740v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-energy-efficiency-and |
Repo | |
Framework | |
Learning and Fusing Multimodal Features from and for Multi-task Facial Computing
Title | Learning and Fusing Multimodal Features from and for Multi-task Facial Computing |
Authors | Wei Li, Zhigang Zhu |
Abstract | We propose a deep learning-based feature fusion approach for facial computing including face recognition as well as gender, race and age detection. Instead of training a single classifier on face images to classify them based on the features of the person whose face appears in the image, we first train four different classifiers for classifying face images based on race, age, gender and identification (ID). Multi-task features are then extracted from the trained models and cross-task-feature training is conducted which shows the value of fusing multimodal features extracted from multi-tasks. We have found that features trained for one task can be used for other related tasks. More interestingly, the features trained for a task with more classes (e.g. ID) and then used in another task with fewer classes (e.g. race) outperforms the features trained for the other task itself. The final feature fusion is performed by combining the four types of features extracted from the images by the four classifiers. The feature fusion approach improves the classifications accuracy by a 7.2%, 20.1%, 22.2%, 21.8% margin, respectively, for ID, age, race and gender recognition, over the results of single classifiers trained only on their individual features. The proposed method can be applied to applications in which different types of data or features can be extracted. |
Tasks | Face Recognition |
Published | 2016-10-14 |
URL | http://arxiv.org/abs/1610.04322v1 |
http://arxiv.org/pdf/1610.04322v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-and-fusing-multimodal-features-from |
Repo | |
Framework | |
Real Time Video Quality Representation Classification of Encrypted HTTP Adaptive Video Streaming - the Case of Safari
Title | Real Time Video Quality Representation Classification of Encrypted HTTP Adaptive Video Streaming - the Case of Safari |
Authors | Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar, Itay Richman, Ofir Trabelsi |
Abstract | The increasing popularity of HTTP adaptive video streaming services has dramatically increased bandwidth requirements on operator networks, which attempt to shape their traffic through Deep Packet Inspection (DPI). However, Google and certain content providers have started to encrypt their video services. As a result, operators often encounter difficulties in shaping their encrypted video traffic via DPI. This highlights the need for new traffic classification methods for encrypted HTTP adaptive video streaming to enable smart traffic shaping. These new methods will have to effectively estimate the quality representation layer and playout buffer. We present a new method and show for the first time that video quality representation classification for (YouTube) encrypted HTTP adaptive streaming is possible. We analyze the performance of this classification method with Safari over HTTPS. Based on a large number of offline and online traffic classification experiments, we demonstrate that it can independently classify, in real time, every video segment into one of the quality representation layers with 97.18% average accuracy. |
Tasks | |
Published | 2016-02-01 |
URL | http://arxiv.org/abs/1602.00489v2 |
http://arxiv.org/pdf/1602.00489v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-video-quality-representation |
Repo | |
Framework | |