May 6, 2019

2959 words 14 mins read

Paper Group ANR 393

Theory reconstruction: a representation learning view on predicate invention. Combining local and global smoothing in multivariate density estimation. Disfluency Detection using a Bidirectional LSTM. Recommendations as Treatments: Debiasing Learning and Evaluation. Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in C …

Theory reconstruction: a representation learning view on predicate invention


Title	Theory reconstruction: a representation learning view on predicate invention
Authors	Sebastijan Dumancic, Wannes Meert, Hendrik Blockeel
Abstract	With this positional paper we present a representation learning view on predicate invention. The intention of this proposal is to bridge the relational and deep learning communities on the problem of predicate invention. We propose a theory reconstruction approach, a formalism that extends autoencoder approach to representation learning to the relational settings. Our intention is to start a discussion to define a unifying framework for predicate invention and theory revision.
Tasks	Representation Learning
Published	2016-06-28
URL	http://arxiv.org/abs/1606.08660v2
PDF	http://arxiv.org/pdf/1606.08660v2.pdf
PWC	https://paperswithcode.com/paper/theory-reconstruction-a-representation
Repo
Framework

Combining local and global smoothing in multivariate density estimation


Title	Combining local and global smoothing in multivariate density estimation
Authors	Adelchi Azzalini
Abstract	Non-parametric estimation of a multivariate density estimation is tackled via a method which combines traditional local smoothing with a form of global smoothing but without imposing a rigid structure. Simulation work delivers encouraging indications on the effectiveness of the method. An application to density-based clustering illustrates a possible usage.
Tasks	Density Estimation
Published	2016-10-07
URL	http://arxiv.org/abs/1610.02372v1
PDF	http://arxiv.org/pdf/1610.02372v1.pdf
PWC	https://paperswithcode.com/paper/combining-local-and-global-smoothing-in
Repo
Framework

Disfluency Detection using a Bidirectional LSTM


Title	Disfluency Detection using a Bidirectional LSTM
Authors	Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi
Abstract	We introduce a new approach for disfluency detection using a Bidirectional Long-Short Term Memory neural network (BLSTM). In addition to the word sequence, the model takes as input pattern match features that were developed to reduce sensitivity to vocabulary size in training, which lead to improved performance over the word sequence alone. The BLSTM takes advantage of explicit repair states in addition to the standard reparandum states. The final output leverages integer linear programming to incorporate constraints of disfluency structure. In experiments on the Switchboard corpus, the model achieves state-of-the-art performance for both the standard disfluency detection task and the correction detection task. Analysis shows that the model has better detection of non-repetition disfluencies, which tend to be much harder to detect.
Tasks
Published	2016-04-12
URL	http://arxiv.org/abs/1604.03209v1
PDF	http://arxiv.org/pdf/1604.03209v1.pdf
PWC	https://paperswithcode.com/paper/disfluency-detection-using-a-bidirectional
Repo
Framework

Recommendations as Treatments: Debiasing Learning and Evaluation


Title	Recommendations as Treatments: Debiasing Learning and Evaluation
Authors	Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims
Abstract	Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, finding that it is highly practical and scalable.
Tasks	Causal Inference, Recommendation Systems
Published	2016-02-17
URL	http://arxiv.org/abs/1602.05352v2
PDF	http://arxiv.org/pdf/1602.05352v2.pdf
PWC	https://paperswithcode.com/paper/recommendations-as-treatments-debiasing
Repo
Framework

Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets


Title	Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets
Authors	Yunjie Liu, Evan Racah, Prabhat, Joaquin Correa, Amir Khosrowshahi, David Lavers, Kenneth Kunkel, Michael Wehner, William Collins
Abstract	Detecting extreme events in large datasets is a major challenge in climate science research. Current algorithms for extreme event detection are build upon human expertise in defining events based on subjective thresholds of relevant physical variables. Often, multiple competing methods produce vastly different results on the same dataset. Accurate characterization of extreme events in climate simulations and observational data archives is critical for understanding the trends and potential impacts of such events in a climate change content. This study presents the first application of Deep Learning techniques as alternative methodology for climate extreme events detection. Deep neural networks are able to learn high-level representations of a broad class of patterns from labeled data. In this work, we developed deep Convolutional Neural Network (CNN) classification system and demonstrated the usefulness of Deep Learning technique for tackling climate pattern detection problems. Coupled with Bayesian based hyper-parameter optimization scheme, our deep CNN system achieves 89%-99% of accuracy in detecting extreme events (Tropical Cyclones, Atmospheric Rivers and Weather Fronts
Tasks
Published	2016-05-04
URL	http://arxiv.org/abs/1605.01156v1
PDF	http://arxiv.org/pdf/1605.01156v1.pdf
PWC	https://paperswithcode.com/paper/application-of-deep-convolutional-neural-1
Repo
Framework

Joint Bayesian Gaussian discriminant analysis for speaker verification


Title	Joint Bayesian Gaussian discriminant analysis for speaker verification
Authors	Yiyan Wang, Haotian Xu, Zhijian Ou
Abstract	State-of-the-art i-vector based speaker verification relies on variants of Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis. We are mainly motivated by the recent work of the joint Bayesian (JB) method, which is originally proposed for discriminant analysis in face verification. We apply JB to speaker verification and make three contributions beyond the original JB. 1) In contrast to the EM iterations with approximated statistics in the original JB, the EM iterations with exact statistics are employed and give better performance. 2) We propose to do simultaneous diagonalization (SD) of the within-class and between-class covariance matrices to achieve efficient testing, which has broader application scope than the SVD-based efficient testing method in the original JB. 3) We scrutinize similarities and differences between various Gaussian PLDAs and JB, complementing the previous analysis of comparing JB only with Prince-Elder PLDA. Extensive experiments are conducted on NIST SRE10 core condition 5, empirically validating the superiority of JB with faster convergence rate and 9-13% EER reduction compared with state-of-the-art PLDA.
Tasks	Face Verification, Speaker Verification
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04056v2
PDF	http://arxiv.org/pdf/1612.04056v2.pdf
PWC	https://paperswithcode.com/paper/joint-bayesian-gaussian-discriminant-analysis
Repo
Framework

Learning Grimaces by Watching TV


Title	Learning Grimaces by Watching TV
Authors	Samuel Albanie, Andrea Vedaldi
Abstract	Differently from computer vision systems which require explicit supervision, humans can learn facial expressions by observing people in their environment. In this paper, we look at how similar capabilities could be developed in machine vision. As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos. In particular, we consider a gameshow in which contestants play to win significant sums of money. We extract events affecting the game and corresponding facial expressions objectively and automatically from the videos, obtaining large quantities of labelled data for our study. We also develop, using benchmarks such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial expression recognition, showing that pre-training on face verification data can be highly beneficial for this task. Then, we extend these models to use facial expressions to predict events in videos and learn nameable expressions from them. The dataset and emotion recognition models are available at http://www.robots.ox.ac.uk/~vgg/data/facevalue
Tasks	Emotion Recognition, Face Verification, Facial Expression Recognition
Published	2016-10-07
URL	http://arxiv.org/abs/1610.02255v1
PDF	http://arxiv.org/pdf/1610.02255v1.pdf
PWC	https://paperswithcode.com/paper/learning-grimaces-by-watching-tv
Repo
Framework

A Differentiable Physics Engine for Deep Learning in Robotics


Title	A Differentiable Physics Engine for Deep Learning in Robotics
Authors	Jonas Degrave, Michiel Hermans, Joni Dambre, Francis wyffels
Abstract	An important field in robotics is the optimization of controllers. Currently, robots are often treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. When gradient-based methods are used, models are kept small or rely on finite difference approximations for the Jacobian. This method quickly grows expensive with increasing numbers of parameters, such as found in deep learning. We propose the implementation of a modern physics engine, which can differentiate control parameters. This engine is implemented for both CPU and GPU. Firstly, this paper shows how such an engine speeds up the optimization process, even for small problems. Furthermore, it explains why this is an alternative approach to deep Q-learning, for using deep learning in robotics. Finally, we argue that this is a big step for deep learning in robotics, as it opens up new possibilities to optimize robots, both in hardware and software.
Tasks	Q-Learning
Published	2016-11-05
URL	http://arxiv.org/abs/1611.01652v2
PDF	http://arxiv.org/pdf/1611.01652v2.pdf
PWC	https://paperswithcode.com/paper/a-differentiable-physics-engine-for-deep
Repo
Framework

Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks


Title	Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks
Authors	Lorenzo Baraldi, Costantino Grana, Rita Cucchiara
Abstract	This paper presents a novel approach for temporal and semantic segmentation of edited videos into meaningful segments, from the point of view of the storytelling structure. The objective is to decompose a long video into more manageable sequences, which can in turn be used to retrieve the most significant parts of it given a textual query and to provide an effective summarization. Previous video decomposition methods mainly employed perceptual cues, tackling the problem either as a story change detection, or as a similarity grouping task, and the lack of semantics limited their ability to identify story boundaries. Our proposal connects together perceptual, audio and semantic cues in a specialized deep network architecture designed with a combination of CNNs which generate an appropriate embedding, and clusters shots into connected sequences of semantic scenes, i.e. stories. A retrieval presentation strategy is also proposed, by selecting the semantically and aesthetically “most valuable” thumbnails to present, considering the query in order to improve the storytelling presentation. Finally, the subjective nature of the task is considered, by conducting experiments with different annotators and by proposing an algorithm to maximize the agreement between automatic results and human annotators.
Tasks	Semantic Segmentation
Published	2016-10-05
URL	http://arxiv.org/abs/1610.01376v2
PDF	http://arxiv.org/pdf/1610.01376v2.pdf
PWC	https://paperswithcode.com/paper/recognizing-and-presenting-the-storytelling
Repo
Framework

Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment


Title	Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment
Authors	Muhao Chen, Yingtao Tian, Mohan Yang, Carlo Zaniolo
Abstract	Many recent works have demonstrated the benefits of knowledge graph embeddings in completing monolingual knowledge graphs. Inasmuch as related knowledge bases are built in several different languages, achieving cross-lingual knowledge alignment will help people in constructing a coherent knowledge base, and assist machines in dealing with different expressions of entity relationships across diverse human languages. Unfortunately, achieving this highly desirable crosslingual alignment by human labor is very costly and errorprone. Thus, we propose MTransE, a translation-based model for multilingual knowledge graph embeddings, to provide a simple and automated solution. By encoding entities and relations of each language in a separated embedding space, MTransE provides transitions for each embedding vector to its cross-lingual counterparts in other spaces, while preserving the functionalities of monolingual embeddings. We deploy three different techniques to represent cross-lingual transitions, namely axis calibration, translation vectors, and linear transformations, and derive five variants for MTransE using different loss functions. Our models can be trained on partially aligned graphs, where just a small portion of triples are aligned with their cross-lingual counterparts. The experiments on cross-lingual entity matching and triple-wise alignment verification show promising results, with some variants consistently outperforming others on different tasks. We also explore how MTransE preserves the key properties of its monolingual counterpart TransE.
Tasks	Calibration, Knowledge Graph Embeddings, Knowledge Graphs
Published	2016-11-12
URL	http://arxiv.org/abs/1611.03954v3
PDF	http://arxiv.org/pdf/1611.03954v3.pdf
PWC	https://paperswithcode.com/paper/multilingual-knowledge-graph-embeddings-for
Repo
Framework

3D Human Pose Estimation from a Single Image via Distance Matrix Regression


Title	3D Human Pose Estimation from a Single Image via Distance Matrix Regression
Authors	Francesc Moreno-Noguer
Abstract	This paper addresses the problem of 3D human pose estimation from a single image. We follow a standard two-step pipeline by first detecting the 2D position of the $N$ body joints, and then using these observations to infer 3D pose. For the first step, we use a recent CNN-based detector. For the second step, most existing approaches perform 2$N$-to-3$N$ regression of the Cartesian joint coordinates. We show that more precise pose estimates can be obtained by representing both the 2D and 3D human poses using $N\times N$ distance matrices, and formulating the problem as a 2D-to-3D distance matrix regression. For learning such a regressor we leverage on simple Neural Network architectures, which by construction, enforce positivity and symmetry of the predicted matrices. The approach has also the advantage to naturally handle missing observations and allowing to hypothesize the position of non-observed joints. Quantitative results on Humaneva and Human3.6M datasets demonstrate consistent performance gains over state-of-the-art. Qualitative evaluation on the images in-the-wild of the LSP dataset, using the regressor learned on Human3.6M, reveals very promising generalization results.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09010v1
PDF	http://arxiv.org/pdf/1611.09010v1.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-from-a-single-image
Repo
Framework

Joint Representation Learning of Text and Knowledge for Knowledge Graph Completion


Title	Joint Representation Learning of Text and Knowledge for Knowledge Graph Completion
Authors	Xu Han, Zhiyuan Liu, Maosong Sun
Abstract	Joint representation learning of text and knowledge within a unified semantic space enables us to perform knowledge graph completion more accurately. In this work, we propose a novel framework to embed words, entities and relations into the same continuous vector space. In this model, both entity and relation embeddings are learned by taking knowledge graph and plain text into consideration. In experiments, we evaluate the joint learning model on three tasks including entity prediction, relation prediction and relation classification from text. The experiment results show that our model can significantly and consistently improve the performance on the three tasks as compared with other baselines.
Tasks	Knowledge Graph Completion, Relation Classification, Representation Learning
Published	2016-11-13
URL	http://arxiv.org/abs/1611.04125v1
PDF	http://arxiv.org/pdf/1611.04125v1.pdf
PWC	https://paperswithcode.com/paper/joint-representation-learning-of-text-and
Repo
Framework

Improving energy efficiency and classification accuracy of neuromorphic chips by learning binary synaptic crossbars


Title	Improving energy efficiency and classification accuracy of neuromorphic chips by learning binary synaptic crossbars
Authors	Antonio Jimeno Yepes, Jianbin Tang
Abstract	Deep Neural Networks (DNN) have achieved human level performance in many image analytics tasks but DNNs are mostly deployed to GPU platforms that consume a considerable amount of power. Brain-inspired spiking neuromorphic chips consume low power and can be highly parallelized. However, for deploying DNNs to energy efficient neuromorphic chips the incompatibility between continuous neurons and synaptic weights of traditional DNNs, discrete spiking neurons and synapses of neuromorphic chips has to be overcome. Previous work has achieved this by training a network to learn continuous probabilities and deployment to a neuromorphic architecture by random sampling these probabilities. An ensemble of sampled networks is needed to approximate the performance of the trained network. In the work presented in this paper, we have extended previous research by directly learning binary synaptic crossbars. Results on MNIST show that better performance can be achieved with a small network in one time step (92.7% maximum observed accuracy vs 95.98% accuracy in our work). Top results on a larger network are similar to previously published results (99.42% maximum observed accuracy vs 99.45% accuracy in our work). More importantly, in our work a smaller ensemble is needed to achieve similar or better accuracy than previous work, which translates into significantly decreased energy consumption for both networks. Results of our work are stable since they do not require random sampling.
Tasks
Published	2016-05-25
URL	http://arxiv.org/abs/1605.07740v1
PDF	http://arxiv.org/pdf/1605.07740v1.pdf
PWC	https://paperswithcode.com/paper/improving-energy-efficiency-and
Repo
Framework

Learning and Fusing Multimodal Features from and for Multi-task Facial Computing


Title	Learning and Fusing Multimodal Features from and for Multi-task Facial Computing
Authors	Wei Li, Zhigang Zhu
Abstract	We propose a deep learning-based feature fusion approach for facial computing including face recognition as well as gender, race and age detection. Instead of training a single classifier on face images to classify them based on the features of the person whose face appears in the image, we first train four different classifiers for classifying face images based on race, age, gender and identification (ID). Multi-task features are then extracted from the trained models and cross-task-feature training is conducted which shows the value of fusing multimodal features extracted from multi-tasks. We have found that features trained for one task can be used for other related tasks. More interestingly, the features trained for a task with more classes (e.g. ID) and then used in another task with fewer classes (e.g. race) outperforms the features trained for the other task itself. The final feature fusion is performed by combining the four types of features extracted from the images by the four classifiers. The feature fusion approach improves the classifications accuracy by a 7.2%, 20.1%, 22.2%, 21.8% margin, respectively, for ID, age, race and gender recognition, over the results of single classifiers trained only on their individual features. The proposed method can be applied to applications in which different types of data or features can be extracted.
Tasks	Face Recognition
Published	2016-10-14
URL	http://arxiv.org/abs/1610.04322v1
PDF	http://arxiv.org/pdf/1610.04322v1.pdf
PWC	https://paperswithcode.com/paper/learning-and-fusing-multimodal-features-from
Repo
Framework

Real Time Video Quality Representation Classification of Encrypted HTTP Adaptive Video Streaming - the Case of Safari


Title	Real Time Video Quality Representation Classification of Encrypted HTTP Adaptive Video Streaming - the Case of Safari
Authors	Ran Dubin, Amit Dvir, Ofir Pele, Ofer Hadar, Itay Richman, Ofir Trabelsi
Abstract	The increasing popularity of HTTP adaptive video streaming services has dramatically increased bandwidth requirements on operator networks, which attempt to shape their traffic through Deep Packet Inspection (DPI). However, Google and certain content providers have started to encrypt their video services. As a result, operators often encounter difficulties in shaping their encrypted video traffic via DPI. This highlights the need for new traffic classification methods for encrypted HTTP adaptive video streaming to enable smart traffic shaping. These new methods will have to effectively estimate the quality representation layer and playout buffer. We present a new method and show for the first time that video quality representation classification for (YouTube) encrypted HTTP adaptive streaming is possible. We analyze the performance of this classification method with Safari over HTTPS. Based on a large number of offline and online traffic classification experiments, we demonstrate that it can independently classify, in real time, every video segment into one of the quality representation layers with 97.18% average accuracy.
Tasks
Published	2016-02-01
URL	http://arxiv.org/abs/1602.00489v2
PDF	http://arxiv.org/pdf/1602.00489v2.pdf
PWC	https://paperswithcode.com/paper/real-time-video-quality-representation
Repo
Framework