July 27, 2019

3099 words 15 mins read

Paper Group ANR 538

Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition. Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC. Efficient and Invariant Convolutional Neural Networks for Dense Prediction. Between Homomorphic Signal Processing and Deep Neural Networks: Construc …

Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition


Title	Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition
Authors	Vikramjit Mitra, Horacio Franco
Abstract	Unseen data conditions can inflict serious performance degradation on systems relying on supervised machine learning algorithms. Because data can often be unseen, and because traditional machine learning algorithms are trained in a supervised manner, unsupervised adaptation techniques must be used to adapt the model to the unseen data conditions. However, unsupervised adaptation is often challenging, as one must generate some hypothesis given a model and then use that hypothesis to bootstrap the model to the unseen data conditions. Unfortunately, reliability of such hypotheses is often poor, given the mismatch between the training and testing datasets. In such cases, a model hypothesis confidence measure enables performing data selection for the model adaptation. Underlying this approach is the fact that for unseen data conditions, data variability is introduced to the model, which the model propagates to its output decision, impacting decision reliability. In a fully connected network, this data variability is propagated as distortions from one layer to the next. This work aims to estimate the propagation of such distortion in the form of network activation entropy, which is measured over a short- time running window on the activation from each neuron of a given hidden layer, and these measurements are then used to compute summary entropy. This work demonstrates that such an entropy measure can help to select data for unsupervised model adaptation, resulting in performance gains in speech recognition tasks. Results from standard benchmark speech recognition tasks show that the proposed approach can alleviate the performance degradation experienced under unseen data conditions by iteratively adapting the model to the unseen datas acoustic condition.
Tasks	Speech Recognition
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09516v1
PDF	http://arxiv.org/pdf/1708.09516v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-deep-neural-network-activation
Repo
Framework

Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC


Title	Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC
Authors	Marcelo Saval-Calvo, Jorge Azorin-Lopez, Andres Fuster-Guillo, Jose Garcia-Rodriguez
Abstract	Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor mapping and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can estimate multiple plane models simultaneously from a noisy point cloud using the knowledge extracted from a scene (or an object) in order to reconstruct it accurately. This method comprises two steps: first, it clusters the data into planar faces that preserve some constraints defined by knowledge related to the object (e.g., the angles between faces); and second, the models of the planes are estimated based on these data using a novel multi-constraint RANSAC. We performed experiments in the clustering and RANSAC stages, which showed that the proposed method performed better than state-of-the-art methods.
Tasks	Object Reconstruction
Published	2017-08-03
URL	http://arxiv.org/abs/1708.01143v1
PDF	http://arxiv.org/pdf/1708.01143v1.pdf
PWC	https://paperswithcode.com/paper/three-dimensional-planar-model-estimation
Repo
Framework

Efficient and Invariant Convolutional Neural Networks for Dense Prediction


Title	Efficient and Invariant Convolutional Neural Networks for Dense Prediction
Authors	Hongyang Gao, Shuiwang Ji
Abstract	Convolutional neural networks have shown great success on feature extraction from raw input data such as images. Although convolutional neural networks are invariant to translations on the inputs, they are not invariant to other transformations, including rotation and flip. Recent attempts have been made to incorporate more invariance in image recognition applications, but they are not applicable to dense prediction tasks, such as image segmentation. In this paper, we propose a set of methods based on kernel rotation and flip to enable rotation and flip invariance in convolutional neural networks. The kernel rotation can be achieved on kernels of 3 $\times$ 3, while kernel flip can be applied on kernels of any size. By rotating in eight or four angles, the convolutional layers could produce the corresponding number of feature maps based on eight or four different kernels. By using flip, the convolution layer can produce three feature maps. By combining produced feature maps using maxout, the resource requirement could be significantly reduced while still retain the invariance properties. Experimental results demonstrate that the proposed methods can achieve various invariance at reasonable resource requirements in terms of both memory and time.
Tasks	Semantic Segmentation
Published	2017-11-24
URL	http://arxiv.org/abs/1711.09064v1
PDF	http://arxiv.org/pdf/1711.09064v1.pdf
PWC	https://paperswithcode.com/paper/efficient-and-invariant-convolutional-neural
Repo
Framework

Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription


Title	Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription
Authors	Li Su
Abstract	This paper presents a new approach in understanding how deep neural networks (DNNs) work by applying homomorphic signal processing techniques. Focusing on the task of multi-pitch estimation (MPE), this paper demonstrates the equivalence relation between a generalized cepstrum and a DNN in terms of their structures and functionality. Such an equivalence relation, together with pitch perception theories and the recently established rectified-correlations-on-a-sphere (RECOS) filter analysis, provide an alternative way in explaining the role of the nonlinear activation function and the multi-layer structure, both of which exist in a cepstrum and a DNN. To validate the efficacy of this new approach, a new feature designed in the same fashion is proposed for pitch salience function. The new feature outperforms the one-layer spectrum in the MPE task and, as predicted, it addresses the issue of the missing fundamental effect and also achieves better robustness to noise.
Tasks
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08231v1
PDF	http://arxiv.org/pdf/1706.08231v1.pdf
PWC	https://paperswithcode.com/paper/between-homomorphic-signal-processing-and
Repo
Framework

Mutual Kernel Matrix Completion


Title	Mutual Kernel Matrix Completion
Authors	Tsuyoshi Kato, Rachelle Rivero
Abstract	With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data completion techniques had been introduced, especially in the advent of kernel methods. However, among the many data completion techniques available in the literature, studies about mutually completing several incomplete kernel matrices have not been given much attention yet. In this paper, we present a new method, called Mutual Kernel Matrix Completion (MKMC) algorithm, that tackles this problem of mutually inferring the missing entries of multiple kernel matrices by combining the notions of data fusion and kernel matrix completion, applied on biological data sets to be used for classification task. We first introduced an objective function that will be minimized by exploiting the EM algorithm, which in turn results to an estimate of the missing entries of the kernel matrices involved. The completed kernel matrices are then combined to produce a model matrix that can be used to further improve the obtained estimates. An interesting result of our study is that the E-step and the M-step are given in closed form, which makes our algorithm efficient in terms of time and memory. After completion, the (completed) kernel matrices are then used to train an SVM classifier to test how well the relationships among the entries are preserved. Our empirical results show that the proposed algorithm bested the traditional completion techniques in preserving the relationships among the data points, and in accurately recovering the missing kernel matrix entries. By far, MKMC offers a promising solution to the problem of mutual estimation of a number of relevant incomplete kernel matrices.
Tasks	Matrix Completion
Published	2017-02-14
URL	http://arxiv.org/abs/1702.04077v3
PDF	http://arxiv.org/pdf/1702.04077v3.pdf
PWC	https://paperswithcode.com/paper/mutual-kernel-matrix-completion
Repo
Framework

Multiple-Kernel Local-Patch Descriptor


Title	Multiple-Kernel Local-Patch Descriptor
Authors	Arun Mukundan, Giorgos Tolias, Ondrej Chum
Abstract	We propose a multiple-kernel local-patch descriptor based on efficient match kernels of patch gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch miss-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Even though handcrafted, the proposed method consistently outperforms the state-of-the-art methods on two local patch benchmarks.
Tasks
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07825v1
PDF	http://arxiv.org/pdf/1707.07825v1.pdf
PWC	https://paperswithcode.com/paper/multiple-kernel-local-patch-descriptor
Repo
Framework

Light Field Video Capture Using a Learning-Based Hybrid Imaging System


Title	Light Field Video Capture Using a Learning-Based Hybrid Imaging System
Authors	Ting-Chun Wang, Jun-Yan Zhu, Nima Khademi Kalantari, Alexei A. Efros, Ravi Ramamoorthi
Abstract	Light field cameras have many advantages over traditional cameras, as they allow the user to change various camera settings after capture. However, capturing light fields requires a huge bandwidth to record the data: a modern light field camera can only take three images per second. This prevents current consumer light field cameras from capturing light field videos. Temporal interpolation at such extreme scale (10x, from 3 fps to 30 fps) is infeasible as too much information will be entirely missing between adjacent frames. Instead, we develop a hybrid imaging system, adding another standard video camera to capture the temporal information. Given a 3 fps light field sequence and a standard 30 fps 2D video, our system can then generate a full light field video at 30 fps. We adopt a learning-based approach, which can be decomposed into two steps: spatio-temporal flow estimation and appearance estimation. The flow estimation propagates the angular information from the light field sequence to the 2D video, so we can warp input images to the target view. The appearance estimation then combines these warped images to output the final pixels. The whole process is trained end-to-end using convolutional neural networks. Experimental results demonstrate that our algorithm outperforms current video interpolation methods, enabling consumer light field videography, and making applications such as refocusing and parallax view generation achievable on videos for the first time.
Tasks
Published	2017-05-08
URL	http://arxiv.org/abs/1705.02997v1
PDF	http://arxiv.org/pdf/1705.02997v1.pdf
PWC	https://paperswithcode.com/paper/light-field-video-capture-using-a-learning
Repo
Framework

A short variational proof of equivalence between policy gradients and soft Q learning


Title	A short variational proof of equivalence between policy gradients and soft Q learning
Authors	Pierre H. Richemond, Brendan Maginnis
Abstract	Two main families of reinforcement learning algorithms, Q-learning and policy gradients, have recently been proven to be equivalent when using a softmax relaxation on one part, and an entropic regularization on the other. We relate this result to the well-known convex duality of Shannon entropy and the softmax function. Such a result is also known as the Donsker-Varadhan formula. This provides a short proof of the equivalence. We then interpret this duality further, and use ideas of convex analysis to prove a new policy inequality relative to soft Q-learning.
Tasks	Q-Learning
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08650v1
PDF	http://arxiv.org/pdf/1712.08650v1.pdf
PWC	https://paperswithcode.com/paper/a-short-variational-proof-of-equivalence
Repo
Framework

Robust Monocular SLAM for Egocentric Videos


Title	Robust Monocular SLAM for Egocentric Videos
Authors	Suvam Patra, Kartikeya Gupta, Faran Ahmad, Chetan Arora, Subhashis Banerjee
Abstract	Regardless of the tremendous progress, a truly general purpose pipeline for Simultaneous Localization and Mapping (SLAM) remains a challenge. We investigate the reported failure of state of the art (SOTA) SLAM techniques on egocentric videos. We find that the dominant 3D rotations, low parallax between successive frames, and primarily forward motion in egocentric videos are the most common causes of failures. The incremental nature of SOTA SLAM, in the presence of unreliable pose and 3D estimates in egocentric videos, with no opportunities for global loop closures, generates drifts and leads to the eventual failures of such techniques. Taking inspiration from batch mode Structure from Motion (SFM) techniques, we propose to solve SLAM as an SFM problem over the sliding temporal windows. This makes the problem well constrained. Further, we propose to initialize the camera poses using 2D rotation averaging, followed by translation averaging before structure estimation using bundle adjustment. This helps in stabilizing the camera poses when 3D estimates are not reliable. We show that the proposed SLAM technique, incorporating the two key ideas works successfully for long, shaky egocentric videos where other SOTA techniques have been reported to fail. Qualitative and quantitative comparisons on publicly available egocentric video datasets validate our results.
Tasks	Simultaneous Localization and Mapping
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05564v2
PDF	http://arxiv.org/pdf/1707.05564v2.pdf
PWC	https://paperswithcode.com/paper/robust-monocular-slam-for-egocentric-videos
Repo
Framework

Neural Personalized Response Generation as Domain Adaptation


Title	Neural Personalized Response Generation as Domain Adaptation
Authors	Weinan Zhang, Ting Liu, Yifa Wang, Qingfu Zhu
Abstract	In this paper, we focus on the personalized response generation for conversational systems. Based on the sequence to sequence learning, especially the encoder-decoder framework, we propose a two-phase approach, namely initialization then adaptation, to model the responding style of human and then generate personalized responses. For evaluation, we propose a novel human aided method to evaluate the performance of the personalized response generation models by online real-time conversation and offline human judgement. Moreover, the lexical divergence of the responses generated by the 5 personalized models indicates that the proposed two-phase approach achieves good results on modeling the responding style of human and generating personalized responses for the conversational systems.
Tasks	Domain Adaptation
Published	2017-01-09
URL	https://arxiv.org/abs/1701.02073v2
PDF	https://arxiv.org/pdf/1701.02073v2.pdf
PWC	https://paperswithcode.com/paper/neural-personalized-response-generation-as
Repo
Framework

Magnetic-Visual Sensor Fusion based Medical SLAM for Endoscopic Capsule Robot


Title	Magnetic-Visual Sensor Fusion based Medical SLAM for Endoscopic Capsule Robot
Authors	Mehmet Turan, Yasin Almalioglu, Hunter Gilbert, Helder Araujo, Ender Konukoglu, Metin Sitti
Abstract	A reliable, real-time simultaneous localization and mapping (SLAM) method is crucial for the navigation of actively controlled capsule endoscopy robots. These robots are an emerging, minimally invasive diagnostic and therapeutic technology for use in the gastrointestinal (GI) tract. In this study, we propose a dense, non-rigidly deformable, and real-time map fusion approach for actively controlled endoscopic capsule robot applications. The method combines magnetic and vision based localization, and makes use of frame-to-model fusion and model-to-model loop closure. The performance of the method is demonstrated using an ex-vivo porcine stomach model. Across four trajectories of varying speed and complexity, and across three cameras, the root mean square localization errors range from 0.42 to 1.92 cm, and the root mean square surface reconstruction errors range from 1.23 to 2.39 cm.
Tasks	Sensor Fusion, Simultaneous Localization and Mapping
Published	2017-05-17
URL	http://arxiv.org/abs/1705.06196v2
PDF	http://arxiv.org/pdf/1705.06196v2.pdf
PWC	https://paperswithcode.com/paper/magnetic-visual-sensor-fusion-based-medical
Repo
Framework

Reti bayesiane per lo studio del fenomeno degli incidenti stradali tra i giovani in Toscana


Title	Reti bayesiane per lo studio del fenomeno degli incidenti stradali tra i giovani in Toscana
Authors	Filippo Elba, Lisa Gnaulati, Fabio Voeller
Abstract	This paper aims to analyse adolescents’ road accidents in Tuscany. The analysis is based on the Database Edit of Osservatorio di Epidemiologia della Toscana. Complexity and heterogeneity of Edit’s data represet an interesting scope to apply Machine Learning methods. In particular, in this paper is proposed an analysis based on a Bayesian probabilistic network, used to discover relationships between adolescents’ characteristics and behaviours that are more often associated with an audacious driving style. The probabilistic network developed by this study can be considered a useful starting point for follow up reasearches, aiming to develop a causal network, a tool to limit this phenomenon.
Tasks
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07066v1
PDF	http://arxiv.org/pdf/1710.07066v1.pdf
PWC	https://paperswithcode.com/paper/reti-bayesiane-per-lo-studio-del-fenomeno
Repo
Framework

Recovering 3D Planar Arrangements from Videos


Title	Recovering 3D Planar Arrangements from Videos
Authors	Shuai Du, Youyi Zheng
Abstract	Acquiring 3D geometry of real world objects has various applications in 3D digitization, such as navigation and content generation in virtual environments. Image remains one of the most popular media for such visual tasks due to its simplicity of acquisition. Traditional image-based 3D reconstruction approaches heavily exploit point-to-point correspondence among multiple images to estimate camera motion and 3D geometry. Establishing point-to-point correspondence lies at the center of the 3D reconstruction pipeline, which however is easily prone to errors. In this paper, we propose an optimization framework which traces image points using a novel structure-guided dynamic tracking algorithm and estimates both the camera motion and a 3D structure model by enforcing a set of planar constraints. The key to our method is a structure model represented as a set of planes and their arrangements. Constraints derived from the structure model is used both in the correspondence establishment stage and the bundle adjustment stage in our reconstruction pipeline. Experiments show that our algorithm can effectively localize structure correspondence across dense image frames while faithfully reconstructing the camera motion and the underlying structured 3D model.
Tasks	3D Reconstruction
Published	2017-01-25
URL	http://arxiv.org/abs/1701.07393v1
PDF	http://arxiv.org/pdf/1701.07393v1.pdf
PWC	https://paperswithcode.com/paper/recovering-3d-planar-arrangements-from-videos
Repo
Framework

Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding


Title	Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding
Authors	Erich Schubert, Andreas Spitz, Michael Weiler, Johanna Geiß, Michael Gertz
Abstract	Many word clouds provide no semantics to the word placement, but use a random layout optimized solely for aesthetic purposes. We propose a novel approach to model word significance and word affinity within a document, and in comparison to a large background corpus. We demonstrate its usefulness for generating more meaningful word clouds as a visual summary of a given document. We then select keywords based on their significance and construct the word cloud based on the derived affinity. Based on a modified t-distributed stochastic neighbor embedding (t-SNE), we generate a semantic word placement. For words that cooccur significantly, we include edges, and cluster the words according to their cooccurrence. For this we designed a scalable and memory-efficient sketch-based approach usable on commodity hardware to aggregate the required corpus statistics needed for normalization, and for identifying keywords as well as significant cooccurences. We empirically validate our approch using a large Wikipedia corpus.
Tasks
Published	2017-08-11
URL	http://arxiv.org/abs/1708.03569v1
PDF	http://arxiv.org/pdf/1708.03569v1.pdf
PWC	https://paperswithcode.com/paper/semantic-word-clouds-with-background-corpus
Repo
Framework

Ethical Artificial Intelligence - An Open Question


Title	Ethical Artificial Intelligence - An Open Question
Authors	Alice Pavaloiu, Utku Kose
Abstract	Artificial Intelligence (AI) is an effective science which employs strong enough approaches, methods, and techniques to solve unsolvable real world based problems. Because of its unstoppable rise towards the future, there are also some discussions about its ethics and safety. Shaping an AI friendly environment for people and a people friendly environment for AI can be a possible answer for finding a shared context of values for both humans and robots. In this context, objective of this paper is to address the ethical issues of AI and explore the moral dilemmas that arise from ethical algorithms, from pre set or acquired values. In addition, the paper will also focus on the subject of AI safety. As general, the paper will briefly analyze the concerns and potential solutions to solving the ethical issues presented and increase readers awareness on AI safety as another related research interest.
Tasks
Published	2017-05-16
URL	http://arxiv.org/abs/1706.03021v1
PDF	http://arxiv.org/pdf/1706.03021v1.pdf
PWC	https://paperswithcode.com/paper/ethical-artificial-intelligence-an-open
Repo
Framework