Paper Group ANR 538
Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition. Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC. Efficient and Invariant Convolutional Neural Networks for Dense Prediction. Between Homomorphic Signal Processing and Deep Neural Networks: Construc …
Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition
Title | Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition |
Authors | Vikramjit Mitra, Horacio Franco |
Abstract | Unseen data conditions can inflict serious performance degradation on systems relying on supervised machine learning algorithms. Because data can often be unseen, and because traditional machine learning algorithms are trained in a supervised manner, unsupervised adaptation techniques must be used to adapt the model to the unseen data conditions. However, unsupervised adaptation is often challenging, as one must generate some hypothesis given a model and then use that hypothesis to bootstrap the model to the unseen data conditions. Unfortunately, reliability of such hypotheses is often poor, given the mismatch between the training and testing datasets. In such cases, a model hypothesis confidence measure enables performing data selection for the model adaptation. Underlying this approach is the fact that for unseen data conditions, data variability is introduced to the model, which the model propagates to its output decision, impacting decision reliability. In a fully connected network, this data variability is propagated as distortions from one layer to the next. This work aims to estimate the propagation of such distortion in the form of network activation entropy, which is measured over a short- time running window on the activation from each neuron of a given hidden layer, and these measurements are then used to compute summary entropy. This work demonstrates that such an entropy measure can help to select data for unsupervised model adaptation, resulting in performance gains in speech recognition tasks. Results from standard benchmark speech recognition tasks show that the proposed approach can alleviate the performance degradation experienced under unseen data conditions by iteratively adapting the model to the unseen datas acoustic condition. |
Tasks | Speech Recognition |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09516v1 |
http://arxiv.org/pdf/1708.09516v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-deep-neural-network-activation |
Repo | |
Framework | |
Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC
Title | Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC |
Authors | Marcelo Saval-Calvo, Jorge Azorin-Lopez, Andres Fuster-Guillo, Jose Garcia-Rodriguez |
Abstract | Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor mapping and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can estimate multiple plane models simultaneously from a noisy point cloud using the knowledge extracted from a scene (or an object) in order to reconstruct it accurately. This method comprises two steps: first, it clusters the data into planar faces that preserve some constraints defined by knowledge related to the object (e.g., the angles between faces); and second, the models of the planes are estimated based on these data using a novel multi-constraint RANSAC. We performed experiments in the clustering and RANSAC stages, which showed that the proposed method performed better than state-of-the-art methods. |
Tasks | Object Reconstruction |
Published | 2017-08-03 |
URL | http://arxiv.org/abs/1708.01143v1 |
http://arxiv.org/pdf/1708.01143v1.pdf | |
PWC | https://paperswithcode.com/paper/three-dimensional-planar-model-estimation |
Repo | |
Framework | |
Efficient and Invariant Convolutional Neural Networks for Dense Prediction
Title | Efficient and Invariant Convolutional Neural Networks for Dense Prediction |
Authors | Hongyang Gao, Shuiwang Ji |
Abstract | Convolutional neural networks have shown great success on feature extraction from raw input data such as images. Although convolutional neural networks are invariant to translations on the inputs, they are not invariant to other transformations, including rotation and flip. Recent attempts have been made to incorporate more invariance in image recognition applications, but they are not applicable to dense prediction tasks, such as image segmentation. In this paper, we propose a set of methods based on kernel rotation and flip to enable rotation and flip invariance in convolutional neural networks. The kernel rotation can be achieved on kernels of 3 $\times$ 3, while kernel flip can be applied on kernels of any size. By rotating in eight or four angles, the convolutional layers could produce the corresponding number of feature maps based on eight or four different kernels. By using flip, the convolution layer can produce three feature maps. By combining produced feature maps using maxout, the resource requirement could be significantly reduced while still retain the invariance properties. Experimental results demonstrate that the proposed methods can achieve various invariance at reasonable resource requirements in terms of both memory and time. |
Tasks | Semantic Segmentation |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.09064v1 |
http://arxiv.org/pdf/1711.09064v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-and-invariant-convolutional-neural |
Repo | |
Framework | |
Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription
Title | Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription |
Authors | Li Su |
Abstract | This paper presents a new approach in understanding how deep neural networks (DNNs) work by applying homomorphic signal processing techniques. Focusing on the task of multi-pitch estimation (MPE), this paper demonstrates the equivalence relation between a generalized cepstrum and a DNN in terms of their structures and functionality. Such an equivalence relation, together with pitch perception theories and the recently established rectified-correlations-on-a-sphere (RECOS) filter analysis, provide an alternative way in explaining the role of the nonlinear activation function and the multi-layer structure, both of which exist in a cepstrum and a DNN. To validate the efficacy of this new approach, a new feature designed in the same fashion is proposed for pitch salience function. The new feature outperforms the one-layer spectrum in the MPE task and, as predicted, it addresses the issue of the missing fundamental effect and also achieves better robustness to noise. |
Tasks | |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08231v1 |
http://arxiv.org/pdf/1706.08231v1.pdf | |
PWC | https://paperswithcode.com/paper/between-homomorphic-signal-processing-and |
Repo | |
Framework | |
Mutual Kernel Matrix Completion
Title | Mutual Kernel Matrix Completion |
Authors | Tsuyoshi Kato, Rachelle Rivero |
Abstract | With the huge influx of various data nowadays, extracting knowledge from them has become an interesting but tedious task among data scientists, particularly when the data come in heterogeneous form and have missing information. Many data completion techniques had been introduced, especially in the advent of kernel methods. However, among the many data completion techniques available in the literature, studies about mutually completing several incomplete kernel matrices have not been given much attention yet. In this paper, we present a new method, called Mutual Kernel Matrix Completion (MKMC) algorithm, that tackles this problem of mutually inferring the missing entries of multiple kernel matrices by combining the notions of data fusion and kernel matrix completion, applied on biological data sets to be used for classification task. We first introduced an objective function that will be minimized by exploiting the EM algorithm, which in turn results to an estimate of the missing entries of the kernel matrices involved. The completed kernel matrices are then combined to produce a model matrix that can be used to further improve the obtained estimates. An interesting result of our study is that the E-step and the M-step are given in closed form, which makes our algorithm efficient in terms of time and memory. After completion, the (completed) kernel matrices are then used to train an SVM classifier to test how well the relationships among the entries are preserved. Our empirical results show that the proposed algorithm bested the traditional completion techniques in preserving the relationships among the data points, and in accurately recovering the missing kernel matrix entries. By far, MKMC offers a promising solution to the problem of mutual estimation of a number of relevant incomplete kernel matrices. |
Tasks | Matrix Completion |
Published | 2017-02-14 |
URL | http://arxiv.org/abs/1702.04077v3 |
http://arxiv.org/pdf/1702.04077v3.pdf | |
PWC | https://paperswithcode.com/paper/mutual-kernel-matrix-completion |
Repo | |
Framework | |
Multiple-Kernel Local-Patch Descriptor
Title | Multiple-Kernel Local-Patch Descriptor |
Authors | Arun Mukundan, Giorgos Tolias, Ondrej Chum |
Abstract | We propose a multiple-kernel local-patch descriptor based on efficient match kernels of patch gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch miss-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Even though handcrafted, the proposed method consistently outperforms the state-of-the-art methods on two local patch benchmarks. |
Tasks | |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07825v1 |
http://arxiv.org/pdf/1707.07825v1.pdf | |
PWC | https://paperswithcode.com/paper/multiple-kernel-local-patch-descriptor |
Repo | |
Framework | |
Light Field Video Capture Using a Learning-Based Hybrid Imaging System
Title | Light Field Video Capture Using a Learning-Based Hybrid Imaging System |
Authors | Ting-Chun Wang, Jun-Yan Zhu, Nima Khademi Kalantari, Alexei A. Efros, Ravi Ramamoorthi |
Abstract | Light field cameras have many advantages over traditional cameras, as they allow the user to change various camera settings after capture. However, capturing light fields requires a huge bandwidth to record the data: a modern light field camera can only take three images per second. This prevents current consumer light field cameras from capturing light field videos. Temporal interpolation at such extreme scale (10x, from 3 fps to 30 fps) is infeasible as too much information will be entirely missing between adjacent frames. Instead, we develop a hybrid imaging system, adding another standard video camera to capture the temporal information. Given a 3 fps light field sequence and a standard 30 fps 2D video, our system can then generate a full light field video at 30 fps. We adopt a learning-based approach, which can be decomposed into two steps: spatio-temporal flow estimation and appearance estimation. The flow estimation propagates the angular information from the light field sequence to the 2D video, so we can warp input images to the target view. The appearance estimation then combines these warped images to output the final pixels. The whole process is trained end-to-end using convolutional neural networks. Experimental results demonstrate that our algorithm outperforms current video interpolation methods, enabling consumer light field videography, and making applications such as refocusing and parallax view generation achievable on videos for the first time. |
Tasks | |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.02997v1 |
http://arxiv.org/pdf/1705.02997v1.pdf | |
PWC | https://paperswithcode.com/paper/light-field-video-capture-using-a-learning |
Repo | |
Framework | |
A short variational proof of equivalence between policy gradients and soft Q learning
Title | A short variational proof of equivalence between policy gradients and soft Q learning |
Authors | Pierre H. Richemond, Brendan Maginnis |
Abstract | Two main families of reinforcement learning algorithms, Q-learning and policy gradients, have recently been proven to be equivalent when using a softmax relaxation on one part, and an entropic regularization on the other. We relate this result to the well-known convex duality of Shannon entropy and the softmax function. Such a result is also known as the Donsker-Varadhan formula. This provides a short proof of the equivalence. We then interpret this duality further, and use ideas of convex analysis to prove a new policy inequality relative to soft Q-learning. |
Tasks | Q-Learning |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08650v1 |
http://arxiv.org/pdf/1712.08650v1.pdf | |
PWC | https://paperswithcode.com/paper/a-short-variational-proof-of-equivalence |
Repo | |
Framework | |
Robust Monocular SLAM for Egocentric Videos
Title | Robust Monocular SLAM for Egocentric Videos |
Authors | Suvam Patra, Kartikeya Gupta, Faran Ahmad, Chetan Arora, Subhashis Banerjee |
Abstract | Regardless of the tremendous progress, a truly general purpose pipeline for Simultaneous Localization and Mapping (SLAM) remains a challenge. We investigate the reported failure of state of the art (SOTA) SLAM techniques on egocentric videos. We find that the dominant 3D rotations, low parallax between successive frames, and primarily forward motion in egocentric videos are the most common causes of failures. The incremental nature of SOTA SLAM, in the presence of unreliable pose and 3D estimates in egocentric videos, with no opportunities for global loop closures, generates drifts and leads to the eventual failures of such techniques. Taking inspiration from batch mode Structure from Motion (SFM) techniques, we propose to solve SLAM as an SFM problem over the sliding temporal windows. This makes the problem well constrained. Further, we propose to initialize the camera poses using 2D rotation averaging, followed by translation averaging before structure estimation using bundle adjustment. This helps in stabilizing the camera poses when 3D estimates are not reliable. We show that the proposed SLAM technique, incorporating the two key ideas works successfully for long, shaky egocentric videos where other SOTA techniques have been reported to fail. Qualitative and quantitative comparisons on publicly available egocentric video datasets validate our results. |
Tasks | Simultaneous Localization and Mapping |
Published | 2017-07-18 |
URL | http://arxiv.org/abs/1707.05564v2 |
http://arxiv.org/pdf/1707.05564v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-monocular-slam-for-egocentric-videos |
Repo | |
Framework | |
Neural Personalized Response Generation as Domain Adaptation
Title | Neural Personalized Response Generation as Domain Adaptation |
Authors | Weinan Zhang, Ting Liu, Yifa Wang, Qingfu Zhu |
Abstract | In this paper, we focus on the personalized response generation for conversational systems. Based on the sequence to sequence learning, especially the encoder-decoder framework, we propose a two-phase approach, namely initialization then adaptation, to model the responding style of human and then generate personalized responses. For evaluation, we propose a novel human aided method to evaluate the performance of the personalized response generation models by online real-time conversation and offline human judgement. Moreover, the lexical divergence of the responses generated by the 5 personalized models indicates that the proposed two-phase approach achieves good results on modeling the responding style of human and generating personalized responses for the conversational systems. |
Tasks | Domain Adaptation |
Published | 2017-01-09 |
URL | https://arxiv.org/abs/1701.02073v2 |
https://arxiv.org/pdf/1701.02073v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-personalized-response-generation-as |
Repo | |
Framework | |
Magnetic-Visual Sensor Fusion based Medical SLAM for Endoscopic Capsule Robot
Title | Magnetic-Visual Sensor Fusion based Medical SLAM for Endoscopic Capsule Robot |
Authors | Mehmet Turan, Yasin Almalioglu, Hunter Gilbert, Helder Araujo, Ender Konukoglu, Metin Sitti |
Abstract | A reliable, real-time simultaneous localization and mapping (SLAM) method is crucial for the navigation of actively controlled capsule endoscopy robots. These robots are an emerging, minimally invasive diagnostic and therapeutic technology for use in the gastrointestinal (GI) tract. In this study, we propose a dense, non-rigidly deformable, and real-time map fusion approach for actively controlled endoscopic capsule robot applications. The method combines magnetic and vision based localization, and makes use of frame-to-model fusion and model-to-model loop closure. The performance of the method is demonstrated using an ex-vivo porcine stomach model. Across four trajectories of varying speed and complexity, and across three cameras, the root mean square localization errors range from 0.42 to 1.92 cm, and the root mean square surface reconstruction errors range from 1.23 to 2.39 cm. |
Tasks | Sensor Fusion, Simultaneous Localization and Mapping |
Published | 2017-05-17 |
URL | http://arxiv.org/abs/1705.06196v2 |
http://arxiv.org/pdf/1705.06196v2.pdf | |
PWC | https://paperswithcode.com/paper/magnetic-visual-sensor-fusion-based-medical |
Repo | |
Framework | |
Reti bayesiane per lo studio del fenomeno degli incidenti stradali tra i giovani in Toscana
Title | Reti bayesiane per lo studio del fenomeno degli incidenti stradali tra i giovani in Toscana |
Authors | Filippo Elba, Lisa Gnaulati, Fabio Voeller |
Abstract | This paper aims to analyse adolescents’ road accidents in Tuscany. The analysis is based on the Database Edit of Osservatorio di Epidemiologia della Toscana. Complexity and heterogeneity of Edit’s data represet an interesting scope to apply Machine Learning methods. In particular, in this paper is proposed an analysis based on a Bayesian probabilistic network, used to discover relationships between adolescents’ characteristics and behaviours that are more often associated with an audacious driving style. The probabilistic network developed by this study can be considered a useful starting point for follow up reasearches, aiming to develop a causal network, a tool to limit this phenomenon. |
Tasks | |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07066v1 |
http://arxiv.org/pdf/1710.07066v1.pdf | |
PWC | https://paperswithcode.com/paper/reti-bayesiane-per-lo-studio-del-fenomeno |
Repo | |
Framework | |
Recovering 3D Planar Arrangements from Videos
Title | Recovering 3D Planar Arrangements from Videos |
Authors | Shuai Du, Youyi Zheng |
Abstract | Acquiring 3D geometry of real world objects has various applications in 3D digitization, such as navigation and content generation in virtual environments. Image remains one of the most popular media for such visual tasks due to its simplicity of acquisition. Traditional image-based 3D reconstruction approaches heavily exploit point-to-point correspondence among multiple images to estimate camera motion and 3D geometry. Establishing point-to-point correspondence lies at the center of the 3D reconstruction pipeline, which however is easily prone to errors. In this paper, we propose an optimization framework which traces image points using a novel structure-guided dynamic tracking algorithm and estimates both the camera motion and a 3D structure model by enforcing a set of planar constraints. The key to our method is a structure model represented as a set of planes and their arrangements. Constraints derived from the structure model is used both in the correspondence establishment stage and the bundle adjustment stage in our reconstruction pipeline. Experiments show that our algorithm can effectively localize structure correspondence across dense image frames while faithfully reconstructing the camera motion and the underlying structured 3D model. |
Tasks | 3D Reconstruction |
Published | 2017-01-25 |
URL | http://arxiv.org/abs/1701.07393v1 |
http://arxiv.org/pdf/1701.07393v1.pdf | |
PWC | https://paperswithcode.com/paper/recovering-3d-planar-arrangements-from-videos |
Repo | |
Framework | |
Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding
Title | Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding |
Authors | Erich Schubert, Andreas Spitz, Michael Weiler, Johanna Geiß, Michael Gertz |
Abstract | Many word clouds provide no semantics to the word placement, but use a random layout optimized solely for aesthetic purposes. We propose a novel approach to model word significance and word affinity within a document, and in comparison to a large background corpus. We demonstrate its usefulness for generating more meaningful word clouds as a visual summary of a given document. We then select keywords based on their significance and construct the word cloud based on the derived affinity. Based on a modified t-distributed stochastic neighbor embedding (t-SNE), we generate a semantic word placement. For words that cooccur significantly, we include edges, and cluster the words according to their cooccurrence. For this we designed a scalable and memory-efficient sketch-based approach usable on commodity hardware to aggregate the required corpus statistics needed for normalization, and for identifying keywords as well as significant cooccurences. We empirically validate our approch using a large Wikipedia corpus. |
Tasks | |
Published | 2017-08-11 |
URL | http://arxiv.org/abs/1708.03569v1 |
http://arxiv.org/pdf/1708.03569v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-word-clouds-with-background-corpus |
Repo | |
Framework | |
Ethical Artificial Intelligence - An Open Question
Title | Ethical Artificial Intelligence - An Open Question |
Authors | Alice Pavaloiu, Utku Kose |
Abstract | Artificial Intelligence (AI) is an effective science which employs strong enough approaches, methods, and techniques to solve unsolvable real world based problems. Because of its unstoppable rise towards the future, there are also some discussions about its ethics and safety. Shaping an AI friendly environment for people and a people friendly environment for AI can be a possible answer for finding a shared context of values for both humans and robots. In this context, objective of this paper is to address the ethical issues of AI and explore the moral dilemmas that arise from ethical algorithms, from pre set or acquired values. In addition, the paper will also focus on the subject of AI safety. As general, the paper will briefly analyze the concerns and potential solutions to solving the ethical issues presented and increase readers awareness on AI safety as another related research interest. |
Tasks | |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1706.03021v1 |
http://arxiv.org/pdf/1706.03021v1.pdf | |
PWC | https://paperswithcode.com/paper/ethical-artificial-intelligence-an-open |
Repo | |
Framework | |