January 29, 2020

3113 words 15 mins read

Paper Group ANR 619

L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition. VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets. Robust Zero-Shot Cross-Domain Slot Filling with Example Values. Structural Material Property Tailoring Using Deep Neural Networks. A machine learning approach for underwater gas leakage detection. The Min …

L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition


Title	L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition
Authors	Yuanfeng Song, Di Jiang, Xuefang Zhao, Qian Xu, Raymond Chi-Wing Wong, Lixin Fan, Qiang Yang
Abstract	Modern Automatic Speech Recognition (ASR) systems primarily rely on scores from an Acoustic Model (AM) and a Language Model (LM) to rescore the N-best lists. With the abundance of recent natural language processing advances, the information utilized by current ASR for evaluating the linguistic and semantic legitimacy of the N-best hypotheses is rather limited. In this paper, we propose a novel Learning-to-Rescore (L2RS) mechanism, which is specialized for utilizing a wide range of textual information from the state-of-the-art NLP models and automatically deciding their weights to rescore the N-best lists for ASR systems. Specifically, we incorporate features including BERT sentence embedding, topic vector, and perplexity scores produced by n-gram LM, topic modeling LM, BERT LM and RNNLM to train a rescoring model. We conduct extensive experiments based on a public dataset, and experimental results show that L2RS outperforms not only traditional rescoring methods but also its deep neural network counterparts by a substantial improvement of 20.67% in terms of NDCG@10. L2RS paves the way for developing more effective rescoring models for ASR.
Tasks	Language Modelling, Sentence Embedding, Speech Recognition
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11496v1
PDF	https://arxiv.org/pdf/1910.11496v1.pdf
PWC	https://paperswithcode.com/paper/l2rs-a-learning-to-rescore-mechanism-for
Repo
Framework

VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets


Title	VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets
Authors	Nilavra Bhattacharya, Danna Gurari
Abstract	We present a visualization tool to exhaustively search and browse through a set of large-scale machine learning datasets. Built on the top of the VizWiz dataset, our dataset browser tool has the potential to support and enable a variety of qualitative and quantitative research, and open new directions for visualizing and researching with multimodal information. The tool is publicly available at https://vizwiz.org/browse.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09336v1
PDF	https://arxiv.org/pdf/1912.09336v1.pdf
PWC	https://paperswithcode.com/paper/vizwiz-dataset-browser-a-tool-for-visualizing
Repo
Framework

Robust Zero-Shot Cross-Domain Slot Filling with Example Values


Title	Robust Zero-Shot Cross-Domain Slot Filling with Example Values
Authors	Darsh J Shah, Raghav Gupta, Amir A Fayazi, Dilek Hakkani-Tur
Abstract	Task-oriented dialog systems increasingly rely on deep learning-based slot filling models, usually needing extensive labeled training data for target domains. Often, however, little to no target domain training data may be available, or the training and target domain schemas may be misaligned, as is common for web forms on similar websites. Prior zero-shot slot filling models use slot descriptions to learn concepts, but are not robust to misaligned schemas. We propose utilizing both the slot description and a small number of examples of slot values, which may be easily available, to learn semantic representations of slots which are transferable across domains and robust to misaligned schemas. Our approach outperforms state-of-the-art models on two multi-domain datasets, especially in the low-data setting.
Tasks	Slot Filling
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06870v1
PDF	https://arxiv.org/pdf/1906.06870v1.pdf
PWC	https://paperswithcode.com/paper/robust-zero-shot-cross-domain-slot-filling
Repo
Framework

Structural Material Property Tailoring Using Deep Neural Networks


Title	Structural Material Property Tailoring Using Deep Neural Networks
Authors	Oshin Olesegun, Ryan Noraas, Michael Giering, Nagendra Somanath
Abstract	Advances in robotics, artificial intelligence, and machine learning are ushering in a new age of automation, as machines match or outperform human performance. Machine intelligence can enable businesses to improve performance by reducing errors, improving sensitivity, quality and speed, and in some cases achieving outcomes that go beyond current resource capabilities. Relevant applications include new product architecture design, rapid material characterization, and life-cycle management tied with a digital strategy that will enable efficient development of products from cradle to grave. In addition, there are also challenges to overcome that must be addressed through a major, sustained research effort that is based solidly on both inferential and computational principles applied to design tailoring of functionally optimized structures. Current applications of structural materials in the aerospace industry demand the highest quality control of material microstructure, especially for advanced rotational turbomachinery in aircraft engines in order to have the best tailored material property. In this paper, deep convolutional neural networks were developed to accurately predict processing-structure-property relations from materials microstructures images, surpassing current best practices and modeling efforts. The models automatically learn critical features, without the need for manual specification and/or subjective and expensive image analysis. Further, in combination with generative deep learning models, a framework is proposed to enable rapid material design space exploration and property identification and optimization. The implementation must take account of real-time decision cycles and the trade-offs between speed and accuracy.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10281v1
PDF	http://arxiv.org/pdf/1901.10281v1.pdf
PWC	https://paperswithcode.com/paper/structural-material-property-tailoring-using
Repo
Framework

A machine learning approach for underwater gas leakage detection


Title	A machine learning approach for underwater gas leakage detection
Authors	Paulo Hubert, Linilson Padovese
Abstract	Underwater gas reservoirs are used in many situations. In particular, Carbon Capture and Storage (CCS) facilities that are currently being developed intend to store greenhouse gases inside geological formations in the deep sea. In these formations, however, the gas might percolate, leaking back to the water and eventually to the atmosphere. The early detection of such leaks is therefore tantamount to any underwater CCS project. In this work, we propose to use Passive Acoustic Monitoring (PAM) and a machine learning approach to design efficient detectors that can signal the presence of a leakage. We use data obtained from simulation experiments off the Brazilian shore, and show that the detection based on classification algorithms achieve good performance. We also propose a smoothing strategy based on Hidden Markov Models in order to incorporate previous knowledge about the probabilities of leakage occurrences.
Tasks
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05661v1
PDF	http://arxiv.org/pdf/1904.05661v1.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-approach-for-underwater
Repo
Framework

The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors


Title	The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors
Authors	William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, Phillip Wang
Abstract	Though deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. As state-of-the-art reinforcement learning (RL) systems require an exponentially increasing number of samples, their development is restricted to a continually shrinking segment of the AI community. Likewise, many of these systems cannot be applied to real-world problems, where environment samples are expensive. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we introduce the MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments. To that end, we introduce: (1) the Minecraft ObtainDiamond task, a sequential decision making environment requiring long-term planning, hierarchical control, and efficient exploration methods; and (2) the MineRL-v0 dataset, a large-scale collection of over 60 million state-action pairs of human demonstrations that can be resimulated into embodied trajectories with arbitrary modifications to game state and visuals. Participants will compete to develop systems which solve the ObtainDiamond task with a limited number of samples from the environment simulator, Malmo. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment with different game textures. At the end of each round, competitors will submit containerized versions of their learning algorithms and they will then be trained/evaluated from scratch on a hold-out dataset-environment pair for a total of 4-days on a prespecified hardware platform.
Tasks	Decision Making, Efficient Exploration
Published	2019-04-22
URL	https://arxiv.org/abs/1904.10079v2
PDF	https://arxiv.org/pdf/1904.10079v2.pdf
PWC	https://paperswithcode.com/paper/the-minerl-competition-on-sample-efficient
Repo
Framework

Object tracking in video signals using Compressive Sensing


Title	Object tracking in video signals using Compressive Sensing
Authors	Marijana Kracunov, Milica Bastica, Jovana Tesovic
Abstract	Reducing the number of pixels in video signals while maintaining quality needed for recovering the trace of an object using Compressive Sensing is main subject of this work. Quality of frames, from video that contains moving object, are gradually reduced by keeping different number of pixels in each iteration, going from 45% all the way to 1%. Using algorithm for tracing object, results were satisfactory and showed mere changes in trajectory graphs, obtained from original and reconstructed videos.
Tasks	Compressive Sensing, Object Tracking
Published	2019-02-08
URL	http://arxiv.org/abs/1903.06253v1
PDF	http://arxiv.org/pdf/1903.06253v1.pdf
PWC	https://paperswithcode.com/paper/object-tracking-in-video-signals-using
Repo
Framework

One Embedding To Do Them All


Title	One Embedding To Do Them All
Authors	Loveperteek Singh, Shreya Singh, Sagar Arora, Sumit Borar
Abstract	Online shopping caters to the needs of millions of users daily. Search, recommendations, personalization have become essential building blocks for serving customer needs. Efficacy of such systems is dependent on a thorough understanding of products and their representation. Multiple information sources and data types provide a complete picture of the product on the platform. While each of these tasks shares some common characteristics, typically product embeddings are trained and used in isolation. In this paper, we propose a framework to combine multiple data sources and learn unified embeddings for products on our e-commerce platform. Our product embeddings are built from three types of data sources - catalog text data, a user’s clickstream session data and product images. We use various techniques like denoising auto-encoders for text, Bayesian personalized ranking (BPR) for clickstream data, Siamese neural network architecture for image data and combined ensemble over the above methods for unified embeddings. Further, we compare and analyze the performance of these embeddings across three unrelated real-world e-commerce tasks specifically checking product attribute coverage, finding similar products and predicting returns. We show that unified product embeddings perform uniformly well across all these tasks.
Tasks	Denoising
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12120v1
PDF	https://arxiv.org/pdf/1906.12120v1.pdf
PWC	https://paperswithcode.com/paper/one-embedding-to-do-them-all
Repo
Framework


Title	Generalization of k-means Related Algorithms
Authors	Yiwei Li
Abstract	This article briefly introduced Arthur and Vassilvitshii’s work on \textbf{k-means++} algorithm and further generalized the center initialization process. It is found that choosing the most distant sample point from the nearest center as new center can mostly have the same effect as the center initialization process in the \textbf{k-means++} algorithm.
Tasks
Published	2019-03-24
URL	http://arxiv.org/abs/1903.10025v1
PDF	http://arxiv.org/pdf/1903.10025v1.pdf
PWC	https://paperswithcode.com/paper/generalization-of-k-means-related-algorithms
Repo
Framework

TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis


Title	TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis
Authors	Chenge Li, Gregory Dobler, Xin Feng, Yao Wang
Abstract	Object detection and object tracking are usually treated as two separate processes. Significant progress has been made for object detection in 2D images using deep learning networks. The usual tracking-by-detection pipeline for object tracking requires that the object is successfully detected in the first frame and all subsequent frames, and tracking is done by associating detection results. Performing object detection and object tracking through a single network remains a challenging open question. We propose a novel network structure named trackNet that can directly detect a 3D tube enclosing a moving object in a video segment by extending the faster R-CNN framework. A Tube Proposal Network (TPN) inside the trackNet is proposed to predict the objectness of each candidate tube and location parameters specifying the bounding tube. The proposed framework is applicable for detecting and tracking any object and in this paper, we focus on its application for traffic video analysis. The proposed model is trained and tested on UA-DETRAC, a large traffic video dataset available for multi-vehicle detection and tracking, and obtained very promising results.
Tasks	Object Detection, Object Tracking
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01466v1
PDF	http://arxiv.org/pdf/1902.01466v1.pdf
PWC	https://paperswithcode.com/paper/tracknet-simultaneous-object-detection-and
Repo
Framework


Title	Video Affective Effects Prediction with Multi-modal Fusion and Shot-Long Temporal Context
Authors	Jie Zhang, Yin Zhao, Longjun Cai, Chaoping Tu, Wu Wei
Abstract	Predicting the emotional impact of videos using machine learning is a challenging task considering the varieties of modalities, the complicated temporal contex of the video as well as the time dependency of the emotional states. Feature extraction, multi-modal fusion and temporal context fusion are crucial stages for predicting valence and arousal values in the emotional impact, but have not been successfully exploited. In this paper, we propose a comprehensive framework with novel designs of modal structure and multi-modal fusion strategy. We select the most suitable modalities for valence and arousal tasks respectively and each modal feature is extracted using the modality-specific pre-trained deep model on large generic dataset. Two-time-scale structures, one for the intra-clip and the other for the inter-clip, are proposed to capture the temporal dependency of video content and emotion states. To combine the complementary information from multiple modalities, an effective and efficient residual-based progressive training strategy is proposed. Each modality is step-wisely combined into the multi-modal model, responsible for completing the missing parts of features. With all those improvements above, our proposed prediction framework achieves better performance on the LIRIS-ACCEDE dataset with a large margin compared to the state-of-the-art.
Tasks
Published	2019-09-01
URL	https://arxiv.org/abs/1909.01763v1
PDF	https://arxiv.org/pdf/1909.01763v1.pdf
PWC	https://paperswithcode.com/paper/video-affective-effects-prediction-with-multi
Repo
Framework

Deep Networks with Adaptive Nyström Approximation


Title	Deep Networks with Adaptive Nyström Approximation
Authors	Luc Giffon, Stéphane Ayache, Thierry Artières, Hachem Kadri
Abstract	Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nystr{"o}m approximation. Our approach is easy and highly flexible. It is compatible with any kernel function and it allows exploiting multiple kernels. We show that our architecture has the same performance than standard architecture on datasets like SVHN and CIFAR100. One benefit of the method lies in its limited number of learnable parameters which makes it particularly suited for small training set sizes, e.g. from 5 to 20 samples per class.
Tasks
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13036v1
PDF	https://arxiv.org/pdf/1911.13036v1.pdf
PWC	https://paperswithcode.com/paper/deep-networks-with-adaptive-nystrom
Repo
Framework

Subexponential-Time Algorithms for Sparse PCA


Title	Subexponential-Time Algorithms for Sparse PCA
Authors	Yunzi Ding, Dmitriy Kunisky, Alexander S. Wein, Afonso S. Bandeira
Abstract	We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + \lambda xx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + \beta xx^\top)$, respectively). Prior work has shown that when the signal-to-noise ratio ($\lambda$ or $\beta\sqrt{N/n}$, respectively) is a small constant and the fraction of nonzero entries in the planted vector is $\x_0 / n = \rho$, it is possible to recover $x$ in polynomial time if $\rho \lesssim 1/\sqrt{n}$. While it is possible to recover $x$ in exponential time under the weaker condition $\rho \ll 1$, it is believed that polynomial-time recovery is impossible unless $\rho \lesssim 1/\sqrt{n}$. We investigate the precise amount of time required for recovery in the “possible but hard” regime $1/\sqrt{n} \ll \rho \ll 1$ by exploring the power of subexponential-time algorithms, i.e., algorithms running in time $\exp(n^\delta)$ for some constant $\delta \in (0,1)$. For any $1/\sqrt{n} \ll \rho \ll 1$, we give a recovery algorithm with runtime roughly $\exp(\rho^2 n)$, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the $\exp(\rho n)$-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11635v2
PDF	https://arxiv.org/pdf/1907.11635v2.pdf
PWC	https://paperswithcode.com/paper/subexponential-time-algorithms-for-sparse-pca
Repo
Framework

Deep Surface Normal Estimation with Hierarchical RGB-D Fusion


Title	Deep Surface Normal Estimation with Hierarchical RGB-D Fusion
Authors	Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, Yongtian Wang
Abstract	The growing availability of commodity RGB-D cameras has boosted the applications in the field of scene understanding. However, as a fundamental scene understanding task, surface normal estimation from RGB-D data lacks thorough investigation. In this paper, a hierarchical fusion network with adaptive feature re-weighting is proposed for surface normal estimation from a single RGB-D image. Specifically, the features from color image and depth are successively integrated at multiple scales to ensure global surface smoothness while preserving visually salient details. Meanwhile, the depth features are re-weighted with a confidence map estimated from depth before merging into the color branch to avoid artifacts caused by input depth corruption. Additionally, a hybrid multi-scale loss function is designed to learn accurate normal estimation given noisy ground-truth dataset. Extensive experimental results validate the effectiveness of the fusion strategy and the loss design, outperforming state-of-the-art normal estimation schemes.
Tasks	Scene Understanding
Published	2019-04-06
URL	https://arxiv.org/abs/1904.03405v2
PDF	https://arxiv.org/pdf/1904.03405v2.pdf
PWC	https://paperswithcode.com/paper/deep-surface-normal-estimation-with
Repo
Framework

Fairness in Machine Learning with Tractable Models


Title	Fairness in Machine Learning with Tractable Models
Authors	Michael Varley, Vaishak Belle
Abstract	Machine Learning techniques have become pervasive across a range of different applications, and are now widely used in areas as disparate as recidivism prediction, consumer credit-risk analysis and insurance pricing. The prevalence of machine learning techniques has raised concerns about the potential for learned algorithms to become biased against certain groups. Many definitions have been proposed in the literature, but the fundamental task of reasoning about probabilistic events is a challenging one, owing to the intractability of inference. The focus of this paper is taking steps towards the application of tractable models to fairness. Tractable probabilistic models have emerged that guarantee that conditional marginal can be computed in time linear in the size of the model. In particular, we show that sum product networks (SPNs) enable an effective technique for determining the statistical relationships between protected attributes and other training variables. If a subset of these training variables are found by the SPN to be independent of the training attribute then they can be considered `safe' variables, from which we can train a classification model without concern that the resulting classifier will result in disparate outcomes for different demographic groups. Our initial experiments on the` German Credit’ data set indicate that this processing technique significantly reduces disparate treatment of male and female credit applicants, with a small reduction in classification accuracy compared to state of the art. We will also motivate the concept of “fairness through percentile equivalence”, a new definition predicated on the notion that individuals at the same percentile of their respective distributions should be treated equivalently, and this prevents unfair penalisation of those individuals who lie at the extremities of their respective distributions.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.07026v2
PDF	https://arxiv.org/pdf/1905.07026v2.pdf
PWC	https://paperswithcode.com/paper/fairness-in-machine-learning-with-tractable
Repo
Framework