Paper Group ANR 619
L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition. VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets. Robust Zero-Shot Cross-Domain Slot Filling with Example Values. Structural Material Property Tailoring Using Deep Neural Networks. A machine learning approach for underwater gas leakage detection. The Min …
L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition
Title | L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition |
Authors | Yuanfeng Song, Di Jiang, Xuefang Zhao, Qian Xu, Raymond Chi-Wing Wong, Lixin Fan, Qiang Yang |
Abstract | Modern Automatic Speech Recognition (ASR) systems primarily rely on scores from an Acoustic Model (AM) and a Language Model (LM) to rescore the N-best lists. With the abundance of recent natural language processing advances, the information utilized by current ASR for evaluating the linguistic and semantic legitimacy of the N-best hypotheses is rather limited. In this paper, we propose a novel Learning-to-Rescore (L2RS) mechanism, which is specialized for utilizing a wide range of textual information from the state-of-the-art NLP models and automatically deciding their weights to rescore the N-best lists for ASR systems. Specifically, we incorporate features including BERT sentence embedding, topic vector, and perplexity scores produced by n-gram LM, topic modeling LM, BERT LM and RNNLM to train a rescoring model. We conduct extensive experiments based on a public dataset, and experimental results show that L2RS outperforms not only traditional rescoring methods but also its deep neural network counterparts by a substantial improvement of 20.67% in terms of NDCG@10. L2RS paves the way for developing more effective rescoring models for ASR. |
Tasks | Language Modelling, Sentence Embedding, Speech Recognition |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11496v1 |
https://arxiv.org/pdf/1910.11496v1.pdf | |
PWC | https://paperswithcode.com/paper/l2rs-a-learning-to-rescore-mechanism-for |
Repo | |
Framework | |
VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets
Title | VizWiz Dataset Browser: A Tool for Visualizing Machine Learning Datasets |
Authors | Nilavra Bhattacharya, Danna Gurari |
Abstract | We present a visualization tool to exhaustively search and browse through a set of large-scale machine learning datasets. Built on the top of the VizWiz dataset, our dataset browser tool has the potential to support and enable a variety of qualitative and quantitative research, and open new directions for visualizing and researching with multimodal information. The tool is publicly available at https://vizwiz.org/browse. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09336v1 |
https://arxiv.org/pdf/1912.09336v1.pdf | |
PWC | https://paperswithcode.com/paper/vizwiz-dataset-browser-a-tool-for-visualizing |
Repo | |
Framework | |
Robust Zero-Shot Cross-Domain Slot Filling with Example Values
Title | Robust Zero-Shot Cross-Domain Slot Filling with Example Values |
Authors | Darsh J Shah, Raghav Gupta, Amir A Fayazi, Dilek Hakkani-Tur |
Abstract | Task-oriented dialog systems increasingly rely on deep learning-based slot filling models, usually needing extensive labeled training data for target domains. Often, however, little to no target domain training data may be available, or the training and target domain schemas may be misaligned, as is common for web forms on similar websites. Prior zero-shot slot filling models use slot descriptions to learn concepts, but are not robust to misaligned schemas. We propose utilizing both the slot description and a small number of examples of slot values, which may be easily available, to learn semantic representations of slots which are transferable across domains and robust to misaligned schemas. Our approach outperforms state-of-the-art models on two multi-domain datasets, especially in the low-data setting. |
Tasks | Slot Filling |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06870v1 |
https://arxiv.org/pdf/1906.06870v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-zero-shot-cross-domain-slot-filling |
Repo | |
Framework | |
Structural Material Property Tailoring Using Deep Neural Networks
Title | Structural Material Property Tailoring Using Deep Neural Networks |
Authors | Oshin Olesegun, Ryan Noraas, Michael Giering, Nagendra Somanath |
Abstract | Advances in robotics, artificial intelligence, and machine learning are ushering in a new age of automation, as machines match or outperform human performance. Machine intelligence can enable businesses to improve performance by reducing errors, improving sensitivity, quality and speed, and in some cases achieving outcomes that go beyond current resource capabilities. Relevant applications include new product architecture design, rapid material characterization, and life-cycle management tied with a digital strategy that will enable efficient development of products from cradle to grave. In addition, there are also challenges to overcome that must be addressed through a major, sustained research effort that is based solidly on both inferential and computational principles applied to design tailoring of functionally optimized structures. Current applications of structural materials in the aerospace industry demand the highest quality control of material microstructure, especially for advanced rotational turbomachinery in aircraft engines in order to have the best tailored material property. In this paper, deep convolutional neural networks were developed to accurately predict processing-structure-property relations from materials microstructures images, surpassing current best practices and modeling efforts. The models automatically learn critical features, without the need for manual specification and/or subjective and expensive image analysis. Further, in combination with generative deep learning models, a framework is proposed to enable rapid material design space exploration and property identification and optimization. The implementation must take account of real-time decision cycles and the trade-offs between speed and accuracy. |
Tasks | |
Published | 2019-01-29 |
URL | http://arxiv.org/abs/1901.10281v1 |
http://arxiv.org/pdf/1901.10281v1.pdf | |
PWC | https://paperswithcode.com/paper/structural-material-property-tailoring-using |
Repo | |
Framework | |
A machine learning approach for underwater gas leakage detection
Title | A machine learning approach for underwater gas leakage detection |
Authors | Paulo Hubert, Linilson Padovese |
Abstract | Underwater gas reservoirs are used in many situations. In particular, Carbon Capture and Storage (CCS) facilities that are currently being developed intend to store greenhouse gases inside geological formations in the deep sea. In these formations, however, the gas might percolate, leaking back to the water and eventually to the atmosphere. The early detection of such leaks is therefore tantamount to any underwater CCS project. In this work, we propose to use Passive Acoustic Monitoring (PAM) and a machine learning approach to design efficient detectors that can signal the presence of a leakage. We use data obtained from simulation experiments off the Brazilian shore, and show that the detection based on classification algorithms achieve good performance. We also propose a smoothing strategy based on Hidden Markov Models in order to incorporate previous knowledge about the probabilities of leakage occurrences. |
Tasks | |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05661v1 |
http://arxiv.org/pdf/1904.05661v1.pdf | |
PWC | https://paperswithcode.com/paper/a-machine-learning-approach-for-underwater |
Repo | |
Framework | |
The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors
Title | The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors |
Authors | William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, Phillip Wang |
Abstract | Though deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. As state-of-the-art reinforcement learning (RL) systems require an exponentially increasing number of samples, their development is restricted to a continually shrinking segment of the AI community. Likewise, many of these systems cannot be applied to real-world problems, where environment samples are expensive. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we introduce the MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments. To that end, we introduce: (1) the Minecraft ObtainDiamond task, a sequential decision making environment requiring long-term planning, hierarchical control, and efficient exploration methods; and (2) the MineRL-v0 dataset, a large-scale collection of over 60 million state-action pairs of human demonstrations that can be resimulated into embodied trajectories with arbitrary modifications to game state and visuals. Participants will compete to develop systems which solve the ObtainDiamond task with a limited number of samples from the environment simulator, Malmo. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment with different game textures. At the end of each round, competitors will submit containerized versions of their learning algorithms and they will then be trained/evaluated from scratch on a hold-out dataset-environment pair for a total of 4-days on a prespecified hardware platform. |
Tasks | Decision Making, Efficient Exploration |
Published | 2019-04-22 |
URL | https://arxiv.org/abs/1904.10079v2 |
https://arxiv.org/pdf/1904.10079v2.pdf | |
PWC | https://paperswithcode.com/paper/the-minerl-competition-on-sample-efficient |
Repo | |
Framework | |
Object tracking in video signals using Compressive Sensing
Title | Object tracking in video signals using Compressive Sensing |
Authors | Marijana Kracunov, Milica Bastica, Jovana Tesovic |
Abstract | Reducing the number of pixels in video signals while maintaining quality needed for recovering the trace of an object using Compressive Sensing is main subject of this work. Quality of frames, from video that contains moving object, are gradually reduced by keeping different number of pixels in each iteration, going from 45% all the way to 1%. Using algorithm for tracing object, results were satisfactory and showed mere changes in trajectory graphs, obtained from original and reconstructed videos. |
Tasks | Compressive Sensing, Object Tracking |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1903.06253v1 |
http://arxiv.org/pdf/1903.06253v1.pdf | |
PWC | https://paperswithcode.com/paper/object-tracking-in-video-signals-using |
Repo | |
Framework | |
One Embedding To Do Them All
Title | One Embedding To Do Them All |
Authors | Loveperteek Singh, Shreya Singh, Sagar Arora, Sumit Borar |
Abstract | Online shopping caters to the needs of millions of users daily. Search, recommendations, personalization have become essential building blocks for serving customer needs. Efficacy of such systems is dependent on a thorough understanding of products and their representation. Multiple information sources and data types provide a complete picture of the product on the platform. While each of these tasks shares some common characteristics, typically product embeddings are trained and used in isolation. In this paper, we propose a framework to combine multiple data sources and learn unified embeddings for products on our e-commerce platform. Our product embeddings are built from three types of data sources - catalog text data, a user’s clickstream session data and product images. We use various techniques like denoising auto-encoders for text, Bayesian personalized ranking (BPR) for clickstream data, Siamese neural network architecture for image data and combined ensemble over the above methods for unified embeddings. Further, we compare and analyze the performance of these embeddings across three unrelated real-world e-commerce tasks specifically checking product attribute coverage, finding similar products and predicting returns. We show that unified product embeddings perform uniformly well across all these tasks. |
Tasks | Denoising |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12120v1 |
https://arxiv.org/pdf/1906.12120v1.pdf | |
PWC | https://paperswithcode.com/paper/one-embedding-to-do-them-all |
Repo | |
Framework | |
Generalization of k-means Related Algorithms
Title | Generalization of k-means Related Algorithms |
Authors | Yiwei Li |
Abstract | This article briefly introduced Arthur and Vassilvitshii’s work on \textbf{k-means++} algorithm and further generalized the center initialization process. It is found that choosing the most distant sample point from the nearest center as new center can mostly have the same effect as the center initialization process in the \textbf{k-means++} algorithm. |
Tasks | |
Published | 2019-03-24 |
URL | http://arxiv.org/abs/1903.10025v1 |
http://arxiv.org/pdf/1903.10025v1.pdf | |
PWC | https://paperswithcode.com/paper/generalization-of-k-means-related-algorithms |
Repo | |
Framework | |
TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis
Title | TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis |
Authors | Chenge Li, Gregory Dobler, Xin Feng, Yao Wang |
Abstract | Object detection and object tracking are usually treated as two separate processes. Significant progress has been made for object detection in 2D images using deep learning networks. The usual tracking-by-detection pipeline for object tracking requires that the object is successfully detected in the first frame and all subsequent frames, and tracking is done by associating detection results. Performing object detection and object tracking through a single network remains a challenging open question. We propose a novel network structure named trackNet that can directly detect a 3D tube enclosing a moving object in a video segment by extending the faster R-CNN framework. A Tube Proposal Network (TPN) inside the trackNet is proposed to predict the objectness of each candidate tube and location parameters specifying the bounding tube. The proposed framework is applicable for detecting and tracking any object and in this paper, we focus on its application for traffic video analysis. The proposed model is trained and tested on UA-DETRAC, a large traffic video dataset available for multi-vehicle detection and tracking, and obtained very promising results. |
Tasks | Object Detection, Object Tracking |
Published | 2019-02-04 |
URL | http://arxiv.org/abs/1902.01466v1 |
http://arxiv.org/pdf/1902.01466v1.pdf | |
PWC | https://paperswithcode.com/paper/tracknet-simultaneous-object-detection-and |
Repo | |
Framework | |
Video Affective Effects Prediction with Multi-modal Fusion and Shot-Long Temporal Context
Title | Video Affective Effects Prediction with Multi-modal Fusion and Shot-Long Temporal Context |
Authors | Jie Zhang, Yin Zhao, Longjun Cai, Chaoping Tu, Wu Wei |
Abstract | Predicting the emotional impact of videos using machine learning is a challenging task considering the varieties of modalities, the complicated temporal contex of the video as well as the time dependency of the emotional states. Feature extraction, multi-modal fusion and temporal context fusion are crucial stages for predicting valence and arousal values in the emotional impact, but have not been successfully exploited. In this paper, we propose a comprehensive framework with novel designs of modal structure and multi-modal fusion strategy. We select the most suitable modalities for valence and arousal tasks respectively and each modal feature is extracted using the modality-specific pre-trained deep model on large generic dataset. Two-time-scale structures, one for the intra-clip and the other for the inter-clip, are proposed to capture the temporal dependency of video content and emotion states. To combine the complementary information from multiple modalities, an effective and efficient residual-based progressive training strategy is proposed. Each modality is step-wisely combined into the multi-modal model, responsible for completing the missing parts of features. With all those improvements above, our proposed prediction framework achieves better performance on the LIRIS-ACCEDE dataset with a large margin compared to the state-of-the-art. |
Tasks | |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.01763v1 |
https://arxiv.org/pdf/1909.01763v1.pdf | |
PWC | https://paperswithcode.com/paper/video-affective-effects-prediction-with-multi |
Repo | |
Framework | |
Deep Networks with Adaptive Nyström Approximation
Title | Deep Networks with Adaptive Nyström Approximation |
Authors | Luc Giffon, Stéphane Ayache, Thierry Artières, Hachem Kadri |
Abstract | Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nystr{"o}m approximation. Our approach is easy and highly flexible. It is compatible with any kernel function and it allows exploiting multiple kernels. We show that our architecture has the same performance than standard architecture on datasets like SVHN and CIFAR100. One benefit of the method lies in its limited number of learnable parameters which makes it particularly suited for small training set sizes, e.g. from 5 to 20 samples per class. |
Tasks | |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.13036v1 |
https://arxiv.org/pdf/1911.13036v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-networks-with-adaptive-nystrom |
Repo | |
Framework | |
Subexponential-Time Algorithms for Sparse PCA
Title | Subexponential-Time Algorithms for Sparse PCA |
Authors | Yunzi Ding, Dmitriy Kunisky, Alexander S. Wein, Afonso S. Bandeira |
Abstract | We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + \lambda xx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + \beta xx^\top)$, respectively). Prior work has shown that when the signal-to-noise ratio ($\lambda$ or $\beta\sqrt{N/n}$, respectively) is a small constant and the fraction of nonzero entries in the planted vector is $\x_0 / n = \rho$, it is possible to recover $x$ in polynomial time if $\rho \lesssim 1/\sqrt{n}$. While it is possible to recover $x$ in exponential time under the weaker condition $\rho \ll 1$, it is believed that polynomial-time recovery is impossible unless $\rho \lesssim 1/\sqrt{n}$. We investigate the precise amount of time required for recovery in the “possible but hard” regime $1/\sqrt{n} \ll \rho \ll 1$ by exploring the power of subexponential-time algorithms, i.e., algorithms running in time $\exp(n^\delta)$ for some constant $\delta \in (0,1)$. For any $1/\sqrt{n} \ll \rho \ll 1$, we give a recovery algorithm with runtime roughly $\exp(\rho^2 n)$, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the $\exp(\rho n)$-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11635v2 |
https://arxiv.org/pdf/1907.11635v2.pdf | |
PWC | https://paperswithcode.com/paper/subexponential-time-algorithms-for-sparse-pca |
Repo | |
Framework | |
Deep Surface Normal Estimation with Hierarchical RGB-D Fusion
Title | Deep Surface Normal Estimation with Hierarchical RGB-D Fusion |
Authors | Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, Yongtian Wang |
Abstract | The growing availability of commodity RGB-D cameras has boosted the applications in the field of scene understanding. However, as a fundamental scene understanding task, surface normal estimation from RGB-D data lacks thorough investigation. In this paper, a hierarchical fusion network with adaptive feature re-weighting is proposed for surface normal estimation from a single RGB-D image. Specifically, the features from color image and depth are successively integrated at multiple scales to ensure global surface smoothness while preserving visually salient details. Meanwhile, the depth features are re-weighted with a confidence map estimated from depth before merging into the color branch to avoid artifacts caused by input depth corruption. Additionally, a hybrid multi-scale loss function is designed to learn accurate normal estimation given noisy ground-truth dataset. Extensive experimental results validate the effectiveness of the fusion strategy and the loss design, outperforming state-of-the-art normal estimation schemes. |
Tasks | Scene Understanding |
Published | 2019-04-06 |
URL | https://arxiv.org/abs/1904.03405v2 |
https://arxiv.org/pdf/1904.03405v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-surface-normal-estimation-with |
Repo | |
Framework | |
Fairness in Machine Learning with Tractable Models
Title | Fairness in Machine Learning with Tractable Models |
Authors | Michael Varley, Vaishak Belle |
Abstract | Machine Learning techniques have become pervasive across a range of different applications, and are now widely used in areas as disparate as recidivism prediction, consumer credit-risk analysis and insurance pricing. The prevalence of machine learning techniques has raised concerns about the potential for learned algorithms to become biased against certain groups. Many definitions have been proposed in the literature, but the fundamental task of reasoning about probabilistic events is a challenging one, owing to the intractability of inference. The focus of this paper is taking steps towards the application of tractable models to fairness. Tractable probabilistic models have emerged that guarantee that conditional marginal can be computed in time linear in the size of the model. In particular, we show that sum product networks (SPNs) enable an effective technique for determining the statistical relationships between protected attributes and other training variables. If a subset of these training variables are found by the SPN to be independent of the training attribute then they can be considered safe' variables, from which we can train a classification model without concern that the resulting classifier will result in disparate outcomes for different demographic groups. Our initial experiments on the German Credit’ data set indicate that this processing technique significantly reduces disparate treatment of male and female credit applicants, with a small reduction in classification accuracy compared to state of the art. We will also motivate the concept of “fairness through percentile equivalence”, a new definition predicated on the notion that individuals at the same percentile of their respective distributions should be treated equivalently, and this prevents unfair penalisation of those individuals who lie at the extremities of their respective distributions. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.07026v2 |
https://arxiv.org/pdf/1905.07026v2.pdf | |
PWC | https://paperswithcode.com/paper/fairness-in-machine-learning-with-tractable |
Repo | |
Framework | |