Paper Group ANR 129
Distributed Deep Learning Strategies For Automatic Speech Recognition. MMTM: Multimodal Transfer Module for CNN Fusion. A3GAN: An Attribute-aware Attentive Generative Adversarial Network for Face Aging. Leveraging External Knowledge for Out-Of-Vocabulary Entity Labeling. BUOCA: Budget-Optimized Crowd Worker Allocation. Copy-Enhanced Heterogeneous I …
Distributed Deep Learning Strategies For Automatic Speech Recognition
Title | Distributed Deep Learning Strategies For Automatic Speech Recognition |
Authors | Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny |
Abstract | In this paper, we propose and investigate a variety of distributed deep learning strategies for automatic speech recognition (ASR) and evaluate them with a state-of-the-art Long short-term memory (LSTM) acoustic model on the 2000-hour Switchboard (SWB2000), which is one of the most widely used datasets for ASR performance benchmark. We first investigate what are the proper hyper-parameters (e.g., learning rate) to enable the training with sufficiently large batch size without impairing the model accuracy. We then implement various distributed strategies, including Synchronous (SYNC), Asynchronous Decentralized Parallel SGD (ADPSGD) and the hybrid of the two HYBRID, to study their runtime/accuracy trade-off. We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7.6% WER on the Hub5- 2000 Switchboard (SWB) test set and a 13.1% WER on the CallHome (CH) test set. Furthermore, we can train the model using HYBRID in 11.5 hours with 32 NVIDIA V100 GPUs without loss in accuracy. |
Tasks | Speech Recognition |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.04956v1 |
http://arxiv.org/pdf/1904.04956v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-deep-learning-strategies-for |
Repo | |
Framework | |
MMTM: Multimodal Transfer Module for CNN Fusion
Title | MMTM: Multimodal Transfer Module for CNN Fusion |
Authors | Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida |
Abstract | In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end. Due to its simplicity late fusion is still the predominant approach in many state-of-the-art multimodal applications. In this paper, we present a simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks. The propose unit, named Multimodal Transfer Module (MMTM), can be added at different levels of the feature hierarchy, enabling slow modality fusion. Using squeeze and excitation operations, MMTM utilizes the knowledge of multiple modalities to recalibrate the channel-wise features in each CNN stream. Despite other intermediate fusion methods, the proposed module could be used for feature modality fusion in convolution layers with different spatial dimensions. Another advantage of the proposed method is that it could be added among unimodal branches with minimum changes in the their network architectures, allowing each branch to be initialized with existing pretrained weights. Experimental results show that our framework improves the recognition accuracy of well-known multimodal networks. We demonstrate state-of-the-art or competitive performance on four datasets that span the task domains of dynamic hand gesture recognition, speech enhancement, and action recognition with RGB and body joints. |
Tasks | Action Recognition In Videos, Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition, Speech Enhancement |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08670v2 |
https://arxiv.org/pdf/1911.08670v2.pdf | |
PWC | https://paperswithcode.com/paper/mmtm-multimodal-transfer-module-for-cnn |
Repo | |
Framework | |
A3GAN: An Attribute-aware Attentive Generative Adversarial Network for Face Aging
Title | A3GAN: An Attribute-aware Attentive Generative Adversarial Network for Face Aging |
Authors | Yunfan Liu, Qi Li, Zhenan Sun, Tieniu Tan |
Abstract | Face aging, which aims at aesthetically rendering a given face to predict its future appearance, has received significant research attention in recent years. Although great progress has been achieved with the success of Generative Adversarial Networks (GANs) in synthesizing realistic images, most existing GAN-based face aging methods have two main problems: 1) unnatural changes of high-level semantic information (e.g. facial attributes) due to the insufficient utilization of prior knowledge of input faces, and 2) distortions of low-level image content including ghosting artifacts and modifications in age-irrelevant regions. In this paper, we introduce A3GAN, an Attribute-Aware Attentive face aging model to address the above issues. Facial attribute vectors are regarded as the conditional information and embedded into both the generator and discriminator, encouraging synthesized faces to be faithful to attributes of corresponding inputs. To improve the visual fidelity of generation results, we leverage the attention mechanism to restrict modifications to age-related areas and preserve image details. Moreover, the wavelet packet transform is employed to capture textural features at multiple scales in the frequency space. Extensive experimental results demonstrate the effectiveness of our model in synthesizing photorealistic aged face images and achieving state-of-the-art performance on popular face aging datasets. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06531v1 |
https://arxiv.org/pdf/1911.06531v1.pdf | |
PWC | https://paperswithcode.com/paper/a3gan-an-attribute-aware-attentive-generative |
Repo | |
Framework | |
Leveraging External Knowledge for Out-Of-Vocabulary Entity Labeling
Title | Leveraging External Knowledge for Out-Of-Vocabulary Entity Labeling |
Authors | Adrian de Wynter, Lambert Mathias |
Abstract | Dealing with previously unseen slots is a challenging problem in a real-world multi-domain dialogue state tracking task. Other approaches rely on predefined mappings to generate candidate slot keys, as well as their associated values. This, however, may fail when the key, the value, or both, are not seen during training. To address this problem we introduce a neural network that leverages external knowledge bases (KBs) to better classify out-of-vocabulary slot keys and values. This network projects the slot into an attribute space derived from the KB, and, by leveraging similarities in this space, we propose candidate slot keys and values to the dialogue state tracker. We provide extensive experiments that demonstrate that our stratagem can improve upon a previous approach, which relies on predefined candidate mappings. In particular, we evaluate this approach by training a state-of-the-art model with candidates generated from our network, and obtained relative increases of 57.7% and 82.7% in F1 score and accuracy, respectively, for the aforementioned model, when compared to the current candidate generation strategy. |
Tasks | Dialogue State Tracking |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09936v1 |
https://arxiv.org/pdf/1908.09936v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-external-knowledge-for-out-of |
Repo | |
Framework | |
BUOCA: Budget-Optimized Crowd Worker Allocation
Title | BUOCA: Budget-Optimized Crowd Worker Allocation |
Authors | Mehrnoosh Sameki, Sha Lai, Kate K. Mays, Lei Guo, Prakash Ishwar, Margrit Betke |
Abstract | Due to concerns about human error in crowdsourcing, it is standard practice to collect labels for the same data point from multiple internet workers. We here show that the resulting budget can be used more effectively with a flexible worker assignment strategy that asks fewer workers to analyze easy-to-label data and more workers to analyze data that requires extra scrutiny. Our main contribution is to show how the allocations of the number of workers to a task can be computed optimally based on task features alone, without using worker profiles. Our target tasks are delineating cells in microscopy images and analyzing the sentiment toward the 2016 U.S. presidential candidates in tweets. We first propose an algorithm that computes budget-optimized crowd worker allocation (BUOCA). We next train a machine learning system (BUOCA-ML) that predicts an optimal number of crowd workers needed to maximize the accuracy of the labeling. We show that the computed allocation can yield large savings in the crowdsourcing budget (up to 49 percent points) while maintaining labeling accuracy. Finally, we envisage a human-machine system for performing budget-optimized data analysis at a scale beyond the feasibility of crowdsourcing. |
Tasks | |
Published | 2019-01-11 |
URL | http://arxiv.org/abs/1901.06237v1 |
http://arxiv.org/pdf/1901.06237v1.pdf | |
PWC | https://paperswithcode.com/paper/buoca-budget-optimized-crowd-worker |
Repo | |
Framework | |
Copy-Enhanced Heterogeneous Information Learning for Dialogue State Tracking
Title | Copy-Enhanced Heterogeneous Information Learning for Dialogue State Tracking |
Authors | Qingbin Liu, Shizhu He, Kang Liu, Shengping Liu, Jun Zhao |
Abstract | Dialogue state tracking (DST) is an essential component in task-oriented dialogue systems, which estimates user goals at every dialogue turn. However, most previous approaches usually suffer from the following problems. Many discriminative models, especially end-to-end (E2E) models, are difficult to extract unknown values that are not in the candidate ontology; previous generative models, which can extract unknown values from utterances, degrade the performance due to ignoring the semantic information of pre-defined ontology. Besides, previous generative models usually need a hand-crafted list to normalize the generated values. How to integrate the semantic information of pre-defined ontology and dialogue text (heterogeneous texts) to generate unknown values and improve performance becomes a severe challenge. In this paper, we propose a Copy-Enhanced Heterogeneous Information Learning model with multiple encoder-decoder for DST (CEDST), which can effectively generate all possible values including unknown values by copying values from heterogeneous texts. Meanwhile, CEDST can effectively decompose the large state space into several small state spaces through multi-encoder, and employ multi-decoder to make full use of the reduced spaces to generate values. Multi-encoder-decoder architecture can significantly improve performance. Experiments show that CEDST can achieve state-of-the-art results on two datasets and our constructed datasets with many unknown values. |
Tasks | Dialogue State Tracking, Task-Oriented Dialogue Systems |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07705v1 |
https://arxiv.org/pdf/1908.07705v1.pdf | |
PWC | https://paperswithcode.com/paper/190807705 |
Repo | |
Framework | |
Self-Supervised Learning of State Estimation for Manipulating Deformable Linear Objects
Title | Self-Supervised Learning of State Estimation for Manipulating Deformable Linear Objects |
Authors | Mengyuan Yan, Yilin Zhu, Ning Jin, Jeannette Bohg |
Abstract | We demonstrate model-based, visual robot manipulation of linear deformable objects. Our approach is based on a state-space representation of the physical system that the robot aims to control. This choice has multiple advantages, including the ease of incorporating physics priors in the dynamics model and perception model, and the ease of planning manipulation actions. In addition, physical states can naturally represent object instances of different appearances. Therefore, dynamics in the state space can be learned in one setting and directly used in other visually different settings. This is in contrast to dynamics learned in pixel space or latent space, where generalization to visual differences are not guaranteed. Challenges in taking the state-space approach are the estimation of the high-dimensional state of a deformable object from raw images, where annotations are very expensive on real data, and finding a dynamics model that is both accurate, generalizable, and efficient to compute. We are the first to demonstrate self-supervised training of rope state estimation on real images, without requiring expensive annotations. This is achieved by our novel self-supervising learning objective, which is generalizable across a wide range of visual appearances. With estimated rope states, we train a fast and differentiable neural network dynamics model that encodes the physics of mass-spring systems. Our method has a higher accuracy in predicting future states compared to models that do not involve explicit state estimation and do not use any physics prior, while only using 3% of training data. We also show that our approach achieves more efficient manipulation, both in simulation and on a real robot, when used within a model predictive controller. |
Tasks | |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.06283v2 |
https://arxiv.org/pdf/1911.06283v2.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-of-state-estimation |
Repo | |
Framework | |
Thompson Sampling via Local Uncertainty
Title | Thompson Sampling via Local Uncertainty |
Authors | Zhendong Wang, Mingyuan Zhou |
Abstract | Thompson sampling is an efficient algorithm for sequential decision making, which exploits the posterior uncertainty to solve the exploration-exploitation dilemma. There has been significant recent interest in integrating Bayesian neural networks into Thompson sampling. Most of these methods rely on global variable uncertainty for exploration. In this paper, we propose a new probabilistic modeling framework for Thompson sampling, where local latent variable uncertainty is used to sample the mean reward. Variational inference is used to approximate the posterior of the local variable, and semi-implicit structure is further introduced to enhance its expressiveness. Our experimental results on eight contextual bandits benchmark datasets show that Thompson sampling guided by local uncertainty achieves state-of-the-arts performance while having low computational complexity. |
Tasks | Decision Making, Multi-Armed Bandits |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13673v1 |
https://arxiv.org/pdf/1910.13673v1.pdf | |
PWC | https://paperswithcode.com/paper/thompson-sampling-via-local-uncertainty |
Repo | |
Framework | |
Cloud-Based Autonomous Indoor Navigation: A Case Study
Title | Cloud-Based Autonomous Indoor Navigation: A Case Study |
Authors | Uthman Baroudi, M. Alharbi, K. Alhouty, H. Baafeef, K. Alofi |
Abstract | In this case study, we design, integrate and implement a cloud-enabled autonomous robotic navigation system. The system has the following features: map generation and robot coordination via cloud service and video streaming to allow online monitoring and control in case of emergency. The system has been tested to generate a map for a long corridor using two modes: manual and autonomous. The autonomous mode has shown more accurate map. In addition, the field experiments confirm the benefit of offloading the heavy computation to the cloud by significantly shortening the time required to build the map. |
Tasks | |
Published | 2019-02-21 |
URL | http://arxiv.org/abs/1902.08052v1 |
http://arxiv.org/pdf/1902.08052v1.pdf | |
PWC | https://paperswithcode.com/paper/cloud-based-autonomous-indoor-navigation-a |
Repo | |
Framework | |
Deep Ordinal Regression for Pledge Specificity Prediction
Title | Deep Ordinal Regression for Pledge Specificity Prediction |
Authors | Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin |
Abstract | Many pledges are made in the course of an election campaign, forming important corpora for political analysis of campaign strategy and governmental accountability. At present, there are no publicly available annotated datasets of pledges, and most political analyses rely on manual analysis. In this paper we collate a novel dataset of manifestos from eleven Australian federal election cycles, with over 12,000 sentences annotated with specificity (e.g., rhetorical vs.\ detailed pledge) on a fine-grained scale. We propose deep ordinal regression approaches for specificity prediction, under both supervised and semi-supervised settings, and provide empirical results demonstrating the effectiveness of the proposed techniques over several baseline approaches. We analyze the utility of pledge specificity modeling across a spectrum of policy issues in performing ideology prediction, and further provide qualitative analysis in terms of capturing party-specific issue salience across election cycles. |
Tasks | |
Published | 2019-08-31 |
URL | https://arxiv.org/abs/1909.00187v1 |
https://arxiv.org/pdf/1909.00187v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-ordinal-regression-for-pledge |
Repo | |
Framework | |
Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning
Title | Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning |
Authors | Felix Leibfried, Jordi Grau-Moya |
Abstract | Cumulative entropy regularization introduces a regulatory signal to the reinforcement learning (RL) problem that encourages policies with high-entropy actions, which is equivalent to enforcing small deviations from a uniform reference marginal policy. This has been shown to improve exploration and robustness, and it tackles the value overestimation problem. It also leads to a significant performance increase in tabular and high-dimensional settings, as demonstrated via algorithms such as soft Q-learning (SQL) and soft actor-critic (SAC). Cumulative entropy regularization has been extended to optimize over the reference marginal policy instead of keeping it fixed, yielding a regularization that minimizes the mutual information between states and actions. While this has been initially proposed for Markov Decision Processes (MDPs) in tabular settings, it was recently shown that a similar principle leads to significant improvements over vanilla SQL in RL for high-dimensional domains with discrete actions and function approximators. Here, we follow the motivation of mutual-information regularization from an inference perspective and theoretically analyze the corresponding Bellman operator. Inspired by this Bellman operator, we devise a novel mutual-information regularized actor-critic learning (MIRACLE) algorithm for continuous action spaces that optimizes over the reference marginal policy. We empirically validate MIRACLE in the Mujoco robotics simulator, where we demonstrate that it can compete with contemporary RL methods. Most notably, it can improve over the model-free state-of-the-art SAC algorithm which implicitly assumes a fixed reference policy. |
Tasks | Q-Learning |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.05950v1 |
https://arxiv.org/pdf/1909.05950v1.pdf | |
PWC | https://paperswithcode.com/paper/mutual-information-regularization-in-markov |
Repo | |
Framework | |
Rethinking Temporal Object Detection from Robotic Perspectives
Title | Rethinking Temporal Object Detection from Robotic Perspectives |
Authors | Xingyu Chen, Zhengxing Wu, Junzhi Yu, Li Wen |
Abstract | Video object detection (VID) has been vigorously studied for years but almost all literature adopts a static accuracy-based evaluation, i.e., average precision (AP). From a robotic perspective, the importance of recall continuity and localization stability is equal to that of accuracy, but the AP is insufficient to reflect detectors’ performance across time. In this paper, non-reference assessments are proposed for continuity and stability based on object tracklets. These temporal evaluations can serve as supplements to static AP. Further, we develop an online tracklet refinement for improving detectors’ temporal performance through short tracklet suppression, fragment filling, and temporal location fusion. In addition, we propose a small-overlap suppression to extend VID methods to single object tracking (SOT) task so that a flexible SOT-by-detection framework is then formed. Extensive experiments are conducted on ImageNet VID dataset and real-world robotic tasks, where the superiority of our proposed approaches are validated and verified. Codes will be publicly available. |
Tasks | Multi-Object Tracking, Object Detection, Object Tracking, Video Object Detection |
Published | 2019-12-22 |
URL | https://arxiv.org/abs/1912.10406v2 |
https://arxiv.org/pdf/1912.10406v2.pdf | |
PWC | https://paperswithcode.com/paper/continuity-stability-and-integration-novel |
Repo | |
Framework | |
EENA: Efficient Evolution of Neural Architecture
Title | EENA: Efficient Evolution of Neural Architecture |
Authors | Hui Zhu, Zhulin An, Chuanguang Yang, Kaiqiang Xu, Erhu Zhao, Yongjun Xu |
Abstract | Latest algorithms for automatic neural architecture search perform remarkable but are basically directionless in search space and computational expensive in training of every intermediate architecture. In this paper, we propose a method for efficient architecture search called EENA (Efficient Evolution of Neural Architecture). Due to the elaborately designed mutation and crossover operations, the evolution process can be guided by the information have already been learned. Therefore, less computational effort will be required while the searching and training time can be reduced significantly. On CIFAR-10 classification, EENA using minimal computational resources (0.65 GPU-days) can design highly effective neural architecture which achieves 2.56% test error with 8.47M parameters. Furthermore, the best architecture discovered is also transferable for CIFAR-100. |
Tasks | Neural Architecture Search |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.07320v3 |
https://arxiv.org/pdf/1905.07320v3.pdf | |
PWC | https://paperswithcode.com/paper/190507320 |
Repo | |
Framework | |
Founded World Views with Autoepistemic Equilibrium Logic
Title | Founded World Views with Autoepistemic Equilibrium Logic |
Authors | Pedro Cabalar, Jorge Fandinno, Luis Fariñas |
Abstract | Defined by Gelfond in 1991 (G91), epistemic specifications (or programs) are an extension of logic programming under stable models semantics that introducessubjective literals. A subjective literal al-lows checking whether some regular literal is true in all (or in some of) the stable models of the program, being those models collected in a setcalledworld view. One epistemic program may yield several world views but, under the original G91 semantics, some of them resulted from self-supported derivations. During the last eight years, several alternative approaches have been proposed to get rid of these self-supported worldviews. Unfortunately, their success could only be measured by studying their behaviour on a set of common examples in the literature, since no formal property of “self-supportedness” had been defined. To fill this gap, we extend in this paper the idea of unfounded set from standard logic programming to the epistemic case. We define when a world view is founded with respect to some program and propose the foundedness property for any semantics whose world views are always founded. Using counterexamples, we explain that the previous approaches violate foundedness, and proceed to propose a new semantics based on a combination of Moore’s Autoepistemic Logic and Pearce’s Equilibrium Logic. The main result proves that this new semantics precisely captures the set of founded G91 world views. |
Tasks | |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07741v1 |
http://arxiv.org/pdf/1902.07741v1.pdf | |
PWC | https://paperswithcode.com/paper/founded-world-views-with-autoepistemic |
Repo | |
Framework | |
Network Based Pricing for 3D Printing Services in Two-Sided Manufacturing-as-a-Service Marketplace
Title | Network Based Pricing for 3D Printing Services in Two-Sided Manufacturing-as-a-Service Marketplace |
Authors | Deepak Pahwa, Binil Starly |
Abstract | This paper presents approaches to determine a network based pricing for 3D printing services in the context of a two-sided manufacturing-as-a-service marketplace. The intent is to provide cost analytics to enable service bureaus to better compete in the market by moving away from setting ad-hoc and subjective prices. A data mining approach with machine learning methods is used to estimate a price range based on the profile characteristics of 3D printing service suppliers. The model considers factors such as supplier experience, supplier capabilities, customer reviews and ratings from past orders, and scale of operations among others to estimate a price range for suppliers’ services. Data was gathered from existing marketplace websites, which was then used to train and test the model. The model demonstrates an accuracy of 65% for US based suppliers and 59% for Europe based suppliers to classify a supplier’s 3D Printer listing in one of the seven price categories. The improvement over baseline accuracy of 25% demonstrates that machine learning based methods are promising for network based pricing in manufacturing marketplaces. Conventional methodologies for pricing services through activity based costing are inefficient in strategically pricing 3D printing service offering in a connected marketplace. As opposed to arbitrarily determining prices, this work proposes an approach to determine prices through data mining methods to estimate competitive prices. Such tools can be built into online marketplaces to help independent service bureaus to determine service price rates. |
Tasks | |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07673v1 |
https://arxiv.org/pdf/1907.07673v1.pdf | |
PWC | https://paperswithcode.com/paper/network-based-pricing-for-3d-printing |
Repo | |
Framework | |