Paper Group ANR 1650
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding. Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation. On Flow Profile Image for Video Representation. Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing. Emergent Coordination Through Competition. A study for …
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Title | Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding |
Authors | Yuchen Liu, Jiajun Zhang, Hao Xiong, Long Zhou, Zhongjun He, Hua Wu, Haifeng Wang, Chengqing Zong |
Abstract | Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential benefits of lower latency, smaller model size, and less error propagation. However, it is notoriously difficult to implement such a model without transcriptions as intermediate. Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR). However, different tasks in this method cannot utilize information from each other, which limits the improvement. Other works propose a two-stage model where the second model can use the hidden state from the first one, but its cascade manner greatly affects the efficiency of training and inference process. In this paper, we propose a novel interactive attention mechanism which enables ASR and ST to perform synchronously and interactively in a single model. Specifically, the generation of transcriptions and translations not only relies on its previous outputs but also the outputs predicted in the other task. Experiments on TED speech translation corpora have shown that our proposed model can outperform strong baselines on the quality of speech translation and achieve better speech recognition performances as well. |
Tasks | Multi-Task Learning, Speech Recognition |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07240v1 |
https://arxiv.org/pdf/1912.07240v1.pdf | |
PWC | https://paperswithcode.com/paper/synchronous-speech-recognition-and-speech-to |
Repo | |
Framework | |
Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation
Title | Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation |
Authors | Zengming Shen, Yifan Chen, S. Kevin Zhou, Bogdan Georgescu, Xuqi Liu, Thomas S. Huang |
Abstract | The one-to-one mapping is necessary for many bidirectional image-to-image translation applications, such as MRI image synthesis as MRI images are unique to the patient. State-of-the-art approaches for image synthesis from domain X to domain Y learn a convolutional neural network that meticulously maps between the domains. A different network is typically implemented to map along the opposite direction, from Y to X. In this paper, we explore the possibility of only wielding one network for bi-directional image synthesis. In other words, such an autonomous learning network implements a self-inverse function. A self-inverse network shares several distinct advantages: only one network instead of two, better generalization and more restricted parameter space. Most importantly, a self-inverse function guarantees a one-to-one mapping, a property that cannot be guaranteed by earlier approaches that are not self-inverse. The experiments on three datasets show that, compared with the baseline approaches that use two separate models for the image synthesis along two directions, our self-inverse network achieves better synthesis results in terms of standard metrics. Finally, our sensitivity analysis confirms the feasibility of learning a self-inverse function for the bidirectional image translation. |
Tasks | Image Generation, Image-to-Image Translation |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04104v2 |
https://arxiv.org/pdf/1909.04104v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-learning-a-self-inverse-network-for |
Repo | |
Framework | |
On Flow Profile Image for Video Representation
Title | On Flow Profile Image for Video Representation |
Authors | Mohammadreza Babaee, David Full, Gerhard Rigoll |
Abstract | Video representation is a key challenge in many computer vision applications such as video classification, video captioning, and video surveillance. In this paper, we propose a novel approach for video representation that captures meaningful information including motion and appearance from a sequence of video frames and compacts it into a single image. To this end, we compute the optical flow and use it in a least squares optimization to find a new image, the so-called Flow Profile Image (FPI). This image encodes motions as well as foreground appearance information while background information is removed. The quality of this image is validated in activity recognition experiments and the results are compared with other video representation techniques such as dynamic images [1] and eigen images [2]. The experimental results as well as visual quality confirm that FPIs can be successfully used in video processing applications. |
Tasks | Activity Recognition, Optical Flow Estimation, Video Captioning, Video Classification |
Published | 2019-05-12 |
URL | https://arxiv.org/abs/1905.04668v1 |
https://arxiv.org/pdf/1905.04668v1.pdf | |
PWC | https://paperswithcode.com/paper/on-flow-profile-image-for-video |
Repo | |
Framework | |
Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing
Title | Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing |
Authors | Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan |
Abstract | We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + \Vert \theta_{0} \Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + \Vert \theta_{0} \Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincar'{e} inequalities for conditional and marginal densities. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05153v1 |
https://arxiv.org/pdf/1912.05153v1.pdf | |
PWC | https://paperswithcode.com/paper/sampling-for-bayesian-mixture-models-mcmc |
Repo | |
Framework | |
Emergent Coordination Through Competition
Title | Emergent Coordination Through Competition |
Authors | Siqi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, Thore Graepel |
Abstract | We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents’ behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our study highlights several of the challenges encountered in large scale multi-agent training in continuous control. In particular, we demonstrate that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior. We further apply an evaluation scheme, grounded by game theoretic principals, that can assess agent performance in the absence of pre-defined evaluation tasks or human baselines. |
Tasks | Continuous Control |
Published | 2019-02-19 |
URL | http://arxiv.org/abs/1902.07151v2 |
http://arxiv.org/pdf/1902.07151v2.pdf | |
PWC | https://paperswithcode.com/paper/emergent-coordination-through-competition |
Repo | |
Framework | |
A study for Image compression using Re-Pair algorithm
Title | A study for Image compression using Re-Pair algorithm |
Authors | Pasquale De Luca, Vincenzo Maria Russiello, Raffaele Ciro Sannino, Lorenzo Valente |
Abstract | The compression is an important topic in computer science which allows we to storage more amount of data on our data storage. There are several techniques to compress any file. In this manuscript will be described the most important algorithm to compress images such as JPEG and it will be compared with another method to retrieve good reason to not use this method on images. So to compress the text the most encoding technique known is the Huffman Encoding which it will be explained in exhaustive way. In this manuscript will showed how to compute a text compression method on images in particular the method and the reason to choice a determinate image format against the other. The method studied and analyzed in this manuscript is the Re-Pair algorithm which is purely for grammatical context to be compress. At the and it will be showed the good result of this application. |
Tasks | Image Compression |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.10744v3 |
http://arxiv.org/pdf/1901.10744v3.pdf | |
PWC | https://paperswithcode.com/paper/a-study-for-image-compression-using-re-pair |
Repo | |
Framework | |
Low-dimensional Semantic Space: from Text to Word Embedding
Title | Low-dimensional Semantic Space: from Text to Word Embedding |
Authors | Xiaolei Lu, Bin Ni |
Abstract | This article focuses on the study of Word Embedding, a feature-learning technique in Natural Language Processing that maps words or phrases to low-dimensional vectors. Beginning with the linguistic theories concerning contextual similarities - “Distributional Hypothesis” and “Context of Situation”, this article introduces two ways of numerical representation of text: One-hot and Distributed Representation. In addition, this article presents statistical-based Language Models(such as Co-occurrence Matrix and Singular Value Decomposition) as well as Neural Network Language Models (NNLM, such as Continuous Bag-of-Words and Skip-Gram). This article also analyzes how Word Embedding can be applied to the study of word-sense disambiguation and diachronic linguistics. |
Tasks | Word Sense Disambiguation |
Published | 2019-11-03 |
URL | https://arxiv.org/abs/1911.00845v1 |
https://arxiv.org/pdf/1911.00845v1.pdf | |
PWC | https://paperswithcode.com/paper/low-dimensional-semantic-space-from-text-to |
Repo | |
Framework | |
Improving Graph Attention Networks with Large Margin-based Constraints
Title | Improving Graph Attention Networks with Large Margin-based Constraints |
Authors | Guangtao Wang, Rex Ying, Jing Huang, Jure Leskovec |
Abstract | Graph Attention Networks (GATs) are the state-of-the-art neural architecture for representation learning with graphs. GATs learn attention functions that assign weights to nodes so that different nodes have different influences in the feature aggregation steps. In practice, however, induced attention functions are prone to over-fitting due to the increasing number of parameters and the lack of direct supervision on attention weights. GATs also suffer from over-smoothing at the decision boundary of nodes. Here we propose a framework to address their weaknesses via margin-based constraints on attention during training. We first theoretically demonstrate the over-smoothing behavior of GATs and then develop an approach using constraint on the attention weights according to the class boundary and feature aggregation pattern. Furthermore, to alleviate the over-fitting problem, we propose additional constraints on the graph structure. Extensive experiments and ablation studies on common benchmark datasets demonstrate the effectiveness of our method, which leads to significant improvements over the previous state-of-the-art graph attention methods on all datasets. |
Tasks | Representation Learning |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11945v1 |
https://arxiv.org/pdf/1910.11945v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-graph-attention-networks-with-large |
Repo | |
Framework | |
Multiscale Self Attentive Convolutions for Vision and Language Modeling
Title | Multiscale Self Attentive Convolutions for Vision and Language Modeling |
Authors | Oren Barkan |
Abstract | Self attention mechanisms have become a key building block in many state-of-the-art language understanding models. In this paper, we show that the self attention operator can be formulated in terms of 1x1 convolution operations. Following this observation, we propose several novel operators: First, we introduce a 2D version of self attention that is applicable for 2D signals such as images. Second, we present the 1D and 2D Self Attentive Convolutions (SAC) operator that generalizes self attention beyond 1x1 convolutions to 1xm and nxm convolutions, respectively. While 1D and 2D self attention operate on individual words and pixels, SAC operates on m-grams and image patches, respectively. Third, we present a multiscale version of SAC (MSAC) which analyzes the input by employing multiple SAC operators that vary by filter size, in parallel. Finally, we explain how MSAC can be utilized for vision and language modeling, and further harness MSAC to form a cross attentive image similarity machinery. |
Tasks | Language Modelling |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01521v1 |
https://arxiv.org/pdf/1912.01521v1.pdf | |
PWC | https://paperswithcode.com/paper/multiscale-self-attentive-convolutions-for |
Repo | |
Framework | |
Microservices based Framework to Support Interoperable IoT Applications for Enhanced Data Analytics
Title | Microservices based Framework to Support Interoperable IoT Applications for Enhanced Data Analytics |
Authors | Sajjad Ali, Muhammad Aslam Jarwar, Ilyoung Chong |
Abstract | Internet of things is growing with a large number of diverse objects which generate billions of data streams by sensing, actuating and communicating. Management of heterogeneous IoT objects with existing approaches and processing of myriads of data from these objects using monolithic services have become major challenges in developing effective IoT applications. The heterogeneity can be resolved by providing interoperability with semantic virtualization of objects. Moreover, monolithic services can be substituted with modular microservices. This article presents an architecture that enables the development of IoT applications using semantically interoperable microservices and virtual objects. The proposed framework supports analytic features with knowledge-driven and data-driven techniques to provision intelligent services on top of interoperable microservices in Web Objects enabled IoT environment. The knowledge-driven aspects are supported with reasoning on semantic ontology models and the data-driven aspects are realized with machine learning pipeline. The development of service functionalities is supported with microservices to enhance modularity and reusability. To evaluate the proposed framework a proof of concept implementation with a use case is discussed. |
Tasks | |
Published | 2019-10-19 |
URL | https://arxiv.org/abs/1910.08713v1 |
https://arxiv.org/pdf/1910.08713v1.pdf | |
PWC | https://paperswithcode.com/paper/microservices-based-framework-to-support |
Repo | |
Framework | |
Blessing of dimensionality at the edge
Title | Blessing of dimensionality at the edge |
Authors | Ivan Y. Tyukin, Alexander N. Gorban, Alistair A. McEwan, Sepehr Meshkinfamfard |
Abstract | In this paper we present theory and algorithms enabling classes of Artificial Intelligence (AI) systems to continuously and incrementally improve with a-priori quantifiable guarantees - or more specifically remove classification errors - over time. This is distinct from state-of-the-art machine learning, AI, and software approaches. Another feature of this approach is that, in the supervised setting, the computational complexity of training is linear in the number of training samples. At the time of classification, the computational complexity is bounded by few inner product calculations. Moreover, the implementation is shown to be very scalable. This makes it viable for deployment in applications where computational power and memory are limited, such as embedded environments. It enables the possibility for fast on-line optimisation using improved training samples. The approach is based on the concentration of measure effects and stochastic separation theorems. |
Tasks | |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1910.00445v1 |
https://arxiv.org/pdf/1910.00445v1.pdf | |
PWC | https://paperswithcode.com/paper/blessing-of-dimensionality-at-the-edge |
Repo | |
Framework | |
A novel dynamic asset allocation system using Feature Saliency Hidden Markov models for smart beta investing
Title | A novel dynamic asset allocation system using Feature Saliency Hidden Markov models for smart beta investing |
Authors | Elizabeth Fons, Paula Dawson, Jeffrey Yau, Xiao-jun Zeng, John Keane |
Abstract | The financial crisis of 2008 generated interest in more transparent, rules-based strategies for portfolio construction, with Smart beta strategies emerging as a trend among institutional investors. While they perform well in the long run, these strategies often suffer from severe short-term drawdown (peak-to-trough decline) with fluctuating performance across cycles. To address cyclicality and underperformance, we build a dynamic asset allocation system using Hidden Markov Models (HMMs). We test our system across multiple combinations of smart beta strategies and the resulting portfolios show an improvement in risk-adjusted returns, especially on more return oriented portfolios (up to 50$%$ in excess of market annually). In addition, we propose a novel smart beta allocation system based on the Feature Saliency HMM (FSHMM) algorithm that performs feature selection simultaneously with the training of the HMM, to improve regime identification. We evaluate our systematic trading system with real life assets using MSCI indices; further, the results (up to 60$%$ in excess of market annually) show model performance improvement with respect to portfolios built using full feature HMMs. |
Tasks | Feature Selection |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.10849v1 |
http://arxiv.org/pdf/1902.10849v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-dynamic-asset-allocation-system-using |
Repo | |
Framework | |
Communication-Censored Linearized ADMM for Decentralized Consensus Optimization
Title | Communication-Censored Linearized ADMM for Decentralized Consensus Optimization |
Authors | Weiyu Li, Yaohua Liu, Zhi Tian, Qing Ling |
Abstract | In this paper, we propose a communication- and computation-efficient algorithm to solve a convex consensus optimization problem defined over a decentralized network. A remarkable existing algorithm to solve this problem is the alternating direction method of multipliers (ADMM), in which at every iteration every node updates its local variable through combining neighboring variables and solving an optimization subproblem. The proposed algorithm, called as COmmunication-censored Linearized ADMM (COLA), leverages a linearization technique to reduce the iteration-wise computation cost of ADMM and uses a communication-censoring strategy to alleviate the communication cost. To be specific, COLA introduces successive linearization approximations to the local cost functions such that the resultant computation is first-order and light-weight. Since the linearization technique slows down the convergence speed, COLA further adopts the communication-censoring strategy to avoid transmissions of less informative messages. A node is allowed to transmit only if the distance between the current local variable and its previously transmitted one is larger than a censoring threshold. COLA is proven to be convergent when the local cost functions have Lipschitz continuous gradients and the censoring threshold is summable. When the local cost functions are further strongly convex, we establish the linear (sublinear) convergence rate of COLA, given that the censoring threshold linearly (sublinearly) decays to 0. Numerical experiments corroborate with the theoretical findings and demonstrate the satisfactory communication-computation tradeoff of COLA. |
Tasks | |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06724v1 |
https://arxiv.org/pdf/1909.06724v1.pdf | |
PWC | https://paperswithcode.com/paper/communication-censored-linearized-admm-for |
Repo | |
Framework | |
Neural Architecture Refinement: A Practical Way for Avoiding Overfitting in NAS
Title | Neural Architecture Refinement: A Practical Way for Avoiding Overfitting in NAS |
Authors | Yang Jiang, Cong Zhao, Zeyang Dou, Lei Pang |
Abstract | Neural architecture search (NAS) is proposed to automate the architecture design process and attracts overwhelming interest from both academia and industry. However, it is confronted with overfitting issue due to the high-dimensional search space composed by operator selection and skip connection of each layer. This paper explores the architecture overfitting issue in depth based on the reinforcement learning-based NAS framework. We show that the policy gradient method has deep correlations with the cross entropy minimization. Based on this correlation, we further demonstrate that, though the reward of NAS is sparse, the policy gradient method implicitly assign the reward to all operations and skip connections based on the sampling frequency. However, due to the inaccurate reward estimation, curse of dimensionality problem and the hierachical structure of neural networks, reward charateristics for operators and skip connections have intrinsic differences, the assigned rewards for the skip connections are extremely noisy and inaccurate. To alleviate this problem, we propose a neural architecture refinement approach that working with an initial state-of-the-art network structure and only refining its operators. Extensive experiments have demonstrated that the proposed method can achieve fascinated results, including classification, face recognition etc. |
Tasks | Face Recognition, Neural Architecture Search |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02341v3 |
https://arxiv.org/pdf/1905.02341v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-architecture-refinement-a-practical |
Repo | |
Framework | |
Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents
Title | Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents |
Authors | Nikolaos Malandrakis, Minmin Shen, Anuj Goyal, Shuyang Gao, Abhishek Sethi, Angeliki Metallinou |
Abstract | Data availability is a bottleneck during early stages of development of new capabilities for intelligent artificial agents. We investigate the use of text generation techniques to augment the training data of a popular commercial artificial agent across categories of functionality, with the goal of faster development of new functionality. We explore a variety of encoder-decoder generative models for synthetic training data generation and propose using conditional variational auto-encoders. Our approach requires only direct optimization, works well with limited data and significantly outperforms the previous controlled text generation techniques. Further, the generated data are used as additional training samples in an extrinsic intent classification task, leading to improved performance by up to 5% absolute f-score in low-resource cases, validating the usefulness of our approach. |
Tasks | Data Augmentation, Intent Classification, Text Generation |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.03487v1 |
https://arxiv.org/pdf/1910.03487v1.pdf | |
PWC | https://paperswithcode.com/paper/controlled-text-generation-for-data |
Repo | |
Framework | |