January 27, 2020

3112 words 15 mins read

Paper Group ANR 1258

Paper Group ANR 1258

On using 2D sequence-to-sequence models for speech recognition. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures. The Odds are Odd: A Statistical Test for Detecting Adversarial Examples. Generating Dialogue Agents via Automated Planning. Deep MR Brain Image Super-Resolution Using Spatio-Structural Priors. A surv …

On using 2D sequence-to-sequence models for speech recognition

Title On using 2D sequence-to-sequence models for speech recognition
Authors Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
Abstract Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more explicit alignment processes, like in classical HMM-based modeling. In contrast, here we apply a novel two-dimensional long short-term memory (2DLSTM) architecture to directly model the input/output relation between audio/feature vector sequences and word sequences. The proposed model is an alternative model such that instead of using any type of attention components, we apply a 2DLSTM layer to assimilate the context from both input observations and output transcriptions. The experimental evaluation on the Switchboard 300h automatic speech recognition task shows word error rates for the 2DLSTM model that are competitive to end-to-end attention-based model.
Tasks Speech Recognition
Published 2019-11-20
URL https://arxiv.org/abs/1911.08888v1
PDF https://arxiv.org/pdf/1911.08888v1.pdf
PWC https://paperswithcode.com/paper/on-using-2d-sequence-to-sequence-models-for
Repo
Framework

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

Title End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Authors Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert
Abstract We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance with the supervised dataset alone, semi-supervision improves all models across architectures and loss functions and bridges much of the performance gaps between them. In doing so, we reach a new state-of-the-art for end-to-end acoustic models decoded with an external language model in the standard supervised learning setting, and a new absolute state-of-the-art with semi-supervised training. Finally, we study the effect of leveraging different amounts of unlabeled audio, propose several ways of evaluating the characteristics of unlabeled audio which improve acoustic modeling, and show that acoustic models trained with more audio rely less on external language models.
Tasks End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published 2019-11-19
URL https://arxiv.org/abs/1911.08460v2
PDF https://arxiv.org/pdf/1911.08460v2.pdf
PWC https://paperswithcode.com/paper/end-to-end-asr-from-supervised-to-semi
Repo
Framework

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

Title The Odds are Odd: A Statistical Test for Detecting Adversarial Examples
Authors Kevin Roth, Yannic Kilcher, Thomas Hofmann
Abstract We investigate conditions under which test statistics exist that can reliably detect examples, which have been adversarially manipulated in a white-box attack. These statistics can be easily computed and calibrated by randomly corrupting inputs. They exploit certain anomalies that adversarial attacks introduce, in particular if they follow the paradigm of choosing perturbations optimally under p-norm constraints. Access to the log-odds is the only requirement to defend models. We justify our approach empirically, but also provide conditions under which detectability via the suggested test statistics is guaranteed to be effective. In our experiments, we show that it is even possible to correct test time predictions for adversarial attacks with high accuracy.
Tasks
Published 2019-02-13
URL https://arxiv.org/abs/1902.04818v2
PDF https://arxiv.org/pdf/1902.04818v2.pdf
PWC https://paperswithcode.com/paper/the-odds-are-odd-a-statistical-test-for
Repo
Framework

Generating Dialogue Agents via Automated Planning

Title Generating Dialogue Agents via Automated Planning
Authors Adi Botea, Christian Muise, Shubham Agarwal, Oznur Alkan, Ondrej Bajgar, Elizabeth Daly, Akihiro Kishimoto, Luis Lastras, Radu Marinescu, Josef Ondrej, Pablo Pedemonte, Miroslav Vodolan
Abstract Dialogue systems have many applications such as customer support or question answering. Typically they have been limited to shallow single turn interactions. However more advanced applications such as career coaching or planning a trip require a much more complex multi-turn dialogue. Current limitations of conversational systems have made it difficult to support applications that require personalization, customization and context dependent interactions. We tackle this challenging problem by using domain-independent AI planning to automatically create dialogue plans, customized to guide a dialogue towards achieving a given goal. The input includes a library of atomic dialogue actions, an initial state of the dialogue, and a goal. Dialogue plans are plugged into a dialogue system capable to orchestrate their execution. Use cases demonstrate the viability of the approach. Our work on dialogue planning has been integrated into a product, and it is in the process of being deployed into another.
Tasks Question Answering
Published 2019-02-02
URL http://arxiv.org/abs/1902.00771v1
PDF http://arxiv.org/pdf/1902.00771v1.pdf
PWC https://paperswithcode.com/paper/generating-dialogue-agents-via-automated
Repo
Framework

Deep MR Brain Image Super-Resolution Using Spatio-Structural Priors

Title Deep MR Brain Image Super-Resolution Using Spatio-Structural Priors
Authors Venkateswararao Cherukuri, Tiantong Guo, Steve. J. Schiff, Vishal Monga
Abstract High resolution Magnetic Resonance (MR) images are desired for accurate diagnostics. In practice, image resolution is restricted by factors like hardware and processing constraints. Recently, deep learning methods have been shown to produce compelling state-of-the-art results for image enhancement/super-resolution. Paying particular attention to desired hi-resolution MR image structure, we propose a new regularized network that exploits image priors, namely a low-rank structure and a sharpness prior to enhance deep MR image super-resolution (SR). Our contributions are then incorporating these priors in an analytically tractable fashion \color{black} as well as towards a novel prior guided network architecture that accomplishes the super-resolution task. This is particularly challenging for the low rank prior since the rank is not a differentiable function of the image matrix(and hence the network parameters), an issue we address by pursuing differentiable approximations of the rank. Sharpness is emphasized by the variance of the Laplacian which we show can be implemented by a fixed feedback layer at the output of the network. As a key extension, we modify the fixed feedback (Laplacian) layer by learning a new set of training data driven filters that are optimized for enhanced sharpness. Experiments performed on publicly available MR brain image databases and comparisons against existing state-of-the-art methods show that the proposed prior guided network offers significant practical gains in terms of improved SNR/image quality measures. Because our priors are on output images, the proposed method is versatile and can be combined with a wide variety of existing network architectures to further enhance their performance.
Tasks Image Enhancement, Image Super-Resolution, Super-Resolution
Published 2019-09-10
URL https://arxiv.org/abs/1909.04572v1
PDF https://arxiv.org/pdf/1909.04572v1.pdf
PWC https://paperswithcode.com/paper/deep-mr-brain-image-super-resolution-using
Repo
Framework

A survey of cross-lingual features for zero-shot cross-lingual semantic parsing

Title A survey of cross-lingual features for zero-shot cross-lingual semantic parsing
Authors Jingfeng Yang, Federico Fancellu, Bonnie Webber
Abstract The availability of corpora to train semantic parsers in English has lead to significant advances in the field. Unfortunately, for languages other than English, annotation is scarce and so are developed parsers. We then ask: could a parser trained in English be applied to language that it hasn’t been trained on? To answer this question we explore zero-shot cross-lingual semantic parsing where we train an available coarse-to-fine semantic parser (Liu et al., 2018) using cross-lingual word embeddings and universal dependencies in English and test it on Italian, German and Dutch. Results on the Parallel Meaning Bank - a multilingual semantic graphbank, show that Universal Dependency features significantly boost performance when used in conjunction with other lexical features but modelling the UD structure directly when encoding the input does not.
Tasks Semantic Parsing, Word Embeddings
Published 2019-08-27
URL https://arxiv.org/abs/1908.10461v1
PDF https://arxiv.org/pdf/1908.10461v1.pdf
PWC https://paperswithcode.com/paper/a-survey-of-cross-lingual-features-for-zero
Repo
Framework

Optimized Realization of Bayesian Networks in Reduced Normal Form using Latent Variable Model

Title Optimized Realization of Bayesian Networks in Reduced Normal Form using Latent Variable Model
Authors Giovanni Di Gennaro, Amedeo Buonanno, Francesco A. N. Palmieri
Abstract Bayesian networks in their Factor Graph Reduced Normal Form (FGrn) are a powerful paradigm for implementing inference graphs. Unfortunately, the computational and memory costs of these networks may be considerable, even for relatively small networks, and this is one of the main reasons why these structures have often been underused in practice. In this work, through a detailed algorithmic and structural analysis, various solutions for cost reduction are proposed. An online version of the classic batch learning algorithm is also analyzed, showing very similar results (in an unsupervised context); which is essential even if multilevel structures are to be built. The solutions proposed, together with the possible online learning algorithm, are included in a C++ library that is quite efficient, especially if compared to the direct use of the well-known sum-product and Maximum Likelihood (ML) algorithms. The results are discussed with particular reference to a Latent Variable Model (LVM) structure.
Tasks
Published 2019-01-18
URL http://arxiv.org/abs/1901.06201v1
PDF http://arxiv.org/pdf/1901.06201v1.pdf
PWC https://paperswithcode.com/paper/optimized-realization-of-bayesian-networks-in
Repo
Framework

Rethinking the Item Order in Session-based Recommendation with Graph Neural Networks

Title Rethinking the Item Order in Session-based Recommendation with Graph Neural Networks
Authors Qiu Ruihong, Li Jingjing, Huang Zi, Yin Hongzhi
Abstract Predicting a user’s preference in a short anonymous interaction session instead of long-term history is a challenging problem in the real-life session-based recommendation, e.g., e-commerce and media stream. Recent research of the session-based recommender system mainly focuses on sequential patterns by utilizing the attention mechanism, which is straightforward for the session’s natural sequence sorted by time. However, the user’s preference is much more complicated than a solely consecutive time pattern in the transition of item choices. In this paper, therefore, we study the item transition pattern by constructing a session graph and propose a novel model which collaboratively considers the sequence order and the latent order in the session graph for a session-based recommender system. We formulate the next item recommendation within the session as a graph classification problem. Specifically, we propose a weighted attention graph layer and a Readout function to learn embeddings of items and sessions for the next item recommendation. Extensive experiments have been conducted on two benchmark E-commerce datasets, Yoochoose and Diginetica, and the experimental results show that our model outperforms other state-of-the-art methods.
Tasks Graph Classification, Recommendation Systems, Session-Based Recommendations
Published 2019-11-27
URL https://arxiv.org/abs/1911.11942v1
PDF https://arxiv.org/pdf/1911.11942v1.pdf
PWC https://paperswithcode.com/paper/rethinking-the-item-order-in-session-based
Repo
Framework

Deep Density: circumventing the Kohn-Sham equations via symmetry preserving neural networks

Title Deep Density: circumventing the Kohn-Sham equations via symmetry preserving neural networks
Authors Leonardo Zepeda-Núñez, Yixiao Chen, Jiefu Zhang, Weile Jia, Linfeng Zhang, Lin Lin
Abstract The recently developed Deep Potential [Phys. Rev. Lett. 120, 143001, 2018] is a powerful method to represent general inter-atomic potentials using deep neural networks. The success of Deep Potential rests on the proper treatment of locality and symmetry properties of each component of the network. In this paper, we leverage its network structure to effectively represent the mapping from the atomic configuration to the electron density in Kohn-Sham density function theory (KS-DFT). By directly targeting at the self-consistent electron density, we demonstrate that the adapted network architecture, called the Deep Density, can effectively represent the electron density as the linear combination of contributions from many local clusters. The network is constructed to satisfy the translation, rotation, and permutation symmetries, and is designed to be transferable to different system sizes. We demonstrate that using a relatively small number of training snapshots, Deep Density achieves excellent performance for one-dimensional insulating and metallic systems, as well as systems with mixed insulating and metallic characters. We also demonstrate its performance for real three-dimensional systems, including small organic molecules, as well as extended systems such as water (up to $512$ molecules) and aluminum (up to $256$ atoms).
Tasks
Published 2019-11-27
URL https://arxiv.org/abs/1912.00775v1
PDF https://arxiv.org/pdf/1912.00775v1.pdf
PWC https://paperswithcode.com/paper/deep-density-circumventing-the-kohn-sham
Repo
Framework

Interpolated Convolutional Networks for 3D Point Cloud Understanding

Title Interpolated Convolutional Networks for 3D Point Cloud Understanding
Authors Jiageng Mao, Xiaogang Wang, Hongsheng Li
Abstract Point cloud is an important type of 3D representation. However, directly applying convolutions on point clouds is challenging due to the sparse, irregular and unordered data structure. In this paper, we propose a novel Interpolated Convolution operation, InterpConv, to tackle the point cloud feature learning and understanding problem. The key idea is to utilize a set of discrete kernel weights and interpolate point features to neighboring kernel-weight coordinates by an interpolation function for convolution. A normalization term is introduced to handle neighborhoods of different sparsity levels. Our InterpConv is shown to be permutation and sparsity invariant, and can directly handle irregular inputs. We further design Interpolated Convolutional Neural Networks (InterpCNNs) based on InterpConv layers to handle point cloud recognition tasks including shape classification, object part segmentation and indoor scene semantic parsing. Experiments show that the networks can capture both fine-grained local structures and global shape context information effectively. The proposed approach achieves state-of-the-art performance on public benchmarks including ModelNet40, ShapeNet Parts and S3DIS.
Tasks Semantic Parsing
Published 2019-08-13
URL https://arxiv.org/abs/1908.04512v1
PDF https://arxiv.org/pdf/1908.04512v1.pdf
PWC https://paperswithcode.com/paper/interpolated-convolutional-networks-for-3d
Repo
Framework

Topologically-Guided Color Image Enhancement

Title Topologically-Guided Color Image Enhancement
Authors Junyi Tu, Paul Rosen
Abstract Enhancement is an important step in post-processing digital images for personal use, in medical imaging, and for object recognition. Most existing manual techniques rely on region selection, similarity, and/or thresholding for editing, never really considering the topological structure of the image. In this paper, we leverage the contour tree to extract a hierarchical representation of the topology of an image. We propose 4 topology-aware transfer functions for editing features of the image using local topological properties, instead of global image properties. Finally, we evaluate our approach with grayscale and color images.
Tasks Image Enhancement, Object Recognition
Published 2019-09-03
URL https://arxiv.org/abs/1909.01456v1
PDF https://arxiv.org/pdf/1909.01456v1.pdf
PWC https://paperswithcode.com/paper/topologically-guided-color-image-enhancement
Repo
Framework

Dynamic Power Management for Neuromorphic Many-Core Systems

Title Dynamic Power Management for Neuromorphic Many-Core Systems
Authors Sebastian Hoeppner, Bernhard Vogginger, Yexin Yan, Andreas Dixius, Stefan Scholze, Johannes Partzsch, Felix Neumaerker, Stephan Hartmann, Stefan Schiefer, Georg Ellguth, Love Cederstroem, Luis Plana, Jim Garside, Steve Furber, Christian Mayr
Abstract This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system.
Tasks
Published 2019-03-21
URL http://arxiv.org/abs/1903.08941v1
PDF http://arxiv.org/pdf/1903.08941v1.pdf
PWC https://paperswithcode.com/paper/dynamic-power-management-for-neuromorphic
Repo
Framework

Neighborhood-Enhanced and Time-Aware Model for Session-based Recommendation

Title Neighborhood-Enhanced and Time-Aware Model for Session-based Recommendation
Authors Yang Lv, Liangsheng Zhuang, Pengyu Luo
Abstract Session based recommendation has become one of the research hotpots in the field of recommendation systems due to its highly practical value.Previous deep learning methods mostly focus on the sequential characteristics within the current session,and neglect the context similarity and temporal similarity between sessions which contain abundant collaborative information.In this paper,we propose a novel neural networks framework,namely Neighborhood Enhanced and Time Aware Recommendation Machine(NETA) for session based recommendation. Firstly,we introduce an efficient neighborhood retrieve mechanism to find out similar sessions which includes collaborative information.Then we design a guided attention with time-aware mechanism to extract collaborative representation from neighborhood sessions.Especially,temporal recency between sessions is considered separately.Finally, we design a simple co-attention mechanism to determine the importance of complementary collaborative representation when predicting the next item.Extensive experiments conducted on two real-world datasets demonstrate the effectiveness of our proposed model.
Tasks Recommendation Systems, Session-Based Recommendations
Published 2019-09-25
URL https://arxiv.org/abs/1909.11252v2
PDF https://arxiv.org/pdf/1909.11252v2.pdf
PWC https://paperswithcode.com/paper/neighborhood-enhanced-and-time-aware-model
Repo
Framework

EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera

Title EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera
Authors Lan Xu, Weipeng Xu, Vladislav Golyanik, Marc Habermann, Lu Fang, Christian Theobalt
Abstract The high frame rate is a critical requirement for capturing fast human motions. In this setting, existing markerless image-based methods are constrained by the lighting requirement, the high data bandwidth and the consequent high computation overhead. In this paper, we propose EventCap — the first approach for 3D capturing of high-speed human motions using a single event camera. Our method combines model-based optimization and CNN-based human pose detection to capture high-frequency motion details and to reduce the drifting in the tracking. As a result, we can capture fast motions at millisecond resolution with significantly higher data efficiency than using high frame rate videos. Experiments on our new event-based fast human motion dataset demonstrate the effectiveness and accuracy of our method, as well as its robustness to challenging lighting conditions.
Tasks
Published 2019-08-30
URL https://arxiv.org/abs/1908.11505v1
PDF https://arxiv.org/pdf/1908.11505v1.pdf
PWC https://paperswithcode.com/paper/eventcap-monocular-3d-capture-of-high-speed
Repo
Framework

Performance Evaluation of Histogram Equalization and Fuzzy image Enhancement Techniques on Low Contrast Images

Title Performance Evaluation of Histogram Equalization and Fuzzy image Enhancement Techniques on Low Contrast Images
Authors E Onyedinma, I Onyenwe, H Inyiama
Abstract Image enhancement aims at improving the information content of original image for a specific purpose. This purpose could be for visual interpretation or for effective extraction of required details. Nevertheless, some acquired images are often associated with pixels of low dynamic range and as such result in low contrast images. Enhancing the contrast therefore tends to increase the dynamic range of the gray levels in the acquired image so as to span the full intensity range. Techniques such as Histogram Equalization (HE) and fuzzy technique can be adopted for contrast enhancement. HE adjusts the contrast of an input image by modifying the intensity distribution of its histogram. It is characterized by providing a global approach to image enhancement, computationally fast and easy to implement approach but can introduce unnatural artifacts and other undesirable elements to the resulting image. Fuzzy technique on its part enhances image by mapping the image gray level intensities into a fuzzy plane using membership functions; modifying the membership functions as desired and mapping back into the gray level plane. Thus, details at desired areas can be enhanced at the expense of increase in computational cost. This paper explores the effect of the use of HE and fuzzy technique to enhance low contrast images. Their performances are evaluated using the Mean squared error (MSE), Peak to signal noise ratio (PSNR), entropy and Absolute mean brightness error (AMBE).
Tasks Image Enhancement
Published 2019-09-01
URL https://arxiv.org/abs/1909.03957v1
PDF https://arxiv.org/pdf/1909.03957v1.pdf
PWC https://paperswithcode.com/paper/performance-evaluation-of-histogram
Repo
Framework
comments powered by Disqus