October 19, 2019

3154 words 15 mins read

Paper Group ANR 277

Paper Group ANR 277

Low-Latency Neural Speech Translation. Long-time predictive modeling of nonlinear dynamical systems using neural networks. Accurate brain extraction using Active Shape Model and Convolutional Neural Networks. Multiple Models for Recommending Temporal Aspects of Entities. A pooling based scene text proposal technique for scene text reading in the wi …

Low-Latency Neural Speech Translation

Title Low-Latency Neural Speech Translation
Authors Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber, Alex Waibel
Abstract Through the development of neural machine translation, the quality of machine translation systems has been improved significantly. By exploiting advancements in deep learning, systems are now able to better approximate the complex mapping from source sentences to target sentences. But with this ability, new challenges also arise. An example is the translation of partial sentences in low-latency speech translation. Since the model has only seen complete sentences in training, it will always try to generate a complete sentence, though the input may only be a partial sentence. We show that NMT systems can be adapted to scenarios where no task-specific training data is available. Furthermore, this is possible without losing performance on the original training data. We achieve this by creating artificial data and by using multi-task learning. After adaptation, we are able to reduce the number of corrections displayed during incremental output construction by 45%, without a decrease in translation quality.
Tasks Machine Translation, Multi-Task Learning
Published 2018-08-01
URL http://arxiv.org/abs/1808.00491v1
PDF http://arxiv.org/pdf/1808.00491v1.pdf
PWC https://paperswithcode.com/paper/low-latency-neural-speech-translation
Repo
Framework

Long-time predictive modeling of nonlinear dynamical systems using neural networks

Title Long-time predictive modeling of nonlinear dynamical systems using neural networks
Authors Shaowu Pan, Karthik Duraisamy
Abstract We study the use of feedforward neural networks (FNN) to develop models of nonlinear dynamical systems from data. Emphasis is placed on predictions at long times, with limited data availability. Inspired by global stability analysis, and the observation of the strong correlation between the local error and the maximum singular value of the Jacobian of the ANN, we introduce Jacobian regularization in the loss function. This regularization suppresses the sensitivity of the prediction to the local error and is shown to improve accuracy and robustness. Comparison between the proposed approach and sparse polynomial regression is presented in numerical examples ranging from simple ODE systems to nonlinear PDE systems including vortex shedding behind a cylinder, and instability-driven buoyant mixing flow. Furthermore, limitations of feedforward neural networks are highlighted, especially when the training data does not include a low dimensional attractor. Strategies of data augmentation are presented as remedies to address these issues to a certain extent.
Tasks Data Augmentation
Published 2018-05-31
URL http://arxiv.org/abs/1805.12547v5
PDF http://arxiv.org/pdf/1805.12547v5.pdf
PWC https://paperswithcode.com/paper/long-time-predictive-modeling-of-nonlinear
Repo
Framework

Accurate brain extraction using Active Shape Model and Convolutional Neural Networks

Title Accurate brain extraction using Active Shape Model and Convolutional Neural Networks
Authors Nguyen Ho Minh Duy, Nguyen Manh Duy, Mai Thanh Nhat Truong, Pham The Bao, Nguyen Thanh Binh
Abstract Brain extraction or skull stripping is a fundamental procedure in most of neuroimaging processing systems. The performance of this procedure has had a critical impact on the success of neuroimaging analysis. After several years of research and development, brain extraction still remains a challenging problem. In this paper, we propose an effective method for skull stripping in Magnetic Resonance Imaging (MRI) scans named ASM-CNN. Our system is a combination of Active Shape Model (ASM) and Convolutional Neural Network (CNN), taking full advantage of these two methods to achieve remarkable results. Instead of working with 3D structures, we process 2D image sequences in sagittal plane. First, we divide images into different groups such that, in each group, the shapes and structures of brain boundaries have similar appearances. This allows developing precise algorithms for each group in order to produce high performance segmentation results. Second, a modified version of ASM is used to detect the brain boundary in images by utilizing prior knowledge of each group. Finally, CNN and the post-processing methods such as Conditional Random Field, Gaussian Process and some special rules are applied to refine segmentation contour produced by ASM. We compared ASM-CNN with the latest version of five state-of-the-art, publicly available methods, namely BET, BSE, 3DSS, ROBEX and BEAST. The evaluation was carried out by using three public datasets IBSR, LPBA and OASIS. The experimental results show that the proposed method outperforms five states-of-the-art algorithms, surpassing all the other methods by a significant margin in all experiments.
Tasks Skull Stripping
Published 2018-02-05
URL http://arxiv.org/abs/1802.01268v1
PDF http://arxiv.org/pdf/1802.01268v1.pdf
PWC https://paperswithcode.com/paper/accurate-brain-extraction-using-active-shape
Repo
Framework

Multiple Models for Recommending Temporal Aspects of Entities

Title Multiple Models for Recommending Temporal Aspects of Entities
Authors Tu Ngoc Nguyen, Nattiya Kanhabua, Wolfgang Nejdl
Abstract Entity aspect recommendation is an emerging task in semantic search that helps users discover serendipitous and prominent information with respect to an entity, of which salience (e.g., popularity) is the most important factor in previous work. However, entity aspects are temporally dynamic and often driven by events happening over time. For such cases, aspect suggestion based solely on salience features can give unsatisfactory results, for two reasons. First, salience is often accumulated over a long time period and does not account for recency. Second, many aspects related to an event entity are strongly time-dependent. In this paper, we study the task of temporal aspect recommendation for a given entity, which aims at recommending the most relevant aspects and takes into account time in order to improve search experience. We propose a novel event-centric ensemble ranking method that learns from multiple time and type-dependent models and dynamically trades off salience and recency characteristics. Through extensive experiments on real-world query logs, we demonstrate that our method is robust and achieves better effectiveness than competitive baselines.
Tasks
Published 2018-03-21
URL http://arxiv.org/abs/1803.07890v2
PDF http://arxiv.org/pdf/1803.07890v2.pdf
PWC https://paperswithcode.com/paper/multiple-models-for-recommending-temporal
Repo
Framework

A pooling based scene text proposal technique for scene text reading in the wild

Title A pooling based scene text proposal technique for scene text reading in the wild
Authors Dinh NguyenVan, Shijian Lu, Shangxuan Tian, Nizar Ouarti, Mounir Mokhtari
Abstract Automatic reading texts in scenes has attracted increasing interest in recent years as texts often carry rich semantic information that is useful for scene understanding. In this paper, we propose a novel scene text proposal technique aiming for accurate reading texts in scenes. Inspired by the pooling layer in the deep neural network architecture, a pooling based scene text proposal technique is developed. A novel score function is designed which exploits the histogram of oriented gradients and is capable of ranking the proposals according to their probabilities of being text. An end-to-end scene text reading system has also been developed by incorporating the proposed scene text proposal technique where false alarms elimination and words recognition are performed simultaneously. Extensive experiments over several public datasets show that the proposed technique can handle multi-orientation and multi-language scene texts and obtains outstanding proposal performance. The developed end-to-end systems also achieve very competitive scene text spotting and reading performance.
Tasks Scene Understanding, Text Spotting
Published 2018-11-25
URL http://arxiv.org/abs/1811.10003v1
PDF http://arxiv.org/pdf/1811.10003v1.pdf
PWC https://paperswithcode.com/paper/a-pooling-based-scene-text-proposal-technique
Repo
Framework

Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime

Title Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime
Authors Dengxin Dai, Luc Van Gool
Abstract This work addresses the problem of semantic image segmentation of nighttime scenes. Although considerable progress has been made in semantic image segmentation, it is mainly related to daytime scenarios. This paper proposes a novel method to progressive adapt the semantic models trained on daytime scenes, along with large-scale annotations therein, to nighttime scenes via the bridge of twilight time – the time between dawn and sunrise, or between sunset and dusk. The goal of the method is to alleviate the cost of human annotation for nighttime images by transferring knowledge from standard daytime conditions. In addition to the method, a new dataset of road scenes is compiled; it consists of 35,000 images ranging from daytime to twilight time and to nighttime. Also, a subset of the nighttime images are densely annotated for method evaluation. Our experiments show that our method is effective for model adaptation from daytime scenes to nighttime scenes, without using extra human annotation.
Tasks Semantic Segmentation
Published 2018-10-05
URL http://arxiv.org/abs/1810.02575v1
PDF http://arxiv.org/pdf/1810.02575v1.pdf
PWC https://paperswithcode.com/paper/dark-model-adaptation-semantic-image
Repo
Framework

Stochastic quasi-Newton with adaptive step lengths for large-scale problems

Title Stochastic quasi-Newton with adaptive step lengths for large-scale problems
Authors Adrian Wills, Thomas Schön
Abstract We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions.
Tasks
Published 2018-02-12
URL http://arxiv.org/abs/1802.04310v1
PDF http://arxiv.org/pdf/1802.04310v1.pdf
PWC https://paperswithcode.com/paper/stochastic-quasi-newton-with-adaptive-step
Repo
Framework

A Learning-Based Framework for Two-Dimensional Vehicle Maneuver Prediction over V2V Networks

Title A Learning-Based Framework for Two-Dimensional Vehicle Maneuver Prediction over V2V Networks
Authors Hossein Nourkhiz Mahjoub, Amin Tahmasbi-Sarvestani, Hadi Kazemi, Yaser P. Fallah
Abstract Situational awareness in vehicular networks could be substantially improved utilizing reliable trajectory prediction methods. More precise situational awareness, in turn, results in notably better performance of critical safety applications, such as Forward Collision Warning (FCW), as well as comfort applications like Cooperative Adaptive Cruise Control (CACC). Therefore, vehicle trajectory prediction problem needs to be deeply investigated in order to come up with an end to end framework with enough precision required by the safety applications’ controllers. This problem has been tackled in the literature using different methods. However, machine learning, which is a promising and emerging field with remarkable potential for time series prediction, has not been explored enough for this purpose. In this paper, a two-layer neural network-based system is developed which predicts the future values of vehicle parameters, such as velocity, acceleration, and yaw rate, in the first layer and then predicts the two-dimensional, i.e. longitudinal and lateral, trajectory points based on the first layer’s outputs. The performance of the proposed framework has been evaluated in realistic cut-in scenarios from Safety Pilot Model Deployment (SPMD) dataset and the results show a noticeable improvement in the prediction accuracy in comparison with the kinematics model which is the dominant employed model by the automotive industry. Both ideal and nonideal communication circumstances have been investigated for our system evaluation. For non-ideal case, an estimation step is included in the framework before the parameter prediction block to handle the drawbacks of packet drops or sensor failures and reconstruct the time series of vehicle parameters at a desirable frequency.
Tasks Time Series, Time Series Prediction, Trajectory Prediction
Published 2018-08-01
URL http://arxiv.org/abs/1808.00516v1
PDF http://arxiv.org/pdf/1808.00516v1.pdf
PWC https://paperswithcode.com/paper/a-learning-based-framework-for-two
Repo
Framework

Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents

Title Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents
Authors Amir Alansary, Loic Le Folgoc, Ghislain Vaillant, Ozan Oktay, Yuanwei Li, Wenjia Bai, Jonathan Passerat-Palmbach, Ricardo Guerrero, Konstantinos Kamnitsas, Benjamin Hou, Steven McDonagh, Ben Glocker, Bernhard Kainz, Daniel Rueckert
Abstract We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is tedious and operator-dependent. For this task, we employ a multi-scale reinforcement learning (RL) agent framework and extensively evaluate several Deep Q-Network (DQN) based strategies. RL enables a natural learning paradigm by interaction with the environment, which can be used to mimic experienced operators. We evaluate our results using the distance between the anatomical landmarks and detected planes, and the angles between their normal vector and target. The proposed algorithm is assessed on the mid-sagittal and anterior-posterior commissure planes of brain MRI, and the 4-chamber long-axis plane commonly used in cardiac MRI, achieving accuracy of 1.53mm, 1.98mm and 4.84mm, respectively.
Tasks
Published 2018-06-08
URL http://arxiv.org/abs/1806.03228v1
PDF http://arxiv.org/pdf/1806.03228v1.pdf
PWC https://paperswithcode.com/paper/automatic-view-planning-with-multi-scale-deep
Repo
Framework

Rapid Time Series Prediction with a Hardware-Based Reservoir Computer

Title Rapid Time Series Prediction with a Hardware-Based Reservoir Computer
Authors Daniel Canaday, Aaron Griffith, Daniel Gauthier
Abstract Reservoir computing is a neural network approach for processing time-dependent signals that has seen rapid development in recent years. Physical implementations of the technique using optical reservoirs have demonstrated remarkable accuracy and processing speed at benchmark tasks. However, these approaches require an electronic output layer to maintain high performance, which limits their use in tasks such as time-series prediction, where the output is fed back into the reservoir. We present here a reservoir computing scheme that has rapid processing speed both by the reservoir and the output layer. The reservoir is realized by an autonomous, time-delay, Boolean network configured on a field-programmable gate array. We investigate the dynamical properties of the network and observe the fading memory property that is critical for successful reservoir computing. We demonstrate the utility of the technique by training a reservoir to learn the short- and long-term behavior of a chaotic system. We find accuracy comparable to state-of-the-art software approaches of similar network size, but with a superior real-time prediction rate up to 160 MHz.
Tasks Time Series, Time Series Prediction
Published 2018-07-19
URL http://arxiv.org/abs/1807.07627v2
PDF http://arxiv.org/pdf/1807.07627v2.pdf
PWC https://paperswithcode.com/paper/rapid-time-series-prediction-with-a-hardware
Repo
Framework

Fast and robust misalignment correction of Fourier ptychographic microscopy

Title Fast and robust misalignment correction of Fourier ptychographic microscopy
Authors Ao Zhou, Wei Wang, Ni Chen, Edmund Y. Lam, Byoungho Lee, Guohai Situ
Abstract Fourier ptychographi cmicroscopy(FPM) is a newly developed computational imaging technique that can provide gigapixel images with both high resolution (HR) and wide field of view (FOV). However, the positional misalignment of the LED array induces a degradation of the reconstruction, especially in the regions away from the optical axis. In this paper, we propose a robust and fast method to correct the LED misalignment of FPM, termed as misalignment correction for FPM (mcFPM). Although different regions in the FOV have different sensitivity to the LED misalignment, the experimental results show that mcFPM is robust to eliminate the degradation in each region. Compared with the state-of-the-art methods, mcFPM is much faster.
Tasks
Published 2018-02-20
URL http://arxiv.org/abs/1803.00395v1
PDF http://arxiv.org/pdf/1803.00395v1.pdf
PWC https://paperswithcode.com/paper/fast-and-robust-misalignment-correction-of
Repo
Framework

Deep Learning Super-Diffusion in Multiplex Networks

Title Deep Learning Super-Diffusion in Multiplex Networks
Authors Vito M. Leli, Saeed Osat, Timur Tlyachev, Jacob D. Biamonte
Abstract Complex network theory has shown success in understanding the emergent and collective behavior of complex systems [1]. Many real-world complex systems were recently discovered to be more accurately modeled as multiplex networks [2-6]—in which each interaction type is mapped to its own network layer; e.g.~multi-layer transportation networks, coupled social networks, metabolic and regulatory networks, etc. A salient physical phenomena emerging from multiplexity is super-diffusion: exhibited by an accelerated diffusion admitted by the multi-layer structure as compared to any single layer. Theoretically super-diffusion was only known to be predicted using the spectral gap of the full Laplacian of a multiplex network and its interacting layers. Here we turn to machine learning which has developed techniques to recognize, classify, and characterize complex sets of data. We show that modern machine learning architectures, such as fully connected and convolutional neural networks, can classify and predict the presence of super-diffusion in multiplex networks with 94.12% accuracy. Such predictions can be done {\it in situ}, without the need to determine spectral properties of a network.
Tasks
Published 2018-11-09
URL http://arxiv.org/abs/1811.04104v1
PDF http://arxiv.org/pdf/1811.04104v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-super-diffusion-in-multiplex
Repo
Framework

The perceived quality of process discovery tools

Title The perceived quality of process discovery tools
Authors Francis Bru, Jan Claes
Abstract Process discovery has seen a rise in popularity in the last decade for both researchers and businesses. Recent developments mainly focused on the power and the functionalities of the discovery algorithm. While continuous improvement of these functional aspects is very important, non-functional aspects such as visualization and usability are often overlooked. However, these aspects are considered valuable for end-users and play an important part in the experience of these end-users when working with a process discovery tool. A questionnaire has been sent out to give end-users the opportunity to voice their opinion on available process discovery tools and about the state of process discovery as a domain in general. The results of 66 respondents are presented and compared with the answers of 63 respondents that were contacted through one particular software vendor’s employee and customer base (i.e., Celonis).
Tasks
Published 2018-08-13
URL http://arxiv.org/abs/1808.06475v1
PDF http://arxiv.org/pdf/1808.06475v1.pdf
PWC https://paperswithcode.com/paper/the-perceived-quality-of-process-discovery
Repo
Framework

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

Title Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis
Authors Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai
Abstract This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.
Tasks Acoustic Modelling, Speech Synthesis
Published 2018-07-18
URL http://arxiv.org/abs/1807.06736v1
PDF http://arxiv.org/pdf/1807.06736v1.pdf
PWC https://paperswithcode.com/paper/forward-attention-in-sequence-to-sequence
Repo
Framework

Multimodal Densenet

Title Multimodal Densenet
Authors Faisal Mahmood, Ziyun Yang, Thomas Ashley, Nicholas J. Durr
Abstract Humans make accurate decisions by interpreting complex data from multiple sources. Medical diagnostics, in particular, often hinge on human interpretation of multi-modal information. In order for artificial intelligence to make progress in automated, objective, and accurate diagnosis and prognosis, methods to fuse information from multiple medical imaging modalities are required. However, combining information from multiple data sources has several challenges, as current deep learning architectures lack the ability to extract useful representations from multimodal information, and often simple concatenation is used to fuse such information. In this work, we propose Multimodal DenseNet, a novel architecture for fusing multimodal data. Instead of focusing on concatenation or early and late fusion, our proposed architectures fuses information over several layers and gives the model flexibility in how it combines information from multiple sources. We apply this architecture to the challenge of polyp characterization and landmark identification in endoscopy. Features from white light images are fused with features from narrow band imaging or depth maps. This study demonstrates that Multimodal DenseNet outperforms monomodal classification as well as other multimodal fusion techniques by a significant margin on two different datasets.
Tasks
Published 2018-11-18
URL http://arxiv.org/abs/1811.07407v1
PDF http://arxiv.org/pdf/1811.07407v1.pdf
PWC https://paperswithcode.com/paper/multimodal-densenet
Repo
Framework
comments powered by Disqus