October 19, 2019

3154 words 15 mins read

Paper Group ANR 277

Low-Latency Neural Speech Translation. Long-time predictive modeling of nonlinear dynamical systems using neural networks. Accurate brain extraction using Active Shape Model and Convolutional Neural Networks. Multiple Models for Recommending Temporal Aspects of Entities. A pooling based scene text proposal technique for scene text reading in the wi …

Low-Latency Neural Speech Translation


Title	Low-Latency Neural Speech Translation
Authors	Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber, Alex Waibel
Abstract	Through the development of neural machine translation, the quality of machine translation systems has been improved significantly. By exploiting advancements in deep learning, systems are now able to better approximate the complex mapping from source sentences to target sentences. But with this ability, new challenges also arise. An example is the translation of partial sentences in low-latency speech translation. Since the model has only seen complete sentences in training, it will always try to generate a complete sentence, though the input may only be a partial sentence. We show that NMT systems can be adapted to scenarios where no task-specific training data is available. Furthermore, this is possible without losing performance on the original training data. We achieve this by creating artificial data and by using multi-task learning. After adaptation, we are able to reduce the number of corrections displayed during incremental output construction by 45%, without a decrease in translation quality.
Tasks	Machine Translation, Multi-Task Learning
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00491v1
PDF	http://arxiv.org/pdf/1808.00491v1.pdf
PWC	https://paperswithcode.com/paper/low-latency-neural-speech-translation
Repo
Framework

Long-time predictive modeling of nonlinear dynamical systems using neural networks


Title	Long-time predictive modeling of nonlinear dynamical systems using neural networks
Authors	Shaowu Pan, Karthik Duraisamy
Abstract	We study the use of feedforward neural networks (FNN) to develop models of nonlinear dynamical systems from data. Emphasis is placed on predictions at long times, with limited data availability. Inspired by global stability analysis, and the observation of the strong correlation between the local error and the maximum singular value of the Jacobian of the ANN, we introduce Jacobian regularization in the loss function. This regularization suppresses the sensitivity of the prediction to the local error and is shown to improve accuracy and robustness. Comparison between the proposed approach and sparse polynomial regression is presented in numerical examples ranging from simple ODE systems to nonlinear PDE systems including vortex shedding behind a cylinder, and instability-driven buoyant mixing flow. Furthermore, limitations of feedforward neural networks are highlighted, especially when the training data does not include a low dimensional attractor. Strategies of data augmentation are presented as remedies to address these issues to a certain extent.
Tasks	Data Augmentation
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12547v5
PDF	http://arxiv.org/pdf/1805.12547v5.pdf
PWC	https://paperswithcode.com/paper/long-time-predictive-modeling-of-nonlinear
Repo
Framework

Accurate brain extraction using Active Shape Model and Convolutional Neural Networks


Title	Accurate brain extraction using Active Shape Model and Convolutional Neural Networks
Authors	Nguyen Ho Minh Duy, Nguyen Manh Duy, Mai Thanh Nhat Truong, Pham The Bao, Nguyen Thanh Binh
Abstract	Brain extraction or skull stripping is a fundamental procedure in most of neuroimaging processing systems. The performance of this procedure has had a critical impact on the success of neuroimaging analysis. After several years of research and development, brain extraction still remains a challenging problem. In this paper, we propose an effective method for skull stripping in Magnetic Resonance Imaging (MRI) scans named ASM-CNN. Our system is a combination of Active Shape Model (ASM) and Convolutional Neural Network (CNN), taking full advantage of these two methods to achieve remarkable results. Instead of working with 3D structures, we process 2D image sequences in sagittal plane. First, we divide images into different groups such that, in each group, the shapes and structures of brain boundaries have similar appearances. This allows developing precise algorithms for each group in order to produce high performance segmentation results. Second, a modified version of ASM is used to detect the brain boundary in images by utilizing prior knowledge of each group. Finally, CNN and the post-processing methods such as Conditional Random Field, Gaussian Process and some special rules are applied to refine segmentation contour produced by ASM. We compared ASM-CNN with the latest version of five state-of-the-art, publicly available methods, namely BET, BSE, 3DSS, ROBEX and BEAST. The evaluation was carried out by using three public datasets IBSR, LPBA and OASIS. The experimental results show that the proposed method outperforms five states-of-the-art algorithms, surpassing all the other methods by a significant margin in all experiments.
Tasks	Skull Stripping
Published	2018-02-05
URL	http://arxiv.org/abs/1802.01268v1
PDF	http://arxiv.org/pdf/1802.01268v1.pdf
PWC	https://paperswithcode.com/paper/accurate-brain-extraction-using-active-shape
Repo
Framework

Multiple Models for Recommending Temporal Aspects of Entities


Title	Multiple Models for Recommending Temporal Aspects of Entities
Authors	Tu Ngoc Nguyen, Nattiya Kanhabua, Wolfgang Nejdl
Abstract	Entity aspect recommendation is an emerging task in semantic search that helps users discover serendipitous and prominent information with respect to an entity, of which salience (e.g., popularity) is the most important factor in previous work. However, entity aspects are temporally dynamic and often driven by events happening over time. For such cases, aspect suggestion based solely on salience features can give unsatisfactory results, for two reasons. First, salience is often accumulated over a long time period and does not account for recency. Second, many aspects related to an event entity are strongly time-dependent. In this paper, we study the task of temporal aspect recommendation for a given entity, which aims at recommending the most relevant aspects and takes into account time in order to improve search experience. We propose a novel event-centric ensemble ranking method that learns from multiple time and type-dependent models and dynamically trades off salience and recency characteristics. Through extensive experiments on real-world query logs, we demonstrate that our method is robust and achieves better effectiveness than competitive baselines.
Tasks
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07890v2
PDF	http://arxiv.org/pdf/1803.07890v2.pdf
PWC	https://paperswithcode.com/paper/multiple-models-for-recommending-temporal
Repo
Framework

A pooling based scene text proposal technique for scene text reading in the wild


Title	A pooling based scene text proposal technique for scene text reading in the wild
Authors	Dinh NguyenVan, Shijian Lu, Shangxuan Tian, Nizar Ouarti, Mounir Mokhtari
Abstract	Automatic reading texts in scenes has attracted increasing interest in recent years as texts often carry rich semantic information that is useful for scene understanding. In this paper, we propose a novel scene text proposal technique aiming for accurate reading texts in scenes. Inspired by the pooling layer in the deep neural network architecture, a pooling based scene text proposal technique is developed. A novel score function is designed which exploits the histogram of oriented gradients and is capable of ranking the proposals according to their probabilities of being text. An end-to-end scene text reading system has also been developed by incorporating the proposed scene text proposal technique where false alarms elimination and words recognition are performed simultaneously. Extensive experiments over several public datasets show that the proposed technique can handle multi-orientation and multi-language scene texts and obtains outstanding proposal performance. The developed end-to-end systems also achieve very competitive scene text spotting and reading performance.
Tasks	Scene Understanding, Text Spotting
Published	2018-11-25
URL	http://arxiv.org/abs/1811.10003v1
PDF	http://arxiv.org/pdf/1811.10003v1.pdf
PWC	https://paperswithcode.com/paper/a-pooling-based-scene-text-proposal-technique
Repo
Framework

Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime


Title	Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime
Authors	Dengxin Dai, Luc Van Gool
Abstract	This work addresses the problem of semantic image segmentation of nighttime scenes. Although considerable progress has been made in semantic image segmentation, it is mainly related to daytime scenarios. This paper proposes a novel method to progressive adapt the semantic models trained on daytime scenes, along with large-scale annotations therein, to nighttime scenes via the bridge of twilight time – the time between dawn and sunrise, or between sunset and dusk. The goal of the method is to alleviate the cost of human annotation for nighttime images by transferring knowledge from standard daytime conditions. In addition to the method, a new dataset of road scenes is compiled; it consists of 35,000 images ranging from daytime to twilight time and to nighttime. Also, a subset of the nighttime images are densely annotated for method evaluation. Our experiments show that our method is effective for model adaptation from daytime scenes to nighttime scenes, without using extra human annotation.
Tasks	Semantic Segmentation
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02575v1
PDF	http://arxiv.org/pdf/1810.02575v1.pdf
PWC	https://paperswithcode.com/paper/dark-model-adaptation-semantic-image
Repo
Framework

Stochastic quasi-Newton with adaptive step lengths for large-scale problems


Title	Stochastic quasi-Newton with adaptive step lengths for large-scale problems
Authors	Adrian Wills, Thomas Schön
Abstract	We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions.
Tasks
Published	2018-02-12
URL	http://arxiv.org/abs/1802.04310v1
PDF	http://arxiv.org/pdf/1802.04310v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-quasi-newton-with-adaptive-step
Repo
Framework

A Learning-Based Framework for Two-Dimensional Vehicle Maneuver Prediction over V2V Networks


Title	A Learning-Based Framework for Two-Dimensional Vehicle Maneuver Prediction over V2V Networks
Authors	Hossein Nourkhiz Mahjoub, Amin Tahmasbi-Sarvestani, Hadi Kazemi, Yaser P. Fallah
Abstract	Situational awareness in vehicular networks could be substantially improved utilizing reliable trajectory prediction methods. More precise situational awareness, in turn, results in notably better performance of critical safety applications, such as Forward Collision Warning (FCW), as well as comfort applications like Cooperative Adaptive Cruise Control (CACC). Therefore, vehicle trajectory prediction problem needs to be deeply investigated in order to come up with an end to end framework with enough precision required by the safety applications’ controllers. This problem has been tackled in the literature using different methods. However, machine learning, which is a promising and emerging field with remarkable potential for time series prediction, has not been explored enough for this purpose. In this paper, a two-layer neural network-based system is developed which predicts the future values of vehicle parameters, such as velocity, acceleration, and yaw rate, in the first layer and then predicts the two-dimensional, i.e. longitudinal and lateral, trajectory points based on the first layer’s outputs. The performance of the proposed framework has been evaluated in realistic cut-in scenarios from Safety Pilot Model Deployment (SPMD) dataset and the results show a noticeable improvement in the prediction accuracy in comparison with the kinematics model which is the dominant employed model by the automotive industry. Both ideal and nonideal communication circumstances have been investigated for our system evaluation. For non-ideal case, an estimation step is included in the framework before the parameter prediction block to handle the drawbacks of packet drops or sensor failures and reconstruct the time series of vehicle parameters at a desirable frequency.
Tasks	Time Series, Time Series Prediction, Trajectory Prediction
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00516v1
PDF	http://arxiv.org/pdf/1808.00516v1.pdf
PWC	https://paperswithcode.com/paper/a-learning-based-framework-for-two
Repo
Framework

Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents


Title	Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents
Authors	Amir Alansary, Loic Le Folgoc, Ghislain Vaillant, Ozan Oktay, Yuanwei Li, Wenjia Bai, Jonathan Passerat-Palmbach, Ricardo Guerrero, Konstantinos Kamnitsas, Benjamin Hou, Steven McDonagh, Ben Glocker, Bernhard Kainz, Daniel Rueckert
Abstract	We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is tedious and operator-dependent. For this task, we employ a multi-scale reinforcement learning (RL) agent framework and extensively evaluate several Deep Q-Network (DQN) based strategies. RL enables a natural learning paradigm by interaction with the environment, which can be used to mimic experienced operators. We evaluate our results using the distance between the anatomical landmarks and detected planes, and the angles between their normal vector and target. The proposed algorithm is assessed on the mid-sagittal and anterior-posterior commissure planes of brain MRI, and the 4-chamber long-axis plane commonly used in cardiac MRI, achieving accuracy of 1.53mm, 1.98mm and 4.84mm, respectively.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.03228v1
PDF	http://arxiv.org/pdf/1806.03228v1.pdf
PWC	https://paperswithcode.com/paper/automatic-view-planning-with-multi-scale-deep
Repo
Framework

Rapid Time Series Prediction with a Hardware-Based Reservoir Computer


Title	Rapid Time Series Prediction with a Hardware-Based Reservoir Computer
Authors	Daniel Canaday, Aaron Griffith, Daniel Gauthier
Abstract	Reservoir computing is a neural network approach for processing time-dependent signals that has seen rapid development in recent years. Physical implementations of the technique using optical reservoirs have demonstrated remarkable accuracy and processing speed at benchmark tasks. However, these approaches require an electronic output layer to maintain high performance, which limits their use in tasks such as time-series prediction, where the output is fed back into the reservoir. We present here a reservoir computing scheme that has rapid processing speed both by the reservoir and the output layer. The reservoir is realized by an autonomous, time-delay, Boolean network configured on a field-programmable gate array. We investigate the dynamical properties of the network and observe the fading memory property that is critical for successful reservoir computing. We demonstrate the utility of the technique by training a reservoir to learn the short- and long-term behavior of a chaotic system. We find accuracy comparable to state-of-the-art software approaches of similar network size, but with a superior real-time prediction rate up to 160 MHz.
Tasks	Time Series, Time Series Prediction
Published	2018-07-19
URL	http://arxiv.org/abs/1807.07627v2
PDF	http://arxiv.org/pdf/1807.07627v2.pdf
PWC	https://paperswithcode.com/paper/rapid-time-series-prediction-with-a-hardware
Repo
Framework

Fast and robust misalignment correction of Fourier ptychographic microscopy


Title	Fast and robust misalignment correction of Fourier ptychographic microscopy
Authors	Ao Zhou, Wei Wang, Ni Chen, Edmund Y. Lam, Byoungho Lee, Guohai Situ
Abstract	Fourier ptychographi cmicroscopy(FPM) is a newly developed computational imaging technique that can provide gigapixel images with both high resolution (HR) and wide field of view (FOV). However, the positional misalignment of the LED array induces a degradation of the reconstruction, especially in the regions away from the optical axis. In this paper, we propose a robust and fast method to correct the LED misalignment of FPM, termed as misalignment correction for FPM (mcFPM). Although different regions in the FOV have different sensitivity to the LED misalignment, the experimental results show that mcFPM is robust to eliminate the degradation in each region. Compared with the state-of-the-art methods, mcFPM is much faster.
Tasks
Published	2018-02-20
URL	http://arxiv.org/abs/1803.00395v1
PDF	http://arxiv.org/pdf/1803.00395v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-robust-misalignment-correction-of
Repo
Framework

Deep Learning Super-Diffusion in Multiplex Networks


Title	Deep Learning Super-Diffusion in Multiplex Networks
Authors	Vito M. Leli, Saeed Osat, Timur Tlyachev, Jacob D. Biamonte
Abstract	Complex network theory has shown success in understanding the emergent and collective behavior of complex systems [1]. Many real-world complex systems were recently discovered to be more accurately modeled as multiplex networks [2-6]—in which each interaction type is mapped to its own network layer; e.g.~multi-layer transportation networks, coupled social networks, metabolic and regulatory networks, etc. A salient physical phenomena emerging from multiplexity is super-diffusion: exhibited by an accelerated diffusion admitted by the multi-layer structure as compared to any single layer. Theoretically super-diffusion was only known to be predicted using the spectral gap of the full Laplacian of a multiplex network and its interacting layers. Here we turn to machine learning which has developed techniques to recognize, classify, and characterize complex sets of data. We show that modern machine learning architectures, such as fully connected and convolutional neural networks, can classify and predict the presence of super-diffusion in multiplex networks with 94.12% accuracy. Such predictions can be done {\it in situ}, without the need to determine spectral properties of a network.
Tasks
Published	2018-11-09
URL	http://arxiv.org/abs/1811.04104v1
PDF	http://arxiv.org/pdf/1811.04104v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-super-diffusion-in-multiplex
Repo
Framework

The perceived quality of process discovery tools


Title	The perceived quality of process discovery tools
Authors	Francis Bru, Jan Claes
Abstract	Process discovery has seen a rise in popularity in the last decade for both researchers and businesses. Recent developments mainly focused on the power and the functionalities of the discovery algorithm. While continuous improvement of these functional aspects is very important, non-functional aspects such as visualization and usability are often overlooked. However, these aspects are considered valuable for end-users and play an important part in the experience of these end-users when working with a process discovery tool. A questionnaire has been sent out to give end-users the opportunity to voice their opinion on available process discovery tools and about the state of process discovery as a domain in general. The results of 66 respondents are presented and compared with the answers of 63 respondents that were contacted through one particular software vendor’s employee and customer base (i.e., Celonis).
Tasks
Published	2018-08-13
URL	http://arxiv.org/abs/1808.06475v1
PDF	http://arxiv.org/pdf/1808.06475v1.pdf
PWC	https://paperswithcode.com/paper/the-perceived-quality-of-process-discovery
Repo
Framework

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis


Title	Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis
Authors	Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai
Abstract	This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.
Tasks	Acoustic Modelling, Speech Synthesis
Published	2018-07-18
URL	http://arxiv.org/abs/1807.06736v1
PDF	http://arxiv.org/pdf/1807.06736v1.pdf
PWC	https://paperswithcode.com/paper/forward-attention-in-sequence-to-sequence
Repo
Framework

Multimodal Densenet


Title	Multimodal Densenet
Authors	Faisal Mahmood, Ziyun Yang, Thomas Ashley, Nicholas J. Durr
Abstract	Humans make accurate decisions by interpreting complex data from multiple sources. Medical diagnostics, in particular, often hinge on human interpretation of multi-modal information. In order for artificial intelligence to make progress in automated, objective, and accurate diagnosis and prognosis, methods to fuse information from multiple medical imaging modalities are required. However, combining information from multiple data sources has several challenges, as current deep learning architectures lack the ability to extract useful representations from multimodal information, and often simple concatenation is used to fuse such information. In this work, we propose Multimodal DenseNet, a novel architecture for fusing multimodal data. Instead of focusing on concatenation or early and late fusion, our proposed architectures fuses information over several layers and gives the model flexibility in how it combines information from multiple sources. We apply this architecture to the challenge of polyp characterization and landmark identification in endoscopy. Features from white light images are fused with features from narrow band imaging or depth maps. This study demonstrates that Multimodal DenseNet outperforms monomodal classification as well as other multimodal fusion techniques by a significant margin on two different datasets.
Tasks
Published	2018-11-18
URL	http://arxiv.org/abs/1811.07407v1
PDF	http://arxiv.org/pdf/1811.07407v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-densenet
Repo
Framework