January 31, 2020

3166 words 15 mins read

Paper Group ANR 130

Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders. Explicit-risk-aware Path Planning with Reward Maximization. KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents. Joint Multi-Domain Learning for Automatic Short Answer Grading. SESF-Fuse: An Unsupervised Deep Model for Multi-Focus …

Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders


Title	Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders
Authors	Mostafa Sadeghi, Xavier Alameda-Pineda
Abstract	Recently, an audio-visual speech generative model based on variational autoencoder (VAE) has been proposed, which is combined with a nonnegative matrix factorization (NMF) model for noise variance to perform unsupervised speech enhancement. When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data. However, audio-visual VAE is not robust against noisy visual data, e.g., when for some video frames, speaker face is not frontal or lips region is occluded. In this paper, we propose a robust unsupervised audio-visual speech enhancement method based on a per-frame VAE mixture model. This mixture model consists of a trained audio-only VAE and a trained audio-visual VAE. The motivation is to skip noisy visual frames by switching to the audio-only VAE model. We present a variational expectation-maximization method to estimate the parameters of the model. Experiments show the promising performance of the proposed method.
Tasks	Speech Enhancement
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03930v1
PDF	https://arxiv.org/pdf/1911.03930v1.pdf
PWC	https://paperswithcode.com/paper/robust-unsupervised-audio-visual-speech
Repo
Framework

Explicit-risk-aware Path Planning with Reward Maximization


Title	Explicit-risk-aware Path Planning with Reward Maximization
Authors	Xuesu Xiao, Jan Dufek, Robin Murphy
Abstract	This paper develops a path planner that minimizes risk (e.g. motion execution) while maximizing accumulated reward (e.g., quality of sensor viewpoint) motivated by visual assistance or tracking scenarios in unstructured or confined environments. In these scenarios, the robot should maintain the best viewpoint as it moves to the goal. However, in unstructured or confined environments, some paths may increase the risk of collision; therefore there is a tradeoff between risk and reward. Conventional state-dependent risk or probabilistic uncertainty modeling do not consider path-level risk or is difficult to acquire. This risk-reward planner explicitly represents risk as a function of motion plans, i.e., paths. Without manual assignment of the negative impact to the planner caused by risk, this planner takes in a pre-established viewpoint quality map and plans target location and path leading to it simultaneously, in order to maximize overall reward along the entire path while minimizing risk. Exact and approximate algorithms are presented, whose solution is further demonstrated on a physical tethered aerial vehicle. Other than the visual assistance problem, the proposed framework also provides a new planning paradigm to address minimum-risk planning under dynamical risk and absence of substructure optimality and to balance the trade-off between reward and risk.
Tasks
Published	2019-03-07
URL	http://arxiv.org/abs/1903.03187v1
PDF	http://arxiv.org/pdf/1903.03187v1.pdf
PWC	https://paperswithcode.com/paper/explicit-risk-aware-path-planning-with-reward
Repo
Framework

KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents


Title	KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents
Authors	Ygor Gallina, Florian Boudin, Béatrice Daille
Abstract	Keyphrase generation is the task of predicting a set of lexical units that conveys the main content of a source text. Existing datasets for keyphrase generation are only readily available for the scholarly domain and include non-expert annotations. In this paper we present KPTimes, a large-scale dataset of news texts paired with editor-curated keyphrases. Exploring the dataset, we show how editors tag documents, and how their annotations differ from those found in existing datasets. We also train and evaluate state-of-the-art neural keyphrase generation models on KPTimes to gain insights on how well they perform on the news domain. The dataset is available online at https://github.com/ygorg/KPTimes .
Tasks
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12559v1
PDF	https://arxiv.org/pdf/1911.12559v1.pdf
PWC	https://paperswithcode.com/paper/kptimes-a-large-scale-dataset-for-keyphrase-1
Repo
Framework

Joint Multi-Domain Learning for Automatic Short Answer Grading


Title	Joint Multi-Domain Learning for Automatic Short Answer Grading
Authors	Swarnadeep Saha, Tejas I. Dhamecha, Smit Marvaniya, Peter Foltz, Renuka Sindhgatta, Bikram Sengupta
Abstract	One of the fundamental challenges towards building any intelligent tutoring system is its ability to automatically grade short student answers. A typical automatic short answer grading system (ASAG) grades student answers across multiple domains (or subjects). Grading student answers requires building a supervised machine learning model that evaluates the similarity of the student answer with the reference answer(s). We observe that unlike typical textual similarity or entailment tasks, the notion of similarity is not universal here. On one hand, para-phrasal constructs of the language can indicate similarity independent of the domain. On the other hand, two words, or phrases, that are not strict synonyms of each other, might mean the same in certain domains. Building on this observation, we propose JMD-ASAG, the first joint multidomain deep learning architecture for automatic short answer grading that performs domain adaptation by learning generic and domain-specific aspects from the limited domain-wise training data. JMD-ASAG not only learns the domain-specific characteristics but also overcomes the dependence on a large corpus by learning the generic characteristics from the task-specific data itself. On a large-scale industry dataset and a benchmarking dataset, we show that our model performs significantly better than existing techniques which either learn domain-specific models or adapt a generic similarity scoring model from a large corpus. Further, on the benchmarking dataset, we report state-of-the-art results against all existing non-neural and neural models.
Tasks	Domain Adaptation
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09183v1
PDF	http://arxiv.org/pdf/1902.09183v1.pdf
PWC	https://paperswithcode.com/paper/joint-multi-domain-learning-for-automatic
Repo
Framework

SESF-Fuse: An Unsupervised Deep Model for Multi-Focus Image Fusion


Title	SESF-Fuse: An Unsupervised Deep Model for Multi-Focus Image Fusion
Authors	Boyuan Ma, Xiaojuan Ban, Haiyou Huang, Yu Zhu
Abstract	In this work, we propose a novel unsupervised deep learning model to address multi-focus image fusion problem. First, we train an encoder-decoder network in unsupervised manner to acquire deep feature of input images. And then we utilize these features and spatial frequency to measure activity level and decision map. Finally, we apply some consistency verification methods to adjust the decision map and draw out fused result. The key point behind of proposed method is that only the objects within the depth-of-field (DOF) have sharp appearance in the photograph while other objects are likely to be blurred. In contrast to previous works, our method analyzes sharp appearance in deep feature instead of original image. Experimental results demonstrate that the proposed method achieves the state-of-art fusion performance compared to existing 16 fusion methods in objective and subjective assessment.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01703v2
PDF	https://arxiv.org/pdf/1908.01703v2.pdf
PWC	https://paperswithcode.com/paper/sesf-fuse-an-unsupervised-deep-model-for
Repo
Framework

Segway DRIVE Benchmark: Place Recognition and SLAM Data Collected by A Fleet of Delivery Robots


Title	Segway DRIVE Benchmark: Place Recognition and SLAM Data Collected by A Fleet of Delivery Robots
Authors	Jianzhu Huai, Yusen Qin, Fumin Pang, Zichong Chen
Abstract	Visual place recognition and simultaneous localization and mapping (SLAM) have recently begun to be used in real-world autonomous navigation tasks like food delivery. Existing datasets for SLAM research are often not representative of in situ operations, leaving a gap between academic research and real-world deployment. In response, this paper presents the Segway DRIVE benchmark, a novel and challenging dataset suite collected by a fleet of Segway delivery robots. Each robot is equipped with a global-shutter fisheye camera, a consumer-grade IMU synced to the camera on chip, two low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. As they routinely carry out tasks in office buildings and shopping malls while collecting data, the dataset spanning a year is characterized by planar motions, moving pedestrians in scenes, and changing environment and lighting. Such factors typically pose severe challenges and may lead to failures for SLAM algorithms. Moreover, several metrics are proposed to evaluate metric place recognition algorithms. With these metrics, sample SLAM and metric place recognition methods were evaluated on this benchmark. The first release of our benchmark has hundreds of sequences, covering more than 50 km of indoor floors. More data will be added as the robot fleet continues to operate in real life. The benchmark is available at http://drive.segwayrobotics.com/#/dataset/download.
Tasks	Autonomous Navigation, Simultaneous Localization and Mapping, Visual Place Recognition
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03424v1
PDF	https://arxiv.org/pdf/1907.03424v1.pdf
PWC	https://paperswithcode.com/paper/segway-drive-benchmark-place-recognition-and
Repo
Framework

DiscoTK: Using Discourse Structure for Machine Translation Evaluation


Title	DiscoTK: Using Discourse Structure for Machine Translation Evaluation
Authors	Shafiq Joty, Francisco Guzman, Lluis Marquez, Preslav Nakov
Abstract	We present novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference. We experiment with five transformations and augmentations of a base discourse tree representation based on the rhetorical structure theory, and we combine the kernel scores for each of them into a single score. Finally, we add other metrics from the ASIYA MT evaluation toolkit, and we tune the weights of the combination on actual human judgments. Experiments on the WMT12 and WMT13 metrics shared task datasets show correlation with human judgments that outperforms what the best systems that participated in these years achieved, both at the segment and at the system level.
Tasks	Machine Translation
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12547v1
PDF	https://arxiv.org/pdf/1911.12547v1.pdf
PWC	https://paperswithcode.com/paper/discotk-using-discourse-structure-for-machine-1
Repo
Framework

On Using SpecAugment for End-to-End Speech Translation


Title	On Using SpecAugment for End-to-End Speech Translation
Authors	Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney
Abstract	This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% \BLEU on LibriSpeech Audiobooks En->Fr and +1.2% on IWSLT TED-talks En->De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.
Tasks	Data Augmentation
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08876v1
PDF	https://arxiv.org/pdf/1911.08876v1.pdf
PWC	https://paperswithcode.com/paper/on-using-specaugment-for-end-to-end-speech
Repo
Framework

Learning Malware Representation based on Execution Sequences


Title	Learning Malware Representation based on Execution Sequences
Authors	Yi-Ting Huang, Ting-Yi Chen, Yeali S. Sun, Meng Chang Chen
Abstract	Malware analysis has been extensively investigated as the number and types of malware has increased dramatically. However, most previous studies use end-to-end systems to detect whether a sample is malicious, or to identify its malware family. In this paper, we propose a neural network framework composed of an embedder, an encoder, and a filter to learn malware representations from characteristic execution sequences for malware family classification. The embedder uses BERT and Sent2Vec, state-of-the-art embedding modules, to capture relations within a single API call and among consecutive API calls in an execution trace. The encoder comprises gated recurrent units (GRU) to preserve the ordinal position of API calls and a self-attention mechanism for comparing intra-relations among different positions of API calls. The filter identifies representative API calls to build the malware representation. We conduct broad experiments to determine the influence of individual framework components. The results show that the proposed framework outperforms the baselines, and also demonstrates that considering Sent2Vec to learn complete API call embeddings and GRU to explicitly preserve ordinal information yields more information and thus significant improvements. Also, the proposed approach effectively classifies new malicious execution traces on the basis of similarities with previously collected families.
Tasks
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07250v1
PDF	https://arxiv.org/pdf/1912.07250v1.pdf
PWC	https://paperswithcode.com/paper/learning-malware-representation-based-on
Repo
Framework

Mixed Formal Learning: A Path to Transparent Machine Learning


Title	Mixed Formal Learning: A Path to Transparent Machine Learning
Authors	Sandra Carrico
Abstract	This paper presents Mixed Formal Learning, a new architecture that learns models based on formal mathematical representations of the domain of interest and exposes latent variables. The second element in the architecture learns a particular skill, typically by using traditional prediction or classification mechanisms. Our key findings include that this architecture: (1) Facilitates transparency by exposing key latent variables based on a learned mathematical model; (2) Enables Low Shot and Zero Shot training of machine learning without sacrificing accuracy or recall.
Tasks
Published	2019-01-20
URL	http://arxiv.org/abs/1901.06622v1
PDF	http://arxiv.org/pdf/1901.06622v1.pdf
PWC	https://paperswithcode.com/paper/mixed-formal-learning-a-path-to-transparent
Repo
Framework

Optimal $δ$-Correct Best-Arm Selection for General Distributions


Title	Optimal $δ$-Correct Best-Arm Selection for General Distributions
Authors	Shubhada Agrawal, Sandeep Juneja, Peter Glynn
Abstract	Given a finite set of unknown distributions, or arms, that can be sampled, we consider the problem of identifying the one with the largest mean using a delta-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified delta) that has minimum sample complexity. Lower bounds for delta-correct algorithms are well known. Delta-correct algorithms that match the lower bound asymptotically as delta reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise under a delta-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a delta-correct algorithm that matches the lower bound as delta reduces to zero under the mild restriction that a known bound on the expectation of a non-negative, continuous, increasing convex function (for example, the squared moment) of the underlying random variables, exists. We also propose batch processing and identify near-optimal batch sizes to substantially speed up the proposed algorithm. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well studied classic problem in the simulation community.
Tasks	Recommendation Systems
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09094v2
PDF	https://arxiv.org/pdf/1908.09094v2.pdf
PWC	https://paperswithcode.com/paper/optimal-best-arm-selection-for-general
Repo
Framework

Machine Learning for Classification of Protein Helix Capping Motifs


Title	Machine Learning for Classification of Protein Helix Capping Motifs
Authors	Sean Mullane, Ruoyan Chen, Sri Vaishnavi Vemulapalli, Eli J. Draizen, Ke Wang, Cameron Mura, Philip E. Bourne
Abstract	The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete secondary structures, of which the most well-characterized are the geometrically regular $\alpha$-helix and $\beta$-sheet; irregular structural patterns, such as ‘turns’ and ‘loops’, are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit $\alpha$-helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly–including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)–as the ground truth to train a bidirectional long short-term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00455v1
PDF	http://arxiv.org/pdf/1905.00455v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-for-classification-of
Repo
Framework

Distributed Deep Convolutional Neural Networks for the Internet-of-Things


Title	Distributed Deep Convolutional Neural Networks for the Internet-of-Things
Authors	Simone Disabato, Manuel Roveri, Cesare Alippi
Abstract	Due to the high demand in computation and memory, deep learning solutions are mostly restricted to high-performance computing units, e.g., those present in servers, Cloud, and computing centers. In pervasive systems, e.g., those involving Internet-of-Things (IoT) technological solutions, this would require the transmission of acquired data from IoT sensors to the computing platform and wait for its output. This solution might become infeasible when remote connectivity is either unavailable or limited in bandwidth. Moreover, it introduces uncertainty in the “data production to decision making”-latency, which, in turn, might impair control loop stability if the response should be used to drive IoT actuators. In order to support a real-time recall phase directly at the IoT level, deep learning solutions must be completely rethought having in mind the constraints on memory and computation characterizing IoT units. In this paper we focus on Convolutional Neural Networks (CNNs), a specific deep learning solution for image and video classification, and introduce a methodology aiming at distributing their computation onto the units of the IoT system. We formalize such a methodology as an optimization problem where the latency between the data-gathering phase and the subsequent decision-making one is minimized. The methodology supports multiple IoT sources of data as well as multiple CNNs in execution on the same IoT system, making it a general-purpose distributed computing platform for CNN-based applications demanding autonomy, low decision-latency, and high Quality-of-Service.
Tasks	Decision Making, Video Classification
Published	2019-08-02
URL	https://arxiv.org/abs/1908.01656v1
PDF	https://arxiv.org/pdf/1908.01656v1.pdf
PWC	https://paperswithcode.com/paper/distributed-deep-convolutional-neural
Repo
Framework

Parity Partition Coding for Sharp Multi-Label Classification


Title	Parity Partition Coding for Sharp Multi-Label Classification
Authors	Christopher G. Blake, Giuseppe Castiglione, Christopher Srinivasa, Marcus Brubaker
Abstract	The problem of efficiently training and evaluating image classifiers that can distinguish between a large number of object categories is considered. A novel metric, sharpness, is proposed which is defined as the fraction of object categories that are above a threshold accuracy. To estimate sharpness (along with a confidence value), a technique called fraction-accurate estimation is introduced which samples categories and samples instances from these categories. In addition, a technique called parity partition coding, a special type of error correcting output code, is introduced, increasing sharpness, while reducing the multi-class problem to a multi-label one with exponentially fewer outputs. We demonstrate that this approach outperforms the baseline model for both MultiMNIST and CelebA, while requiring fewer parameters and exceeding state of the art accuracy on individual labels.
Tasks	Multi-Label Classification
Published	2019-08-23
URL	https://arxiv.org/abs/1908.09651v1
PDF	https://arxiv.org/pdf/1908.09651v1.pdf
PWC	https://paperswithcode.com/paper/parity-partition-coding-for-sharp-multi-label
Repo
Framework

ATFaceGAN: Single Face Image Restoration and Recognition from Atmospheric Turbulence


Title	ATFaceGAN: Single Face Image Restoration and Recognition from Atmospheric Turbulence
Authors	Chun Pong Lau, Hossein Souri, Rama Chellappa
Abstract	Image degradation due to atmospheric turbulence is common while capturing images at long ranges. To mitigate the degradation due to turbulence which includes deformation and blur, we propose a generative single frame restoration algorithm which disentangles the blur and deformation due to turbulence and reconstructs a restored image. The disentanglement is achieved by decomposing the distortion due to turbulence into blur and deformation components using deblur generator and deformation correction generator. Two paths of restoration are implemented to regularize the disentanglement and generate two restored images from one degraded image. A fusion function combines the features of the restored images to reconstruct a sharp image with rich details. Adversarial and perceptual losses are added to reconstruct a sharp image and suppress the artifacts respectively. Extensive experiments demonstrate the effectiveness of the proposed restoration algorithm, which achieves satisfactory performance in face restoration and face recognition.
Tasks	Face Recognition, Image Restoration
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03119v1
PDF	https://arxiv.org/pdf/1910.03119v1.pdf
PWC	https://paperswithcode.com/paper/atfacegan-single-face-image-restoration-and
Repo
Framework