October 21, 2019

3381 words 16 mins read

Paper Group AWR 12

High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures. The Text-Based Adventure AI Competition. Learning to Shoot in First Person Shooter Games by Stabilizing Actions and Clustering Rewards for Reinforcement Learning. Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition. MVSNet: …

High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures


Title	High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures
Authors	Iddo Drori, Isht Dwivedi, Pranav Shrestha, Jeffrey Wan, Yueqi Wang, Yunchu He, Anthony Mazza, Hugh Krogh-Freeman, Dimitri Leggas, Kendal Sandridge, Linyong Nan, Kaveri Thakoor, Chinmay Joshi, Sonam Goenka, Chen Keasar, Itsik Pe’er
Abstract	We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the application domain. We focus on the Q8 resolution of secondary structure, an active area for continuously improving methods. We use an ensemble of strong predictors to achieve accuracy of 70.7% (on the CB513 test set using the CB6133filtered training set). These results are statistically indistinguishable from those of the top existing predictors. In the spirit of reproducible research we make our data, models and code available, aiming to set a gold standard for purity of training and testing sets. Such good practices lower entry barriers to this domain and facilitate reproducible, extendable research.
Tasks	Protein Secondary Structure Prediction
Published	2018-11-17
URL	http://arxiv.org/abs/1811.07143v1
PDF	http://arxiv.org/pdf/1811.07143v1.pdf
PWC	https://paperswithcode.com/paper/high-quality-prediction-of-protein-q8
Repo	https://github.com/idrori/cu-ssp
Framework	tf

The Text-Based Adventure AI Competition


Title	The Text-Based Adventure AI Competition
Authors	Timothy Atkinson, Hendrik Baier, Tara Copplestone, Sam Devlin, Jerry Swan
Abstract	In 2016, 2017, and 2018 at the IEEE Conference on Computational Intelligence in Games, the authors of this paper ran a competition for agents that can play classic text-based adventure games. This competition fills a gap in existing game AI competitions that have typically focussed on traditional card/board games or modern video games with graphical interfaces. By providing a platform for evaluating agents in text-based adventures, the competition provides a novel benchmark for game AI with unique challenges for natural language understanding and generation. This paper summarises the three competitions ran in 2016, 2017, and 2018 (including details of open source implementations of both the competition framework and our competitors) and presents the results of an improved evaluation of these competitors across 20 games.
Tasks	Board Games
Published	2018-08-03
URL	http://arxiv.org/abs/1808.01262v4
PDF	http://arxiv.org/pdf/1808.01262v4.pdf
PWC	https://paperswithcode.com/paper/the-text-based-adventure-ai-competition
Repo	https://github.com/Microsoft/nail_agent
Framework	none

Learning to Shoot in First Person Shooter Games by Stabilizing Actions and Clustering Rewards for Reinforcement Learning


Title	Learning to Shoot in First Person Shooter Games by Stabilizing Actions and Clustering Rewards for Reinforcement Learning
Authors	Frank G. Glavin, Michael G. Madden
Abstract	While reinforcement learning (RL) has been applied to turn-based board games for many years, more complex games involving decision-making in real-time are beginning to receive more attention. A challenge in such environments is that the time that elapses between deciding to take an action and receiving a reward based on its outcome can be longer than the interval between successive decisions. We explore this in the context of a non-player character (NPC) in a modern first-person shooter game. Such games take place in 3D environments where players, both human and computer-controlled, compete by engaging in combat and completing task objectives. We investigate the use of RL to enable NPCs to gather experience from game-play and improve their shooting skill over time from a reward signal based on the damage caused to opponents. We propose a new method for RL updates and reward calculations, in which the updates are carried out periodically, after each shooting encounter has ended, and a new weighted-reward mechanism is used which increases the reward applied to actions that lead to damaging the opponent in successive hits in what we term “hit clusters”.
Tasks	Board Games, Decision Making
Published	2018-06-13
URL	http://arxiv.org/abs/1806.05117v1
PDF	http://arxiv.org/pdf/1806.05117v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-shoot-in-first-person-shooter
Repo	https://github.com/lucylow/b00m-h3adsh0t
Framework	none

Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition


Title	Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition
Authors	Mubariz Zaffar, Shoaib Ehsan, Michael Milford, Klaus Mcdonald Maier
Abstract	This paper presents a cognition-inspired agnostic framework for building a map for Visual Place Recognition. This framework draws inspiration from human-memorability, utilizes the traditional image entropy concept and computes the static content in an image; thereby presenting a tri-folded criterion to assess the ‘memorability’ of an image for visual place recognition. A dataset namely ‘ESSEX3IN1’ is created, composed of highly confusing images from indoor, outdoor and natural scenes for analysis. When used in conjunction with state-of-the-art visual place recognition methods, the proposed framework provides significant performance boost to these techniques, as evidenced by results on ESSEX3IN1 and other public datasets.
Tasks	Visual Place Recognition
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03529v2
PDF	http://arxiv.org/pdf/1811.03529v2.pdf
PWC	https://paperswithcode.com/paper/memorable-maps-a-framework-for-re-defining
Repo	https://github.com/MubarizZaffar/ESSEX3IN1-Dataset
Framework	none

MVSNet: Depth Inference for Unstructured Multi-view Stereo


Title	MVSNet: Depth Inference for Unstructured Multi-view Stereo
Authors	Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan
Abstract	We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping. Next, we apply 3D convolutions to regularize and regress the initial depth map, which is then refined with the reference image to generate the final output. Our framework flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature. The proposed MVSNet is demonstrated on the large-scale indoor DTU dataset. With simple post-processing, our method not only significantly outperforms previous state-of-the-arts, but also is several times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks and Temples dataset, where our method ranks first before April 18, 2018 without any fine-tuning, showing the strong generalization ability of MVSNet.
Tasks
Published	2018-04-07
URL	http://arxiv.org/abs/1804.02505v2
PDF	http://arxiv.org/pdf/1804.02505v2.pdf
PWC	https://paperswithcode.com/paper/mvsnet-depth-inference-for-unstructured-multi
Repo	https://github.com/YoYo000/MVSNet
Framework	tf

Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications


Title	Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
Authors	Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao
Abstract	To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation.
Tasks	Anomaly Detection, Unsupervised Anomaly Detection
Published	2018-02-12
URL	http://arxiv.org/abs/1802.03903v1
PDF	http://arxiv.org/pdf/1802.03903v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-anomaly-detection-via
Repo	https://github.com/nakumgaurav/Anomaly-Detection
Framework	none

DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN


Title	DOPING: Generative Data Augmentation for Unsupervised Anomaly Detection with GAN
Authors	Swee Kiat Lim, Yi Loo, Ngoc-Trung Tran, Ngai-Man Cheung, Gemma Roig, Yuval Elovici
Abstract	Recently, the introduction of the generative adversarial network (GAN) and its variants has enabled the generation of realistic synthetic samples, which has been used for enlarging training sets. Previous work primarily focused on data augmentation for semi-supervised and supervised tasks. In this paper, we instead focus on unsupervised anomaly detection and propose a novel generative data augmentation framework optimized for this task. In particular, we propose to oversample infrequent normal samples - normal samples that occur with small probability, e.g., rare normal events. We show that these samples are responsible for false positives in anomaly detection. However, oversampling of infrequent normal samples is challenging for real-world high-dimensional data with multimodal distributions. To address this challenge, we propose to use a GAN variant known as the adversarial autoencoder (AAE) to transform the high-dimensional multimodal data distributions into low-dimensional unimodal latent distributions with well-defined tail probability. Then, we systematically oversample at the `edge’ of the latent distributions to increase the density of infrequent normal samples. We show that our oversampling pipeline is a unified one: it is generally applicable to datasets with different complex data distributions. To the best of our knowledge, our method is the first data augmentation technique focused on improving performance in unsupervised anomaly detection. We validate our method by demonstrating consistent improvements across several real-world datasets. \|
Tasks	Anomaly Detection, Data Augmentation, Unsupervised Anomaly Detection
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07632v2
PDF	http://arxiv.org/pdf/1808.07632v2.pdf
PWC	https://paperswithcode.com/paper/doping-generative-data-augmentation-for
Repo	https://github.com/greentfrapp/doping
Framework	none

Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features


Title	Scalable and Interpretable One-class SVMs with Deep Learning and Random Fourier features
Authors	Minh-Nghia Nguyen, Ngo Anh Vien
Abstract	One-class support vector machine (OC-SVM) for a long time has been one of the most effective anomaly detection methods and extensively adopted in both research as well as industrial applications. The biggest issue for OC-SVM is yet the capability to operate with large and high-dimensional datasets due to optimization complexity. Those problems might be mitigated via dimensionality reduction techniques such as manifold learning or autoencoder. However, previous work often treats representation learning and anomaly prediction separately. In this paper, we propose autoencoder based one-class support vector machine (AE-1SVM) that brings OC-SVM, with the aid of random Fourier features to approximate the radial basis kernel, into deep learning context by combining it with a representation learning architecture and jointly exploit stochastic gradient descent to obtain end-to-end training. Interestingly, this also opens up the possible use of gradient-based attribution methods to explain the decision making for anomaly detection, which has ever been challenging as a result of the implicit mappings between the input space and the kernel space. To the best of our knowledge, this is the first work to study the interpretability of deep learning in anomaly detection. We evaluate our method on a wide range of unsupervised anomaly detection tasks in which our end-to-end training architecture achieves a performance significantly better than the previous work using separate training.
Tasks	Anomaly Detection, Decision Making, Dimensionality Reduction, Representation Learning, Unsupervised Anomaly Detection
Published	2018-04-13
URL	http://arxiv.org/abs/1804.04888v2
PDF	http://arxiv.org/pdf/1804.04888v2.pdf
PWC	https://paperswithcode.com/paper/scalable-and-interpretable-one-class-svms
Repo	https://github.com/minh-nghia/AE-1SVM
Framework	tf

Fast and Accurate Point Cloud Registration using Trees of Gaussian Mixtures


Title	Fast and Accurate Point Cloud Registration using Trees of Gaussian Mixtures
Authors	Ben Eckart, Kihwan Kim, Jan Kautz
Abstract	Point cloud registration sits at the core of many important and challenging 3D perception problems including autonomous navigation, SLAM, object/scene recognition, and augmented reality. In this paper, we present a new registration algorithm that is able to achieve state-of-the-art speed and accuracy through its use of a hierarchical Gaussian Mixture Model (GMM) representation. Our method constructs a top-down multi-scale representation of point cloud data by recursively running many small-scale data likelihood segmentations in parallel on a GPU. We leverage the resulting representation using a novel PCA-based optimization criterion that adaptively finds the best scale to perform data association between spatial subsets of point cloud data. Compared to previous Iterative Closest Point and GMM-based techniques, our tree-based point association algorithm performs data association in logarithmic-time while dynamically adjusting the level of detail to best match the complexity and spatial distribution characteristics of local scene geometry. In addition, unlike other GMM methods that restrict covariances to be isotropic, our new PCA-based optimization criterion well-approximates the true MLE solution even when fully anisotropic Gaussian covariances are used. Efficient data association, multi-scale adaptability, and a robust MLE approximation produce an algorithm that is up to an order of magnitude both faster and more accurate than current state-of-the-art on a wide variety of 3D datasets captured from LiDAR to structured light.
Tasks	Autonomous Navigation, Point Cloud Registration, Scene Recognition
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02587v1
PDF	http://arxiv.org/pdf/1807.02587v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-accurate-point-cloud-registration
Repo	https://github.com/neka-nat/probreg
Framework	none

DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds


Title	DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds
Authors	Li Ding, Chen Feng
Abstract	We propose DeepMapping, a novel registration framework using deep neural networks (DNNs) as auxiliary functions to align multiple point clouds from scratch to a globally consistent frame. We use DNNs to model the highly non-convex mapping process that traditionally involves hand-crafted data association, sensor pose initialization, and global refinement. Our key novelty is that “training” these DNNs with properly defined unsupervised losses is equivalent to solving the underlying registration problem, but less sensitive to good initialization than ICP. Our framework contains two DNNs: a localization network that estimates the poses for input point clouds, and a map network that models the scene structure by estimating the occupancy status of global coordinates. This allows us to convert the registration problem to a binary occupancy classification, which can be solved efficiently using gradient-based optimization. We further show that DeepMapping can be readily extended to address the problem of Lidar SLAM by imposing geometric constraints between consecutive point clouds. Experiments are conducted on both simulated and real datasets. Qualitative and quantitative comparisons demonstrate that DeepMapping often enables more robust and accurate global registration of multiple point clouds than existing techniques. Our code is available at https://ai4ce.github.io/DeepMapping/.
Tasks	Point Cloud Registration
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11397v2
PDF	http://arxiv.org/pdf/1811.11397v2.pdf
PWC	https://paperswithcode.com/paper/deepmapping-unsupervised-map-estimation-from
Repo	https://github.com/ai4ce/DeepMapping
Framework	pytorch

Pre-Synaptic Pool Modification (PSPM): A Supervised Learning Procedure for Spiking Neural Networks


Title	Pre-Synaptic Pool Modification (PSPM): A Supervised Learning Procedure for Spiking Neural Networks
Authors	Bryce Bagley, Blake Bordelon, Benjamin Moseley, Ralf Wessel
Abstract	Learning synaptic weights of spiking neural network (SNN) models that can reproduce target spike trains from provided neural firing data is a central problem in computational neuroscience and spike-based computing. The discovery of the optimal weight values can be posed as a supervised learning task wherein the weights of the model network are chosen to maximize the similarity between the target spike trains and the model outputs. It is still largely unknown whether optimizing spike train similarity of highly recurrent SNNs produces weight matrices similar to those of the ground truth model. To this end, we propose flexible heuristic supervised learning rules, termed Pre-Synaptic Pool Modification (PSPM), that rely on stochastic weight updates in order to produce spikes within a short window of the desired times and eliminate spikes outside of this window. PSPM improves spike train similarity for all-to-all SNNs and makes no assumption about the post-synaptic potential of the neurons or the structure of the network since no gradients are required. We test whether optimizing for spike train similarity entails the discovery of accurate weights and explore the relative contributions of local and homeostatic weight updates. Although PSPM improves similarity between spike trains, the learned weights often differ from the weights of the ground truth model, implying that connectome inference from spike data may require additional constraints on connectivity statistics. We also find that spike train similarity is sensitive to local updates, but other measures of network activity such as avalanche distributions, can be learned through synaptic homeostasis.
Tasks
Published	2018-10-07
URL	https://arxiv.org/abs/1810.03199v3
PDF	https://arxiv.org/pdf/1810.03199v3.pdf
PWC	https://paperswithcode.com/paper/pre-synaptic-pool-modification-pspm-a
Repo	https://github.com/blakebordelon/Spiking-Neural-Network-Optimization
Framework	none

AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks


Title	AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks
Authors	Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, Jian Tang
Abstract	Click-through rate (CTR) prediction, which aims to predict the probability of a user clicking on an ad or an item, is critical to many online applications such as online advertising and recommender systems. The problem is very challenging since (1) the input features (e.g., the user id, user age, item id, item category) are usually sparse and high-dimensional, and (2) an effective prediction relies on high-order combinatorial features (\textit{a.k.a.} cross features), which are very time-consuming to hand-craft by domain experts and are impossible to be enumerated. Therefore, there have been efforts in finding low-dimensional representations of the sparse and high-dimensional raw features and their meaningful combinations. In this paper, we propose an effective and efficient method called the \emph{AutoInt} to automatically learn the high-order feature interactions of input features. Our proposed algorithm is very general, which can be applied to both numerical and categorical input features. Specifically, we map both the numerical and categorical features into the same low-dimensional space. Afterwards, a multi-head self-attentive neural network with residual connections is proposed to explicitly model the feature interactions in the low-dimensional space. With different layers of the multi-head self-attentive neural networks, different orders of feature combinations of input features can be modeled. The whole model can be efficiently fit on large-scale raw data in an end-to-end fashion. Experimental results on four real-world datasets show that our proposed approach not only outperforms existing state-of-the-art approaches for prediction but also offers good explainability. Code is available at: \url{https://github.com/DeepGraphLearning/RecommenderSystems}.
Tasks	Click-Through Rate Prediction, Recommendation Systems
Published	2018-10-29
URL	https://arxiv.org/abs/1810.11921v2
PDF	https://arxiv.org/pdf/1810.11921v2.pdf
PWC	https://paperswithcode.com/paper/autoint-automatic-feature-interaction
Repo	https://github.com/shichence/AutoInt
Framework	tf

How2: A Large-scale Dataset for Multimodal Language Understanding


Title	How2: A Large-scale Dataset for Multimodal Language Understanding
Authors	Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze
Abstract	In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations. We also present integrated sequence-to-sequence baselines for machine translation, automatic speech recognition, spoken language translation, and multimodal summarization. By making available data and code for several multimodal natural language tasks, we hope to stimulate more research on these and similar challenges, to obtain a deeper understanding of multimodality in language processing.
Tasks	Machine Translation, Speech Recognition
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00347v2
PDF	http://arxiv.org/pdf/1811.00347v2.pdf
PWC	https://paperswithcode.com/paper/how2-a-large-scale-dataset-for-multimodal
Repo	https://github.com/srvk/how2-dataset
Framework	none

DTMT: A Novel Deep Transition Architecture for Neural Machine Translation


Title	DTMT: A Novel Deep Transition Architecture for Neural Machine Translation
Authors	Fandong Meng, Jinchao Zhang
Abstract	Past years have witnessed rapid developments in Neural Machine Translation (NMT). Most recently, with advanced modeling and training techniques, the RNN-based NMT (RNMT) has shown its potential strength, even compared with the well-known Transformer (self-attentional) model. Although the RNMT model can possess very deep architectures through stacking layers, the transition depth between consecutive hidden states along the sequential axis is still shallow. In this paper, we further enhance the RNN-based NMT through increasing the transition depth between consecutive hidden states and build a novel Deep Transition RNN-based Architecture for Neural Machine Translation, named DTMT. This model enhances the hidden-to-hidden transition with multiple non-linear transformations, as well as maintains a linear transformation path throughout this deep transition by the well-designed linear transformation mechanism to alleviate the gradient vanishing problem. Experiments show that with the specially designed deep transition modules, our DTMT can achieve remarkable improvements on translation quality. Experimental results on Chinese->English translation task show that DTMT can outperform the Transformer model by +2.09 BLEU points and achieve the best results ever reported in the same dataset. On WMT14 English->German and English->French translation tasks, DTMT shows superior quality to the state-of-the-art NMT systems, including the Transformer and the RNMT+.
Tasks	Machine Translation
Published	2018-12-19
URL	https://arxiv.org/abs/1812.07807v2
PDF	https://arxiv.org/pdf/1812.07807v2.pdf
PWC	https://paperswithcode.com/paper/dtmt-a-novel-deep-transition-architecture-for
Repo	https://github.com/fandongmeng/DTMT_InDec
Framework	tf

Partial Convolution based Padding


Title	Partial Convolution based Padding
Authors	Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, Bryan Catanzaro
Abstract	In this paper, we present a simple yet effective padding scheme that can be used as a drop-in module for existing convolutional neural networks. We call it partial convolution based padding, with the intuition that the padded region can be treated as holes and the original input as non-holes. Specifically, during the convolution operation, the convolution results are re-weighted near image borders based on the ratios between the padded area and the convolution sliding window area. Extensive experiments with various deep network models on ImageNet classification and semantic segmentation demonstrate that the proposed padding scheme consistently outperforms standard zero padding with better accuracy.
Tasks	Semantic Segmentation
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11718v1
PDF	http://arxiv.org/pdf/1811.11718v1.pdf
PWC	https://paperswithcode.com/paper/partial-convolution-based-padding
Repo	https://github.com/lessw2020/auto-adaptive-ai
Framework	pytorch