January 28, 2020

3130 words 15 mins read

Paper Group ANR 1028

Diverse Behavior Is What Game AI Needs: Generating Varied Human-Like Playing Styles Using Evolutionary Multi-Objective Deep Reinforcement Learning. Evaluating structure learning algorithms with a balanced scoring function. What Makes Training Multi-Modal Classification Networks Hard?. Computing the Scope of Applicability for Acquired Task Knowledge …

Diverse Behavior Is What Game AI Needs: Generating Varied Human-Like Playing Styles Using Evolutionary Multi-Objective Deep Reinforcement Learning


Title	Diverse Behavior Is What Game AI Needs: Generating Varied Human-Like Playing Styles Using Evolutionary Multi-Objective Deep Reinforcement Learning
Authors	Ruimin Shen, Yan Zheng, Jianye Hao, Yinfeng Chen, Changjie Fan
Abstract	Designing artificial intelligence for games (Game AI) has been long recognized as a notoriously challenging task in the game industry, as it mainly relies on manual design, requiring plenty of domain knowledge. More frustratingly, even spending a lot of effort, a satisfying Game AI is still hard to achieve by manual design due to the almost infinite search space. The recent success of deep reinforcement learning (DRL) sheds light on advancing automated game designing, significantly relaxing human competitive intelligent support. However, existing DRL algorithms mostly focus on training a Game AI to win the game rather than the way it wins (style). To bridge the gap, we introduce EMO-DRL, an end-to-end game design framework, leveraging evolutionary algorithm, DRL and multi-objective optimization (MOO) to perform intelligent and automatic game design. Firstly, EMO-DRL proposes style-oriented learning to bypass manual reward shaping in DRL and directly learns a Game AI with an expected style in an end-to-end fashion. On this basis, the prioritized multi-objective optimization is introduced to achieve more diverse, nature and human-like Game AI. Large-scale evaluations on an Atari game and a commercial massively multiplayer online game are conducted. The results demonstrate that EMO-DRL, compared to existing algorithms, achieve better game designs in an intelligent and automatic way.
Tasks
Published	2019-10-20
URL	https://arxiv.org/abs/1910.09022v2
PDF	https://arxiv.org/pdf/1910.09022v2.pdf
PWC	https://paperswithcode.com/paper/diverse-behavior-is-what-game-ai-needs
Repo
Framework

Evaluating structure learning algorithms with a balanced scoring function


Title	Evaluating structure learning algorithms with a balanced scoring function
Authors	Anthony Constantinou
Abstract	Several structure learning algorithms have been proposed towards discovering causal or Bayesian Network (BN) graphs, which is a particularly challenging problem in AI. The performance of these algorithms is evaluated based on the relationship the learned graph has with respect to the ground truth graph. However, there is no agreed scoring function to determine this relationship. Moreover, this paper shows that the commonly used metrics tend to be biased in favour of graphs that minimise the number of edges. The evaluation bias is inconsistent and may lead to evaluating graphs with no edges as superior to graphs with varying numbers of correct and incorrect edges; implying that graphs that minimise edges are often favoured over more complex graphs due to bias rather than overall accuracy. While graphs that are less complex are often desirable, the current metrics encourage algorithms to optimise for simplicity, and to discover graphs with a limited number of edges that do not enable full propagation of evidence. This paper proposes a Balanced Scoring Function (BSF) that eliminates this bias by adjusting the reward function based on the difficulty of discovering an edge, or no edge, proportional to their occurrence rate in the ground truth graph. The BSF score can be used in conjunction with other traditional metrics to provide an alternative and unbiased assessment about the capability of structure learning algorithms in discovering causal or BN graphs.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12666v1
PDF	https://arxiv.org/pdf/1905.12666v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-structure-learning-algorithms-with
Repo
Framework


Title	What Makes Training Multi-Modal Classification Networks Hard?
Authors	Weiyao Wang, Du Tran, Matt Feiszli
Abstract	Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our experiments, however, we observe the opposite: the best single-modal network always outperforms the multi-modal network. This observation is consistent across different combinations of modalities and on different tasks and benchmarks. This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity. Second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal. We address these two problems with a technique we call Gradient Blending, which computes an optimal blend of modalities based on their overfitting behavior. We demonstrate that Gradient Blending outperforms widely-used baselines for avoiding overfitting and achieves state-of-the-art accuracy on various tasks including human action recognition, ego-centric action recognition, and acoustic event detection.
Tasks	Action Classification, Action Recognition In Videos, Temporal Action Localization
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12681v4
PDF	https://arxiv.org/pdf/1905.12681v4.pdf
PWC	https://paperswithcode.com/paper/what-makes-training-multi-modal-networks-hard
Repo
Framework

Computing the Scope of Applicability for Acquired Task Knowledge in Experience-Based Planning Domains


Title	Computing the Scope of Applicability for Acquired Task Knowledge in Experience-Based Planning Domains
Authors	Vahid Mokhtari, Luis Seabra Lopes, Armando Pinho, Roman Manevich
Abstract	Experience-based planning domains have been proposed to improve problem solving by learning from experience. They rely on acquiring and using task knowledge, i.e., activity schemata, for generating solutions to problem instances in a class of tasks. Using Three-Valued Logic Analysis (TVLA), we extend previous work to generate a set of conditions that determine the scope of applicability of an activity schema. The inferred scope is a bounded representation of a set of problems of potentially unbounded size, in the form of a 3-valued logical structure, which is used to automatically find an applicable activity schema for solving task problems. We validate this work in two classical planning domains.
Tasks
Published	2019-03-13
URL	http://arxiv.org/abs/1903.06015v1
PDF	http://arxiv.org/pdf/1903.06015v1.pdf
PWC	https://paperswithcode.com/paper/computing-the-scope-of-applicability-for
Repo
Framework

Fine-grained evaluation of German-English Machine Translation based on a Test Suite


Title	Fine-grained evaluation of German-English Machine Translation based on a Test Suite
Authors	Vivien Macketanz, Eleftherios Avramidis, Aljoscha Burchardt, Hans Uszkoreit
Abstract	We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.
Tasks	Machine Translation
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07460v1
PDF	https://arxiv.org/pdf/1910.07460v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-evaluation-of-german-english-1
Repo
Framework

Semantics-Aware Image to Image Translation and Domain Transfer


Title	Semantics-Aware Image to Image Translation and Domain Transfer
Authors	Pravakar Roy, Nicolai Häni, Volkan Isler
Abstract	Image to image translation is the problem of transferring an image from a source domain to a target domain. We present a new method to transfer the underlying semantics of an image even when there are geometric changes across the two domains. Specifically, we present a Generative Adversarial Network (GAN) that can transfer semantic information presented as segmentation masks. Our main technical contribution is an encoder-decoder based generator architecture that jointly encodes the image and its underlying semantics and translates both simultaneously to the target domain. Additionally, we propose object transfiguration and cross-domain semantic consistency losses that preserve the underlying semantic labels maps. We demonstrate the effectiveness of our approach in multiple object transfiguration and domain transfer tasks through qualitative and quantitative experiments. The results show that our method is better at transferring image semantics than state of the art image to image translation methods.
Tasks	Image-to-Image Translation
Published	2019-04-03
URL	http://arxiv.org/abs/1904.02203v1
PDF	http://arxiv.org/pdf/1904.02203v1.pdf
PWC	https://paperswithcode.com/paper/semantics-aware-image-to-image-translation
Repo
Framework

Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles


Title	Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles
Authors	Siddharth Srivastava, Frederic Jurie, Gaurav Sharma
Abstract	We address the problem of 3D object detection from 2D monocular images in autonomous driving scenarios. We propose to lift the 2D images to 3D representations using learned neural networks and leverage existing networks working directly on 3D data to perform 3D object detection and localization. We show that, with carefully designed training mechanism and automatically selected minimally noisy data, such a method is not only feasible, but gives higher results than many methods working on actual 3D inputs acquired from physical sensors. On the challenging KITTI benchmark, we show that our 2D to 3D lifted method outperforms many recent competitive 3D networks while significantly outperforming previous state-of-the-art for 3D detection from monocular images. We also show that a late fusion of the output of the network trained on generated 3D images, with that trained on real 3D images, improves performance. We find the results very interesting and argue that such a method could serve as a highly reliable backup in case of malfunction of expensive 3D sensors, if not potentially making them redundant, at least in the case of low human injury risk autonomous navigation scenarios like warehouse automation.
Tasks	3D Object Detection, Autonomous Driving, Autonomous Navigation, Autonomous Vehicles, Object Detection
Published	2019-03-27
URL	https://arxiv.org/abs/1904.08494v2
PDF	https://arxiv.org/pdf/1904.08494v2.pdf
PWC	https://paperswithcode.com/paper/190408494
Repo
Framework

Selecting Biomarkers for building optimal treatment selection rules using Kernel Machines


Title	Selecting Biomarkers for building optimal treatment selection rules using Kernel Machines
Authors	Sayan Dasgupta, Ying Huang
Abstract	Optimal biomarker combinations for treatment-selection can be derived by minimizing total burden to the population caused by the targeted disease and its treatment. However, when multiple biomarkers are present, including all in the model can be expensive and hurt model performance. To remedy this, we consider feature selection in optimization by minimizing an extended total burden that additionally incorporates biomarker measurement costs. Formulating it as a 0-norm penalized weighted classification, we develop various procedures for estimating linear and nonlinear combinations. Through simulations and a real data example, we demonstrate the importance of incorporating feature-selection and marker cost when deriving treatment-selection rules.
Tasks	Feature Selection
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02384v1
PDF	https://arxiv.org/pdf/1906.02384v1.pdf
PWC	https://paperswithcode.com/paper/selecting-biomarkers-for-building-optimal
Repo
Framework

The Angel is in the Priors: Improving GAN based Image and Sequence Inpainting with Better Noise and Structural Priors


Title	The Angel is in the Priors: Improving GAN based Image and Sequence Inpainting with Better Noise and Structural Priors
Authors	Avisek Lahiri, Arnav Kumar Jain, Prabir Kumar Biswas
Abstract	Contemporary deep learning based inpainting algorithms are mainly based on a hybrid dual stage training policy of supervised reconstruction loss followed by an unsupervised adversarial critic loss. However, there is a dearth of literature for a fully unsupervised GAN based inpainting framework. The primary aversion towards the latter genre is due to its prohibitively slow iterative optimization requirement during inference to find a matching noise prior for a masked image. In this paper, we show that priors matter in GAN: we learn a data driven parametric network to predict a matching prior for a given image. This converts an iterative paradigm to a single feed forward inference pipeline with a massive 1500X speedup and simultaneous improvement in reconstruction quality. We show that an additional structural prior imposed on GAN model results in higher fidelity outputs. To extend our model for sequence inpainting, we propose a recurrent net based grouped noise prior learning. To our knowledge, this is the first demonstration of an unsupervised GAN based sequence inpainting. A further improvement in sequence inpainting is achieved with an additional subsequence consistency loss. These contributions improve the spatio-temporal characteristics of reconstructed sequences. Extensive experiments conducted on SVHN, Standford Cars, CelebA and CelebA-HQ image datasets, synthetic sequences and ViDTIMIT video datasets reveal that we consistently improve upon previous unsupervised baseline and also achieve comparable performances(sometimes also better) to hybrid benchmarks.
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05861v1
PDF	https://arxiv.org/pdf/1908.05861v1.pdf
PWC	https://paperswithcode.com/paper/the-angel-is-in-the-priors-improving-gan
Repo
Framework

Keyphrase Generation: A Multi-Aspect Survey


Title	Keyphrase Generation: A Multi-Aspect Survey
Authors	Erion Çano, Ondřej Bojar
Abstract	Extractive keyphrase generation research has been around since the nineties, but the more advanced abstractive approach based on the encoder-decoder framework and sequence-to-sequence learning has been explored only recently. In fact, more than a dozen of abstractive methods have been proposed in the last three years, producing meaningful keyphrases and achieving state-of-the-art scores. In this survey, we examine various aspects of the extractive keyphrase generation methods and focus mostly on the more recent abstractive methods that are based on neural networks. We pay particular attention to the mechanisms that have driven the perfection of the later. A huge collection of scientific article metadata and the corresponding keyphrases is created and released for the research community. We also present various keyphrase generation and text summarization research patterns and trends of the last two decades.
Tasks	Text Summarization
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05059v1
PDF	https://arxiv.org/pdf/1910.05059v1.pdf
PWC	https://paperswithcode.com/paper/keyphrase-generation-a-multi-aspect-survey
Repo
Framework

Deep ReLU network approximation of functions on a manifold


Title	Deep ReLU network approximation of functions on a manifold
Authors	Johannes Schmidt-Hieber
Abstract	Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study a regression problem with inputs on a $d^$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension. It is shown that sparsely connected deep ReLU networks can approximate a H"older function with smoothness index $\beta$ up to error $\epsilon$ using of the order of $\epsilon^{-d^/\beta}\log(1/\epsilon)$ many non-zero network parameters. As an application, we derive statistical convergence rates for the estimator minimizing the empirical risk over all possible choices of bounded network parameters.
Tasks
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00695v1
PDF	https://arxiv.org/pdf/1908.00695v1.pdf
PWC	https://paperswithcode.com/paper/deep-relu-network-approximation-of-functions
Repo
Framework

Mocycle-GAN: Unpaired Video-to-Video Translation


Title	Mocycle-GAN: Unpaired Video-to-Video Translation
Authors	Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, Tao Mei
Abstract	Unsupervised image-to-image translation is the task of translating an image from one domain to another in the absence of any paired training examples and tends to be more applicable to practical applications. Nevertheless, the extension of such synthesis from image-to-image to video-to-video is not trivial especially when capturing spatio-temporal structures in videos. The difficulty originates from the aspect that not only the visual appearance in each frame but also motion between consecutive frames should be realistic and consistent across transformation. This motivates us to explore both appearance structure and temporal continuity in video synthesis. In this paper, we present a new Motion-guided Cycle GAN, dubbed as Mocycle-GAN, that novelly integrates motion estimation into unpaired video translator. Technically, Mocycle-GAN capitalizes on three types of constrains: adversarial constraint discriminating between synthetic and real frame, cycle consistency encouraging an inverse translation on both frame and motion, and motion translation validating the transfer of motion between consecutive frames. Extensive experiments are conducted on video-to-labels and labels-to-video translation, and superior results are reported when comparing to state-of-the-art methods. More remarkably, we qualitatively demonstrate our Mocycle-GAN for both flower-to-flower and ambient condition transfer.
Tasks	Image-to-Image Translation, Motion Estimation, Unsupervised Image-To-Image Translation
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09514v1
PDF	https://arxiv.org/pdf/1908.09514v1.pdf
PWC	https://paperswithcode.com/paper/mocycle-gan-unpaired-video-to-video
Repo
Framework

“Is this an example image?” – Predicting the Relative Abstractness Level of Image and Text


Title	“Is this an example image?” – Predicting the Relative Abstractness Level of Image and Text
Authors	Christian Otto, Sebastian Holzki, Ralph Ewerth
Abstract	Successful multimodal search and retrieval requires the automatic understanding of semantic cross-modal relations, which, however, is still an open research problem. Previous work has suggested the metrics cross-modal mutual information and semantic correlation to model and predict cross-modal semantic relations of image and text. In this paper, we present an approach to predict the (cross-modal) relative abstractness level of a given image-text pair, that is whether the image is an abstraction of the text or vice versa. For this purpose, we introduce a new metric that captures this specific relationship between image and text at the Abstractness Level (ABS). We present a deep learning approach to predict this metric, which relies on an autoencoder architecture that allows us to significantly reduce the required amount of labeled training data. A comprehensive set of publicly available scientific documents has been gathered. Experimental results on a challenging test set demonstrate the feasibility of the approach.
Tasks
Published	2019-01-23
URL	http://arxiv.org/abs/1901.07878v1
PDF	http://arxiv.org/pdf/1901.07878v1.pdf
PWC	https://paperswithcode.com/paper/is-this-an-example-image-predicting-the
Repo
Framework

Regularized Adversarial Sampling and Deep Time-aware Attention for Click-Through Rate Prediction


Title	Regularized Adversarial Sampling and Deep Time-aware Attention for Click-Through Rate Prediction
Authors	Yikai Wang, Liang Zhang, Quanyu Dai, Fuchun Sun, Bo Zhang, Yang He, Weipeng Yan, Yongjun Bao
Abstract	Improving the performance of click-through rate (CTR) prediction remains one of the core tasks in online advertising systems. With the rise of deep learning, CTR prediction models with deep networks remarkably enhance model capacities. In deep CTR models, exploiting users’ historical data is essential for learning users’ behaviors and interests. As existing CTR prediction works neglect the importance of the temporal signals when embed users’ historical clicking records, we propose a time-aware attention model which explicitly uses absolute temporal signals for expressing the users’ periodic behaviors and relative temporal signals for expressing the temporal relation between items. Besides, we propose a regularized adversarial sampling strategy for negative sampling which eases the classification imbalance of CTR data and can make use of the strong guidance provided by the observed negative CTR samples. The adversarial sampling strategy significantly improves the training efficiency, and can be co-trained with the time-aware attention model seamlessly. Experiments are conducted on real-world CTR datasets from both in-station and out-station advertising places.
Tasks	Click-Through Rate Prediction
Published	2019-11-03
URL	https://arxiv.org/abs/1911.00886v1
PDF	https://arxiv.org/pdf/1911.00886v1.pdf
PWC	https://paperswithcode.com/paper/regularized-adversarial-sampling-and-deep
Repo
Framework

Conversion Rate Prediction via Post-Click Behaviour Modeling


Title	Conversion Rate Prediction via Post-Click Behaviour Modeling
Authors	Hong Wen, Jing Zhang, Yuan Wang, Wentian Bao, Quan Lin, Keping Yang
Abstract	Effective and efficient recommendation is crucial for modern e-commerce platforms. It consists of two indispensable components named Click-Through Rate (CTR) prediction and Conversion Rate (CVR) prediction, where the latter is an essential factor contributing to the final purchasing volume. Existing methods specifically predict CVR using the clicked and purchased samples, which has limited performance affected by the well-known sample selection bias and data sparsity issues. To address these issues, we propose a novel deep CVR prediction method by considering the post-click behaviors. After grouping deterministic actions together, we construct a novel sequential path, which elaborately depicts the post-click behaviors of users. Based on the path, we define the CVR and several related probabilities including CTR, etc., and devise a deep neural network with multiple targets involved accordingly. It takes advantage of the abundant samples with deterministic labels derived from the post-click actions, leading to a significant improvement of CVR prediction. Extensive experiments on both offline and online settings demonstrate its superiority over representative state-of-the-art methods.
Tasks	Click-Through Rate Prediction
Published	2019-10-15
URL	https://arxiv.org/abs/1910.07099v1
PDF	https://arxiv.org/pdf/1910.07099v1.pdf
PWC	https://paperswithcode.com/paper/conversion-rate-prediction-via-post-click
Repo
Framework