October 16, 2019

3220 words 16 mins read

Paper Group ANR 1074

A New Optimization Layer for Real-Time Bidding Advertising Campaigns. STS Classification with Dual-stream CNN. Jointly Localizing and Describing Events for Dense Video Captioning. C3PO: Database and Benchmark for Early-stage Malicious Activity Detection in 3D Printing. Spatial Morphing Kernel Regression For Feature Interpolation. Simplifying Senten …

A New Optimization Layer for Real-Time Bidding Advertising Campaigns


Title	A New Optimization Layer for Real-Time Bidding Advertising Campaigns
Authors	Gianluca Micchi, Saeid Soheily-Khah, Jacob Turner
Abstract	While it is relatively easy to start an online advertising campaign, obtaining a high Key Performance Indicator (KPI) can be challenging. A large body of work on this subject has already been performed and platforms known as DSPs are available on the market that deal with such an optimization. From the advertiser’s point of view, each DSP is a different black box, with its pros and cons, that needs to be configured. In order to take advantage of the pros of every DSP, advertisers are well-advised to use a combination of them when setting up their campaigns. In this paper, we propose an algorithm for advertisers to add an optimization layer on top of DSPs. The algorithm we introduce, called SKOTT, maximizes the chosen KPI by optimally configuring the DSPs and putting them in competition with each other. SKOTT is a highly specialized iterative algorithm loosely based on gradient descent that is made up of three independent sub-routines, each dealing with a different problem: partitioning the budget, setting the desired average bid, and preventing under-delivery. In particular, one of the novelties of our approach lies in our taking the perspective of the advertisers rather than the DSPs. Synthetic market data is used to evaluate the efficiency of SKOTT against other state-of-the-art approaches adapted from similar problems. The results illustrate the benefits of our proposals, which greatly outperforms the other methods.
Tasks
Published	2018-08-07
URL	http://arxiv.org/abs/1808.03147v1
PDF	http://arxiv.org/pdf/1808.03147v1.pdf
PWC	https://paperswithcode.com/paper/a-new-optimization-layer-for-real-time
Repo
Framework

STS Classification with Dual-stream CNN


Title	STS Classification with Dual-stream CNN
Authors	Shuchen Weng, Wenbo Li, Yi Zhang, Siwei Lyu
Abstract	The structured time series (STS) classification problem requires the modeling of interweaved spatiotemporal dependency. most previous STS classification methods model the spatial and temporal dependencies independently. Due to the complexity of the STS data, we argue that a desirable STS classification method should be a holistic framework that can be made as adaptive and flexible as possible. This motivates us to design a deep neural network with such merits. Inspired by the dual-stream hypothesis in neural science, we propose a novel dual-stream framework for modeling the interweaved spatiotemporal dependency, and develop a convolutional neural network within this framework that aims to achieve high adaptability and flexibility in STS configurations from various diagonals, i.e., sequential order, dependency range and features. The proposed architecture is highly modularized and scalable, making it easy to be adapted to specific tasks. The effectiveness of our model is demonstrated through experiments on synthetic data as well as benchmark datasets for skeleton based activity recognition.
Tasks	Activity Recognition, Time Series
Published	2018-05-20
URL	http://arxiv.org/abs/1805.07740v1
PDF	http://arxiv.org/pdf/1805.07740v1.pdf
PWC	https://paperswithcode.com/paper/sts-classification-with-dual-stream-cnn
Repo
Framework

Jointly Localizing and Describing Events for Dense Video Captioning


Title	Jointly Localizing and Describing Events for Dense Video Captioning
Authors	Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei
Abstract	Automatically describing a video with natural language is regarded as a fundamental challenge in computer vision. The problem nevertheless is not trivial especially when a video contains multiple events to be worthy of mention, which often happens in real videos. A valid question is how to temporally localize and then describe events, which is known as “dense video captioning.” In this paper, we present a novel framework for dense video captioning that unifies the localization of temporal event proposals and sentence generation of each proposal, by jointly training them in an end-to-end manner. To combine these two worlds, we integrate a new design, namely descriptiveness regression, into a single shot detection structure to infer the descriptive complexity of each detected proposal via sentence generation. This in turn adjusts the temporal locations of each event proposal. Our model differs from existing dense video captioning methods since we propose a joint and global optimization of detection and captioning, and the framework uniquely capitalizes on an attribute-augmented video captioning architecture. Extensive experiments are conducted on ActivityNet Captions dataset and our framework shows clear improvements when compared to the state-of-the-art techniques. More remarkably, we obtain a new record: METEOR of 12.96% on ActivityNet Captions official test set.
Tasks	Dense Video Captioning, Video Captioning
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08274v1
PDF	http://arxiv.org/pdf/1804.08274v1.pdf
PWC	https://paperswithcode.com/paper/jointly-localizing-and-describing-events-for
Repo
Framework

C3PO: Database and Benchmark for Early-stage Malicious Activity Detection in 3D Printing


Title	C3PO: Database and Benchmark for Early-stage Malicious Activity Detection in 3D Printing
Authors	Zhe Li, Xiaolong Ma, Hongjia Li, Qiyuan An, Aditya Singh Rathore, Qinru Qiu, Wenyao Xu, Yanzhi Wang
Abstract	Increasing malicious users have sought practices to leverage 3D printing technology to produce unlawful tools in criminal activities. Current regulations are inadequate to deal with the rapid growth of 3D printers. It is of vital importance to enable 3D printers to identify the objects to be printed, so that the manufacturing procedure of an illegal weapon can be terminated at the early stage. Deep learning yields significant rises in performance in the object recognition tasks. However, the lack of large-scale databases in 3D printing domain stalls the advancement of automatic illegal weapon recognition. This paper presents a new 3D printing image database, namely C3PO, which compromises two subsets for the different system working scenarios. We extract images from the numerical control programming code files of 22 3D models, and then categorize the images into 10 distinct labels. The first set consists of 62,200 images which represent the object projections on the three planes in a Cartesian coordinate system. And the second sets consists of sequences of total 671,677 images to simulate the cameras’ captures of the printed objects. Importantly, we demonstrate that the weapons can be recognized in either scenario using deep learning based approaches using our proposed database. % We also use the trained deep models to build a prototype of object-aware 3D printer. The quantitative results are promising, and the future exploration of the database and the crime prevention in 3D printing are demanding tasks.
Tasks	Action Detection, Activity Detection, Object Recognition
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07544v2
PDF	http://arxiv.org/pdf/1803.07544v2.pdf
PWC	https://paperswithcode.com/paper/c3po-database-and-benchmark-for-early-stage
Repo
Framework

Spatial Morphing Kernel Regression For Feature Interpolation


Title	Spatial Morphing Kernel Regression For Feature Interpolation
Authors	Xueqing Deng, Yi Zhu, Shawn Newsam
Abstract	In recent years, geotagged social media has become popular as a novel source for geographic knowledge discovery. Ground-level images and videos provide a different perspective than overhead imagery and can be applied to a range of applications such as land use mapping, activity detection, pollution mapping, etc. The sparse and uneven distribution of this data presents a problem, however, for generating dense maps. We therefore investigate the problem of spatially interpolating the high-dimensional features extracted from sparse social media to enable dense labeling using standard classifiers. Further, we show how prior knowledge about region boundaries can be used to improve the interpolation through spatial morphing kernel regression. We show that an interpolate-then-classify framework can produce dense maps from sparse observations but that care must be taken in choosing the interpolation method. We also show that the spatial morphing kernel improves the results.
Tasks	Action Detection, Activity Detection
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07452v2
PDF	http://arxiv.org/pdf/1802.07452v2.pdf
PWC	https://paperswithcode.com/paper/spatial-morphing-kernel-regression-for
Repo
Framework

Simplifying Sentences with Sequence to Sequence Models


Title	Simplifying Sentences with Sequence to Sequence Models
Authors	Alexander Mathews, Lexing Xie, Xuming He
Abstract	We simplify sentences with an attentive neural network sequence to sequence model, dubbed S4. The model includes a novel word-copy mechanism and loss function to exploit linguistic similarities between the original and simplified sentences. It also jointly uses pre-trained and fine-tuned word embeddings to capture the semantics of complex sentences and to mitigate the effects of limited data. When trained and evaluated on pairs of sentences from thousands of news articles, we observe a 8.8 point improvement in BLEU score over a sequence to sequence baseline; however, learning word substitutions remains difficult. Such sequence to sequence models are promising for other text generation tasks such as style transfer.
Tasks	Style Transfer, Text Generation, Word Embeddings
Published	2018-05-15
URL	http://arxiv.org/abs/1805.05557v1
PDF	http://arxiv.org/pdf/1805.05557v1.pdf
PWC	https://paperswithcode.com/paper/simplifying-sentences-with-sequence-to
Repo
Framework

Graph Convolutional Neural Networks for Web-Scale Recommender Systems


Title	Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Authors	Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, Jure Leskovec
Abstract	Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains a challenge. Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a data-efficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable deep learning and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.
Tasks	Recommendation Systems
Published	2018-06-06
URL	http://arxiv.org/abs/1806.01973v1
PDF	http://arxiv.org/pdf/1806.01973v1.pdf
PWC	https://paperswithcode.com/paper/graph-convolutional-neural-networks-for-web
Repo
Framework

Artificial intelligence and pediatrics: A synthetic mini review


Title	Artificial intelligence and pediatrics: A synthetic mini review
Authors	Peter Kokol, Jernej Završnik, Helena Blažun Vošner
Abstract	The use of artificial intelligence intelligencein medicine can be traced back to 1968 when Paycha published his paper Le diagnostic a l’aide d’intelligences artificielle, presentation de la premiere machine diagnostri. Few years later Shortliffe et al. presented an expert system named Mycin which was able to identify bacteria causing severe blood infections and to recommend antibiotics. Despite the fact that Mycin outperformed members of the Stanford medical school in the reliability of diagnosis it was never used in practice due to a legal issue who do you sue if it gives a wrong diagnosis?. However only in 2016 when the artificial intelligence software built into the IBM Watson AI platform correctly diagnosed and proposed an effective treatment for a 60-year-old womans rare form of leukemia the AI use in medicine become really popular.On of first papers presenting the use of AI in paediatrics was published in 1984. The paper introduced a computer-assisted medical decision making system called SHELP.
Tasks	Decision Making
Published	2018-02-16
URL	http://arxiv.org/abs/1802.06068v1
PDF	http://arxiv.org/pdf/1802.06068v1.pdf
PWC	https://paperswithcode.com/paper/artificial-intelligence-and-pediatrics-a
Repo
Framework

An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path


Title	An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path
Authors	Ajin George Joseph, Shalabh Bhatnagar
Abstract	In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which maximizes the value function, i.e., the long run discounted reward of the MDP. The current settings also assume access to a generative model of the MDP with the hidden premise that observations of the system behaviour in the form of sample trajectories can be obtained with ease from the model. In this paper, we consider a modified version, where the cost function is the expectation of a non-convex function of the value function without access to the generative model. Rather, we assume that a sample trajectory generated using a priori chosen behaviour policy is made available. In this restricted setting, we solve the modified control problem in its true sense, i.e., to find the best possible policy given this limited information. We propose a stochastic approximation algorithm based on the well-known cross entropy method which is data (sample trajectory) efficient, stable, robust as well as computationally and storage efficient. We provide a proof of convergence of our algorithm to a policy which is globally optimal relative to the behaviour policy. We also present experimental results to corroborate our claims and we demonstrate the superiority of the solution produced by our algorithm compared to the state-of-the-art algorithms under appropriately chosen behaviour policy.
Tasks
Published	2018-01-31
URL	http://arxiv.org/abs/1801.10287v1
PDF	http://arxiv.org/pdf/1801.10287v1.pdf
PWC	https://paperswithcode.com/paper/an-incremental-off-policy-search-in-a-model
Repo
Framework

Thompson Sampling for Pursuit-Evasion Problems


Title	Thompson Sampling for Pursuit-Evasion Problems
Authors	Zhen Li, Nicholas J. Meyer, Eric B. Laber, Robert Brigantic
Abstract	Pursuit-evasion is a multi-agent sequential decision problem wherein a group of agents known as pursuers coordinate their traversal of a spatial domain to locate an agent trying to evade them. Pursuit evasion problems arise in a number of import application domains including defense and route planning. Learning to optimally coordinate pursuer behaviors so as to minimize time to capture of the evader is challenging because of a large action space and sparse noisy state information; consequently, previous approaches have relied primarily on heuristics. We propose a variant of Thompson Sampling for pursuit-evasion that allows for the application of existing model-based planning algorithms. This approach is general in that it allows for an arbitrary number of pursuers, a general spatial domain, and the integration of auxiliary information provided by informants. In a suite of simulation experiments, Thompson Sampling for pursuit evasion significantly reduces time-to-capture relative to competing algorithms.
Tasks
Published	2018-11-11
URL	http://arxiv.org/abs/1811.04471v1
PDF	http://arxiv.org/pdf/1811.04471v1.pdf
PWC	https://paperswithcode.com/paper/thompson-sampling-for-pursuit-evasion
Repo
Framework

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning


Title	Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Authors	Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu
Abstract	Dense video captioning is a newly emerging task that aims at both localizing and describing all events in a video. We identify and tackle two challenges on this task, namely, (1) how to utilize both past and future contexts for accurate event proposal predictions, and (2) how to construct informative input to the decoder for generating natural event descriptions. First, previous works predominantly generate temporal event proposals in the forward direction, which neglects future video context. We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions. Second, different events ending at (nearly) the same time are indistinguishable in the previous works, resulting in the same captions. We solve this problem by representing each event with an attentive fusion of hidden states from the proposal module and video contents (e.g., C3D features). We further propose a novel context gating mechanism to balance the contributions from the current event and its surrounding contexts dynamically. We empirically show that our attentively fused event representation is superior to the proposal hidden states or video contents alone. By coupling proposal and captioning modules into one unified framework, our model outperforms the state-of-the-arts on the ActivityNet Captions dataset with a relative gain of over 100% (Meteor score increases from 4.82 to 9.65).
Tasks	Dense Video Captioning, Video Captioning
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00100v2
PDF	http://arxiv.org/pdf/1804.00100v2.pdf
PWC	https://paperswithcode.com/paper/bidirectional-attentive-fusion-with-context
Repo
Framework

Reconstruction Network for Video Captioning


Title	Reconstruction Network for Video Captioning
Authors	Bairui Wang, Lin Ma, Wei Zhang, Wei Liu
Abstract	In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (sentence to video) flows for video captioning. Specifically, the encoder-decoder makes use of the forward flow to produce the sentence description based on the encoded video semantic features. Two types of reconstructors are customized to employ the backward flow and reproduce the video features based on the hidden state sequence generated by the decoder. The generation loss yielded by the encoder-decoder and the reconstruction loss introduced by the reconstructor are jointly drawn into training the proposed RecNet in an end-to-end fashion. Experimental results on benchmark datasets demonstrate that the proposed reconstructor can boost the encoder-decoder models and leads to significant gains in video caption accuracy.
Tasks	Video Captioning
Published	2018-03-30
URL	http://arxiv.org/abs/1803.11438v1
PDF	http://arxiv.org/pdf/1803.11438v1.pdf
PWC	https://paperswithcode.com/paper/reconstruction-network-for-video-captioning
Repo
Framework

Ablation of a Robot’s Brain: Neural Networks Under a Knife


Title	Ablation of a Robot’s Brain: Neural Networks Under a Knife
Authors	Peter E. Lillian, Richard Meyes, Tobias Meisen
Abstract	It is still not fully understood exactly how neural networks are able to solve the complex tasks that have recently pushed AI research forward. We present a novel method for determining how information is structured inside a neural network. Using ablation (a neuroscience technique for cutting away parts of a brain to determine their function), we approach several neural network architectures from a biological perspective. Through an analysis of this method’s results, we examine important similarities between biological and artificial neural networks to search for the implicit knowledge locked away in the network’s weights.
Tasks
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05687v2
PDF	http://arxiv.org/pdf/1812.05687v2.pdf
PWC	https://paperswithcode.com/paper/ablation-of-a-robots-brain-neural-networks
Repo
Framework

Image Semantic Transformation: Faster, Lighter and Stronger


Title	Image Semantic Transformation: Faster, Lighter and Stronger
Authors	Dasong Li, Jianbo Wang
Abstract	We propose Image-Semantic-Transformation-Reconstruction-Circle(ISTRC) model, a novel and powerful method using facenet’s Euclidean latent space to understand the images. As the name suggests, ISTRC construct the circle, able to perfectly reconstruct images. One powerful Euclidean latent space embedded in ISTRC is FaceNet’s last layer with the power of distinguishing and understanding images. Our model will reconstruct the images and manipulate Euclidean latent vectors to achieve semantic transformations and semantic images arthimetic calculations. In this paper, we show that ISTRC performs 10 high-level semantic transformations like “Male and female”,“add smile”,“open mouth”, “deduct beard or add mustache”, “bigger/smaller nose”, “make older and younger”, “bigger lips”, “bigger eyes”, “bigger/smaller mouths” and “more attractive”. It just takes 3 hours(GTX 1080) to train the models of 10 semantic transformations.
Tasks
Published	2018-03-27
URL	http://arxiv.org/abs/1803.09932v1
PDF	http://arxiv.org/pdf/1803.09932v1.pdf
PWC	https://paperswithcode.com/paper/image-semantic-transformation-faster-lighter
Repo
Framework

Paraphrase-Supervised Models of Compositionality


Title	Paraphrase-Supervised Models of Compositionality
Authors	Avneesh Saluja, Chris Dyer, Jean-David Ruvini
Abstract	Compositional vector space models of meaning promise new solutions to stubborn language understanding problems. This paper makes two contributions toward this end: (i) it uses automatically-extracted paraphrase examples as a source of supervision for training compositional models, replacing previous work which relied on manual annotations used for the same purpose, and (ii) develops a context-aware model for scoring phrasal compositionality. Experimental results indicate that these multiple sources of information can be used to learn partial semantic supervision that matches previous techniques in intrinsic evaluation tasks. Our approaches are also evaluated for their impact on a machine translation system where we show improvements in translation quality, demonstrating that compositionality in interpretation correlates with compositionality in translation.
Tasks	Machine Translation
Published	2018-01-31
URL	http://arxiv.org/abs/1801.10293v1
PDF	http://arxiv.org/pdf/1801.10293v1.pdf
PWC	https://paperswithcode.com/paper/paraphrase-supervised-models-of
Repo
Framework