January 29, 2020

3386 words 16 mins read

Paper Group ANR 658

AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks. Microsoft AI Challenge India 2018: Learning to Rank Passages for Web Question Answering with Deep Attention Networks. Super Interaction Neural Network. Domain Authoring Assistant for Intelligent Virtual Agents. An Empirical …

AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks


Title	AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention Networks
Authors	Jingyuan Wang, Yang Zhang, Ke Tang, Junjie Wu, Zhang Xiong
Abstract	Recent years have witnessed the successful marriage of finance innovations and AI techniques in various finance applications including quantitative trading (QT). Despite great research efforts devoted to leveraging deep learning (DL) methods for building better QT strategies, existing studies still face serious challenges especially from the side of finance, such as the balance of risk and return, the resistance to extreme loss, and the interpretability of strategies, which limit the application of DL-based strategies in real-life financial markets. In this work, we propose AlphaStock, a novel reinforcement learning (RL) based investment strategy enhanced by interpretable deep attention networks, to address the above challenges. Our main contributions are summarized as follows: i) We integrate deep attention networks with a Sharpe ratio-oriented reinforcement learning framework to achieve a risk-return balanced investment strategy; ii) We suggest modeling interrelationships among assets to avoid selection bias and develop a cross-asset attention mechanism; iii) To our best knowledge, this work is among the first to offer an interpretable investment strategy using deep reinforcement learning models. The experiments on long-periodic U.S. and Chinese markets demonstrate the effectiveness and robustness of AlphaStock over diverse market states. It turns out that AlphaStock tends to select the stocks as winners with high long-term growth, low volatility, high intrinsic value, and being undervalued recently.
Tasks	Deep Attention
Published	2019-07-24
URL	https://arxiv.org/abs/1908.02646v1
PDF	https://arxiv.org/pdf/1908.02646v1.pdf
PWC	https://paperswithcode.com/paper/alphastock-a-buying-winners-and-selling
Repo
Framework

Microsoft AI Challenge India 2018: Learning to Rank Passages for Web Question Answering with Deep Attention Networks


Title	Microsoft AI Challenge India 2018: Learning to Rank Passages for Web Question Answering with Deep Attention Networks
Authors	Chaitanya Sai Alaparthi
Abstract	This paper describes our system for The Microsoft AI Challenge India 2018: Ranking Passages for Web Question Answering. The system uses the biLSTM network with co-attention mechanism between query and passage representations. Additionally, we use self attention on embeddings to increase the lexical coverage by allowing the system to take union over different embeddings. We also incorporate hand-crafted features to improve the system performance. Our system achieved a Mean Reciprocal Rank (MRR) of 0.67 on eval-1 dataset.
Tasks	Deep Attention, Learning-To-Rank, Question Answering
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06056v1
PDF	https://arxiv.org/pdf/1906.06056v1.pdf
PWC	https://paperswithcode.com/paper/microsoft-ai-challenge-india-2018-learning-to
Repo
Framework

Super Interaction Neural Network


Title	Super Interaction Neural Network
Authors	Yang Yao, Xu Zhang, Baile Xu, Furao Shen, Jian Zhao
Abstract	Recent studies have demonstrated that the convolutional networks heavily rely on the quality and quantity of generated features. However, in lightweight networks, there are limited available feature information because these networks tend to be shallower and thinner due to the efficiency consideration. For farther improving the performance and accuracy of lightweight networks, we develop Super Interaction Neural Networks (SINet) model from a novel point of view: enhancing the information interaction in neural networks. In order to achieve information interaction along the width of the deep network, we propose Exchange Shortcut Connection, which can integrate the information from different convolution groups without any extra computation cost. And then, in order to achieve information interaction along the depth of the network, we proposed Dense Funnel Layer and Attention based Hierarchical Joint Decision, which are able to make full use of middle layer features. Our experiments show that the superior performance of SINet over other state-of-the-art lightweight models in ImageNet dataset. Furthermore, we also exhibit the effectiveness and universality of our proposed components by ablation studies.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12349v1
PDF	https://arxiv.org/pdf/1905.12349v1.pdf
PWC	https://paperswithcode.com/paper/super-interaction-neural-network
Repo
Framework

Domain Authoring Assistant for Intelligent Virtual Agents


Title	Domain Authoring Assistant for Intelligent Virtual Agents
Authors	Sepehr Janghorbani, Ashutosh Modi, Jakob Buhmann, Mubbasir Kapadia
Abstract	Developing intelligent virtual characters has attracted a lot of attention in the recent years. The process of creating such characters often involves a team of creative authors who describe different aspects of the characters in natural language, and planning experts that translate this description into a planning domain. This can be quite challenging as the team of creative authors should diligently define every aspect of the character especially if it contains complex human-like behavior. Also a team of engineers has to manually translate the natural language description of a character’s personality into the planning domain knowledge. This can be extremely time and resource demanding and can be an obstacle to author’s creativity. The goal of this paper is to introduce an authoring assistant tool to automate the process of domain generation from natural language description of virtual characters, thus bridging between the creative authoring team and the planning domain experts. Moreover, the proposed tool also identifies possible missing information in the domain description and iteratively makes suggestions to the author.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03266v1
PDF	http://arxiv.org/pdf/1904.03266v1.pdf
PWC	https://paperswithcode.com/paper/domain-authoring-assistant-for-intelligent
Repo
Framework

An Empirical Study of Generation Order for Machine Translation


Title	An Empirical Study of Generation Order for Machine Translation
Authors	William Chan, Mitchell Stern, Jamie Kiros, Jakob Uszkoreit
Abstract	In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft order-reward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of generation orders, including uninformed orders, location-based orders, frequency-based orders, content-based orders, and model-based orders. Curiously, we find that for the WMT’14 English $\to$ German translation task, order does not have a substantial impact on output quality, with unintuitive orderings such as alphabetical and shortest-first matching the performance of a standard Transformer. This demonstrates that traditional left-to-right generation is not strictly necessary to achieve high performance. On the other hand, results on the WMT’18 English $\to$ Chinese task tend to vary more widely, suggesting that translation for less well-aligned language pairs may be more sensitive to generation order.
Tasks	Machine Translation
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13437v1
PDF	https://arxiv.org/pdf/1910.13437v1.pdf
PWC	https://paperswithcode.com/paper/191013437
Repo
Framework

Cooperative Perception for 3D Object Detection in Driving Scenarios using Infrastructure Sensors


Title	Cooperative Perception for 3D Object Detection in Driving Scenarios using Infrastructure Sensors
Authors	Eduardo Arnold, Mehrdad Dianati, Robert de Temple
Abstract	The perception system of an autonomous vehicle is responsible for mapping sensor observations into a semantic description of the vehicle’s environment. 3D object detection is a common function within this system and outputs a list of 3D bounding boxes around objects of interest. Various 3D object detection methods have relied on fusion of different sensor modalities to overcome limitations of individual sensors. However, occlusion, limited field-of-view and low-point density of the sensor data cannot be reliably and cost-effectively addressed by multi-modal sensing from a single point of view. Alternatively, cooperative perception incorporates information from spatially diverse sensors distributed around the environment as a way to mitigate these limitations. This paper proposes two schemes for cooperative 3D object detection. The early fusion scheme combines point clouds from multiple spatially diverse sensing points of view before detection. In contrast, the late fusion scheme fuses the independently estimated bounding boxes from multiple spatially diverse sensors. We evaluate the performance of both schemes using a synthetic cooperative dataset created in two complex driving scenarios, a T-junction and a roundabout. The evaluation show that the early fusion approach outperforms late fusion by a significant margin at the cost of higher communication bandwidth. The results demonstrate that cooperative perception can recall more than 95% of the objects as opposed to 30% for single-point sensing in the most challenging scenario. To provide practical insights into the deployment of such system, we report how the number of sensors and their configuration impact the detection performance of the system.
Tasks	3D Object Detection, Object Detection
Published	2019-12-18
URL	https://arxiv.org/abs/1912.12147v1
PDF	https://arxiv.org/pdf/1912.12147v1.pdf
PWC	https://paperswithcode.com/paper/cooperative-perception-for-3d-object
Repo
Framework

Single-Stage Monocular 3D Object Detection with Virtual Cameras


Title	Single-Stage Monocular 3D Object Detection with Virtual Cameras
Authors	Andrea Simonelli, Samuel Rota Bulò, Lorenzo Porzi, Elisa Ricci, Peter Kontschieder
Abstract	While expensive LiDAR and stereo camera rigs have enabled the development of successful 3D object detection methods, monocular RGB-only approaches still lag significantly behind. Our work advances the state of the art by introducing MoVi-3D, a novel, single-stage deep architecture for monocular 3D object detection. At its core, MoVi-3D leverages geometrical information to generate synthetic views from virtual cameras at both, training and test time, resulting in normalized object appearance with respect to distance. Our synthetically generated views facilitate the detection task as they cut down the variability in visual appearance associated to objects placed at different distances from the camera. As a consequence, the deep model is relieved from learning depth-specific representations and its complexity can be significantly reduced. In particular we show that our proposed concept of exploiting virtual cameras enables us to set new state-of-the-art results on the popular KITTI3D benchmark using just a lightweight, single-stage architecture.
Tasks	3D Object Detection, Object Detection
Published	2019-12-17
URL	https://arxiv.org/abs/1912.08035v2
PDF	https://arxiv.org/pdf/1912.08035v2.pdf
PWC	https://paperswithcode.com/paper/single-stage-monocular-3d-object-detection
Repo
Framework

SpecNet: Spectral Domain Convolutional Neural Network


Title	SpecNet: Spectral Domain Convolutional Neural Network
Authors	Bochen Guan, Jinnian Zhang, William A. Sethares, Richard Kijowski, Fang Liu
Abstract	The memory consumption of most Convolutional Neural Network (CNN) architectures grows rapidly with increasing depth of the network, which is a major constraint for efficient network training and inference on modern GPUs with limited memory. Several studies show that the feature maps (as generated after the convolutional layers) are the main bottleneck in this memory problem. Often, these feature maps mimic natural photographs in the sense that their energy is concentrated in the spectral domain. Although embedding CNN architectures in the spectral domain is widely exploited to accelerate the training process, we demonstrate that it is also possible to use the spectral domain to reduce the memory footprint by proposing a Spectral Domain Convolutional Neural Network (SpecNet) that performs both the convolution and the activation operations in the spectral domain. SpecNet exploits a configurable threshold to force small values in the feature maps to zero, allowing the feature maps to be stored sparsely. SpecNet also employs a special activation function that preserves the sparsity of the feature maps while effectively encouraging the convergence of the network. The performance of SpecNet is evaluated on three competitive object recognition benchmark tasks (MNIST, CIFAR-10, and SVHN), and compared with four state-of-the-art implementations (LeNet, AlexNet, VGG, and DenseNet). Overall, SpecNet is able to reduce memory consumption by about 60% without significant loss of performance for all tested network architectures.
Tasks	Object Recognition
Published	2019-05-27
URL	https://arxiv.org/abs/1905.10915v4
PDF	https://arxiv.org/pdf/1905.10915v4.pdf
PWC	https://paperswithcode.com/paper/specnet-spectral-domain-convolutional-neural
Repo
Framework

Introduction to Camera Pose Estimation with Deep Learning


Title	Introduction to Camera Pose Estimation with Deep Learning
Authors	Yoli Shavit, Ron Ferens
Abstract	Over the last two decades, deep learning has transformed the field of computer vision. Deep convolutional networks were successfully applied to learn different vision tasks such as image classification, image segmentation, object detection and many more. By transferring the knowledge learned by deep models on large generic datasets, researchers were further able to create fine-tuned models for other more specific tasks. Recently this idea was applied for regressing the absolute camera pose from an RGB image. Although the resulting accuracy was sub-optimal, compared to classic feature-based solutions, this effort led to a surge of learning-based pose estimation methods. Here, we review deep learning approaches for camera pose estimation. We describe key methods in the field and identify trends aiming at improving the original deep pose regression solution. We further provide an extensive cross-comparison of existing learning-based pose estimators, together with practical notes on their execution for reproducibility purposes. Finally, we discuss emerging solutions and potential future research directions.
Tasks	Image Classification, Object Detection, Pose Estimation, Semantic Segmentation
Published	2019-07-08
URL	https://arxiv.org/abs/1907.05272v3
PDF	https://arxiv.org/pdf/1907.05272v3.pdf
PWC	https://paperswithcode.com/paper/introduction-to-camera-pose-estimation-with
Repo
Framework

Fair Meta-Learning: Learning How to Learn Fairly


Title	Fair Meta-Learning: Learning How to Learn Fairly
Authors	Dylan Slack, Sorelle Friedler, Emile Givental
Abstract	Data sets for fairness relevant tasks can lack examples or be biased according to a specific label in a sensitive attribute. We demonstrate the usefulness of weight based meta-learning approaches in such situations. For models that can be trained through gradient descent, we demonstrate that there are some parameter configurations that allow models to be optimized from a few number of gradient steps and with minimal data which are both fair and accurate. To learn such weight sets, we adapt the popular MAML algorithm to Fair-MAML by the inclusion of a fairness regularization term. In practice, Fair-MAML allows practitioners to train fair machine learning models from only a few examples when data from related tasks is available. We empirically exhibit the value of this technique by comparing to relevant baselines.
Tasks	Meta-Learning
Published	2019-11-06
URL	https://arxiv.org/abs/1911.04336v1
PDF	https://arxiv.org/pdf/1911.04336v1.pdf
PWC	https://paperswithcode.com/paper/fair-meta-learning-learning-how-to-learn
Repo
Framework

PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points


Title	PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points
Authors	Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
Abstract	Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and improve the consistencies between the 2D image plane and the 3D world coordinate. To address this challenge, we propose to adopt perspective points as a new intermediate representation for 3D object detection, defined as the 2D projections of local Manhattan 3D keypoints to locate an object; these perspective points satisfy geometric constraints imposed by the perspective projection. We further devise PerspectiveNet, an end-to-end trainable model that simultaneously detects the 2D bounding box, 2D perspective points, and 3D object bounding box for each object from a single RGB image. PerspectiveNet yields three unique advantages: (i) 3D object bounding boxes are estimated based on perspective points, bridging the gap between 2D and 3D bounding boxes without the need of category-specific 3D shape priors. (ii) It predicts the perspective points by a template-based method, and a perspective loss is formulated to maintain the perspective constraints. (iii) It maintains the consistency between the 2D perspective points and 3D bounding boxes via a differentiable projective function. Experiments on SUN RGB-D dataset show that the proposed method significantly outperforms existing RGB-based approaches for 3D object detection.
Tasks	3D Object Detection, Object Detection
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07744v1
PDF	https://arxiv.org/pdf/1912.07744v1.pdf
PWC	https://paperswithcode.com/paper/perspectivenet-3d-object-detection-from-a-1
Repo
Framework

Learning to Generate Synthetic 3D Training Data through Hybrid Gradient


Title	Learning to Generate Synthetic 3D Training Data through Hybrid Gradient
Authors	Dawei Yang, Jia Deng
Abstract	Synthetic images rendered by graphics engines are a promising source for training deep networks. However, it is challenging to ensure that they can help train a network to perform well on real images, because a graphics-based generation pipeline requires numerous design decisions such as the selection of 3D shapes and the placement of the camera. In this work, we propose a new method that optimizes the generation of 3D training data based on what we call “hybrid gradient”. We parametrize the design decisions as a real vector, and combine the approximate gradient and the analytical gradient to obtain the hybrid gradient of the network performance with respect to this vector. We evaluate our approach on the task of estimating surface normals from a single image. Experiments on standard benchmarks show that our approach can outperform the prior state of the art on optimizing the generation of 3D training data, particularly in terms of computational efficiency.
Tasks
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00267v1
PDF	https://arxiv.org/pdf/1907.00267v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-generate-synthetic-3d-training
Repo
Framework

What You See is What You Get: Exploiting Visibility for 3D Object Detection


Title	What You See is What You Get: Exploiting Visibility for 3D Object Detection
Authors	Peiyun Hu, Jason Ziglar, David Held, Deva Ramanan
Abstract	Recent advances in 3D sensing have created unique challenges for computer vision. One fundamental challenge is finding a good representation for 3D sensor data. Most popular representations (such as PointNet) are proposed in the context of processing truly 3D data (e.g. points sampled from mesh models), ignoring the fact that 3D sensored data such as a LiDAR sweep is in fact 2.5D. We argue that representing 2.5D data as collections of (x, y, z) points fundamentally destroys hidden information about freespace. In this paper, we demonstrate such knowledge can be efficiently recovered through 3D raycasting and readily incorporated into batch-based gradient learning. We describe a simple approach to augmenting voxel-based networks with visibility: we add a voxelized visibility map as an additional input stream. In addition, we show that visibility can be combined with two crucial modifications common to state-of-the-art 3D detectors: synthetic data augmentation of virtual objects and temporal aggregation of LiDAR sweeps over multiple time frames. On the NuScenes 3D detection benchmark, we show that, by adding an additional stream for visibility input, we can significantly improve the overall detection accuracy of a state-of-the-art 3D detector.
Tasks	3D Object Detection, Data Augmentation, Object Detection
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04986v2
PDF	https://arxiv.org/pdf/1912.04986v2.pdf
PWC	https://paperswithcode.com/paper/what-you-see-is-what-you-get-exploiting
Repo
Framework

Localization for Ground Robots: On Manifold Representation, Integration, Re-Parameterization, and Optimization


Title	Localization for Ground Robots: On Manifold Representation, Integration, Re-Parameterization, and Optimization
Authors	Mingming Zhang, Xingxing Zuo, Yiming Chen, Mingyang Li
Abstract	In this paper, we focus on localizing ground robots, by probabilistically fusing measurements from the wheel odometry and a monocular camera. For ground robots, the wheel odometry is widely used in localization tasks, especially in applications under planar-scene based environments. However, since the wheel odometry only provides 2D motion estimates, it is extremely challenging to use that for performing accurate full 6D pose (3D position and 3D rotation) estimation. Traditional methods on 6D localization either approximate sensor or motion models, at a cost of accuracy reduction, or rely on other sensors, e.g., inertial measurement unit (IMU), to obtain full 6D motion. By contrast, in this paper, we propose a novel probabilistic framework that is able to use the wheel odometry measurements for high-precision 6D pose estimation, in which only the wheel odometry and a monocular camera are mandatory. Specifically, we propose novel methods for i) formulating a motion manifold by parametric representation, ii) performing manifold based 6D integration with the wheel odometry measurements, and iii) re-parameterizing manifold equations periodically for error reduction. Finally, we propose a complete localization algorithm based on a manifold-assisted sliding-window estimator, fusing measurements from the wheel odometry, a monocular camera, and optionally an IMU. By extensive simulated and real-world experiments, we show that the proposed algorithm outperforms a number of state-of-the-art vision based localization algorithms by a significant margin, especially when deployed in large-scale complicated environments.
Tasks	6D Pose Estimation, Pose Estimation
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03423v2
PDF	https://arxiv.org/pdf/1909.03423v2.pdf
PWC	https://paperswithcode.com/paper/localization-for-ground-robots-on-manifold
Repo
Framework

Contextual Graph Attention for Answering Logical Queries over Incomplete Knowledge Graphs


Title	Contextual Graph Attention for Answering Logical Queries over Incomplete Knowledge Graphs
Authors	Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, Ni Lao
Abstract	Recently, several studies have explored methods for using KG embedding to answer logical queries. These approaches either treat embedding learning and query answering as two separated learning tasks, or fail to deal with the variability of contributions from different query paths. We proposed to leverage a graph attention mechanism to handle the unequal contribution of different query paths. However, commonly used graph attention assumes that the center node embedding is provided, which is unavailable in this task since the center node is to be predicted. To solve this problem we propose a multi-head attention-based end-to-end logical query answering model, called Contextual Graph Attention model(CGA), which uses an initial neighborhood aggregation layer to generate the center embedding, and the whole model is trained jointly on the original KG structure as well as the sampled query-answer pairs. We also introduce two new datasets, DB18 and WikiGeo19, which are rather large in size compared to the existing datasets and contain many more relation types, and use them to evaluate the performance of the proposed model. Our result shows that the proposed CGA with fewer learnable parameters consistently outperforms the baseline models on both datasets as well as Bio dataset.
Tasks	Knowledge Graphs
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00084v1
PDF	https://arxiv.org/pdf/1910.00084v1.pdf
PWC	https://paperswithcode.com/paper/contextual-graph-attention-for-answering
Repo
Framework