January 31, 2020

3269 words 16 mins read

Paper Group AWR 454

GRIP: Graph-based Interaction-aware Trajectory Prediction. Tweets Can Tell: Activity Recognition using Hybrid Long Short-Term Memory Model. Flow-Motion and Depth Network for Monocular Stereo and Beyond. Meta-learnt priors slow down catastrophic forgetting in neural networks. Indoor Depth Completion with Boundary Consistency and Self-Attention. Omni …

GRIP: Graph-based Interaction-aware Trajectory Prediction


Title	GRIP: Graph-based Interaction-aware Trajectory Prediction
Authors	Xin Li, Xiaowen Ying, Mooi Choo Chuah
Abstract	Nowadays, autonomous driving cars have become commercially available. However, the safety of a self-driving car is still a challenging problem that has not been well studied. Motion prediction is one of the core functions of an autonomous driving car. In this paper, we propose a novel scheme called GRIP which is designed to predict trajectories for traffic agents around an autonomous car efficiently. GRIP uses a graph to represent the interactions of close objects, applies several graph convolutional blocks to extract features, and subsequently uses an encoder-decoder long short-term memory (LSTM) model to make predictions. The experimental results on two well-known public datasets show that our proposed model improves the prediction accuracy of the state-of-the-art solution by 30%. The prediction error of GRIP is one meter shorter than existing schemes. Such an improvement can help autonomous driving cars avoid many traffic accidents. In addition, the proposed GRIP runs 5x faster than state-of-the-art schemes.
Tasks	Autonomous Driving, motion prediction, Trajectory Prediction
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07792v1
PDF	https://arxiv.org/pdf/1907.07792v1.pdf
PWC	https://paperswithcode.com/paper/grip-graph-based-interaction-aware-trajectory
Repo	https://github.com/rohanchandra30/Spectral-Trajectory-Prediction
Framework	pytorch

Tweets Can Tell: Activity Recognition using Hybrid Long Short-Term Memory Model


Title	Tweets Can Tell: Activity Recognition using Hybrid Long Short-Term Memory Model
Authors	Renhao Cui, Gagan Agrawal, Rajiv Ramnath
Abstract	This paper presents techniques to detect the “offline” activity a person is engaged in when she is tweeting (such as dining, shopping or entertainment), in order to create a dynamic profile of the user, for uses such as better targeting of advertisements. To this end, we propose a hybrid LSTM model for rich contextual learning, along with studies on the effects of applying and combining multiple LSTM based methods with different contextual features. The hybrid model is shown to outperform a set of baselines and state-of-the-art methods. Finally, this paper presents an orthogonal validation with a real-case application. Our model generates an offline activity analysis for the followers of several well-known accounts, which is quite representative of the expected characteristics of these accounts.
Tasks	Activity Recognition
Published	2019-07-10
URL	https://arxiv.org/abs/1908.02551v1
PDF	https://arxiv.org/pdf/1908.02551v1.pdf
PWC	https://paperswithcode.com/paper/tweets-can-tell-activity-recognition-using
Repo	https://github.com/renhaocui/activityExtractor
Framework	none

Flow-Motion and Depth Network for Monocular Stereo and Beyond


Title	Flow-Motion and Depth Network for Monocular Stereo and Beyond
Authors	Kaixuan Wang, Shaojie Shen
Abstract	We propose a learning-based method that solves monocular stereo and can be extended to fuse depth information from multiple target frames. Given two unconstrained images from a monocular camera with known intrinsic calibration, our network estimates relative camera poses and the depth map of the source image. The core contribution of the proposed method is threefold. First, a network is tailored for static scenes that jointly estimates the optical flow and camera motion. By the joint estimation, the optical flow search space is gradually reduced resulting in an efficient and accurate flow estimation. Second, a novel triangulation layer is proposed to encode the estimated optical flow and camera motion while avoiding common numerical issues caused by epipolar. Third, beyond two-view depth estimation, we further extend the above networks to fuse depth information from multiple target images and estimate the depth map of the source image. To further benefit the research community, we introduce tools to generate photorealistic structure-from-motion datasets such that deep networks can be well trained and evaluated. The proposed method is compared with previous methods and achieves state-of-the-art results within less time. Images from real-world applications and Google Earth are used to demonstrate the generalization ability of the method.
Tasks	Calibration, Depth Estimation, Optical Flow Estimation
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05452v1
PDF	https://arxiv.org/pdf/1909.05452v1.pdf
PWC	https://paperswithcode.com/paper/flow-motion-and-depth-network-for-monocular
Repo	https://github.com/HKUST-Aerial-Robotics/Flow-Motion-Depth
Framework	none

Meta-learnt priors slow down catastrophic forgetting in neural networks


Title	Meta-learnt priors slow down catastrophic forgetting in neural networks
Authors	Giacomo Spigler
Abstract	Current training regimes for deep learning usually involve exposure to a single task / dataset at a time. Here we start from the observation that in this context the trained model is not given any knowledge of anything outside its (single-task) training distribution, and has thus no way to learn parameters (i.e., feature detectors or policies) that could be helpful to solve other tasks, and to limit future interference with the acquired knowledge, and thus catastrophic forgetting. Here we show that catastrophic forgetting can be mitigated in a meta-learning context, by exposing a neural network to multiple tasks in a sequential manner during training. Finally, we present SeqFOMAML, a meta-learning algorithm that implements these principles, and we evaluate it on sequential learning problems composed by Omniglot and MiniImageNet classification tasks.
Tasks	Meta-Learning, Omniglot
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04170v2
PDF	https://arxiv.org/pdf/1909.04170v2.pdf
PWC	https://paperswithcode.com/paper/meta-learnt-priors-slow-down-catastrophic
Repo	https://github.com/spiglerg/pyMeta
Framework	tf

Indoor Depth Completion with Boundary Consistency and Self-Attention


Title	Indoor Depth Completion with Boundary Consistency and Self-Attention
Authors	Yu-Kai Huang, Tsung-Han Wu, Yueh-Cheng Liu, Winston H. Hsu
Abstract	Depth estimation features are helpful for 3D recognition. Commodity-grade depth cameras are able to capture depth and color image in real-time. However, glossy, transparent or distant surface cannot be scanned properly by the sensor. As a result, enhancement and restoration from sensing depth is an important task. Depth completion aims at filling the holes that sensors fail to detect, which is still a complex task for machine to learn. Traditional hand-tuned methods have reached their limits, while neural network based methods tend to copy and interpolate the output from surrounding depth values. This leads to blurred boundaries, and structures of the depth map are lost. Consequently, our main work is to design an end-to-end network improving completion depth maps while maintaining edge clarity. We utilize self-attention mechanism, previously used in image inpainting fields, to extract more useful information in each layer of convolution so that the complete depth map is enhanced. In addition, we propose boundary consistency concept to enhance the depth map quality and structure. Experimental results validate the effectiveness of our self-attention and boundary consistency schema, which outperforms previous state-of-the-art depth completion work on Matterport3D dataset. Our code is publicly available at https://github.com/patrickwu2/Depth-Completion
Tasks	Depth Completion, Depth Estimation, Image Inpainting
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08344v2
PDF	https://arxiv.org/pdf/1908.08344v2.pdf
PWC	https://paperswithcode.com/paper/indoor-depth-completion-with-boundary
Repo	https://github.com/patrickwu2/Depth-Completion
Framework	pytorch

OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching


Title	OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching
Authors	Changhee Won, Jongbin Ryu, Jongwoo Lim
Abstract	In this paper, we propose a novel end-to-end deep neural network model for omnidirectional depth estimation from a wide-baseline multi-view stereo setup. The images captured with ultra wide field-of-view (FOV) cameras on an omnidirectional rig are processed by the feature extraction module, and then the deep feature maps are warped onto the concentric spheres swept through all candidate depths using the calibrated camera parameters. The 3D encoder-decoder block takes the aligned feature volume to produce the omnidirectional depth estimate with regularization on uncertain regions utilizing the global context information. In addition, we present large-scale synthetic datasets for training and testing omnidirectional multi-view stereo algorithms. Our datasets consist of 11K ground-truth depth maps and 45K fisheye images in four orthogonal directions with various objects and environments. Experimental results show that the proposed method generates excellent results in both synthetic and real-world environments, and it outperforms the prior art and the omnidirectional versions of the state-of-the-art conventional stereo algorithms.
Tasks	Depth Estimation, Stereo Matching, Stereo Matching Hand
Published	2019-08-17
URL	https://arxiv.org/abs/1908.06257v1
PDF	https://arxiv.org/pdf/1908.06257v1.pdf
PWC	https://paperswithcode.com/paper/omnimvs-end-to-end-learning-for
Repo	https://github.com/matsuren/omnimvs_pytorch
Framework	pytorch

Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations


Title	Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations
Authors	Lily Xu, Shahrzad Gholami, Sara Mc Carthy, Bistra Dilkina, Andrew Plumptre, Milind Tambe, Rohit Singh, Mustapha Nsubuga, Joshua Mabonga, Margaret Driciru, Fred Wanyama, Aggrey Rwetsiba, Tom Okello, Eric Enyel
Abstract	Illegal wildlife poaching threatens ecosystems and drives endangered species toward extinction. However, efforts for wildlife protection are constrained by the limited resources of law enforcement agencies. To help combat poaching, the Protection Assistant for Wildlife Security (PAWS) is a machine learning pipeline that has been developed as a data-driven approach to identify areas at high risk of poaching throughout protected areas and compute optimal patrol routes. In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching. In doing so, we address challenges including extreme class imbalance (up to 1:200), bias, and uncertainty in wildlife poaching data to enhance PAWS, and we apply our methodology to three national parks with diverse characteristics. (i) We use Gaussian processes to quantify predictive uncertainty, which we exploit to improve robustness of our prescribed patrols and increase detection of snares by an average of 30%. We evaluate our approach on real-world historical poaching data from Murchison Falls and Queen Elizabeth National Parks in Uganda and, for the first time, Srepok Wildlife Sanctuary in Cambodia. (ii) We present the results of large-scale field tests conducted in Murchison Falls and Srepok Wildlife Sanctuary which confirm that the predictive power of PAWS extends promisingly to multiple parks. This paper is part of an effort to expand PAWS to 800 parks around the world through integration with SMART conservation software.
Tasks	Gaussian Processes
Published	2019-03-08
URL	https://arxiv.org/abs/1903.06669v3
PDF	https://arxiv.org/pdf/1903.06669v3.pdf
PWC	https://paperswithcode.com/paper/stay-ahead-of-poachers-illegal-wildlife
Repo	https://github.com/lily-x/PAWS-public
Framework	none

TEASER: Early and Accurate Time Series Classification


Title	TEASER: Early and Accurate Time Series Classification
Authors	P. Schäfer, U. Leser
Abstract	Early time series classification (eTSC) is the problem of classifying a time series after as few measurements as possible with the highest possible accuracy. The most critical issue of any eTSC method is to decide when enough data of a time series has been seen to take a decision: Waiting for more data points usually makes the classification problem easier but delays the time in which a classification is made; in contrast, earlier classification has to cope with less input data, often leading to inferior accuracy. The state-of-the-art eTSC methods compute a fixed optimal decision time assuming that every times series has the same defined start time (like turning on a machine). However, in many real-life applications measurements start at arbitrary times (like measuring heartbeats of a patient), implying that the best time for taking a decision varies heavily between time series. We present TEASER, a novel algorithm that models eTSC as a two two-tier classification problem: In the first tier, a classifier periodically assesses the incoming time series to compute class probabilities. However, these class probabilities are only used as output label if a second-tier classifier decides that the predicted label is reliable enough, which can happen after a different number of measurements. In an evaluation using 45 benchmark datasets, TEASER is two to three times earlier at predictions than its competitors while reaching the same or an even higher classification accuracy. We further show TEASER’s superior performance using real-life use cases, namely energy monitoring, and gait detection.
Tasks	Time Series, Time Series Classification
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03405v2
PDF	https://arxiv.org/pdf/1908.03405v2.pdf
PWC	https://paperswithcode.com/paper/teaser-early-and-accurate-time-series
Repo	https://github.com/patrickzib/SFA
Framework	tf

Simplify the Usage of Lexicon in Chinese NER


Title	Simplify the Usage of Lexicon in Chinese NER
Authors	Minlong Peng, Ruotian Ma, Qi Zhang, Xuanjing Huang
Abstract	Recently, many works have tried to utilizing word lexicon to augment the performance of Chinese named entity recognition (NER). As a representative work in this line, Lattice-LSTM \cite{zhang2018chinese} has achieved new state-of-the-art performance on several benchmark Chinese NER datasets. However, Lattice-LSTM suffers from a complicated model architecture, resulting in low computational efficiency. This will heavily limit its application in many industrial areas, which require real-time NER response. In this work, we ask the question: if we can simplify the usage of lexicon and, at the same time, achieve comparative performance with Lattice-LSTM for Chinese NER? Started with this question and motivated by the idea of Lattice-LSTM, we propose a concise but effective method to incorporate the lexicon information into the vector representations of characters. This way, our method can avoid introducing a complicated sequence modeling architecture to model the lexicon information. Instead, it only needs to subtly adjust the character representation layer of the neural sequence model. Experimental study on four benchmark Chinese NER datasets shows that our method can achieve much faster inference speed, comparative or better performance over Lattice-LSTM and its follwees. It also shows that our method can be easily transferred across difference neural architectures.
Tasks	Chinese Named Entity Recognition, Named Entity Recognition
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05969v1
PDF	https://arxiv.org/pdf/1908.05969v1.pdf
PWC	https://paperswithcode.com/paper/simplify-the-usage-of-lexicon-in-chinese-ner
Repo	https://github.com/v-mipeng/LexiconAugmentedNER
Framework	pytorch

NeuronBlocks: Building Your NLP DNN Models Like Playing Lego


Title	NeuronBlocks: Building Your NLP DNN Models Like Playing Lego
Authors	Ming Gong, Linjun Shou, Wutao Lin, Zhijie Sang, Quanjia Yan, Ze Yang, Feixiang Cheng, Daxin Jiang
Abstract	Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks. However, many engineers find it a big overhead when they have to choose from multiple frameworks, compare different types of models, and understand various optimization mechanisms. An NLP toolkit for DNN models with both generality and flexibility can greatly improve the productivity of engineers by saving their learning cost and guiding them to find optimal solutions to their tasks. In this paper, we introduce NeuronBlocks\footnote{Code: \url{https://github.com/Microsoft/NeuronBlocks}} \footnote{Demo: \url{https://youtu.be/x6cOpVSZcdo}}, a toolkit encapsulating a suite of neural network modules as building blocks to construct various DNN models with complex architecture. This toolkit empowers engineers to build, train, and test various NLP models through simple configuration of JSON files. The experiments on several NLP datasets such as GLUE, WikiQA and CoNLL-2003 demonstrate the effectiveness of NeuronBlocks.
Tasks
Published	2019-04-21
URL	https://arxiv.org/abs/1904.09535v3
PDF	https://arxiv.org/pdf/1904.09535v3.pdf
PWC	https://paperswithcode.com/paper/neuronblocks-building-your-nlp-dnn-models
Repo	https://github.com/Microsoft/NeuronBlocks
Framework	pytorch

Towards Improving Solution Dominance with Incomparability Conditions: A case-study using Generator Itemset Mining


Title	Towards Improving Solution Dominance with Incomparability Conditions: A case-study using Generator Itemset Mining
Authors	Gökberk Koçak, Özgür Akgün, Tias Guns, Ian Miguel
Abstract	Finding interesting patterns is a challenging task in data mining. Constraint based mining is a well-known approach to this, and one for which constraint programming has been shown to be a well-suited and generic framework. Dominance programming has been proposed as an extension that can capture an even wider class of constraint-based mining problems, by allowing to compare relations between patterns. In this paper, in addition to specifying a dominance relation, we introduce the ability to specify an incomparability condition. Using these two concepts we devise a generic framework that can do a batch-wise search that avoids checking incomparable solutions. We extend the ESSENCE language and underlying modelling pipeline to support this. We use generator itemset mining problem as a test case and give a declarative specification for that. We also present preliminary experimental results on this specific problem class with a CP solver backend to show that using the incomparability condition during search can improve the efficiency of dominance programming and reduces the need for post-processing to filter dominated solutions.
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00505v1
PDF	https://arxiv.org/pdf/1910.00505v1.pdf
PWC	https://paperswithcode.com/paper/towards-improving-solution-dominance-with
Repo	https://github.com/stacs-cp/ModRef2019-Dominance
Framework	none

Dual-modality seq2seq network for audio-visual event localization


Title	Dual-modality seq2seq network for audio-visual event localization
Authors	Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang
Abstract	Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level). To address this task, we pro-pose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking bothaudio and visual features at each time segment as inputs, ourproposed model learns global and local event information ina sequence to sequence manner, which can be realized in ei-ther fully supervised or weakly supervised settings. Empiricalresults confirm that our proposed method performs favorablyagainst recent deep learning approaches in both settings.
Tasks
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07473v1
PDF	http://arxiv.org/pdf/1902.07473v1.pdf
PWC	https://paperswithcode.com/paper/dual-modality-seq2seq-network-for-audio
Repo	https://github.com/YapengTian/AVE-ECCV18
Framework	pytorch

Sobolev Independence Criterion


Title	Sobolev Independence Criterion
Authors	Youssef Mroueh, Tom Sercu, Mattia Rigotti, Inkit Padhi, Cicero Dos Santos
Abstract	We propose the Sobolev Independence Criterion (SIC), an interpretable dependency measure between a high dimensional random variable X and a response variable Y . SIC decomposes to the sum of feature importance scores and hence can be used for nonlinear feature selection. SIC can be seen as a gradient regularized Integral Probability Metric (IPM) between the joint distribution of the two random variables and the product of their marginals. We use sparsity inducing gradient penalties to promote input sparsity of the critic of the IPM. In the kernel version we show that SIC can be cast as a convex optimization problem by introducing auxiliary variables that play an important role in feature selection as they are normalized feature importance scores. We then present a neural version of SIC where the critic is parameterized as a homogeneous neural network, improving its representation power as well as its interpretability. We conduct experiments validating SIC for feature selection in synthetic and real-world experiments. We show that SIC enables reliable and interpretable discoveries, when used in conjunction with the holdout randomization test and knockoffs to control the False Discovery Rate. Code is available at http://github.com/ibm/sic.
Tasks	Feature Importance, Feature Selection
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14212v1
PDF	https://arxiv.org/pdf/1910.14212v1.pdf
PWC	https://paperswithcode.com/paper/sobolev-independence-criterion
Repo	https://github.com/IBM/SIC
Framework	pytorch

FCC-GAN: A Fully Connected and Convolutional Net Architecture for GANs


Title	FCC-GAN: A Fully Connected and Convolutional Net Architecture for GANs
Authors	Sukarna Barua, Sarah Monazam Erfani, James Bailey
Abstract	Generative Adversarial Networks (GANs) are a powerful class of generative models. Despite their successes, the most appropriate choice of a GAN network architecture is still not well understood. GAN models for image synthesis have adopted a deep convolutional network architecture, which eliminates or minimizes the use of fully connected and pooling layers in favor of convolution layers in the generator and discriminator of GANs. In this paper, we demonstrate that a convolution network architecture utilizing deep fully connected layers and pooling layers can be more effective than the traditional convolution-only architecture, and we propose FCC-GAN, a fully connected and convolutional GAN architecture. Models based on our FCC-GAN architecture learn both faster than the conventional architecture and also generate higher quality of samples. We demonstrate the effectiveness and stability of our approach across four popular image datasets.
Tasks	Image Generation
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02417v2
PDF	https://arxiv.org/pdf/1905.02417v2.pdf
PWC	https://paperswithcode.com/paper/fcc-gan-a-fully-connected-and-convolutional
Repo	https://github.com/sukarnabarua/fccgan
Framework	tf

Modeling Semantic Compositionality with Sememe Knowledge


Title	Modeling Semantic Compositionality with Sememe Knowledge
Authors	Fanchao Qi, Junjie Huang, Chenghao Yang, Zhiyuan Liu, Xiao Chen, Qun Liu, Maosong Sun
Abstract	Semantic compositionality (SC) refers to the phenomenon that the meaning of a complex linguistic unit can be composed of the meanings of its constituents. Most related works focus on using complicated compositionality functions to model SC while few works consider external knowledge in models. In this paper, we verify the effectiveness of sememes, the minimum semantic units of human languages, in modeling SC by a confirmatory experiment. Furthermore, we make the first attempt to incorporate sememe knowledge into SC models, and employ the sememeincorporated models in learning representations of multiword expressions, a typical task of SC. In experiments, we implement our models by incorporating knowledge from a famous sememe knowledge base HowNet and perform both intrinsic and extrinsic evaluations. Experimental results show that our models achieve significant performance boost as compared to the baseline methods without considering sememe knowledge. We further conduct quantitative analysis and case studies to demonstrate the effectiveness of applying sememe knowledge in modeling SC. All the code and data of this paper can be obtained on https://github.com/thunlp/Sememe-SC.
Tasks	multi-word expression embedding, multi-word expression sememe prediction
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04744v1
PDF	https://arxiv.org/pdf/1907.04744v1.pdf
PWC	https://paperswithcode.com/paper/modeling-semantic-compositionality-with
Repo	https://github.com/thunlp/Sememe-SC
Framework	tf