January 31, 2020

3089 words 15 mins read

Paper Group AWR 430

From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation. Training-Time-Friendly Network for Real-Time Object Detection. Event Representation Learning Enhanced with External Commonsense Knowledge. Hierarchical Optimal Transport for Document Representation. Attention-based Context Aggregation Network for Monocular Depth Es …

From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation


Title	From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation
Authors	Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, Il Hong Suh
Abstract	Estimating accurate depth from a single image is challenging because it is an ill-posed problem as infinitely many 3D scenes can be projected to the same 2D scene. However, recent works based on deep convolutional neural networks show great progress with plausible results. The convolutional neural networks are generally composed of two parts: an encoder for dense feature extraction and a decoder for predicting the desired depth. In the encoder-decoder schemes, repeated strided convolution and spatial pooling layers lower the spatial resolution of transitional outputs, and several techniques such as skip connections or multi-layer deconvolutional networks are adopted to recover back to the original resolution for effective dense prediction. In this paper, for more effective guidance of densely encoded features to the desired depth prediction, we propose a network architecture that utilizes novel local planar guidance layers located at multiple stages in the decoding phase. We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks. We also provide results from an ablation study to validate the effectiveness of the proposed method.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10326v5
PDF	https://arxiv.org/pdf/1907.10326v5.pdf
PWC	https://paperswithcode.com/paper/from-big-to-small-multi-scale-local-planar
Repo	https://github.com/cogaplex-bts/bts
Framework	tf

Training-Time-Friendly Network for Real-Time Object Detection


Title	Training-Time-Friendly Network for Real-Time Object Detection
Authors	Zili Liu, Tu Zheng, Guodong Xu, Zheng Yang, Haifeng Liu, Deng Cai
Abstract	Modern object detectors can rarely achieve short training time, fast inference speed, and high accuracy at the same time. To strike a balance among them, we propose the Training-Time-Friendly Network (TTFNet). In this work, we start with light-head, single-stage, and anchor-free designs, which enable fast inference speed. Then, we focus on shortening training time. We notice that encoding more training samples from annotated boxes plays a similar role as increasing batch size, which helps enlarge the learning rate and accelerate the training process. To this end, we introduce a novel approach using Gaussian kernels to encode training samples. Besides, we design the initiative sample weights for better information utilization. Experiments on MS COCO show that our TTFNet has great advantages in balancing training time, inference speed, and accuracy. It has reduced training time by more than seven times compared to previous real-time detectors while maintaining state-of-the-art performances. In addition, our super-fast version of TTFNet-18 and TTFNet-53 can outperform SSD300 and YOLOv3 by less than one-tenth of their training time, respectively. The code has been made available at \url{https://github.com/ZJULearning/ttfnet}.
Tasks	Object Detection, Real-Time Object Detection
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00700v3
PDF	https://arxiv.org/pdf/1909.00700v3.pdf
PWC	https://paperswithcode.com/paper/training-time-friendly-network-for-real-time
Repo	https://github.com/ZJULearning/ttfnet
Framework	pytorch

Event Representation Learning Enhanced with External Commonsense Knowledge


Title	Event Representation Learning Enhanced with External Commonsense Knowledge
Authors	Xiao Ding, Kuo Liao, Ting Liu, Zhongyang Li, Junwen Duan
Abstract	Prior work has proposed effective methods to learn event representations that can capture syntactic and semantic information over text corpus, demonstrating their effectiveness for downstream tasks such as script event prediction. On the other hand, events extracted from raw texts lacks of commonsense knowledge, such as the intents and emotions of the event participants, which are useful for distinguishing event pairs when there are only subtle differences in their surface realizations. To address this issue, this paper proposes to leverage external commonsense knowledge about the intent and sentiment of the event. Experiments on three event-related tasks, i.e., event similarity, script event prediction and stock market prediction, show that our model obtains much better event embeddings for the tasks, achieving 78% improvements on hard similarity task, yielding more precise inferences on subsequent events under given contexts, and better accuracies in predicting the volatilities of the stock market.
Tasks	Representation Learning, Stock Market Prediction
Published	2019-09-09
URL	https://arxiv.org/abs/1909.05190v1
PDF	https://arxiv.org/pdf/1909.05190v1.pdf
PWC	https://paperswithcode.com/paper/event-representation-learning-enhanced-with
Repo	https://github.com/MagiaSN/CommonsenseERL_EMNLP_2019
Framework	pytorch

Hierarchical Optimal Transport for Document Representation


Title	Hierarchical Optimal Transport for Document Representation
Authors	Mikhail Yurochkin, Sebastian Claici, Edward Chien, Farzaneh Mirzazadeh, Justin Solomon
Abstract	The ability to measure similarity between documents enables intelligent summarization and analysis of large corpora. Past distances between documents suffer from either an inability to incorporate semantic similarities between words or from scalability issues. As an alternative, we introduce hierarchical optimal transport as a meta-distance between documents, where documents are modeled as distributions over topics, which themselves are modeled as distributions over words. We then solve an optimal transport problem on the smaller topic space to compute a similarity score. We give conditions on the topics under which this construction defines a distance, and we relate it to the word mover’s distance. We evaluate our technique for k-NN classification and show better interpretability and scalability with comparable performance to current methods at a fraction of the cost.
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10827v2
PDF	https://arxiv.org/pdf/1906.10827v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-optimal-transport-for-document
Repo	https://github.com/IBM/HOTT
Framework	none

Attention-based Context Aggregation Network for Monocular Depth Estimation


Title	Attention-based Context Aggregation Network for Monocular Depth Estimation
Authors	Yuru Chen, Haitao Zhao, Zhengwei Hu
Abstract	Depth estimation is a traditional computer vision task, which plays a crucial role in understanding 3D scene geometry. Recently, deep-convolutional-neural-networks based methods have achieved promising results in the monocular depth estimation field. Specifically, the framework that combines the multi-scale features extracted by the dilated convolution based block (atrous spatial pyramid pooling, ASPP) has gained the significant improvement in the dense labeling task. However, the discretized and predefined dilation rates cannot capture the continuous context information that differs in diverse scenes and easily introduce the grid artifacts in depth estimation. In this paper, we propose an attention-based context aggregation network (ACAN) to tackle these difficulties. Based on the self-attention model, ACAN adaptively learns the task-specific similarities between pixels to model the context information. First, we recast the monocular depth estimation as a dense labeling multi-class classification problem. Then we propose a soft ordinal inference to transform the predicted probabilities to continuous depth values, which can reduce the discretization error (about 1% decrease in RMSE). Second, the proposed ACAN aggregates both the image-level and pixel-level context information for depth estimation, where the former expresses the statistical characteristic of the whole image and the latter extracts the long-range spatial dependencies for each pixel. Third, for further reducing the inconsistency between the RGB image and depth map, we construct an attention loss to minimize their information entropy. We evaluate on public monocular depth-estimation benchmark datasets (including NYU Depth V2, KITTI). The experiments demonstrate the superiority of our proposed ACAN and achieve the competitive results with the state of the arts.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10137v1
PDF	http://arxiv.org/pdf/1901.10137v1.pdf
PWC	https://paperswithcode.com/paper/attention-based-context-aggregation-network
Repo	https://github.com/miraiaroha/ACAN
Framework	pytorch

PaLM: A Hybrid Parser and Language Model


Title	PaLM: A Hybrid Parser and Language Model
Authors	Hao Peng, Roy Schwartz, Noah A. Smith
Abstract	We present PaLM, a hybrid parser and neural language model. Building on an RNN language model, PaLM adds an attention layer over text spans in the left context. An unsupervised constituency parser can be derived from its attention weights, using a greedy decoding algorithm. We evaluate PaLM on language modeling, and empirically show that it outperforms strong baselines. If syntactic annotations are available, the attention component can be trained in a supervised manner, providing syntactically-informed representations of the context, and further improving language modeling performance.
Tasks	Language Modelling
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02134v1
PDF	https://arxiv.org/pdf/1909.02134v1.pdf
PWC	https://paperswithcode.com/paper/palm-a-hybrid-parser-and-language-model
Repo	https://github.com/Noahs-ARK/PaLM
Framework	pytorch

Spotting Collective Behaviour of Online Frauds in Customer Reviews


Title	Spotting Collective Behaviour of Online Frauds in Customer Reviews
Authors	Sarthika Dhawan, Siva Charan Reddy Gangireddy, Shiv Kumar, Tanmoy Chakraborty
Abstract	Online reviews play a crucial role in deciding the quality before purchasing any product. Unfortunately, spammers often take advantage of online review forums by writing fraud reviews to promote/demote certain products. It may turn out to be more detrimental when such spammers collude and collectively inject spam reviews as they can take complete control of users’ sentiment due to the volume of fraud reviews they inject. Group spam detection is thus more challenging than individual-level fraud detection due to unclear definition of a group, variation of inter-group dynamics, scarcity of labeled group-level spam data, etc. Here, we propose DeFrauder, an unsupervised method to detect online fraud reviewer groups. It first detects candidate fraud groups by leveraging the underlying product review graph and incorporating several behavioral signals which model multi-faceted collaboration among reviewers. It then maps reviewers into an embedding space and assigns a spam score to each group such that groups comprising spammers with highly similar behavioral traits achieve high spam score. While comparing with five baselines on four real-world datasets (two of them were curated by us), DeFrauder shows superior performance by outperforming the best baseline with 17.11% higher NDCG@50 (on average) across datasets.
Tasks	Fraud Detection
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13649v6
PDF	https://arxiv.org/pdf/1905.13649v6.pdf
PWC	https://paperswithcode.com/paper/spotting-collusive-behaviour-of-online-fraud
Repo	https://github.com/LCS2-IIITD/DeFrauder
Framework	none

Imputing missing values with unsupervised random trees


Title	Imputing missing values with unsupervised random trees
Authors	David Cortes
Abstract	This work proposes a non-iterative strategy for missing value imputations which is guided by similarity between observations, but instead of explicitly determining distances or nearest neighbors, it assigns observations to overlapping buckets through recursive semi-random hyperplane cuts, in which weighted averages are determined as imputations for each variable. The quality of these imputations is oftentimes not as good as that of chained equations, but the proposed technique is much faster, non-iterative, can make imputations on new data without re-calculating anything, and scales easily to large and high-dimensional datasets, providing a significant boost over simple mean/median imputation in regression and classification metrics with imputed values when other methods are not feasible.
Tasks	Imputation
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06646v2
PDF	https://arxiv.org/pdf/1911.06646v2.pdf
PWC	https://paperswithcode.com/paper/imputing-missing-values-with-unsupervised
Repo	https://github.com/david-cortes/isotree
Framework	none

Open Set Recognition Through Deep Neural Network Uncertainty: Does Out-of-Distribution Detection Require Generative Classifiers?


Title	Open Set Recognition Through Deep Neural Network Uncertainty: Does Out-of-Distribution Detection Require Generative Classifiers?
Authors	Martin Mundt, Iuliia Pliushch, Sagnik Majumder, Visvanathan Ramesh
Abstract	We present an analysis of predictive uncertainty based out-of-distribution detection for different approaches to estimate various models’ epistemic uncertainty and contrast it with extreme value theory based open set recognition. While the former alone does not seem to be enough to overcome this challenge, we demonstrate that uncertainty goes hand in hand with the latter method. This seems to be particularly reflected in a generative model approach, where we show that posterior based open set recognition outperforms discriminative models and predictive uncertainty based outlier rejection, raising the question of whether classifiers need to be generative in order to know what they have not seen.
Tasks	Open Set Learning, Out-of-Distribution Detection
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09625v1
PDF	https://arxiv.org/pdf/1908.09625v1.pdf
PWC	https://paperswithcode.com/paper/open-set-recognition-through-deep-neural
Repo	https://github.com/MrtnMndt/Deep_Openset_Recognition_through_Uncertainty
Framework	pytorch

Deep CNN-based Multi-task Learning for Open-Set Recognition


Title	Deep CNN-based Multi-task Learning for Open-Set Recognition
Authors	Poojan Oza, Vishal M. Patel
Abstract	We propose a novel deep convolutional neural network (CNN) based multi-task learning approach for open-set visual recognition. We combine a classifier network and a decoder network with a shared feature extractor network within a multi-task learning framework. We show that this approach results in better open-set recognition accuracy. In our approach, reconstruction errors from the decoder network are utilized for open-set rejection. In addition, we model the tail of the reconstruction error distribution from the known classes using the statistical Extreme Value Theory to improve the overall performance. Experiments on multiple image classification datasets are performed and it is shown that this method can perform significantly better than many competitive open set recognition algorithms available in the literature. The code will be made available at: github.com/otkupjnoz/mlosr.
Tasks	Image Classification, Multi-Task Learning, Open Set Learning
Published	2019-03-07
URL	http://arxiv.org/abs/1903.03161v1
PDF	http://arxiv.org/pdf/1903.03161v1.pdf
PWC	https://paperswithcode.com/paper/deep-cnn-based-multi-task-learning-for-open
Repo	https://github.com/otkupjnoz/mlosr
Framework	pytorch

Clustered Object Detection in Aerial Images


Title	Clustered Object Detection in Aerial Images
Authors	Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling
Abstract	Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in pixels, making them hardly distinguished from surrounding background; and (2) targets are in general sparsely and non-uniformly distributed, making the detection very inefficient. In this paper, we address both issues inspired by observing that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object clustering and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces object cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region is fed into DetecNet for object detection. ClusDet has several advantages over previous solutions: (1) it greatly reduces the number of chips for final object detection and hence achieves high running time efficiency, (2) the cluster-based scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three popular aerial image datasets including VisDrone, UAVDT and DOTA. In all experiments, ClusDet achieves promising performance in comparison with state-of-the-art detectors. Code will be available in \url{https://github.com/fyangneil}.
Tasks	Object Detection, Object Detection In Aerial Images
Published	2019-04-16
URL	https://arxiv.org/abs/1904.08008v3
PDF	https://arxiv.org/pdf/1904.08008v3.pdf
PWC	https://paperswithcode.com/paper/clustered-object-detection-in-aerial-images
Repo	https://github.com/fyangneil/Clustered-Object-Detection-in-Aerial-Image
Framework	none

Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network


Title	Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network
Authors	Priyanka Mandikal, R. Venkatesh Babu
Abstract	Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08906v1
PDF	http://arxiv.org/pdf/1901.08906v1.pdf
PWC	https://paperswithcode.com/paper/dense-3d-point-cloud-reconstruction-using-a
Repo	https://github.com/val-iisc/densepcr
Framework	tf

Discrete Object Generation with Reversible Inductive Construction


Title	Discrete Object Generation with Reversible Inductive Construction
Authors	Ari Seff, Wenda Zhou, Farhan Damani, Abigail Doyle, Ryan P. Adams
Abstract	The success of generative modeling in continuous domains has led to a surge of interest in generating discrete data such as molecules, source code, and graphs. However, construction histories for these discrete objects are typically not unique and so generative models must reason about intractably large spaces in order to learn. Additionally, structured discrete domains are often characterized by strict constraints on what constitutes a valid object and generative models must respect these requirements in order to produce useful novel samples. Here, we present a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity. Building off of generative interpretations of denoising autoencoders, the Markov chain alternates between producing 1) a sequence of corrupted objects that are valid but not from the data distribution, and 2) a learned reconstruction distribution that attempts to fix the corruptions while also preserving validity. This approach constrains the generative model to only produce valid objects, requires the learner to only discover local modifications to the objects, and avoids marginalization over an unknown and potentially large space of construction histories. We evaluate the proposed approach on two highly structured discrete domains, molecules and Laman graphs, and find that it compares favorably to alternative methods at capturing distributional statistics for a host of semantically relevant metrics.
Tasks	Denoising
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08268v2
PDF	https://arxiv.org/pdf/1907.08268v2.pdf
PWC	https://paperswithcode.com/paper/discrete-object-generation-with-reversible
Repo	https://github.com/PrincetonLIPS/reversible-inductive-construction
Framework	none

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition


Title	SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition
Authors	Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold Haeb-Umbach
Abstract	We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ – Spatialized Multi-Speaker Wall Street Journal. It consists of artificially mixed speech taken from the WSJ database, but unlike earlier databases we consider all WSJ0+1 utterances and take care of strictly separating the speaker sets present in the training, validation and test sets. When spatializing the data we ensure a high degree of randomness w.r.t. room size, array center and rotation, as well as speaker position. Furthermore, this paper offers a critical assessment of recently proposed measures of source separation performance. Alongside the code to generate the database we provide a source separation baseline and a Kaldi recipe with competitive word error rates to provide common ground for evaluation.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13934v1
PDF	https://arxiv.org/pdf/1910.13934v1.pdf
PWC	https://paperswithcode.com/paper/sms-wsj-database-performance-measures-and
Repo	https://github.com/fgnt/sms_wsj
Framework	none

MIDAS: A Dialog Act Annotation Scheme for Open Domain Human Machine Spoken Conversations


Title	MIDAS: A Dialog Act Annotation Scheme for Open Domain Human Machine Spoken Conversations
Authors	Dian Yu, Zhou Yu
Abstract	Dialog act prediction is an essential language comprehension task for both dialog system building and discourse analysis. Previous dialog act schemes, such as SWBD-DAMSL, are designed for human-human conversations, in which conversation partners have perfect language understanding ability. In this paper, we design a dialog act annotation scheme, MIDAS (Machine Interaction Dialog Act Scheme), targeted on open-domain human-machine conversations. MIDAS is designed to assist machines which have limited ability to understand their human partners. MIDAS has a hierarchical structure and supports multi-label annotations. We collected and annotated a large open-domain human-machine spoken conversation dataset (consists of 24K utterances). To show the applicability of the scheme, we leverage transfer learning methods to train a multi-label dialog act prediction model and reach an F1 score of 0.79.
Tasks	Transfer Learning
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10023v1
PDF	https://arxiv.org/pdf/1908.10023v1.pdf
PWC	https://paperswithcode.com/paper/midas-a-dialog-act-annotation-scheme-for-open
Repo	https://github.com/DianDYu/MIDAS_dialog_act
Framework	pytorch