February 2, 2020

2932 words 14 mins read

Paper Group AWR 42

Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement. 3D Dilated Multi-Fiber Network for Real-time Brain Tumor Segmentation in MRI. A topological data analysis based classification method for multiple measurements. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. Is the Red Square Big? MALeViC: Modelin …


Title	Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement
Authors	Ting-En Lin, Hua Xu, Hanlei Zhang
Abstract	Identifying new user intents is an essential task in the dialogue system. However, it is hard to get satisfying clustering results since the definition of intents is strongly guided by prior knowledge. Existing methods incorporate prior knowledge by intensive feature engineering, which not only leads to overfitting but also makes it sensitive to the number of clusters. In this paper, we propose constrained deep adaptive clustering with cluster refinement (CDAC+), an end-to-end clustering method that can naturally incorporate pairwise constraints as prior knowledge to guide the clustering process. Moreover, we refine the clusters by forcing the model to learn from the high confidence assignments. After eliminating low confidence assignments, our approach is surprisingly insensitive to the number of clusters. Experimental results on the three benchmark datasets show that our method can yield significant improvements over strong baselines.
Tasks	Feature Engineering
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08891v1
PDF	https://arxiv.org/pdf/1911.08891v1.pdf
PWC	https://paperswithcode.com/paper/discovering-new-intents-via-constrained-deep
Repo	https://github.com/thuiar/CDAC-plus
Framework	pytorch

3D Dilated Multi-Fiber Network for Real-time Brain Tumor Segmentation in MRI


Title	3D Dilated Multi-Fiber Network for Real-time Brain Tumor Segmentation in MRI
Authors	Chen Chen, Xiaopeng Liu, Meng Ding, Junfeng Zheng, Jiangyun Li
Abstract	Brain tumor segmentation plays a pivotal role in medical image processing. In this work, we aim to segment brain MRI volumes. 3D convolution neural networks (CNN) such as 3D U-Net and V-Net employing 3D convolutions to capture the correlation between adjacent slices have achieved impressive segmentation results. However, these 3D CNN architectures come with high computational overheads due to multiple layers of 3D convolutions, which may make these models prohibitive for practical large-scale applications. To this end, we propose a highly efficient 3D CNN to achieve real-time dense volumetric segmentation. The network leverages the 3D multi-fiber unit which consists of an ensemble of lightweight 3D convolutional networks to significantly reduce the computational cost. Moreover, 3D dilated convolutions are used to build multi-scale feature representations. Extensive experimental results on the BraTS-2018 challenge dataset show that the proposed architecture greatly reduces computation cost while maintaining high accuracy for brain tumor segmentation. The source code can be found at https://github.com/China-LiuXiaopeng/BraTS-DMFNet
Tasks	Brain Tumor Segmentation
Published	2019-04-06
URL	https://arxiv.org/abs/1904.03355v5
PDF	https://arxiv.org/pdf/1904.03355v5.pdf
PWC	https://paperswithcode.com/paper/3d-dilated-multi-fiber-network-for-real-time
Repo	https://github.com/China-LiuXiaopeng/BraTS-DMFNet
Framework	pytorch

A topological data analysis based classification method for multiple measurements


Title	A topological data analysis based classification method for multiple measurements
Authors	Henri Riihimäki, Wojciech Chachólski, Jakob Theorell, Jan Hillert, Ryan Ramanujam
Abstract	Machine learning models for repeated measurements are limited. Using topological data analysis (TDA), we present a classifier for repeated measurements which samples from the data space and builds a network graph based on the data topology. When applying this to two case studies, accuracy exceeds alternative models with additional benefits such as reporting data subsets with high purity along with feature values. For 300 examples of 3 tree species, the accuracy reached 80% after 30 datapoints, which was improved to 90% after increased sampling to 400 datapoints. Using data from 100 examples of each of 6 point processes, the classifier achieved 96.8% accuracy. In both datasets, the TDA classifier outperformed an alternative model. This algorithm and software can be beneficial for repeated measurement data common in biological sciences, as both an accurate classifier and a feature selection tool.
Tasks	Feature Selection, Point Processes, Topological Data Analysis
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02971v1
PDF	http://arxiv.org/pdf/1904.02971v1.pdf
PWC	https://paperswithcode.com/paper/a-topological-data-analysis-based
Repo	https://github.com/ryaram1/mmTDA
Framework	none

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack


Title	HopSkipJumpAttack: A Query-Efficient Decision-Based Attack
Authors	Jianbo Chen, Michael I. Jordan, Martin J. Wainwright
Abstract	The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for $\ell_2$ and $\ell_\infty$ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than Boundary Attack. It also achieves competitive performance in attacking several widely-used defense mechanisms. (HopSkipJumpAttack was named Boundary Attack++ in a previous version of the preprint.)
Tasks	Adversarial Attack
Published	2019-04-03
URL	https://arxiv.org/abs/1904.02144v4
PDF	https://arxiv.org/pdf/1904.02144v4.pdf
PWC	https://paperswithcode.com/paper/boundary-attack-query-efficient-decision
Repo	https://github.com/Jianbo-Lab/HSJA
Framework	tf

Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts


Title	Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts
Authors	Sandro Pezzelle, Raquel Fernández
Abstract	This work aims at modeling how the meaning of gradable adjectives of size (`big',` small’) can be learned from visually-grounded contexts. Inspired by cognitive and linguistic evidence showing that the use of these expressions relies on setting a threshold that is dependent on a specific context, we investigate the ability of multi-modal models in assessing whether an object is `big' or` small’ in a given visual scene. In contrast with the standard computational approach that simplistically treats gradable adjectives as `fixed’ attributes, we pose the problem as relational: to be successful, a model has to consider the full visual context. By means of four main tasks, we show that state-of-the-art models (but not a relatively strong baseline) can learn the function subtending the meaning of size adjectives, though their performance is found to decrease while moving from simple to more complex tasks. Crucially, models fail in developing abstract representations of gradable adjectives that can be used compositionally. \|
Tasks
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10285v1
PDF	https://arxiv.org/pdf/1908.10285v1.pdf
PWC	https://paperswithcode.com/paper/is-the-red-square-big-malevic-modeling
Repo	https://github.com/sandropezzelle/malevic
Framework	none

Collaborative Similarity Embedding for Recommender Systems


Title	Collaborative Similarity Embedding for Recommender Systems
Authors	Chih-Ming Chen, Chuan-Ju Wang, Ming-Feng Tsai, Yi-Hsuan Yang
Abstract	We present collaborative similarity embedding (CSE), a unified framework that exploits comprehensive collaborative relations available in a user-item bipartite graph for representation learning and recommendation. In the proposed framework, we differentiate two types of proximity relations: direct proximity and k-th order neighborhood proximity. While learning from the former exploits direct user-item associations observable from the graph, learning from the latter makes use of implicit associations such as user-user similarities and item-item similarities, which can provide valuable information especially when the graph is sparse. Moreover, for improving scalability and flexibility, we propose a sampling technique that is specifically designed to capture the two types of proximity relations. Extensive experiments on eight benchmark datasets show that CSE yields significantly better performance than state-of-the-art recommendation methods.
Tasks	Recommendation Systems, Representation Learning
Published	2019-02-17
URL	http://arxiv.org/abs/1902.06188v2
PDF	http://arxiv.org/pdf/1902.06188v2.pdf
PWC	https://paperswithcode.com/paper/collaborative-similarity-embedding-for
Repo	https://github.com/cnclabs/smore
Framework	none

The Shape of Data: Intrinsic Distance for Data Distributions


Title	The Shape of Data: Intrinsic Distance for Data Distributions
Authors	Anton Tsitsulin, Marina Munkhoeva, Davide Mottin, Panagiotis Karras, Alex Bronstein, Ivan Oseledets, Emmanuel Müller
Abstract	The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data distributions focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop a first-of-its-kind intrinsic and multi-scale method for characterizing and comparing data manifolds, using a lower-bound of the spectral variant of the Gromov-Wasserstein inter-manifold distance, which compares all data moments. In a thorough experimental study, we demonstrate that our method effectively discerns the structure of data manifolds even on unaligned data of different dimensionalities; moreover, we showcase its efficacy in evaluating the quality of generative models.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11141v2
PDF	https://arxiv.org/pdf/1905.11141v2.pdf
PWC	https://paperswithcode.com/paper/intrinsic-multi-scale-evaluation-of
Repo	https://github.com/xgfs/msid
Framework	none

FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network


Title	FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network
Authors	Jonah Philion
Abstract	The search for predictive models that generalize to the long tail of sensor inputs is the central difficulty when developing data-driven models for autonomous vehicles. In this paper, we use lane detection to study modeling and training techniques that yield better performance on real world test drives. On the modeling side, we introduce a novel fully convolutional model of lane detection that learns to decode lane structures instead of delegating structure inference to post-processing. In contrast to previous works, our convolutional decoder is able to represent an arbitrary number of lanes per image, preserves the polyline representation of lanes without reducing lanes to polynomials, and draws lanes iteratively without requiring the computational and temporal complexity of recurrent neural networks. Because our model includes an estimate of the joint distribution of neighboring pixels belonging to the same lane, our formulation includes a natural and computationally cheap definition of uncertainty. On the training side, we demonstrate a simple yet effective approach to adapt the model to new environments using unsupervised style transfer. By training FastDraw to make predictions of lane structure that are invariant to low-level stylistic differences between images, we achieve strong performance at test time in weather and lighting conditions that deviate substantially from those of the annotated datasets that are publicly available. We quantitatively evaluate our approach on the CVPR 2017 Tusimple lane marking challenge, difficult CULane datasets, and a small labeled dataset of our own and achieve competitive accuracy while running at 90 FPS.
Tasks	Autonomous Vehicles, Lane Detection, Style Transfer
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04354v2
PDF	https://arxiv.org/pdf/1905.04354v2.pdf
PWC	https://paperswithcode.com/paper/fastdraw-addressing-the-long-tail-of-lane
Repo	https://github.com/jonahthelion/FastDraw
Framework	none

Short-distance commuters in the smart city


Title	Short-distance commuters in the smart city
Authors	Francisco Benita, Garvit Bansal, Georgios Piliouras, Bige Tunçer
Abstract	This study models and examines commuter’s preferences for short-distance transportation modes, namely: walking, taking a bus or riding a metro. It is used a unique dataset from a large-scale field experiment in Singapore that provides rich information about tens of thousands of commuters’ behavior. In contrast to the standard approach, this work does not relay on survey data. Conversely, the chosen transportation modes are identified by processing raw data (latitude, longitude, timestamp). The approach of this work exploits the information generated by the smart transportation system in the city that make suitable the task of obtaining granular and nearly real-time data. Novel algorithms are proposed with the intention to generate proxies for walkability and public transport attributes. The empirical results of the case study suggest that commuters do no differentiate between public transport choices (bus and metro), therefore possible nested structures for the public transport modes are rejected.
Tasks
Published	2019-02-16
URL	http://arxiv.org/abs/1902.08028v1
PDF	http://arxiv.org/pdf/1902.08028v1.pdf
PWC	https://paperswithcode.com/paper/short-distance-commuters-in-the-smart-city
Repo	https://github.com/Garvit244/Shapefile_to_Network
Framework	none

Machine Learning vs Statistical Methods for Time Series Forecasting: Size Matters


Title	Machine Learning vs Statistical Methods for Time Series Forecasting: Size Matters
Authors	Vitor Cerqueira, Luis Torgo, Carlos Soares
Abstract	Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, these were shown to systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample size. Using a learning curve method, our results suggest that machine learning methods improve their relative predictive performance as the sample size grows. The code to reproduce the experiments is available at https://github.com/vcerqueira/MLforForecasting.
Tasks	Time Series, Time Series Forecasting
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13316v1
PDF	https://arxiv.org/pdf/1909.13316v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-vs-statistical-methods-for
Repo	https://github.com/vcerqueira/MLforForecasting
Framework	none

LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking


Title	LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking
Authors	Bernie Wang, Virginia Wu, Bichen Wu, Kurt Keutzer
Abstract	LiDAR (Light Detection And Ranging) is an essential and widely adopted sensor for autonomous vehicles, particularly for those vehicles operating at higher levels (L4-L5) of autonomy. Recent work has demonstrated the promise of deep-learning approaches for LiDAR-based detection. However, deep-learning algorithms are extremely data hungry, requiring large amounts of labeled point-cloud data for training and evaluation. Annotating LiDAR point cloud data is challenging due to the following issues: 1) A LiDAR point cloud is usually sparse and has low resolution, making it difficult for human annotators to recognize objects. 2) Compared to annotation on 2D images, the operation of drawing 3D bounding boxes or even point-wise labels on LiDAR point clouds is more complex and time-consuming. 3) LiDAR data are usually collected in sequences, so consecutive frames are highly correlated, leading to repeated annotations. To tackle these challenges, we propose LATTE, an open-sourced annotation tool for LiDAR point clouds. LATTE features the following innovations: 1) Sensor fusion: We utilize image-based detection algorithms to automatically pre-label a calibrated image, and transfer the labels to the point cloud. 2) One-click annotation: Instead of drawing 3D bounding boxes or point-wise labels, we simplify the annotation to just one click on the target object, and automatically generate the bounding box for the target. 3) Tracking: we integrate tracking into sequence annotation such that we can transfer labels from one frame to subsequent ones and therefore significantly reduce repeated labeling. Experiments show the proposed features accelerate the annotation speed by 6.2x and significantly improve label quality with 23.6% and 2.2% higher instance-level precision and recall, and 2.0% higher bounding box IoU. LATTE is open-sourced at https://github.com/bernwang/latte.
Tasks	Autonomous Vehicles, Sensor Fusion
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09085v1
PDF	http://arxiv.org/pdf/1904.09085v1.pdf
PWC	https://paperswithcode.com/paper/latte-accelerating-lidar-point-cloud
Repo	https://github.com/bernwang/latte
Framework	none

Adaptively Preconditioned Stochastic Gradient Langevin Dynamics


Title	Adaptively Preconditioned Stochastic Gradient Langevin Dynamics
Authors	Chandrasekaran Anirudh Bhardwaj
Abstract	Stochastic Gradient Langevin Dynamics infuses isotropic gradient noise to SGD to help navigate pathological curvature in the loss landscape for deep networks. Isotropic nature of the noise leads to poor scaling, and adaptive methods based on higher order curvature information such as Fisher Scoring have been proposed to precondition the noise in order to achieve better convergence. In this paper, we describe an adaptive method to estimate the parameters of the noise and conduct experiments on well-known model architectures to show that the adaptively preconditioned SGLD method achieves convergence with the speed of adaptive first order methods such as Adam, AdaGrad etc. and achieves generalization equivalent of SGD in the test set.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04324v2
PDF	https://arxiv.org/pdf/1906.04324v2.pdf
PWC	https://paperswithcode.com/paper/adaptively-preconditioned-stochastic-gradient
Repo	https://github.com/Anirudhsekar96/Noisy_SGD
Framework	pytorch

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification


Title	DropEdge: Towards Deep Graph Convolutional Networks on Node Classification
Authors	Yu Rong, Wenbing Huang, Tingyang Xu, Junzhou Huang
Abstract	\emph{Over-fitting} and \emph{over-smoothing} are two main obstacles of developing deep Graph Convolutional Networks (GCNs) for node classification. In particular, over-fitting weakens the generalization ability on small dataset, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. This paper proposes DropEdge, a novel and flexible technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graph at each training epoch, acting like a data augmenter and also a message passing reducer. Furthermore, we theoretically demonstrate that DropEdge either reduces the convergence speed of over-smoothing or relieves the information loss caused by it. More importantly, our DropEdge is a general skill that can be equipped with many other backbone models (e.g. GCN, ResGCN, GraphSAGE, and JKNet) for enhanced performance. Extensive experiments on several benchmarks verify that DropEdge consistently improves the performance on a variety of both shallow and deep GCNs. The effect of DropEdge on preventing over-smoothing is empirically visualized and validated as well. Codes are released on~\url{https://github.com/DropEdge/DropEdge}.
Tasks	Node Classification
Published	2019-07-25
URL	https://arxiv.org/abs/1907.10903v4
PDF	https://arxiv.org/pdf/1907.10903v4.pdf
PWC	https://paperswithcode.com/paper/the-truly-deep-graph-convolutional-networks
Repo	https://github.com/DropEdge/DropEdge
Framework	pytorch

Learning to Optimize in Swarms


Title	Learning to Optimize in Swarms
Authors	Yue Cao, Tianlong Chen, Zhangyang Wang, Yang Shen
Abstract	Learning to optimize has emerged as a powerful framework for various optimization and machine learning tasks. Current such “meta-optimizers” often learn in the space of continuous optimization algorithms that are point-based and uncertainty-unaware. To overcome the limitations, we propose a meta-optimizer that learns in the algorithmic space of both point-based and population-based optimization algorithms. The meta-optimizer targets at a meta-loss function consisting of both cumulative regret and entropy. Specifically, we learn and interpret the update formula through a population of LSTMs embedded with sample- and feature-level attentions. Meanwhile, we estimate the posterior directly over the global optimum and use an uncertainty measure to help guide the learning process. Empirical results over non-convex test functions and the protein-docking application demonstrate that this new meta-optimizer outperforms existing competitors.
Tasks
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03787v2
PDF	https://arxiv.org/pdf/1911.03787v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-optimize-in-swarms
Repo	https://github.com/Shen-Lab/LOIS
Framework	tf

Discourse-Aware Semantic Self-Attention for Narrative Reading Comprehension


Title	Discourse-Aware Semantic Self-Attention for Narrative Reading Comprehension
Authors	Todor Mihaylov, Anette Frank
Abstract	In this work, we propose to use linguistic annotations as a basis for a \textit{Discourse-Aware Semantic Self-Attention} encoder that we employ for reading comprehension on long narrative texts. We extract relations between discourse units, events and their arguments as well as coreferring mentions, using available annotation tools. Our empirical evaluation shows that the investigated structures improve the overall performance, especially intra-sentential and cross-sentential discourse relations, sentence-internal semantic role relations, and long-distance coreference relations. We show that dedicating self-attention heads to intra-sentential relations and relations connecting neighboring sentences is beneficial for finding answers to questions in longer contexts. Our findings encourage the use of discourse-semantic annotations to enhance the generalization capacity of self-attention models for reading comprehension.
Tasks	Reading Comprehension
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10721v1
PDF	https://arxiv.org/pdf/1908.10721v1.pdf
PWC	https://paperswithcode.com/paper/discourse-aware-semantic-self-attention-for
Repo	https://github.com/Heidelberg-NLP/discourse-aware-semantic-self-attention
Framework	none