October 21, 2019

3116 words 15 mins read

Paper Group AWR 146

A Large-Scale Corpus for Conversation Disentanglement. Improved Techniques for Learning to Dehaze and Beyond: A Collective Study. Open Logo Detection Challenge. Hoeffding Trees with nmin adaptation. Textual Explanations for Self-Driving Vehicles. Tukey-Inspired Video Object Segmentation. Model-order selection in statistical shape models. Fast Conte …

A Large-Scale Corpus for Conversation Disentanglement


Title	A Large-Scale Corpus for Conversation Disentanglement
Authors	Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, Walter S. Lasecki
Abstract	Disentangling conversations mixed together in a single stream of messages is a difficult task, made harder by the lack of large manually annotated datasets. We created a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure. Our dataset is 16 times larger than all previously released datasets combined, the first to include adjudication of annotation disagreements, and the first to include context. We use our data to re-examine prior work, in particular, finding that 80% of conversations in a widely used dialogue corpus are either missing messages or contain extra messages. Our manually-annotated data presents an opportunity to develop robust data-driven methods for conversation disentanglement, which will help advance dialogue research.
Tasks
Published	2018-10-25
URL	https://arxiv.org/abs/1810.11118v2
PDF	https://arxiv.org/pdf/1810.11118v2.pdf
PWC	https://paperswithcode.com/paper/analyzing-assumptions-in-conversation
Repo	https://github.com/IBM/dstc7-noesis
Framework	tf

Improved Techniques for Learning to Dehaze and Beyond: A Collective Study


Title	Improved Techniques for Learning to Dehaze and Beyond: A Collective Study
Authors	Yu Liu, Guanlong Zhao, Boyuan Gong, Yang Li, Ritu Raj, Niraj Goel, Satya Kesav, Sandeep Gottimukkala, Zhangyang Wang, Wenqi Ren, Dacheng Tao
Abstract	Here we explore two related but important tasks based on the recently released REalistic Single Image DEhazing (RESIDE) benchmark dataset: (i) single image dehazing as a low-level image restoration problem; and (ii) high-level visual understanding (e.g., object detection) of hazy images. For the first task, we investigated a variety of loss functions and show that perception-driven loss significantly improves dehazing performance. In the second task, we provide multiple solutions including using advanced modules in the dehazing-detection cascade and domain-adaptive object detectors. In both tasks, our proposed solutions significantly improve performance. GitHub repository URL is: https://github.com/guanlongzhao/dehaze
Tasks	Image Dehazing, Image Restoration, Object Detection, Single Image Dehazing
Published	2018-06-30
URL	http://arxiv.org/abs/1807.00202v2
PDF	http://arxiv.org/pdf/1807.00202v2.pdf
PWC	https://paperswithcode.com/paper/improved-techniques-for-learning-to-dehaze
Repo	https://github.com/guanlongzhao/dehaze
Framework	none

Open Logo Detection Challenge


Title	Open Logo Detection Challenge
Authors	Hang Su, Xiatian Zhu, Shaogang Gong
Abstract	Existing logo detection benchmarks consider artificial deployment scenarios by assuming that large training data with fine-grained bounding box annotations for each class are available for model training. Such assumptions are often invalid in realistic logo detection scenarios where new logo classes come progressively and require to be detected with little or none budget for exhaustively labelling fine-grained training data for every new class. Existing benchmarks are thus unable to evaluate the true performance of a logo detection method in realistic and open deployments. In this work, we introduce a more realistic and challenging logo detection setting, called Open Logo Detection. Specifically, this new setting assumes fine-grained labelling only on a small proportion of logo classes whilst the remaining classes have no labelled training data to simulate the open deployment. We further create an open logo detection benchmark, called OpenLogo,to promote the investigation of this new challenge. OpenLogo contains 27,083 images from 352 logo classes, built by aggregating/refining 7 existing datasets and establishing an open logo detection evaluation protocol. To address this challenge, we propose a Context Adversarial Learning (CAL) approach to synthesising training data with coherent logo instance appearance against diverse background context for enabling more effective optimisation of contemporary deep learning detection models. Experiments show the performance advantage of CAL over existing state-of-the-art alternative methods on the more realistic and challenging OpenLogo benchmark.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.01964v3
PDF	http://arxiv.org/pdf/1807.01964v3.pdf
PWC	https://paperswithcode.com/paper/open-logo-detection-challenge
Repo	https://github.com/dqhuy140598/LogoDetectionV2
Framework	tf

Hoeffding Trees with nmin adaptation


Title	Hoeffding Trees with nmin adaptation
Authors	Eva García-Martín, Niklas Lavesson, Håkan Grahn, Emiliano Casalicchio, Veselka Boeva
Abstract	Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution. We have observed that having fixed parameters lead to unnecessary computations, thus making the algorithm energy inefficient. In this paper we present the nmin adaptation method for Hoeffding trees. This method adapts the value of the nmin parameter, which significantly affects the energy consumption of the algorithm. The method reduces unnecessary computations and memory accesses, thus reducing the energy, while the accuracy is only marginally affected. We experimentally compared VFDT (Very Fast Decision Tree, the first Hoeffding tree algorithm) and CVFDT (Concept-adapting VFDT) with the VFDT-nmin (VFDT with nmin adaptation). The results show that VFDT-nmin consumes up to 27% less energy than the standard VFDT, and up to 92% less energy than CVFDT, trading off a few percent of accuracy in a few datasets.
Tasks
Published	2018-08-03
URL	http://arxiv.org/abs/1808.01145v1
PDF	http://arxiv.org/pdf/1808.01145v1.pdf
PWC	https://paperswithcode.com/paper/hoeffding-trees-with-nmin-adaptation
Repo	https://github.com/egarciamartin/hoeffding-nmin-adaptation
Framework	none

Textual Explanations for Self-Driving Vehicles


Title	Textual Explanations for Self-Driving Vehicles
Authors	Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata
Abstract	Deep neural perception and control networks have become key components of self-driving vehicles. User acceptance is likely to benefit from easy-to-interpret textual explanations which allow end-users to understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller’s output, namely rationalizations. We propose a new approach to introspective explanations which consists of two parts. First, we use a visual (spatial) attention model to train a convolutional network end-to-end from images to the vehicle control commands, i.e., acceleration and change of course. The controller’s attention identifies image regions that potentially influence the network’s output. Second, we use an attention-based video-to-text model to produce textual explanations of model actions. The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments. We evaluate these models on a novel driving dataset with ground-truth human explanations, the Berkeley DeepDrive eXplanation (BDD-X) dataset. Code is available at https://github.com/JinkyuKimUCB/explainable-deep-driving.
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11546v1
PDF	http://arxiv.org/pdf/1807.11546v1.pdf
PWC	https://paperswithcode.com/paper/textual-explanations-for-self-driving
Repo	https://github.com/JinkyuKimUCB/explainable-deep-driving
Framework	none

Tukey-Inspired Video Object Segmentation


Title	Tukey-Inspired Video Object Segmentation
Authors	Brent A. Griffin, Jason J. Corso
Abstract	We investigate the problem of strictly unsupervised video object segmentation, i.e., the separation of a primary object from background in video without a user-provided object mask or any training on an annotated dataset. We find foreground objects in low-level vision data using a John Tukey-inspired measure of “outlierness”. This Tukey-inspired measure also estimates the reliability of each data source as video characteristics change (e.g., a camera starts moving). The proposed method achieves state-of-the-art results for strictly unsupervised video object segmentation on the challenging DAVIS dataset. Finally, we use a variant of the Tukey-inspired measure to combine the output of multiple segmentation methods, including those using supervision during training, runtime, or both. This collectively more robust method of segmentation improves the Jaccard measure of its constituent methods by as much as 28%.
Tasks	Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07958v2
PDF	http://arxiv.org/pdf/1811.07958v2.pdf
PWC	https://paperswithcode.com/paper/tukey-inspired-video-object-segmentation
Repo	https://github.com/griffbr/TIS
Framework	none

Model-order selection in statistical shape models


Title	Model-order selection in statistical shape models
Authors	Alma Eguizabal, Peter J. Schreier, David Ramírez
Abstract	Statistical shape models enhance machine learning algorithms providing prior information about deformation. A Point Distribution Model (PDM) is a popular landmark-based statistical shape model for segmentation. It requires choosing a model order, which determines how much of the variation seen in the training data is accounted for by the PDM. A good choice of the model order depends on the number of training samples and the noise level in the training data set. Yet the most common approach for choosing the model order simply keeps a predetermined percentage of the total shape variation. In this paper, we present a technique for choosing the model order based on information-theoretic criteria, and we show empirical evidence that the model order chosen by this technique provides a good trade-off between over- and underfitting.
Tasks
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00309v1
PDF	http://arxiv.org/pdf/1808.00309v1.pdf
PWC	https://paperswithcode.com/paper/model-order-selection-in-statistical-shape
Repo	https://github.com/SSTGroup/Source-detection-in-colored-noise
Framework	none

Fast Context Adaptation via Meta-Learning


Title	Fast Context Adaptation via Meta-Learning
Authors	Luisa M Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson
Abstract	We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable. CAVIA partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks. At test time, only the context parameters are updated, leading to a low-dimensional task representation. We show empirically that CAVIA outperforms MAML for regression, classification, and reinforcement learning. Our experiments also highlight weaknesses in current benchmarks, in that the amount of adaptation needed in some cases is small.
Tasks	Meta-Learning
Published	2018-10-08
URL	https://arxiv.org/abs/1810.03642v4
PDF	https://arxiv.org/pdf/1810.03642v4.pdf
PWC	https://paperswithcode.com/paper/fast-context-adaptation-via-meta-learning
Repo	https://github.com/lmzintgraf/cavia
Framework	pytorch

Anytime Stereo Image Depth Estimation on Mobile Devices


Title	Anytime Stereo Image Depth Estimation on Mobile Devices
Authors	Yan Wang, Zihang Lai, Gao Huang, Brian H. Wang, Laurens van der Maaten, Mark Campbell, Kilian Q. Weinberger
Abstract	Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Current state-of-the-art algorithms force a choice between either generating accurate mappings at a slow pace, or quickly generating inaccurate ones, and additionally these methods typically require far too many parameters to be usable on power- or memory-constrained devices. Motivated by these shortcomings, we propose a novel approach for disparity prediction in the anytime setting. In contrast to prior work, our end-to-end learned approach can trade off computation and accuracy at inference time. Depth estimation is performed in stages, during which the model can be queried at any time to output its current best estimate. Our final model can process 1242$ \times $375 resolution images within a range of 10-35 FPS on an NVIDIA Jetson TX2 module with only marginal increases in error – using two orders of magnitude fewer parameters than the most competitive baseline. The source code is available at https://github.com/mileyan/AnyNet .
Tasks	Depth Estimation, Stereo Depth Estimation
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11408v2
PDF	http://arxiv.org/pdf/1810.11408v2.pdf
PWC	https://paperswithcode.com/paper/anytime-stereo-image-depth-estimation-on
Repo	https://github.com/mileyan/AnyNet
Framework	pytorch

Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification


Title	Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification
Authors	Tatsuhiro Aoshima, Kei Kobayashi, Mihoko Minami
Abstract	Machine learning has played an important role in information retrieval (IR) in recent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast implementation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse generalization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage footprint. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels.
Tasks	Extreme Multi-Label Classification, Information Retrieval, Multi-Label Classification
Published	2018-02-12
URL	http://arxiv.org/abs/1802.03938v1
PDF	http://arxiv.org/pdf/1802.03938v1.pdf
PWC	https://paperswithcode.com/paper/revisiting-the-vector-space-model-sparse
Repo	https://github.com/hiro4bbh/sticker
Framework	none


Title	Adversarial and Perceptual Refinement for Compressed Sensing MRI Reconstruction
Authors	Maximilian Seitzer, Guang Yang, Jo Schlemper, Ozan Oktay, Tobias Würfl, Vincent Christlein, Tom Wong, Raad Mohiaddin, David Firmin, Jennifer Keegan, Daniel Rueckert, Andreas Maier
Abstract	Deep learning approaches have shown promising performance for compressed sensing-based Magnetic Resonance Imaging. While deep neural networks trained with mean squared error (MSE) loss functions can achieve high peak signal to noise ratio, the reconstructed images are often blurry and lack sharp details, especially for higher undersampling rates. Recently, adversarial and perceptual loss functions have been shown to achieve more visually appealing results. However, it remains an open question how to (1) optimally combine these loss functions with the MSE loss function and (2) evaluate such a perceptual enhancement. In this work, we propose a hybrid method, in which a visual refinement component is learnt on top of an MSE loss-based reconstruction network. In addition, we introduce a semantic interpretability score, measuring the visibility of the region of interest in both ground truth and reconstructed images, which allows us to objectively quantify the usefulness of the image quality for image post-processing and analysis. Applied on a large cardiac MRI dataset simulated with 8-fold undersampling, we demonstrate significant improvements ($p<0.01$) over the state-of-the-art in both a human observer study and the semantic interpretability score.
Tasks
Published	2018-06-28
URL	http://arxiv.org/abs/1806.11216v1
PDF	http://arxiv.org/pdf/1806.11216v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-and-perceptual-refinement-for
Repo	https://github.com/mseitzer/csmri-refinement
Framework	pytorch

Nonparametric Density Flows for MRI Intensity Normalisation


Title	Nonparametric Density Flows for MRI Intensity Normalisation
Authors	Daniel C. Castro, Ben Glocker
Abstract	With the adoption of powerful machine learning methods in medical image analysis, it is becoming increasingly desirable to aggregate data that is acquired across multiple sites. However, the underlying assumption of many analysis techniques that corresponding tissues have consistent intensities in all images is often violated in multi-centre databases. We introduce a novel intensity normalisation scheme based on density matching, wherein the histograms are modelled as Dirichlet process Gaussian mixtures. The source mixture model is transformed to minimise its $L^2$ divergence towards a target model, then the voxel intensities are transported through a mass-conserving flow to maintain agreement with the moving density. In a multi-centre study with brain MRI data, we show that the proposed technique produces excellent correspondence between the matched densities and histograms. We further demonstrate that our method makes tissue intensity statistics substantially more compatible between images than a baseline affine transformation and is comparable to state-of-the-art while providing considerably smoother transformations. Finally, we validate that nonlinear intensity normalisation is a step toward effective imaging data harmonisation.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02613v1
PDF	http://arxiv.org/pdf/1806.02613v1.pdf
PWC	https://paperswithcode.com/paper/nonparametric-density-flows-for-mri-intensity
Repo	https://github.com/dccastro/NDFlow
Framework	none

DeepASL: Kinetic Model Incorporated Loss for Denoising Arterial Spin Labeled MRI via Deep Residual Learning


Title	DeepASL: Kinetic Model Incorporated Loss for Denoising Arterial Spin Labeled MRI via Deep Residual Learning
Authors	Cagdas Ulas, Giles Tetteh, Stephan Kaczmarz, Christine Preibisch, Bjoern H. Menze
Abstract	Arterial spin labeling (ASL) allows to quantify the cerebral blood flow (CBF) by magnetic labeling of the arterial blood water. ASL is increasingly used in clinical studies due to its noninvasiveness, repeatability and benefits in quantification. However, ASL suffers from an inherently low-signal-to-noise ratio (SNR) requiring repeated measurements of control/spin-labeled (C/L) pairs to achieve a reasonable image quality, which in return increases motion sensitivity. This leads to clinically prolonged scanning times increasing the risk of motion artifacts. Thus, there is an immense need of advanced imaging and processing techniques in ASL. In this paper, we propose a novel deep learning based approach to improve the perfusion-weighted image quality obtained from a subset of all available pairwise C/L subtractions. Specifically, we train a deep fully convolutional network (FCN) to learn a mapping from noisy perfusion-weighted image and its subtraction (residual) from the clean image. Additionally, we incorporate the CBF estimation model in the loss function during training, which enables the network to produce high quality images while simultaneously enforcing the CBF estimates to be as close as reference CBF values. Extensive experiments on synthetic and clinical ASL datasets demonstrate the effectiveness of our method in terms of improved ASL image quality, accurate CBF parameter estimation and considerably small computation time during testing.
Tasks	Denoising
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02755v2
PDF	http://arxiv.org/pdf/1804.02755v2.pdf
PWC	https://paperswithcode.com/paper/deepasl-kinetic-model-incorporated-loss-for
Repo	https://github.com/cagdasulas/ASL_CNN
Framework	tf

CREPE: A Convolutional Representation for Pitch Estimation


Title	CREPE: A Convolutional Representation for Pitch Estimation
Authors	Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello
Abstract	The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch. In this paper, we propose a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform. We show that the proposed model produces state-of-the-art results, performing equally or better than pYIN. Furthermore, we evaluate the model’s generalizability in terms of noise robustness. A pre-trained version of CREPE is made freely available as an open-source Python module for easy application.
Tasks	Information Retrieval, Music Information Retrieval
Published	2018-02-17
URL	http://arxiv.org/abs/1802.06182v1
PDF	http://arxiv.org/pdf/1802.06182v1.pdf
PWC	https://paperswithcode.com/paper/crepe-a-convolutional-representation-for
Repo	https://github.com/Pradeepiit/hf0
Framework	none

Bayesian Nonparametric Spectral Estimation


Title	Bayesian Nonparametric Spectral Estimation
Authors	Felipe Tobar
Abstract	Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies. This can become particularly challenging when only partial and noisy observations of the signal are available, where current methods fail to handle uncertainty appropriately. In this context, we propose a joint probabilistic model for signals, observations and spectra, where SE is addressed as an exact inference problem. Assuming a Gaussian process prior over the signal, we apply Bayes’ rule to find the analytic posterior distribution of the spectrum given a set of observations. Besides its expressiveness and natural account of spectral uncertainty, the proposed model also provides a functional-form representation of the power spectral density, which can be optimised efficiently. Comparison with previous approaches, in particular against Lomb-Scargle, is addressed theoretically and also experimentally in three different scenarios. Code and demo available at https://github.com/GAMES-UChile/BayesianSpectralEstimation.
Tasks	Time Series
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02196v2
PDF	http://arxiv.org/pdf/1809.02196v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-nonparametric-spectral-estimation
Repo	https://github.com/GAMES-UChile/BayesianSpectralEstimation
Framework	none