Paper Group AWR 146
A Large-Scale Corpus for Conversation Disentanglement. Improved Techniques for Learning to Dehaze and Beyond: A Collective Study. Open Logo Detection Challenge. Hoeffding Trees with nmin adaptation. Textual Explanations for Self-Driving Vehicles. Tukey-Inspired Video Object Segmentation. Model-order selection in statistical shape models. Fast Conte …
A Large-Scale Corpus for Conversation Disentanglement
Title | A Large-Scale Corpus for Conversation Disentanglement |
Authors | Jonathan K. Kummerfeld, Sai R. Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, Walter S. Lasecki |
Abstract | Disentangling conversations mixed together in a single stream of messages is a difficult task, made harder by the lack of large manually annotated datasets. We created a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure. Our dataset is 16 times larger than all previously released datasets combined, the first to include adjudication of annotation disagreements, and the first to include context. We use our data to re-examine prior work, in particular, finding that 80% of conversations in a widely used dialogue corpus are either missing messages or contain extra messages. Our manually-annotated data presents an opportunity to develop robust data-driven methods for conversation disentanglement, which will help advance dialogue research. |
Tasks | |
Published | 2018-10-25 |
URL | https://arxiv.org/abs/1810.11118v2 |
https://arxiv.org/pdf/1810.11118v2.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-assumptions-in-conversation |
Repo | https://github.com/IBM/dstc7-noesis |
Framework | tf |
Improved Techniques for Learning to Dehaze and Beyond: A Collective Study
Title | Improved Techniques for Learning to Dehaze and Beyond: A Collective Study |
Authors | Yu Liu, Guanlong Zhao, Boyuan Gong, Yang Li, Ritu Raj, Niraj Goel, Satya Kesav, Sandeep Gottimukkala, Zhangyang Wang, Wenqi Ren, Dacheng Tao |
Abstract | Here we explore two related but important tasks based on the recently released REalistic Single Image DEhazing (RESIDE) benchmark dataset: (i) single image dehazing as a low-level image restoration problem; and (ii) high-level visual understanding (e.g., object detection) of hazy images. For the first task, we investigated a variety of loss functions and show that perception-driven loss significantly improves dehazing performance. In the second task, we provide multiple solutions including using advanced modules in the dehazing-detection cascade and domain-adaptive object detectors. In both tasks, our proposed solutions significantly improve performance. GitHub repository URL is: https://github.com/guanlongzhao/dehaze |
Tasks | Image Dehazing, Image Restoration, Object Detection, Single Image Dehazing |
Published | 2018-06-30 |
URL | http://arxiv.org/abs/1807.00202v2 |
http://arxiv.org/pdf/1807.00202v2.pdf | |
PWC | https://paperswithcode.com/paper/improved-techniques-for-learning-to-dehaze |
Repo | https://github.com/guanlongzhao/dehaze |
Framework | none |
Open Logo Detection Challenge
Title | Open Logo Detection Challenge |
Authors | Hang Su, Xiatian Zhu, Shaogang Gong |
Abstract | Existing logo detection benchmarks consider artificial deployment scenarios by assuming that large training data with fine-grained bounding box annotations for each class are available for model training. Such assumptions are often invalid in realistic logo detection scenarios where new logo classes come progressively and require to be detected with little or none budget for exhaustively labelling fine-grained training data for every new class. Existing benchmarks are thus unable to evaluate the true performance of a logo detection method in realistic and open deployments. In this work, we introduce a more realistic and challenging logo detection setting, called Open Logo Detection. Specifically, this new setting assumes fine-grained labelling only on a small proportion of logo classes whilst the remaining classes have no labelled training data to simulate the open deployment. We further create an open logo detection benchmark, called OpenLogo,to promote the investigation of this new challenge. OpenLogo contains 27,083 images from 352 logo classes, built by aggregating/refining 7 existing datasets and establishing an open logo detection evaluation protocol. To address this challenge, we propose a Context Adversarial Learning (CAL) approach to synthesising training data with coherent logo instance appearance against diverse background context for enabling more effective optimisation of contemporary deep learning detection models. Experiments show the performance advantage of CAL over existing state-of-the-art alternative methods on the more realistic and challenging OpenLogo benchmark. |
Tasks | |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.01964v3 |
http://arxiv.org/pdf/1807.01964v3.pdf | |
PWC | https://paperswithcode.com/paper/open-logo-detection-challenge |
Repo | https://github.com/dqhuy140598/LogoDetectionV2 |
Framework | tf |
Hoeffding Trees with nmin adaptation
Title | Hoeffding Trees with nmin adaptation |
Authors | Eva García-Martín, Niklas Lavesson, Håkan Grahn, Emiliano Casalicchio, Veselka Boeva |
Abstract | Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution. We have observed that having fixed parameters lead to unnecessary computations, thus making the algorithm energy inefficient. In this paper we present the nmin adaptation method for Hoeffding trees. This method adapts the value of the nmin parameter, which significantly affects the energy consumption of the algorithm. The method reduces unnecessary computations and memory accesses, thus reducing the energy, while the accuracy is only marginally affected. We experimentally compared VFDT (Very Fast Decision Tree, the first Hoeffding tree algorithm) and CVFDT (Concept-adapting VFDT) with the VFDT-nmin (VFDT with nmin adaptation). The results show that VFDT-nmin consumes up to 27% less energy than the standard VFDT, and up to 92% less energy than CVFDT, trading off a few percent of accuracy in a few datasets. |
Tasks | |
Published | 2018-08-03 |
URL | http://arxiv.org/abs/1808.01145v1 |
http://arxiv.org/pdf/1808.01145v1.pdf | |
PWC | https://paperswithcode.com/paper/hoeffding-trees-with-nmin-adaptation |
Repo | https://github.com/egarciamartin/hoeffding-nmin-adaptation |
Framework | none |
Textual Explanations for Self-Driving Vehicles
Title | Textual Explanations for Self-Driving Vehicles |
Authors | Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata |
Abstract | Deep neural perception and control networks have become key components of self-driving vehicles. User acceptance is likely to benefit from easy-to-interpret textual explanations which allow end-users to understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller’s output, namely rationalizations. We propose a new approach to introspective explanations which consists of two parts. First, we use a visual (spatial) attention model to train a convolutional network end-to-end from images to the vehicle control commands, i.e., acceleration and change of course. The controller’s attention identifies image regions that potentially influence the network’s output. Second, we use an attention-based video-to-text model to produce textual explanations of model actions. The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments. We evaluate these models on a novel driving dataset with ground-truth human explanations, the Berkeley DeepDrive eXplanation (BDD-X) dataset. Code is available at https://github.com/JinkyuKimUCB/explainable-deep-driving. |
Tasks | |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11546v1 |
http://arxiv.org/pdf/1807.11546v1.pdf | |
PWC | https://paperswithcode.com/paper/textual-explanations-for-self-driving |
Repo | https://github.com/JinkyuKimUCB/explainable-deep-driving |
Framework | none |
Tukey-Inspired Video Object Segmentation
Title | Tukey-Inspired Video Object Segmentation |
Authors | Brent A. Griffin, Jason J. Corso |
Abstract | We investigate the problem of strictly unsupervised video object segmentation, i.e., the separation of a primary object from background in video without a user-provided object mask or any training on an annotated dataset. We find foreground objects in low-level vision data using a John Tukey-inspired measure of “outlierness”. This Tukey-inspired measure also estimates the reliability of each data source as video characteristics change (e.g., a camera starts moving). The proposed method achieves state-of-the-art results for strictly unsupervised video object segmentation on the challenging DAVIS dataset. Finally, we use a variant of the Tukey-inspired measure to combine the output of multiple segmentation methods, including those using supervision during training, runtime, or both. This collectively more robust method of segmentation improves the Jaccard measure of its constituent methods by as much as 28%. |
Tasks | Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07958v2 |
http://arxiv.org/pdf/1811.07958v2.pdf | |
PWC | https://paperswithcode.com/paper/tukey-inspired-video-object-segmentation |
Repo | https://github.com/griffbr/TIS |
Framework | none |
Model-order selection in statistical shape models
Title | Model-order selection in statistical shape models |
Authors | Alma Eguizabal, Peter J. Schreier, David Ramírez |
Abstract | Statistical shape models enhance machine learning algorithms providing prior information about deformation. A Point Distribution Model (PDM) is a popular landmark-based statistical shape model for segmentation. It requires choosing a model order, which determines how much of the variation seen in the training data is accounted for by the PDM. A good choice of the model order depends on the number of training samples and the noise level in the training data set. Yet the most common approach for choosing the model order simply keeps a predetermined percentage of the total shape variation. In this paper, we present a technique for choosing the model order based on information-theoretic criteria, and we show empirical evidence that the model order chosen by this technique provides a good trade-off between over- and underfitting. |
Tasks | |
Published | 2018-08-01 |
URL | http://arxiv.org/abs/1808.00309v1 |
http://arxiv.org/pdf/1808.00309v1.pdf | |
PWC | https://paperswithcode.com/paper/model-order-selection-in-statistical-shape |
Repo | https://github.com/SSTGroup/Source-detection-in-colored-noise |
Framework | none |
Fast Context Adaptation via Meta-Learning
Title | Fast Context Adaptation via Meta-Learning |
Authors | Luisa M Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson |
Abstract | We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable. CAVIA partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks. At test time, only the context parameters are updated, leading to a low-dimensional task representation. We show empirically that CAVIA outperforms MAML for regression, classification, and reinforcement learning. Our experiments also highlight weaknesses in current benchmarks, in that the amount of adaptation needed in some cases is small. |
Tasks | Meta-Learning |
Published | 2018-10-08 |
URL | https://arxiv.org/abs/1810.03642v4 |
https://arxiv.org/pdf/1810.03642v4.pdf | |
PWC | https://paperswithcode.com/paper/fast-context-adaptation-via-meta-learning |
Repo | https://github.com/lmzintgraf/cavia |
Framework | pytorch |
Anytime Stereo Image Depth Estimation on Mobile Devices
Title | Anytime Stereo Image Depth Estimation on Mobile Devices |
Authors | Yan Wang, Zihang Lai, Gao Huang, Brian H. Wang, Laurens van der Maaten, Mark Campbell, Kilian Q. Weinberger |
Abstract | Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Current state-of-the-art algorithms force a choice between either generating accurate mappings at a slow pace, or quickly generating inaccurate ones, and additionally these methods typically require far too many parameters to be usable on power- or memory-constrained devices. Motivated by these shortcomings, we propose a novel approach for disparity prediction in the anytime setting. In contrast to prior work, our end-to-end learned approach can trade off computation and accuracy at inference time. Depth estimation is performed in stages, during which the model can be queried at any time to output its current best estimate. Our final model can process 1242$ \times $375 resolution images within a range of 10-35 FPS on an NVIDIA Jetson TX2 module with only marginal increases in error – using two orders of magnitude fewer parameters than the most competitive baseline. The source code is available at https://github.com/mileyan/AnyNet . |
Tasks | Depth Estimation, Stereo Depth Estimation |
Published | 2018-10-26 |
URL | http://arxiv.org/abs/1810.11408v2 |
http://arxiv.org/pdf/1810.11408v2.pdf | |
PWC | https://paperswithcode.com/paper/anytime-stereo-image-depth-estimation-on |
Repo | https://github.com/mileyan/AnyNet |
Framework | pytorch |
Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification
Title | Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification |
Authors | Tatsuhiro Aoshima, Kei Kobayashi, Mihoko Minami |
Abstract | Machine learning has played an important role in information retrieval (IR) in recent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast implementation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse generalization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage footprint. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels. |
Tasks | Extreme Multi-Label Classification, Information Retrieval, Multi-Label Classification |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03938v1 |
http://arxiv.org/pdf/1802.03938v1.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-the-vector-space-model-sparse |
Repo | https://github.com/hiro4bbh/sticker |
Framework | none |
Adversarial and Perceptual Refinement for Compressed Sensing MRI Reconstruction
Title | Adversarial and Perceptual Refinement for Compressed Sensing MRI Reconstruction |
Authors | Maximilian Seitzer, Guang Yang, Jo Schlemper, Ozan Oktay, Tobias Würfl, Vincent Christlein, Tom Wong, Raad Mohiaddin, David Firmin, Jennifer Keegan, Daniel Rueckert, Andreas Maier |
Abstract | Deep learning approaches have shown promising performance for compressed sensing-based Magnetic Resonance Imaging. While deep neural networks trained with mean squared error (MSE) loss functions can achieve high peak signal to noise ratio, the reconstructed images are often blurry and lack sharp details, especially for higher undersampling rates. Recently, adversarial and perceptual loss functions have been shown to achieve more visually appealing results. However, it remains an open question how to (1) optimally combine these loss functions with the MSE loss function and (2) evaluate such a perceptual enhancement. In this work, we propose a hybrid method, in which a visual refinement component is learnt on top of an MSE loss-based reconstruction network. In addition, we introduce a semantic interpretability score, measuring the visibility of the region of interest in both ground truth and reconstructed images, which allows us to objectively quantify the usefulness of the image quality for image post-processing and analysis. Applied on a large cardiac MRI dataset simulated with 8-fold undersampling, we demonstrate significant improvements ($p<0.01$) over the state-of-the-art in both a human observer study and the semantic interpretability score. |
Tasks | |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.11216v1 |
http://arxiv.org/pdf/1806.11216v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-and-perceptual-refinement-for |
Repo | https://github.com/mseitzer/csmri-refinement |
Framework | pytorch |
Nonparametric Density Flows for MRI Intensity Normalisation
Title | Nonparametric Density Flows for MRI Intensity Normalisation |
Authors | Daniel C. Castro, Ben Glocker |
Abstract | With the adoption of powerful machine learning methods in medical image analysis, it is becoming increasingly desirable to aggregate data that is acquired across multiple sites. However, the underlying assumption of many analysis techniques that corresponding tissues have consistent intensities in all images is often violated in multi-centre databases. We introduce a novel intensity normalisation scheme based on density matching, wherein the histograms are modelled as Dirichlet process Gaussian mixtures. The source mixture model is transformed to minimise its $L^2$ divergence towards a target model, then the voxel intensities are transported through a mass-conserving flow to maintain agreement with the moving density. In a multi-centre study with brain MRI data, we show that the proposed technique produces excellent correspondence between the matched densities and histograms. We further demonstrate that our method makes tissue intensity statistics substantially more compatible between images than a baseline affine transformation and is comparable to state-of-the-art while providing considerably smoother transformations. Finally, we validate that nonlinear intensity normalisation is a step toward effective imaging data harmonisation. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02613v1 |
http://arxiv.org/pdf/1806.02613v1.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-density-flows-for-mri-intensity |
Repo | https://github.com/dccastro/NDFlow |
Framework | none |
DeepASL: Kinetic Model Incorporated Loss for Denoising Arterial Spin Labeled MRI via Deep Residual Learning
Title | DeepASL: Kinetic Model Incorporated Loss for Denoising Arterial Spin Labeled MRI via Deep Residual Learning |
Authors | Cagdas Ulas, Giles Tetteh, Stephan Kaczmarz, Christine Preibisch, Bjoern H. Menze |
Abstract | Arterial spin labeling (ASL) allows to quantify the cerebral blood flow (CBF) by magnetic labeling of the arterial blood water. ASL is increasingly used in clinical studies due to its noninvasiveness, repeatability and benefits in quantification. However, ASL suffers from an inherently low-signal-to-noise ratio (SNR) requiring repeated measurements of control/spin-labeled (C/L) pairs to achieve a reasonable image quality, which in return increases motion sensitivity. This leads to clinically prolonged scanning times increasing the risk of motion artifacts. Thus, there is an immense need of advanced imaging and processing techniques in ASL. In this paper, we propose a novel deep learning based approach to improve the perfusion-weighted image quality obtained from a subset of all available pairwise C/L subtractions. Specifically, we train a deep fully convolutional network (FCN) to learn a mapping from noisy perfusion-weighted image and its subtraction (residual) from the clean image. Additionally, we incorporate the CBF estimation model in the loss function during training, which enables the network to produce high quality images while simultaneously enforcing the CBF estimates to be as close as reference CBF values. Extensive experiments on synthetic and clinical ASL datasets demonstrate the effectiveness of our method in terms of improved ASL image quality, accurate CBF parameter estimation and considerably small computation time during testing. |
Tasks | Denoising |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02755v2 |
http://arxiv.org/pdf/1804.02755v2.pdf | |
PWC | https://paperswithcode.com/paper/deepasl-kinetic-model-incorporated-loss-for |
Repo | https://github.com/cagdasulas/ASL_CNN |
Framework | tf |
CREPE: A Convolutional Representation for Pitch Estimation
Title | CREPE: A Convolutional Representation for Pitch Estimation |
Authors | Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello |
Abstract | The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch. In this paper, we propose a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform. We show that the proposed model produces state-of-the-art results, performing equally or better than pYIN. Furthermore, we evaluate the model’s generalizability in terms of noise robustness. A pre-trained version of CREPE is made freely available as an open-source Python module for easy application. |
Tasks | Information Retrieval, Music Information Retrieval |
Published | 2018-02-17 |
URL | http://arxiv.org/abs/1802.06182v1 |
http://arxiv.org/pdf/1802.06182v1.pdf | |
PWC | https://paperswithcode.com/paper/crepe-a-convolutional-representation-for |
Repo | https://github.com/Pradeepiit/hf0 |
Framework | none |
Bayesian Nonparametric Spectral Estimation
Title | Bayesian Nonparametric Spectral Estimation |
Authors | Felipe Tobar |
Abstract | Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies. This can become particularly challenging when only partial and noisy observations of the signal are available, where current methods fail to handle uncertainty appropriately. In this context, we propose a joint probabilistic model for signals, observations and spectra, where SE is addressed as an exact inference problem. Assuming a Gaussian process prior over the signal, we apply Bayes’ rule to find the analytic posterior distribution of the spectrum given a set of observations. Besides its expressiveness and natural account of spectral uncertainty, the proposed model also provides a functional-form representation of the power spectral density, which can be optimised efficiently. Comparison with previous approaches, in particular against Lomb-Scargle, is addressed theoretically and also experimentally in three different scenarios. Code and demo available at https://github.com/GAMES-UChile/BayesianSpectralEstimation. |
Tasks | Time Series |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02196v2 |
http://arxiv.org/pdf/1809.02196v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-nonparametric-spectral-estimation |
Repo | https://github.com/GAMES-UChile/BayesianSpectralEstimation |
Framework | none |