February 1, 2020

3075 words 15 mins read

Paper Group AWR 313

Spatial-Angular Interaction for Light Field Image Super-Resolution. Sub-event detection from Twitter streams as a sequence labeling problem. On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems. DC3 – A Diagnostic Case Challenge Collection for Clinical Decision Support. Deep CNN-based Speech Balloon D …

Spatial-Angular Interaction for Light Field Image Super-Resolution


Title	Spatial-Angular Interaction for Light Field Image Super-Resolution
Authors	Yingqian Wang, Longguang Wang, Jungang Yang, Wei An, Jingyi Yu, Yulan Guo
Abstract	Light field (LF) cameras record both intensity and directions of light rays, and capture scenes from a number of viewpoints. Both information within each perspective (i.e., spatial information) and among different perspectives (i.e., angular information) is beneficial to image super-resolution (SR). In this paper, we propose a spatial-angular interactive network (namely, LF-InterNet) for LF image SR. Specifically, spatial and angular features are first separately extracted from input LFs, and then repetitively interacted to progressively incorporate spatial and angular information. Finally, the interacted features are fused to super-resolve each sub-aperture image. Experiments on 6 public LF datasets demonstrated that our method significantly outperforms the state-of-the-art single image and LF image SR methods both quantitatively and qualitatively.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07849v2
PDF	https://arxiv.org/pdf/1912.07849v2.pdf
PWC	https://paperswithcode.com/paper/spatial-angular-interaction-for-light-field
Repo	https://github.com/YingqianWang/DeOccNet
Framework	pytorch

Sub-event detection from Twitter streams as a sequence labeling problem


Title	Sub-event detection from Twitter streams as a sequence labeling problem
Authors	Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder
Abstract	This paper introduces improved methods for sub-event detection in social media streams, by applying neural sequence models not only on the level of individual posts, but also directly on the stream level. Current approaches to identify sub-events within a given event, such as a goal during a soccer match, essentially do not exploit the sequential nature of social media streams. We address this shortcoming by framing the sub-event detection problem in social media streams as a sequence labeling task and adopt a neural sequence architecture that explicitly accounts for the chronological order of posts. Specifically, we (i) establish a neural baseline that outperforms a graph-based state-of-the-art method for binary sub-event detection (2.7% micro-F1 improvement), as well as (ii) demonstrate superiority of a recurrent neural network model on the posts sequence level for labeled sub-events (2.4% bin-level F1 improvement over non-sequential models).
Tasks
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05396v1
PDF	http://arxiv.org/pdf/1903.05396v1.pdf
PWC	https://paperswithcode.com/paper/sub-event-detection-from-twitter-streams-as-a
Repo	https://github.com/bekou/subevent_sequence_labeling
Framework	pytorch

On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems


Title	On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems
Authors	Gabriel de Souza P. Moreira, Dietmar Jannach, Adilson Marques da Cunha
Abstract	News recommender systems are designed to surface relevant information for online readers by personalizing their user experiences. A particular problem in that context is that online readers are often anonymous, which means that this personalization can only be based on the last few recorded interactions with the user, a setting named session-based recommendation. Another particularity of the news domain is that constantly fresh articles are published, which should be immediately considered for recommendation. To deal with this item cold-start problem, it is important to consider the actual content of items when recommending. Hybrid approaches are therefore often considered as the method of choice in such settings. In this work, we analyze the importance of considering content information in a hybrid neural news recommender system. We contrast content-aware and content-agnostic techniques and also explore the effects of using different content encodings. Experiments on two public datasets confirm the importance of adopting a hybrid approach. Furthermore, we show that the choice of the content encoding can have an impact on the resulting performance.
Tasks	Recommendation Systems, Session-Based Recommendations
Published	2019-07-12
URL	https://arxiv.org/abs/1907.07629v3
PDF	https://arxiv.org/pdf/1907.07629v3.pdf
PWC	https://paperswithcode.com/paper/on-the-importance-of-news-content
Repo	https://github.com/gabrielspmoreira/chameleon_recsys
Framework	tf

DC3 – A Diagnostic Case Challenge Collection for Clinical Decision Support


Title	DC3 – A Diagnostic Case Challenge Collection for Clinical Decision Support
Authors	Carsten Eickhoff, Floran Gmehlin, Anu V. Patel, Jocelyn Boullier, Hamish Fraser
Abstract	In clinical care, obtaining a correct diagnosis is the first step towards successful treatment and, ultimately, recovery. Depending on the complexity of the case, the diagnostic phase can be lengthy and ridden with errors and delays. Such errors have a high likelihood to cause patients severe harm or even lead to their death and are estimated to cost the U.S. healthcare system several hundred billion dollars each year. To avoid diagnostic errors, physicians increasingly rely on diagnostic decision support systems drawing from heuristics, historic cases, textbooks, clinical guidelines and scholarly biomedical literature. The evaluation of such systems, however, is often conducted in an ad-hoc fashion, using non-transparent methodology, and proprietary data. This paper presents DC3, a collection of 31 extremely difficult diagnostic case challenges, manually compiled and solved by clinical experts. For each case, we present a number of temporally ordered physician-generated observations alongside the eventually confirmed true diagnosis. We additionally provide inferred dense relevance judgments for these cases among the PubMed collection of 27 million scholarly biomedical articles.
Tasks
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08581v1
PDF	https://arxiv.org/pdf/1908.08581v1.pdf
PWC	https://paperswithcode.com/paper/dc3-a-diagnostic-case-challenge-collection
Repo	https://github.com/codiag-public/dc3
Framework	none

Deep CNN-based Speech Balloon Detection and Segmentation for Comic Books


Title	Deep CNN-based Speech Balloon Detection and Segmentation for Comic Books
Authors	David Dubray, Jochen Laubrock
Abstract	We develop a method for the automated detection and segmentation of speech balloons in comic books, including their carrier and tails. Our method is based on a deep convolutional neural network that was trained on annotated pages of the Graphic Narrative Corpus. More precisely, we are using a fully convolutional network approach inspired by the U-Net architecture, combined with a VGG-16 based encoder. The trained model delivers state-of-the-art performance with an F1-score of over 0.94. Qualitative results suggest that wiggly tails, curved corners, and even illusory contours do not pose a major problem. Furthermore, the model has learned to distinguish speech balloons from captions. We compare our model to earlier results and discuss some possible applications.
Tasks
Published	2019-02-21
URL	http://arxiv.org/abs/1902.08137v1
PDF	http://arxiv.org/pdf/1902.08137v1.pdf
PWC	https://paperswithcode.com/paper/deep-cnn-based-speech-balloon-detection-and
Repo	https://github.com/DRDRD18/balloons
Framework	tf

MoEL: Mixture of Empathetic Listeners


Title	MoEL: Mixture of Empathetic Listeners
Authors	Zhaojiang Lin, Andrea Madotto, Jamin Shin, Peng Xu, Pascale Fung
Abstract	Previous research on empathetic dialogue systems has mostly focused on generating responses given certain emotions. However, being empathetic not only requires the ability of generating emotional responses, but more importantly, requires the understanding of user emotions and replying appropriately. In this paper, we propose a novel end-to-end approach for modeling empathy in dialogue systems: Mixture of Empathetic Listeners (MoEL). Our model first captures the user emotions and outputs an emotion distribution. Based on this, MoEL will softly combine the output states of the appropriate Listener(s), which are each optimized to react to certain emotions, and generate an empathetic response. Human evaluations on empathetic-dialogues (Rashkin et al., 2018) dataset confirm that MoEL outperforms multitask training baseline in terms of empathy, relevance, and fluency. Furthermore, the case study on generated responses of different Listeners shows high interpretability of our model.
Tasks
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07687v1
PDF	https://arxiv.org/pdf/1908.07687v1.pdf
PWC	https://paperswithcode.com/paper/190807687
Repo	https://github.com/HLTCHKUST/MoEL
Framework	pytorch

A New Distribution on the Simplex with Auto-Encoding Applications


Title	A New Distribution on the Simplex with Auto-Encoding Applications
Authors	Andrew Stirn, Tony Jebara, David A Knowles
Abstract	We construct a new distribution for the simplex using the Kumaraswamy distribution and an ordered stick-breaking process. We explore and develop the theoretical properties of this new distribution and prove that it exhibits symmetry under the same conditions as the well-known Dirichlet. Like the Dirichlet, the new distribution is adept at capturing sparsity but, unlike the Dirichlet, has an exact and closed form reparameterization–making it well suited for deep variational Bayesian modeling. We demonstrate the distribution’s utility in a variety of semi-supervised auto-encoding tasks. In all cases, the resulting models achieve competitive performance commensurate with their simplicity, use of explicit probability models, and abstinence from adversarial training.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12052v3
PDF	https://arxiv.org/pdf/1905.12052v3.pdf
PWC	https://paperswithcode.com/paper/a-new-distribution-on-the-simplex-with-auto
Repo	https://github.com/astirn/MV-Kumaraswamy
Framework	tf

A Closed-form Solution to Universal Style Transfer


Title	A Closed-form Solution to Universal Style Transfer
Authors	Ming Lu, Hao Zhao, Anbang Yao, Yurong Chen, Feng Xu, Li Zhang
Abstract	Universal style transfer tries to explicitly minimize the losses in feature space, thus it does not require training on any pre-defined styles. It usually uses different layers of VGG network as the encoders and trains several decoders to invert the features into images. Therefore, the effect of style transfer is achieved by feature transform. Although plenty of methods have been proposed, a theoretical analysis of feature transform is still missing. In this paper, we first propose a novel interpretation by treating it as the optimal transport problem. Then, we demonstrate the relations of our formulation with former works like Adaptive Instance Normalization (AdaIN) and Whitening and Coloring Transform (WCT). Finally, we derive a closed-form solution named Optimal Style Transfer (OST) under our formulation by additionally considering the content loss of Gatys. Comparatively, our solution can preserve better structure and achieve visually pleasing results. It is simple yet effective and we demonstrate its advantages both quantitatively and qualitatively. Besides, we hope our theoretical analysis can inspire future works in neural style transfer. Code is available at https://github.com/lu-m13/OptimalStyleTransfer.
Tasks	Style Transfer
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00668v2
PDF	https://arxiv.org/pdf/1906.00668v2.pdf
PWC	https://paperswithcode.com/paper/190600668
Repo	https://github.com/lu-m13/OptimalStyleTransfer
Framework	pytorch

Active Learning in the Overparameterized and Interpolating Regime


Title	Active Learning in the Overparameterized and Interpolating Regime
Authors	Mina Karzand, Robert D. Nowak
Abstract	Overparameterized models that interpolate training data often display surprisingly good generalization properties. Specifically, minimum norm solutions have been shown to generalize well in the overparameterized, interpolating regime. This paper introduces a new framework for active learning based on the notion of minimum norm interpolators. We analytically study its properties and behavior in the kernel-based setting and present experimental studies with kernel methods and neural networks. In general, active learning algorithms adaptively select examples for labeling that (1) rule-out as many (incompatible) classifiers as possible at each step and/or (2) discover cluster structure in unlabeled data and label representative examples from each cluster. We show that our new active learning approach based on a minimum norm heuristic automatically exploits both these strategies.
Tasks	Active Learning
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12782v1
PDF	https://arxiv.org/pdf/1905.12782v1.pdf
PWC	https://paperswithcode.com/paper/active-learning-in-the-overparameterized-and
Repo	https://github.com/jackhwolf/Overparameterized-AL-NN
Framework	pytorch

A Closest Point Proposal for MCMC-based Probabilistic Surface Registration


Title	A Closest Point Proposal for MCMC-based Probabilistic Surface Registration
Authors	Dennis Madsen, Andreas Morel-Forster, Patrick Kahr, Dana Rahbani, Thomas Vetter, Marcel Lüthi
Abstract	In this paper, we propose a non-rigid surface registration algorithm that estimates the correspondence uncertainty using the Markov-chain Monte Carlo (MCMC) framework. The estimated uncertainty of the inferred registration is important for many applications, such as surgical planning or missing data reconstruction. The used Metropolis-Hastings (MH) algorithm decouples the inference from modelling the posterior using a propose-and-verify scheme. The widely used random sampling strategy leads to slow convergence rates in high dimensional space. In order to overcome this limitation, we introduce an informed probabilistic proposal based on ICP that can be used within the MH algorithm. While the ICP algorithm is used in the inference algorithm, the likelihood can be chosen independently. We showcase different surface distance measures, such as the traditional Euclidean norm and the Hausdorff distance. While quantifying the uncertainty of the correspondence, we also experimentally verify that our method is more robust than the non-rigid ICP algorithm and provides more accurate surface registrations. In a reconstruction task, we show how our probabilistic framework can be used to estimate the posterior distribution of missing data without assuming a fixed point-to-point correspondence. We have made our registration framework publicly available for the community.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01414v1
PDF	https://arxiv.org/pdf/1907.01414v1.pdf
PWC	https://paperswithcode.com/paper/a-closest-point-proposal-for-mcmc-based
Repo	https://github.com/unibas-gravis/icp-proposal
Framework	none

Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition


Title	Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition
Authors	Yaman Kumar, Dhruva Sahrawat, Shubham Maheshwari, Debanjan Mahata, Amanda Stent, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann
Abstract	Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases. We also show that our models are language agnostic and therefore capable of seamlessly generating, using English training data, videos for a new language (Hindi). To the best of our knowledge, this is the first work to show empirical evidence of the use of GANs for generating training samples of unseen classes in the domain of VSR, hence facilitating zero-shot learning. We make the added videos for new classes publicly available along with our code.
Tasks	Speech Recognition, Visual Speech Recognition, Zero-Shot Learning
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10139v4
PDF	https://arxiv.org/pdf/1901.10139v4.pdf
PWC	https://paperswithcode.com/paper/harnessing-gans-for-addition-of-new-classes
Repo	https://github.com/midas-research/DECA
Framework	pytorch

Proposed Guidelines for the Responsible Use of Explainable Machine Learning


Title	Proposed Guidelines for the Responsible Use of Explainable Machine Learning
Authors	Patrick Hall, Navdeep Gill, Nicholas Schmidt
Abstract	Explainable machine learning (ML) enables human learning from ML, human appeal of automated model decisions, regulatory compliance, and security audits of ML models. Explainable ML (i.e. explainable artificial intelligence or XAI) has been implemented in numerous open source and commercial packages and explainable ML is also an important, mandatory, or embedded aspect of commercial predictive modeling in industries like financial services. However, like many technologies, explainable ML can be misused, particularly as a faulty safeguard for harmful black-boxes, e.g. fairwashing or scaffolding, and for other malevolent purposes like stealing models and sensitive training data. To promote best-practice discussions for this already in-flight technology, this short text presents internal definitions and a few examples before covering the proposed guidelines. This text concludes with a seemingly natural argument for the use of interpretable models and explanatory, debugging, and disparate impact testing methods in life- or mission-critical ML systems.
Tasks
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03533v3
PDF	https://arxiv.org/pdf/1906.03533v3.pdf
PWC	https://paperswithcode.com/paper/guidelines-for-responsible-and-human-centered
Repo	https://github.com/jphall663/interpretable_machine_learning_with_python
Framework	none

Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction


Title	Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction
Authors	Wentao Ouyang, Xiuwu Zhang, Li Li, Heng Zou, Xin Xing, Zhaojie Liu, Yanlong Du
Abstract	Click-through rate (CTR) prediction is a critical task in online advertising systems. A large body of research considers each ad independently, but ignores its relationship to other ads that may impact the CTR. In this paper, we investigate various types of auxiliary ads for improving the CTR prediction of the target ad. In particular, we explore auxiliary ads from two viewpoints: one is from the spatial domain, where we consider the contextual ads shown above the target ad on the same page; the other is from the temporal domain, where we consider historically clicked and unclicked ads of the user. The intuitions are that ads shown together may influence each other, clicked ads reflect a user’s preferences, and unclicked ads may indicate what a user dislikes to certain extent. In order to effectively utilize these auxiliary data, we propose the Deep Spatio-Temporal neural Networks (DSTNs) for CTR prediction. Our model is able to learn the interactions between each type of auxiliary data and the target ad, to emphasize more important hidden information, and to fuse heterogeneous data in a unified framework. Offline experiments on one public dataset and two industrial datasets show that DSTNs outperform several state-of-the-art methods for CTR prediction. We have deployed the best-performing DSTN in Shenma Search, which is the second largest search engine in China. The A/B test results show that the online CTR is also significantly improved compared to our last serving model.
Tasks	Click-Through Rate Prediction
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03776v2
PDF	https://arxiv.org/pdf/1906.03776v2.pdf
PWC	https://paperswithcode.com/paper/deep-spatio-temporal-neural-networks-for
Repo	https://github.com/oywtece/dstn
Framework	tf

Unsupervised Object Segmentation by Redrawing


Title	Unsupervised Object Segmentation by Redrawing
Authors	Mickaël Chen, Thierry Artières, Ludovic Denoyer
Abstract	Object segmentation is a crucial problem that is usually solved by using supervised learning approaches over very large datasets composed of both images and corresponding object masks. Since the masks have to be provided at pixel level, building such a dataset for any new domain can be very time-consuming. We present ReDO, a new model able to extract objects from images without any annotation in an unsupervised way. It relies on the idea that it should be possible to change the textures or colors of the objects without changing the overall distribution of the dataset. Following this assumption, our approach is based on an adversarial architecture where the generator is guided by an input sample: given an image, it extracts the object mask, then redraws a new object at the same location. The generator is controlled by a discriminator that ensures that the distribution of generated images is aligned to the original one. We experiment with this method on different datasets and demonstrate the good quality of extracted masks.
Tasks	Semantic Segmentation
Published	2019-05-27
URL	https://arxiv.org/abs/1905.13539v4
PDF	https://arxiv.org/pdf/1905.13539v4.pdf
PWC	https://paperswithcode.com/paper/190513539
Repo	https://github.com/mickaelChen/ReDO
Framework	pytorch

Quantization Networks


Title	Quantization Networks
Authors	Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xiansheng Hua
Abstract	Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network into a low-bitwidth integer version, has been an active and promising research topic. Existing methods formulate the low-bit quantization of networks as an approximation or optimization problem. Approximation-based methods confront the gradient mismatch problem, while optimization-based methods are only suitable for quantizing weights and could introduce high computational cost in the training stage. In this paper, we propose a novel perspective of interpreting and implementing neural network quantization by formulating low-bit quantization as a differentiable non-linear function (termed quantization function). The proposed quantization function can be learned in a lossless and end-to-end manner and works for any weights and activations of neural networks in a simple and uniform way. Extensive experiments on image classification and object detection tasks show that our quantization networks outperform the state-of-the-art methods. We believe that the proposed method will shed new insights on the interpretation of neural network quantization. Our code is available at https://github.com/aliyun/alibabacloud-quantization-networks.
Tasks	Image Classification, Object Detection, Quantization
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09464v2
PDF	https://arxiv.org/pdf/1911.09464v2.pdf
PWC	https://paperswithcode.com/paper/quantization-networks-1
Repo	https://github.com/aliyun/alibabacloud-quantization-networks
Framework	pytorch