February 2, 2020

3602 words 17 mins read

Paper Group AWR 64

Detecting Overfitting of Deep Generative Networks via Latent Recovery. Training Neural Response Selection for Task-Oriented Dialogue Systems. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). Scalable Gaussian Process Regression for Kernels with a Non-Stationary Phase. Planted Hitt …

Detecting Overfitting of Deep Generative Networks via Latent Recovery


Title	Detecting Overfitting of Deep Generative Networks via Latent Recovery
Authors	Ryan Webster, Julien Rabin, Loic Simon, Frederic Jurie
Abstract	State of the art deep generative networks are capable of producing images with such incredible realism that they can be suspected of memorizing training images. It is why it is not uncommon to include visualizations of training set nearest neighbors, to suggest generated images are not simply memorized. We demonstrate this is not sufficient and motivates the need to study memorization/overfitting of deep generators with more scrutiny. This paper addresses this question by i) showing how simple losses are highly effective at reconstructing images for deep generators ii) analyzing the statistics of reconstruction errors when reconstructing training and validation images, which is the standard way to analyze overfitting in machine learning. Using this methodology, this paper shows that overfitting is not detectable in the pure GAN models proposed in the literature, in contrast with those using hybrid adversarial losses, which are amongst the most widely applied generative methods. The paper also shows that standard GAN evaluation metrics fail to capture memorization for some deep generators. Finally, the paper also shows how off-the-shelf GAN generators can be successfully applied to face inpainting and face super-resolution using the proposed reconstruction method, without hybrid adversarial losses.
Tasks	Facial Inpainting, Super-Resolution
Published	2019-01-09
URL	http://arxiv.org/abs/1901.03396v1
PDF	http://arxiv.org/pdf/1901.03396v1.pdf
PWC	https://paperswithcode.com/paper/detecting-overfitting-of-deep-generative
Repo	https://github.com/ryanwebster90/gen-overfitting-latent-recovery
Framework	pytorch

Training Neural Response Selection for Task-Oriented Dialogue Systems


Title	Training Neural Response Selection for Task-Oriented Dialogue Systems
Authors	Matthew Henderson, Ivan Vulić, Daniela Gerz, Iñigo Casanueva, Paweł Budzianowski, Sam Coope, Georgios Spithourakis, Tsung-Hsien Wen, Nikola Mrkšić, Pei-Hao Su
Abstract	Despite their popularity in the chatbot literature, retrieval-based models have had modest impact on task-oriented dialogue systems, with the main obstacle to their application being the low-data regime of most task-oriented dialogue tasks. Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue. To train response selection models for task-oriented dialogue tasks, we propose a novel method which: 1) pretrains the response selection model on large general-domain conversational corpora; and then 2) fine-tunes the pretrained model for the target dialogue domain, relying only on the small in-domain dataset to capture the nuances of the given dialogue domain. Our evaluation on six diverse application domains, ranging from e-commerce to banking, demonstrates the effectiveness of the proposed training method.
Tasks	Chatbot, Language Modelling, Task-Oriented Dialogue Systems
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01543v2
PDF	https://arxiv.org/pdf/1906.01543v2.pdf
PWC	https://paperswithcode.com/paper/training-neural-response-selection-for-task
Repo	https://github.com/avidale/arxivarius
Framework	none

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)


Title	Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
Authors	Mariya Toneva, Leila Wehbe
Abstract	Neural networks models for NLP are typically implemented without the explicit encoding of language rules and yet they are able to break one performance record after another. This has generated a lot of research interest in interpreting the representations learned by these networks. We propose here a novel interpretation approach that relies on the only processing system we have that does understand language: the human brain. We use brain imaging recordings of subjects reading complex natural text to interpret word and sequence embeddings from 4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, context length, and attention type. Our results reveal differences in the context-related representations across these models. Further, in the transformer models, we find an interaction between layer depth and context length, and between layer depth and attention type. We finally hypothesize that altering BERT to better align with brain recordings would enable it to also better understand language. Probing the altered BERT using syntactic NLP tasks reveals that the model with increased brain-alignment outperforms the original model. Cognitive neuroscientists have already begun using NLP networks to study the brain, and this work closes the loop to allow the interaction between NLP and cognitive neuroscience to be a true cross-pollination.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11833v4
PDF	https://arxiv.org/pdf/1905.11833v4.pdf
PWC	https://paperswithcode.com/paper/interpreting-and-improving-natural-language
Repo	https://github.com/mtoneva/brain_language_nlp
Framework	pytorch

Scalable Gaussian Process Regression for Kernels with a Non-Stationary Phase


Title	Scalable Gaussian Process Regression for Kernels with a Non-Stationary Phase
Authors	Jan Graßhoff, Alexandra Jankowski, Philipp Rostalski
Abstract	The application of Gaussian processes (GPs) to large data sets is limited due to heavy memory and computational requirements. A variety of methods has been proposed to enable scalability, one of which is to exploit structure in the kernel matrix. Previous methods, however, cannot easily deal with non-stationary processes. This paper presents an efficient GP framework, that extends structured kernel interpolation methods to GPs with a non-stationary phase. We particularly treat mixtures of non-stationary processes, which are commonly used in the context of separation problems e.g. in biomedical signal processing. Our approach employs multiple sets of non-equidistant inducing points to account for the non-stationarity and retrieve Toeplitz and Kronecker structure in the kernel matrix allowing for efficient inference. Kernel learning is done by optimizing the marginal likelihood, which can be approximated efficiently using stochastic trace estimation methods. Our approach is demonstrated on numerical examples and large biomedical datasets.
Tasks	Gaussian Processes
Published	2019-12-25
URL	https://arxiv.org/abs/1912.11713v1
PDF	https://arxiv.org/pdf/1912.11713v1.pdf
PWC	https://paperswithcode.com/paper/scalable-gaussian-process-regression-for
Repo	https://github.com/ime-luebeck/non-stationary-phase-gp-mod
Framework	none

Planted Hitting Set Recovery in Hypergraphs


Title	Planted Hitting Set Recovery in Hypergraphs
Authors	Ilya Amburg, Jon Kleinberg, Austin R. Benson
Abstract	In various application areas, networked data is collected by measuring interactions involving some specific set of core nodes. This results in a network dataset containing the core nodes along with a potentially much larger set of fringe nodes that all have at least one interaction with a core node. In many settings, this type of data arises for structures that are richer than graphs, because they involve the interactions of larger sets; for example, the core nodes might be a set of individuals under surveillance, where we observe the attendees of meetings involving at least one of the core individuals. We model such scenarios using hypergraphs, and we study the problem of core recovery: if we observe the hypergraph but not the labels of core and fringe nodes, can we recover the “planted” set of core nodes in the hypergraph? We provide a theoretical framework for analyzing the recovery of such a set of core nodes and use our theory to develop a practical and scalable algorithm for core recovery. The crux of our analysis and algorithm is that the core nodes are a hitting set of the hypergraph, meaning that every hyperedge has at least one node in the set of core nodes. We demonstrate the efficacy of our algorithm on a number of real-world datasets, outperforming competitive baselines derived from network centrality and core-periphery measures.
Tasks
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05839v1
PDF	https://arxiv.org/pdf/1905.05839v1.pdf
PWC	https://paperswithcode.com/paper/planted-hitting-set-recovery-in-hypergraphs
Repo	https://github.com/ilyaamburg/Hypergraph-Planted-Hitting-Set-Recovery
Framework	none

Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax


Title	Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax
Authors	Andres Potapczynski, Gabriel Loaiza-Ganem, John P. Cunningham
Abstract	The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. Unfortunately, we show that the cost of this aesthetic interpretability is material: the temperature hyperparameter must be set too high, KL estimates are noisy, and as a result, performance suffers. We circumvent the previous issues by proposing a much simpler and more flexible reparameterizable family of distributions that transforms Gaussian noise into a one-hot approximation through an invertible function. This invertible function is composed of a modified softmax and can incorporate diverse transformations that serve different specific purposes. For example, the stick-breaking procedure allows us to extend the reparameterization trick to distributions with countably infinite support, or normalizing flows let us increase the flexibility of the distribution. Our construction improves numerical stability and outperforms the Gumbel-Softmax in a variety of experiments while generating samples that are closer to their discrete counterparts and achieving lower-variance gradients.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09588v2
PDF	https://arxiv.org/pdf/1912.09588v2.pdf
PWC	https://paperswithcode.com/paper/invertible-gaussian-reparameterization
Repo	https://github.com/cunningham-lab/igr
Framework	tf

Digital phase-only holography using deep conditional generative models


Title	Digital phase-only holography using deep conditional generative models
Authors	Jannes Gladrow
Abstract	Holographic wave-shaping has found numerous applications across the physical sciences, especially since the development of digital spatial-light modulators (SLMs). A key challenge in digital holography consists in finding optimal hologram patterns which transform the incoming laser beam into desired shapes in a conjugate optical plane. The existing repertoire of approaches to solve this inverse problem is built on iterative phase-retrieval algorithms, which do not take optical aberrations and deviations from theoretical models into account. Here, we adopt a physics-free, data-driven, and probabilistic approach to the problem. Using deep conditional generative models such as Generative-Adversarial Networks (cGAN) or Variational Autoencoder (cVAE), we approximate conditional distributions of holograms for a given target laser intensity pattern. In order to reduce the cardinality of the problem, we train our models on a proxy mapping relating an 8x8-matrix of complex-valued spatial-frequency coefficients to the ensuing 100x100-shaped intensity distribution recorded on a camera. We discuss the degree of ‘ill-posedness’ that remains in this reduced problem and compare different generative model architectures in terms of their ability to find holograms that reconstruct given intensity patterns. Finally, we challenge our models to generalise to synthetic target intensities, where the existence of matching holograms cannot be guaranteed. We devise a forward-interpolating training scheme aimed at providing models the ability to interpolate in laser intensity space, rather than hologram space and show that this indeed enhances model performance on synthetic data sets.
Tasks
Published	2019-11-03
URL	https://arxiv.org/abs/1911.00904v1
PDF	https://arxiv.org/pdf/1911.00904v1.pdf
PWC	https://paperswithcode.com/paper/digital-phase-only-holography-using-deep
Repo	https://github.com/JamesGlare/Holo_gen_models
Framework	tf

PolarMask: Single Shot Instance Segmentation with Polar Representation


Title	PolarMask: Single Shot Instance Segmentation with Polar Representation
Authors	Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo
Abstract	In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods. Our method, termed PolarMask, formulates the instance segmentation problem as instance center classification and dense distance regression in a polar coordinate. Moreover, we propose two effective approaches to deal with sampling high-quality center examples and optimization for dense distance regression, respectively, which can significantly improve the performance and simplify the training process. Without any bells and whistles, PolarMask achieves 32.9% in mask mAP with single-model and single-scale training/testing on challenging COCO dataset. For the first time, we demonstrate a much simpler and flexible instance segmentation framework achieving competitive accuracy. We hope that the proposed PolarMask framework can serve as a fundamental and strong baseline for single shot instance segmentation tasks. Code is available at: github.com/xieenze/PolarMask.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13226v4
PDF	https://arxiv.org/pdf/1909.13226v4.pdf
PWC	https://paperswithcode.com/paper/polarmask-single-shot-instance-segmentation
Repo	https://github.com/xieenze/PolarMask
Framework	pytorch

EmbedMask: Embedding Coupling for One-stage Instance Segmentation


Title	EmbedMask: Embedding Coupling for One-stage Instance Segmentation
Authors	Hui Ying, Zhaojin Huang, Shu Liu, Tianjia Shao, Kun Zhou
Abstract	Current instance segmentation methods can be categorized into segmentation-based methods that segment first then do clustering, and proposal-based methods that detect first then predict masks for each instance proposal using repooling. In this work, we propose a one-stage method, named EmbedMask, that unifies both methods by taking advantages of them. Like proposal-based methods, EmbedMask builds on top of detection models making it strong in detection capability. Meanwhile, EmbedMask applies extra embedding modules to generate embeddings for pixels and proposals, where pixel embeddings are guided by proposal embeddings if they belong to the same instance. Through this embedding coupling process, pixels are assigned to the mask of the proposal if their embeddings are similar. The pixel-level clustering enables EmbedMask to generate high-resolution masks without missing details from repooling, and the existence of proposal embedding simplifies and strengthens the clustering procedure to achieve high speed with higher performance than segmentation-based methods. Without any bells and whistles, EmbedMask achieves comparable performance as Mask R-CNN, which is the representative two-stage method, and can produce more detailed masks at a higher speed. Code is available at github.com/yinghdb/EmbedMask.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01954v2
PDF	https://arxiv.org/pdf/1912.01954v2.pdf
PWC	https://paperswithcode.com/paper/embedmask-embedding-coupling-for-one-stage
Repo	https://github.com/yinghdb/EmbedMask
Framework	pytorch

Parallel fault-tolerant programming of an arbitrary feedforward photonic network


Title	Parallel fault-tolerant programming of an arbitrary feedforward photonic network
Authors	Sunil Pai, Ian A. D. Williamson, Tyler W. Hughes, Momchil Minkov, Olav Solgaard, Shanhui Fan, David A. B. Miller
Abstract	Reconfigurable photonic mesh networks of tunable beamsplitter nodes can linearly transform $N$-dimensional vectors representing input modal amplitudes of light for applications such as energy-efficient machine learning hardware, quantum information processing, and mode demultiplexing. Such photonic meshes are typically programmed and/or calibrated by tuning or characterizing each beam splitter one-by-one, which can be time-consuming and can limit scaling to larger meshes. Here we introduce a graph-topological approach that defines the general class of feedforward networks commonly used in such applications and identifies columns of non-interacting nodes that can be adjusted simultaneously. By virtue of this approach, we can calculate the necessary input vectors to program entire columns of nodes in parallel by simultaneously nullifying the power in one output of each node via optoelectronic feedback onto adjustable phase shifters or couplers. This parallel nullification approach is fault-tolerant to fabrication errors, requiring no prior knowledge or calibration of the node parameters, and can reduce the programming time by a factor of order $N$ to being proportional to the optical depth (or number of node columns in the device). As a demonstration, we simulate our programming protocol on a feedforward optical neural network model trained to classify handwritten digit images from the MNIST dataset with up to 98% validation accuracy.
Tasks	Calibration
Published	2019-09-11
URL	https://arxiv.org/abs/1909.06179v1
PDF	https://arxiv.org/pdf/1909.06179v1.pdf
PWC	https://paperswithcode.com/paper/parallel-fault-tolerant-programming-of-an
Repo	https://github.com/solgaardlab/neurophox
Framework	tf

PixelHop: A Successive Subspace Learning (SSL) Method for Object Classification


Title	PixelHop: A Successive Subspace Learning (SSL) Method for Object Classification
Authors	Yueru Chen, C. -C. Jay Kuo
Abstract	A new machine learning methodology, called successive subspace learning (SSL), is introduced in this work. SSL contains four key ingredients: 1) successive near-to-far neighborhood expansion; 2) unsupervised dimension reduction via subspace approximation; 3) supervised dimension reduction via label-assisted regression (LAG); and 4) feature concatenation and decision making. An image-based object classification method, called PixelHop, is proposed to illustrate the SSL design. It is shown by experimental results that the PixelHop method outperforms the classic CNN model of similar model complexity in three benchmarking datasets (MNIST, Fashion MNIST and CIFAR-10). Although SSL and deep learning (DL) have some high-level concept in common, they are fundamentally different in model formulation, the training process and training complexity. Extensive discussion on the comparison of SSL and DL is made to provide further insights into the potential of SSL.
Tasks	Decision Making, Dimensionality Reduction, Object Classification
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08190v1
PDF	https://arxiv.org/pdf/1909.08190v1.pdf
PWC	https://paperswithcode.com/paper/pixelhop-a-successive-subspace-learning-ssl
Repo	https://github.com/yifan-fanyi/Pixelhop
Framework	none

“Does 4-4-2 exist?” – An Analytics Approach to Understand and Classify Football Team Formations in Single Match Situations


Title	“Does 4-4-2 exist?” – An Analytics Approach to Understand and Classify Football Team Formations in Single Match Situations
Authors	Eric Müller-Budack, Jonas Theiner, Robert Rein, Ralph Ewerth
Abstract	The chances to win a football match can be significantly increased if the right tactic is chosen and the behavior of the opposite team is well anticipated. For this reason, every professional football club employs a team of game analysts. However, at present game performance analysis is done manually and therefore highly time-consuming. Consequently, automated tools to support the analysis process are required. In this context, one of the main tasks is to summarize team formations by patterns such as 4-4-2. In this paper, we introduce an analytics approach that automatically classifies and visualizes the team formation based on the players’ position data. We focus on single match situations instead of complete halftimes or matches to provide a more detailed analysis. A detailed analysis of individual match situations depending on ball possession and match segment length is provided. For this purpose, a visual summary is utilized that summarizes the team formation in a match segment. An expert annotation study is conducted that demonstrates 1) the complexity of the task and 2) the usefulness of the visualization of single situations to understand team formations. The suggested classification approach outperforms existing methods for formation classification. In particular, our approach gives insights about the shortcomings of using patterns like 4-4-2 to describe team formations.
Tasks
Published	2019-09-02
URL	https://arxiv.org/abs/1910.00412v1
PDF	https://arxiv.org/pdf/1910.00412v1.pdf
PWC	https://paperswithcode.com/paper/does-4-4-2-exist-an-analytics-approach-to
Repo	https://github.com/naokiwifruit/classify-football-formation
Framework	none

Causal inference using Bayesian non-parametric quasi-experimental design


Title	Causal inference using Bayesian non-parametric quasi-experimental design
Authors	Max Hinne, Marcel A. J. van Gerven, Luca Ambrogioni
Abstract	The de facto standard for causal inference is the randomized controlled trial, where one compares an manipulated group with a control group in order to determine the effect of an intervention. However, this research design is not always realistically possible due to pragmatic or ethical concerns. In these situations, quasi-experimental designs may provide a solution, as these allow for causal conclusions at the cost of additional design assumptions. In this paper, we provide a generic framework for quasi-experimental design using Bayesian model comparison, and we show how it can be used as an alternative to several common research designs. We provide a theoretical motivation for a Gaussian process based approach and demonstrate its convenient use in a number of simulations. Finally, we apply the framework to determine the effect of population-based thresholds for municipality funding in France, of the 2005 smoking ban in Sicily on the number of acute coronary events, and of the effect of an alleged historical phantom border in the Netherlands on Dutch voting behaviour.
Tasks	Causal Inference
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06722v1
PDF	https://arxiv.org/pdf/1911.06722v1.pdf
PWC	https://paperswithcode.com/paper/causal-inference-using-bayesian-non
Repo	https://github.com/mhinne/BNQD
Framework	none

Model-based clustering in very high dimensions via adaptive projections


Title	Model-based clustering in very high dimensions via adaptive projections
Authors	Bernd Taschler, Frank Dondelinger, Sach Mukherjee
Abstract	Mixture models are a standard approach to dealing with heterogeneous data with non-i.i.d. structure. However, when the dimension $p$ is large relative to sample size $n$ and where either or both of means and covariances/graphical models may differ between the latent groups, mixture models face statistical and computational difficulties and currently available methods cannot realistically go beyond $p ! \sim ! 10^4$ or so. We propose an approach called Model-based Clustering via Adaptive Projections (MCAP). Instead of estimating mixtures in the original space, we work with a low-dimensional representation obtained by linear projection. The projection dimension itself plays an important role and governs a type of bias-variance tradeoff with respect to recovery of the relevant signals. MCAP sets the projection dimension automatically in a data-adaptive manner, using a proxy for the assignment risk. Combining a full covariance formulation with the adaptive projection allows detection of both mean and covariance signals in very high dimensional problems. We show real-data examples in which covariance signals are reliably detected in problems with $p ! \sim ! 10^4$ or more, and simulations going up to $p = 10^6$. In some examples, MCAP performs well even when the mean signal is entirely removed, leaving differential covariance structure in the high-dimensional space as the only signal. Across a number of regimes, MCAP performs as well or better than a range of existing methods, including a recently-proposed $\ell_1$-penalized approach; and performance remains broadly stable with increasing dimension. MCAP can be run “out of the box” and is fast enough for interactive use on large-$p$ problems using standard desktop computing resources.
Tasks
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08472v1
PDF	http://arxiv.org/pdf/1902.08472v1.pdf
PWC	https://paperswithcode.com/paper/model-based-clustering-in-very-high
Repo	https://github.com/btaschler/mcap
Framework	none

Inexact Block Coordinate Descent Algorithms for Nonsmooth Nonconvex Optimization


Title	Inexact Block Coordinate Descent Algorithms for Nonsmooth Nonconvex Optimization
Authors	Yang Yang, Marius Pesavento, Zhi-Quan Luo, Björn Ottersten
Abstract	In this paper, we propose an inexact block coordinate descent algorithm for large-scale nonsmooth nonconvex optimization problems. At each iteration, a particular block variable is selected and updated by inexactly solving the original optimization problem with respect to that block variable. More precisely, a local approximation of the original optimization problem is solved. The proposed algorithm has several attractive features, namely, i) high flexibility, as the approximation function only needs to be strictly convex and it does not have to be a global upper bound of the original function; ii) fast convergence, as the approximation function can be designed to exploit the problem structure at hand and the stepsize is calculated by the line search; iii) low complexity, as the approximation subproblems are much easier to solve and the line search scheme is carried out over a properly constructed differentiable function; iv) guaranteed convergence of a subsequence to a stationary point, even when the objective function does not have a Lipschitz continuous gradient. Interestingly, when the approximation subproblem is solved by a descent algorithm, convergence of a subsequence to a stationary point is still guaranteed even if the approximation subproblem is solved inexactly by terminating the descent algorithm after a finite number of iterations. These features make the proposed algorithm suitable for large-scale problems where the dimension exceeds the memory and/or the processing capability of the existing hardware. These features are also illustrated by several applications in signal processing and machine learning, for instance, network anomaly detection and phase retrieval.
Tasks	Anomaly Detection
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04211v5
PDF	https://arxiv.org/pdf/1905.04211v5.pdf
PWC	https://paperswithcode.com/paper/inexact-block-coordinate-descent-algorithms
Repo	https://github.com/optyang/BSCA
Framework	none