October 20, 2019

3251 words 16 mins read

Paper Group AWR 283

Understanding Regularized Spectral Clustering via Graph Conductance. Stepping Stones to Inductive Synthesis of Low-Level Looping Programs. Neural Machine Translation of Text from Non-Native Speakers. Proximal Mean-field for Neural Network Quantization. Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Sepa …

Understanding Regularized Spectral Clustering via Graph Conductance


Title	Understanding Regularized Spectral Clustering via Graph Conductance
Authors	Yilin Zhang, Karl Rohe
Abstract	This paper uses the relationship between graph conductance and spectral clustering to study (i) the failures of spectral clustering and (ii) the benefits of regularization. The explanation is simple. Sparse and stochastic graphs create a lot of small trees that are connected to the core of the graph by only one edge. Graph conductance is sensitive to these noisy `dangling sets'. Spectral clustering inherits this sensitivity. The second part of the paper starts from a previously proposed form of regularized spectral clustering and shows that it is related to the graph conductance on a` regularized graph’. We call the conductance on the regularized graph CoreCut. Based upon previous arguments that relate graph conductance to spectral clustering (e.g. Cheeger inequality), minimizing CoreCut relaxes to regularized spectral clustering. Simple inspection of CoreCut reveals why it is less sensitive to small cuts in the graph. Together, these results show that unbalanced partitions from spectral clustering can be understood as overfitting to noise in the periphery of a sparse and stochastic graph. Regularization fixes this overfitting. In addition to this statistical benefit, these results also demonstrate how regularization can improve the computational speed of spectral clustering. We provide simulations and data examples to illustrate these results.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01468v4
PDF	http://arxiv.org/pdf/1806.01468v4.pdf
PWC	https://paperswithcode.com/paper/understanding-regularized-spectral-clustering
Repo	https://github.com/yzhang672/NeurlPS18
Framework	none

Stepping Stones to Inductive Synthesis of Low-Level Looping Programs


Title	Stepping Stones to Inductive Synthesis of Low-Level Looping Programs
Authors	Christopher D. Rosin
Abstract	Inductive program synthesis, from input/output examples, can provide an opportunity to automatically create programs from scratch without presupposing the algorithmic form of the solution. For induction of general programs with loops (as opposed to loop-free programs, or synthesis for domain-specific languages), the state of the art is at the level of introductory programming assignments. Most problems that require algorithmic subtlety, such as fast sorting, have remained out of reach without the benefit of significant problem-specific background knowledge. A key challenge is to identify cues that are available to guide search towards correct looping programs. We present MAKESPEARE, a simple delayed-acceptance hillclimbing method that synthesizes low-level looping programs from input/output examples. During search, delayed acceptance bypasses small gains to identify significantly-improved stepping stone programs that tend to generalize and enable further progress. The method performs well on a set of established benchmarks, and succeeds on the previously unsolved “Collatz Numbers” program synthesis problem. Additional benchmarks include the problem of rapidly sorting integer arrays, in which we observe the emergence of comb sort (a Shell sort variant that is empirically fast). MAKESPEARE has also synthesized a record-setting program on one of the puzzles from the TIS-100 assembly language programming game.
Tasks	Program Synthesis
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10665v1
PDF	http://arxiv.org/pdf/1811.10665v1.pdf
PWC	https://paperswithcode.com/paper/stepping-stones-to-inductive-synthesis-of-low
Repo	https://github.com/ChristopherRosin/MAKESPEARE
Framework	none

Neural Machine Translation of Text from Non-Native Speakers


Title	Neural Machine Translation of Text from Non-Native Speakers
Authors	Antonios Anastasopoulos, Alison Lui, Toan Nguyen, David Chiang
Abstract	Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.5 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.
Tasks	Machine Translation
Published	2018-08-19
URL	http://arxiv.org/abs/1808.06267v2
PDF	http://arxiv.org/pdf/1808.06267v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-of-text-from-non
Repo	https://github.com/tnq177/nmt_text_from_non_native_speaker
Framework	pytorch

Proximal Mean-field for Neural Network Quantization


Title	Proximal Mean-field for Neural Network Quantization
Authors	Thalaiyasingam Ajanthan, Puneet K. Dokania, Richard Hartley, Philip H. S. Torr
Abstract	Compressing large Neural Networks (NN) by quantizing the parameters, while maintaining the performance is highly desirable due to reduced memory and time complexity. In this work, we cast NN quantization as a discrete labelling problem, and by examining relaxations, we design an efficient iterative optimization procedure that involves stochastic gradient descent followed by a projection. We prove that our simple projected gradient descent approach is, in fact, equivalent to a proximal version of the well-known mean-field method. These findings would allow the decades-old and theoretically grounded research on MRF optimization to be used to design better network quantization schemes. Our experiments on standard classification datasets (MNIST, CIFAR10/100, TinyImageNet) with convolutional and residual architectures show that our algorithm obtains fully-quantized networks with accuracies very close to the floating-point reference networks.
Tasks	Image Classification, Quantization
Published	2018-12-11
URL	https://arxiv.org/abs/1812.04353v3
PDF	https://arxiv.org/pdf/1812.04353v3.pdf
PWC	https://paperswithcode.com/paper/proximal-mean-field-for-neural-network
Repo	https://github.com/tajanthan/pmf
Framework	pytorch

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation


Title	Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Authors	Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein
Abstract	We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and auditory signals to solve this task. The visual features are used to “focus” the audio on desired speakers in a scene and to improve the speech separation quality. To train our joint audio-visual model, we introduce AVSpeech, a new dataset comprised of thousands of hours of video segments from the Web. We demonstrate the applicability of our method to classic speech separation tasks, as well as real-world scenarios involving heated interviews, noisy bars, and screaming children, only requiring the user to specify the face of the person in the video whose speech they want to isolate. Our method shows clear advantage over state-of-the-art audio-only speech separation in cases of mixed speech. In addition, our model, which is speaker-independent (trained once, applicable to any speaker), produces better results than recent audio-visual speech separation methods that are speaker-dependent (require training a separate model for each speaker of interest).
Tasks	Speech Separation
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03619v2
PDF	http://arxiv.org/pdf/1804.03619v2.pdf
PWC	https://paperswithcode.com/paper/looking-to-listen-at-the-cocktail-party-a
Repo	https://github.com/rusac/math
Framework	none

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning


Title	Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning
Authors	Sebastian Raschka
Abstract	The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning. Common methods such as the holdout method for model evaluation and selection are covered, which are not recommended when working with small datasets. Different flavors of the bootstrap technique are introduced for estimating the uncertainty of performance estimates, as an alternative to confidence intervals via normal approximation if bootstrapping is computationally feasible. Common cross-validation techniques such as leave-one-out cross-validation and k-fold cross-validation are reviewed, the bias-variance trade-off for choosing k is discussed, and practical tips for the optimal choice of k are given based on empirical evidence. Different statistical tests for algorithm comparisons are presented, and strategies for dealing with multiple comparisons such as omnibus tests and multiple-comparison corrections are discussed. Finally, alternative methods for algorithm selection, such as the combined F-test 5x2 cross-validation and nested cross-validation, are recommended for comparing machine learning algorithms when datasets are small.
Tasks	Model Selection
Published	2018-11-13
URL	http://arxiv.org/abs/1811.12808v2
PDF	http://arxiv.org/pdf/1811.12808v2.pdf
PWC	https://paperswithcode.com/paper/model-evaluation-model-selection-and
Repo	https://github.com/rasbt/model-eval-article-supplementary
Framework	none

Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks


Title	Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks
Authors	Vitalii Zhelezniak, Dan Busbridge, April Shen, Samuel L. Smith, Nils Y. Hammerla
Abstract	Experimental evidence indicates that simple models outperform complex deep networks on many unsupervised similarity tasks. We provide a simple yet rigorous explanation for this behaviour by introducing the concept of an optimal representation space, in which semantically close symbols are mapped to representations that are close under a similarity measure induced by the model’s objective function. In addition, we present a straightforward procedure that, without any retraining or architectural modifications, allows deep recurrent models to perform equally well (and sometimes better) when compared to shallow models. To validate our analysis, we conduct a set of consistent empirical evaluations and introduce several new sentence embedding models in the process. Even though this work is presented within the context of natural language processing, the insights are readily applicable to other domains that rely on distributed representations for transfer tasks.
Tasks	Sentence Embedding
Published	2018-05-09
URL	http://arxiv.org/abs/1805.03435v1
PDF	http://arxiv.org/pdf/1805.03435v1.pdf
PWC	https://paperswithcode.com/paper/decoding-decoders-finding-optimal
Repo	https://github.com/Babylonpartners/decoding-decoders
Framework	tf

Calculating the similarity between words and sentences using a lexical database and corpus statistics


Title	Calculating the similarity between words and sentences using a lexical database and corpus statistics
Authors	Atish Pawar, Vijay Mago
Abstract	Calculating the semantic similarity between sentences is a long dealt problem in the area of natural language processing. The semantic analysis field has a crucial role to play in the research related to the text analytics. The semantic similarity differs as the domain of operation differs. In this paper, we present a methodology which deals with this issue by incorporating semantic similarity and corpus statistics. To calculate the semantic similarity between words and sentences, the proposed method follows an edge-based approach using a lexical database. The methodology can be applied in a variety of domains. The methodology has been tested on both benchmark standards and mean human similarity dataset. When tested on these two datasets, it gives highest correlation value for both word and sentence similarity outperforming other similar models. For word similarity, we obtained Pearson correlation coefficient of 0.8753 and for sentence similarity, the correlation obtained is 0.8794.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05667v2
PDF	http://arxiv.org/pdf/1802.05667v2.pdf
PWC	https://paperswithcode.com/paper/calculating-the-similarity-between-words-and
Repo	https://github.com/nihitsaxena95/sentence-similarity-wordnet-sementic
Framework	none

Learning long-range spatial dependencies with horizontal gated-recurrent units


Title	Learning long-range spatial dependencies with horizontal gated-recurrent units
Authors	Drew Linsley, Junkyung Kim, Vijay Veerabadran, Thomas Serre
Abstract	Progress in deep learning has spawned great successes in many engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching – and sometimes even surpassing – human accuracy on a variety of visual recognition tasks. Here, however, we show that these neural networks and their recent extensions struggle in recognition tasks where co-dependent visual features must be detected over long spatial ranges. We introduce the horizontal gated-recurrent unit (hGRU) to learn intrinsic horizontal connections – both within and across feature columns. We demonstrate that a single hGRU layer matches or outperforms all tested feedforward hierarchical baselines including state-of-the-art architectures which have orders of magnitude more free parameters. We further discuss the biological plausibility of the hGRU in comparison to anatomical data from the visual cortex as well as human behavioral data on a classic contour detection task.
Tasks	Contour Detection
Published	2018-05-21
URL	https://arxiv.org/abs/1805.08315v4
PDF	https://arxiv.org/pdf/1805.08315v4.pdf
PWC	https://paperswithcode.com/paper/learning-long-range-spatial-dependencies-with-1
Repo	https://github.com/serre-lab/hgru_share
Framework	tf

Imagine This! Scripts to Compositions to Videos


Title	Imagine This! Scripts to Compositions to Videos
Authors	Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi
Abstract	Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present the Composition, Retrieval, and Fusion Network (CRAFT), a model capable of learning this knowledge from video-caption data and applying it while generating videos from novel captions. CRAFT explicitly predicts a temporal-layout of mentioned entities (characters and objects), retrieves spatio-temporal entity segments from a video database and fuses them to generate scene videos. Our contributions include sequential training of components of CRAFT while jointly modeling layout and appearances, and losses that encourage learning compositional representations for retrieval. We evaluate CRAFT on semantic fidelity to caption, composition consistency, and visual quality. CRAFT outperforms direct pixel generation approaches and generalizes well to unseen captions and to unseen video databases with no text annotations. We demonstrate CRAFT on FLINTSTONES, a new richly annotated video-caption dataset with over 25000 videos. For a glimpse of videos generated by CRAFT, see https://youtu.be/688Vv86n0z8.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03608v1
PDF	http://arxiv.org/pdf/1804.03608v1.pdf
PWC	https://paperswithcode.com/paper/imagine-this-scripts-to-compositions-to
Repo	https://github.com/Leiree/Paginas-importantes
Framework	tf

MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics


Title	MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics
Authors	Xinchen Yan, Akash Rastogi, Ruben Villegas, Kalyan Sunkavalli, Eli Shechtman, Sunil Hadap, Ersin Yumer, Honglak Lee
Abstract	Long-term human motion can be represented as a series of motion modes—motion sequences that capture short-term temporal dynamics—with transitions between them. We leverage this structure and present a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation. Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode. Our model is able to generate multiple diverse and plausible motion sequences in the future from the same input. We apply our approach to both facial and full body motion, and demonstrate applications like analogy-based motion transfer and video synthesis.
Tasks	Human Dynamics, motion prediction
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04545v1
PDF	http://arxiv.org/pdf/1808.04545v1.pdf
PWC	https://paperswithcode.com/paper/mt-vae-learning-motion-transformations-to
Repo	https://github.com/xcyan/eccv18_mtvae
Framework	tf

Wasserstein regularization for sparse multi-task regression


Title	Wasserstein regularization for sparse multi-task regression
Authors	Hicham Janati, Marco Cuturi, Alexandre Gramfort
Abstract	We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space. Such problems often employ sparse priors, which promote models using a small subset of regressors. To increase statistical power, the so-called multi-task techniques were proposed, which consist in the simultaneous estimation of several related models. Combined with sparsity assumptions, it lead to models enforcing the active regressors to be shared across models, thanks to, for instance L1 / Lq norms. We argue in this paper that these techniques fail to leverage the spatial information associated to regressors. Indeed, while sparse priors enforce that only a small subset of variables is used, the assumption that these regressors overlap across all tasks is overly simplistic given the spatial variability observed in real data. In this paper, we propose a convex regularizer for multi-task regression that encodes a more flexible geometry. Our regularizer is based on unbalanced optimal transport (OT) theory, and can take into account a prior geometric knowledge on the regressor variables, without necessarily requiring overlapping supports. We derive an efficient algorithm based on a regularized formulation of OT, which iterates through applications of Sinkhorn’s algorithm along with coordinate descent iterations. The performance of our model is demonstrated on regular grids with both synthetic and real datasets as well as complex triangulated geometries of the cortex with an application in neuroimaging.
Tasks
Published	2018-05-20
URL	http://arxiv.org/abs/1805.07833v3
PDF	http://arxiv.org/pdf/1805.07833v3.pdf
PWC	https://paperswithcode.com/paper/wasserstein-regularization-for-sparse-multi
Repo	https://github.com/hichamjanati/mtw
Framework	none

Few-Shot Segmentation Propagation with Guided Networks


Title	Few-Shot Segmentation Propagation with Guided Networks
Authors	Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alexei A. Efros, Sergey Levine
Abstract	Learning-based methods for visual segmentation have made progress on particular types of segmentation tasks, but are limited by the necessary supervision, the narrow definitions of fixed tasks, and the lack of control during inference for correcting errors. To remedy the rigidity and annotation burden of standard approaches, we address the problem of few-shot segmentation: given few image and few pixel supervision, segment any images accordingly. We propose guided networks, which extract a latent task representation from any amount of supervision, and optimize our architecture end-to-end for fast, accurate few-shot segmentation. Our method can switch tasks without further optimization and quickly update when given more guidance. We report the first results for segmentation from one pixel per concept and show real-time interactive video segmentation. Our unified approach propagates pixel annotations across space for interactive segmentation, across time for video segmentation, and across scenes for semantic segmentation. Our guided segmentor is state-of-the-art in accuracy for the amount of annotation and time. See http://github.com/shelhamer/revolver for code, models, and more details.
Tasks	Interactive Segmentation, Semantic Segmentation, Video Semantic Segmentation
Published	2018-05-25
URL	http://arxiv.org/abs/1806.07373v1
PDF	http://arxiv.org/pdf/1806.07373v1.pdf
PWC	https://paperswithcode.com/paper/few-shot-segmentation-propagation-with-guided
Repo	https://github.com/shelhamer/revolver
Framework	pytorch

Boosted Sparse and Low-Rank Tensor Regression


Title	Boosted Sparse and Low-Rank Tensor Regression
Authors	Lifang He, Kun Chen, Wanwan Xu, Jiayu Zhou, Fei Wang
Abstract	We propose a sparse and low-rank tensor regression model to relate a univariate outcome to a feature tensor, in which each unit-rank tensor from the CP decomposition of the coefficient tensor is assumed to be sparse. This structure is both parsimonious and highly interpretable, as it implies that the outcome is related to the features through a few distinct pathways, each of which may only involve subsets of feature dimensions. We take a divide-and-conquer strategy to simplify the task into a set of sparse unit-rank tensor regression problems. To make the computation efficient and scalable, for the unit-rank tensor regression, we propose a stagewise estimation procedure to efficiently trace out its entire solution path. We show that as the step size goes to zero, the stagewise solution paths converge exactly to those of the corresponding regularized regression. The superior performance of our approach is demonstrated on various real-world and synthetic examples.
Tasks
Published	2018-11-03
URL	http://arxiv.org/abs/1811.01158v1
PDF	http://arxiv.org/pdf/1811.01158v1.pdf
PWC	https://paperswithcode.com/paper/boosted-sparse-and-low-rank-tensor-regression
Repo	https://github.com/LifangHe/SURF
Framework	none

Learning Interpretable Anatomical Features Through Deep Generative Models: Application to Cardiac Remodeling


Title	Learning Interpretable Anatomical Features Through Deep Generative Models: Application to Cardiac Remodeling
Authors	Carlo Biffi, Ozan Oktay, Giacomo Tarroni, Wenjia Bai, Antonio De Marvao, Georgia Doumou, Martin Rajchl, Reem Bedair, Sanjay Prasad, Stuart Cook, Declan O’Regan, Daniel Rueckert
Abstract	Alterations in the geometry and function of the heart define well-established causes of cardiovascular disease. However, current approaches to the diagnosis of cardiovascular diseases often rely on subjective human assessment as well as manual analysis of medical images. Both factors limit the sensitivity in quantifying complex structural and functional phenotypes. Deep learning approaches have recently achieved success for tasks such as classification or segmentation of medical images, but lack interpretability in the feature extraction and decision processes, limiting their value in clinical diagnosis. In this work, we propose a 3D convolutional generative model for automatic classification of images from patients with cardiac diseases associated with structural remodeling. The model leverages interpretable task-specific anatomic patterns learned from 3D segmentations. It further allows to visualise and quantify the learned pathology-specific remodeling patterns in the original input space of the images. This approach yields high accuracy in the categorization of healthy and hypertrophic cardiomyopathy subjects when tested on unseen MR images from our own multi-centre dataset (100%) as well on the ACDC MICCAI 2017 dataset (90%). We believe that the proposed deep learning approach is a promising step towards the development of interpretable classifiers for the medical imaging domain, which may help clinicians to improve diagnostic accuracy and enhance patient risk-stratification.
Tasks
Published	2018-07-18
URL	http://arxiv.org/abs/1807.06843v1
PDF	http://arxiv.org/pdf/1807.06843v1.pdf
PWC	https://paperswithcode.com/paper/learning-interpretable-anatomical-features
Repo	https://github.com/UK-Digital-Heart-Project/lvae_mlp
Framework	tf