October 21, 2019

2840 words 14 mins read

Paper Group AWR 118

Universal Dependency Parsing with a General Transition-Based DAG Parser. Modeling Composite Labels for Neural Morphological Tagging. Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme. Hashing with Mutual Information. GAN Q-learning. 3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks. Cross-lingual Argumentati …

Universal Dependency Parsing with a General Transition-Based DAG Parser


Title	Universal Dependency Parsing with a General Transition-Based DAG Parser
Authors	Daniel Hershcovich, Omri Abend, Ari Rappoport
Abstract	This paper presents our experiments with applying TUPA to the CoNLL 2018 UD shared task. TUPA is a general neural transition-based DAG parser, which we use to present the first experiments on recovering enhanced dependencies as part of the general parsing task. TUPA was designed for parsing UCCA, a cross-linguistic semantic annotation scheme, exhibiting reentrancy, discontinuity and non-terminal nodes. By converting UD trees and graphs to a UCCA-like DAG format, we train TUPA almost without modification on the UD parsing task. The generic nature of our approach lends itself naturally to multitask learning. Our code is available at https://github.com/CoNLL-UD-2018/HUJI
Tasks	Dependency Parsing
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09354v1
PDF	http://arxiv.org/pdf/1808.09354v1.pdf
PWC	https://paperswithcode.com/paper/universal-dependency-parsing-with-a-general
Repo	https://github.com/CoNLL-UD-2018/HUJI
Framework	none

Modeling Composite Labels for Neural Morphological Tagging


Title	Modeling Composite Labels for Neural Morphological Tagging
Authors	Alexander Tkachenko, Kairit Sirts
Abstract	Neural morphological tagging has been regarded as an extension to POS tagging task, treating each morphological tag as a monolithic label and ignoring its internal structure. We propose to view morphological tags as composite labels and explicitly model their internal structure in a neural sequence tagger. For this, we explore three different neural architectures and compare their performance with both CRF and simple neural multiclass baselines. We evaluate our models on 49 languages and show that the neural architecture that models the morphological labels as sequences of morphological category values performs significantly better than both baselines establishing state-of-the-art results in morphological tagging for most languages.
Tasks	Morphological Tagging
Published	2018-10-20
URL	http://arxiv.org/abs/1810.08815v1
PDF	http://arxiv.org/pdf/1810.08815v1.pdf
PWC	https://paperswithcode.com/paper/modeling-composite-labels-for-neural
Repo	https://github.com/AleksTk/seq-morph-tagger
Framework	tf

Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme


Title	Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme
Authors	Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke, Adam Hammond
Abstract	In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling. We assess the quality of generated poems using crowd and expert judgements. The stress and rhyme models perform very well, as generated poems are largely indistinguishable from human-written poems. Expert evaluation, however, reveals that a vanilla language model captures meter implicitly, and that machine-generated poems still underperform in terms of readability and emotion. Our research shows the importance expert evaluation for poetry generation, and that future research should look beyond rhyme/meter and focus on poetic language.
Tasks	Language Modelling
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03491v1
PDF	http://arxiv.org/pdf/1807.03491v1.pdf
PWC	https://paperswithcode.com/paper/deep-speare-a-joint-neural-model-of-poetic
Repo	https://github.com/jhlau/deepspeare
Framework	tf

Hashing with Mutual Information


Title	Hashing with Mutual Information
Authors	Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff
Abstract	Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval. We study the problem of learning binary vector embeddings under a supervised setting, also known as hashing. We propose a novel supervised hashing method based on optimizing an information-theoretic quantity: mutual information. We show that optimizing mutual information can reduce ambiguity in the induced neighborhood structure in the learned Hamming space, which is essential in obtaining high retrieval performance. To this end, we optimize mutual information in deep neural networks with minibatch stochastic gradient descent, with a formulation that maximally and efficiently utilizes available supervision. Experiments on four image retrieval benchmarks, including ImageNet, confirm the effectiveness of our method in learning high-quality binary embeddings for nearest neighbor retrieval.
Tasks	Image Retrieval, Video Retrieval
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00974v2
PDF	http://arxiv.org/pdf/1803.00974v2.pdf
PWC	https://paperswithcode.com/paper/hashing-with-mutual-information
Repo	https://github.com/fcakir/deep-mihash
Framework	none

GAN Q-learning


Title	GAN Q-learning
Authors	Thang Doan, Bogdan Mazoure, Clare Lyle
Abstract	Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However, there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Q-learning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to traditional methods.
Tasks	Distributional Reinforcement Learning, Q-Learning
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04874v3
PDF	http://arxiv.org/pdf/1805.04874v3.pdf
PWC	https://paperswithcode.com/paper/gan-q-learning
Repo	https://github.com/daggertye/GAN-Q-Learning
Framework	tf

3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks


Title	3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks
Authors	Soo Ye Kim, Jeongyeon Lim, Taeyoung Na, Munchurl Kim
Abstract	In video super-resolution, the spatio-temporal coherence between, and among the frames must be exploited appropriately for accurate prediction of the high resolution frames. Although 2D convolutional neural networks (CNNs) are powerful in modelling images, 3D-CNNs are more suitable for spatio-temporal feature extraction as they can preserve temporal information. To this end, we propose an effective 3D-CNN for video super-resolution, called the 3DSRnet that does not require motion alignment as preprocessing. Our 3DSRnet maintains the temporal depth of spatio-temporal feature maps to maximally capture the temporally nonlinear characteristics between low and high resolution frames, and adopts residual learning in conjunction with the sub-pixel outputs. It outperforms the most state-of-the-art method with average 0.45 and 0.36 dB higher in PSNR for scales 3 and 4, respectively, in the Vidset4 benchmark. Our 3DSRnet first deals with the performance drop due to scene change, which is important in practice but has not been previously considered.
Tasks	Super-Resolution, Video Super-Resolution
Published	2018-12-21
URL	https://arxiv.org/abs/1812.09079v2
PDF	https://arxiv.org/pdf/1812.09079v2.pdf
PWC	https://paperswithcode.com/paper/3dsrnet-video-super-resolution-using-3d
Repo	https://github.com/sooyekim/3DSRnet
Framework	none

Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!


Title	Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Authors	Steffen Eger, Johannes Daxenberger, Christian Stab, Iryna Gurevych
Abstract	Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://github.com/UKPLab/coling2018-xling_argument_mining}.
Tasks	Cross-Lingual Transfer, Machine Translation, Word Embeddings
Published	2018-07-24
URL	http://arxiv.org/abs/1807.08998v1
PDF	http://arxiv.org/pdf/1807.08998v1.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-argumentation-mining-machine
Repo	https://github.com/UKPLab/coling2018-xling_argument_mining
Framework	none

PennyLane: Automatic differentiation of hybrid quantum-classical computations


Title	PennyLane: Automatic differentiation of hybrid quantum-classical computations
Authors	Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, M. Sohaib Alam, Shahnawaz Ahmed, Juan Miguel Arrazola, Carsten Blank, Alain Delgado, Soran Jahangiri, Keri McKiernan, Johannes Jakob Meyer, Zeyue Niu, Antal Száva, Nathan Killoran
Abstract	PennyLane is a Python 3 software framework for optimization and machine learning of quantum and hybrid quantum-classical computations. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane’s core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpropagation. PennyLane thus extends the automatic differentiation algorithms common in optimization and machine learning to include quantum and hybrid computations. A plugin system makes the framework compatible with any gate-based quantum simulator or hardware. We provide plugins for Strawberry Fields, Rigetti Forest, Qiskit, Cirq, and ProjectQ, allowing PennyLane optimizations to be run on publicly accessible quantum devices provided by Rigetti and IBM Q. On the classical front, PennyLane interfaces with accelerated machine learning libraries such as TensorFlow, PyTorch, and autograd. PennyLane can be used for the optimization of variational quantum eigensolvers, quantum approximate optimization, quantum machine learning models, and many other applications.
Tasks	Quantum Machine Learning
Published	2018-11-12
URL	https://arxiv.org/abs/1811.04968v3
PDF	https://arxiv.org/pdf/1811.04968v3.pdf
PWC	https://paperswithcode.com/paper/pennylane-automatic-differentiation-of-hybrid
Repo	https://github.com/XanaduAI/pennylane
Framework	tf

Continuous-variable quantum neural networks


Title	Continuous-variable quantum neural networks
Authors	Nathan Killoran, Thomas R. Bromley, Juan Miguel Arrazola, Maria Schuld, Nicolás Quesada, Seth Lloyd
Abstract	We introduce a general method for building neural networks on quantum computers. The quantum neural network is a variational quantum circuit built in the continuous-variable (CV) architecture, which encodes quantum information in continuous degrees of freedom such as the amplitudes of the electromagnetic field. This circuit contains a layered structure of continuously parameterized gates which is universal for CV quantum computation. Affine transformations and nonlinear activation functions, two key elements in neural networks, are enacted in the quantum network using Gaussian and non-Gaussian gates, respectively. The non-Gaussian gates provide both the nonlinearity and the universality of the model. Due to the structure of the CV model, the CV quantum neural network can encode highly nonlinear transformations while remaining completely unitary. We show how a classical network can be embedded into the quantum formalism and propose quantum versions of various specialized model such as convolutional, recurrent, and residual networks. Finally, we present numerous modeling experiments built with the Strawberry Fields software library. These experiments, including a classifier for fraud detection, a network which generates Tetris images, and a hybrid classical-quantum autoencoder, demonstrate the capability and adaptability of CV quantum neural networks.
Tasks	Fraud Detection, Quantum Machine Learning
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06871v1
PDF	http://arxiv.org/pdf/1806.06871v1.pdf
PWC	https://paperswithcode.com/paper/continuous-variable-quantum-neural-networks
Repo	https://github.com/XanaduAI/quantum-learning
Framework	tf

Albumentations: fast and flexible image augmentations


Title	Albumentations: fast and flexible image augmentations
Authors	Alexander Buslaev, Alex Parinov, Eugene Khvedchenya, Vladimir I. Iglovikov, Alexandr A. Kalinin
Abstract	Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations. The source code for Albumentations is made publicly available online at https://github.com/albu/albumentations
Tasks	Data Augmentation, Image Augmentation
Published	2018-09-18
URL	http://arxiv.org/abs/1809.06839v1
PDF	http://arxiv.org/pdf/1809.06839v1.pdf
PWC	https://paperswithcode.com/paper/albumentations-fast-and-flexible-image
Repo	https://github.com/albu/albumentations
Framework	pytorch

Variance Networks: When Expectation Does Not Meet Your Expectations


Title	Variance Networks: When Expectation Does Not Meet Your Expectations
Authors	Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov
Abstract	Ordinary stochastic neural networks mostly rely on the expected values of their weights to make predictions, whereas the induced noise is mostly used to capture the uncertainty, prevent overfitting and slightly boost the performance through test-time averaging. In this paper, we introduce variance layers, a different kind of stochastic layers. Each weight of a variance layer follows a zero-mean distribution and is only parameterized by its variance. We show that such layers can learn surprisingly well, can serve as an efficient exploration tool in reinforcement learning tasks and provide a decent defense against adversarial attacks. We also show that a number of conventional Bayesian neural networks naturally converge to such zero-mean posteriors. We observe that in these cases such zero-mean parameterization leads to a much better training objective than conventional parameterizations where the mean is being learned.
Tasks	Efficient Exploration
Published	2018-03-10
URL	http://arxiv.org/abs/1803.03764v5
PDF	http://arxiv.org/pdf/1803.03764v5.pdf
PWC	https://paperswithcode.com/paper/variance-networks-when-expectation-does-not
Repo	https://github.com/jondaa/CS236605FinalProject
Framework	pytorch

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice


Title	A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice
Authors	Hendrik Fichtenberger, Dennis Rohde
Abstract	In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.
Tasks
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05064v3
PDF	http://arxiv.org/pdf/1810.05064v3.pdf
PWC	https://paperswithcode.com/paper/a-theory-based-evaluation-of-nearest-neighbor
Repo	https://github.com/derohde/knn_test
Framework	none

Model selection with lasso-zero: adding straw to the haystack to better find needles


Title	Model selection with lasso-zero: adding straw to the haystack to better find needles
Authors	Pascaline Descloux, Sylvain Sardy
Abstract	The high-dimensional linear model $y = X \beta^0 + \epsilon$ is considered and the focus is put on the problem of recovering the support $S^0$ of the sparse vector $\beta^0.$ We introduce Lasso-Zero, a new $\ell_1$-based estimator whose novelty resides in an “overfit, then threshold” paradigm and the use of noise dictionaries concatenated to $X$ for overfitting the response. To select the threshold, we employ the quantile universal threshold based on a pivotal statistic that requires neither knowledge nor preliminary estimation of the noise level. Numerical simulations show that Lasso-Zero performs well in terms of support recovery and provides an excellent trade-off between high true positive rate and low false discovery rate compared to competitors. Our methodology is supported by theoretical results showing that when no noise dictionary is used, Lasso-Zero recovers the signs of $\beta^0$ under weaker conditions on $X$ and $S^0$ than the Lasso and achieves sign consistency for correlated Gaussian designs. The use of noise dictionary improves the procedure for low signals.
Tasks	Model Selection
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05133v2
PDF	http://arxiv.org/pdf/1805.05133v2.pdf
PWC	https://paperswithcode.com/paper/model-selection-with-lasso-zero-adding-straw
Repo	https://github.com/pascalinedescloux/lasso-zero
Framework	none

Universal and Succinct Source Coding of Deep Neural Networks


Title	Universal and Succinct Source Coding of Deep Neural Networks
Authors	Sourya Basu, Lav R. Varshney
Abstract	Deep neural networks have shown incredible performance for inference tasks in a variety of domains. Unfortunately, most current deep networks are enormous cloud-based structures that require significant storage space, which limits scaling of deep learning as a service (DLaaS) and use for on-device intelligence. This paper is concerned with finding universal lossless compressed representations of deep feedforward networks with synaptic weights drawn from discrete sets, and directly performing inference without full decompression. The basic insight that allows less rate than naive approaches is recognizing that the bipartite graph layers of feedforward networks have a kind of permutation invariance to the labeling of nodes, in terms of inferential operation. We provide efficient algorithms to dissipate this irrelevant uncertainty and then use arithmetic coding to nearly achieve the entropy bound in a universal manner. We also provide experimental results of our approach on several standard datasets.
Tasks
Published	2018-04-09
URL	https://arxiv.org/abs/1804.02800v2
PDF	https://arxiv.org/pdf/1804.02800v2.pdf
PWC	https://paperswithcode.com/paper/universal-and-succinct-source-coding-of-deep
Repo	https://github.com/basusourya/DNN
Framework	none

Effects of sampling skewness of the importance-weighted risk estimator on model selection


Title	Effects of sampling skewness of the importance-weighted risk estimator on model selection
Authors	Wouter M. Kouw, Marco Loog
Abstract	Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for datasets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to suboptimal regularization parameters when used for importance-weighted validation.
Tasks	Model Selection
Published	2018-04-19
URL	http://arxiv.org/abs/1804.07344v1
PDF	http://arxiv.org/pdf/1804.07344v1.pdf
PWC	https://paperswithcode.com/paper/effects-of-sampling-skewness-of-the
Repo	https://github.com/wmkouw/covshift-skewness
Framework	none