Paper Group AWR 118
Universal Dependency Parsing with a General Transition-Based DAG Parser. Modeling Composite Labels for Neural Morphological Tagging. Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme. Hashing with Mutual Information. GAN Q-learning. 3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks. Cross-lingual Argumentati …
Universal Dependency Parsing with a General Transition-Based DAG Parser
Title | Universal Dependency Parsing with a General Transition-Based DAG Parser |
Authors | Daniel Hershcovich, Omri Abend, Ari Rappoport |
Abstract | This paper presents our experiments with applying TUPA to the CoNLL 2018 UD shared task. TUPA is a general neural transition-based DAG parser, which we use to present the first experiments on recovering enhanced dependencies as part of the general parsing task. TUPA was designed for parsing UCCA, a cross-linguistic semantic annotation scheme, exhibiting reentrancy, discontinuity and non-terminal nodes. By converting UD trees and graphs to a UCCA-like DAG format, we train TUPA almost without modification on the UD parsing task. The generic nature of our approach lends itself naturally to multitask learning. Our code is available at https://github.com/CoNLL-UD-2018/HUJI |
Tasks | Dependency Parsing |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09354v1 |
http://arxiv.org/pdf/1808.09354v1.pdf | |
PWC | https://paperswithcode.com/paper/universal-dependency-parsing-with-a-general |
Repo | https://github.com/CoNLL-UD-2018/HUJI |
Framework | none |
Modeling Composite Labels for Neural Morphological Tagging
Title | Modeling Composite Labels for Neural Morphological Tagging |
Authors | Alexander Tkachenko, Kairit Sirts |
Abstract | Neural morphological tagging has been regarded as an extension to POS tagging task, treating each morphological tag as a monolithic label and ignoring its internal structure. We propose to view morphological tags as composite labels and explicitly model their internal structure in a neural sequence tagger. For this, we explore three different neural architectures and compare their performance with both CRF and simple neural multiclass baselines. We evaluate our models on 49 languages and show that the neural architecture that models the morphological labels as sequences of morphological category values performs significantly better than both baselines establishing state-of-the-art results in morphological tagging for most languages. |
Tasks | Morphological Tagging |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.08815v1 |
http://arxiv.org/pdf/1810.08815v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-composite-labels-for-neural |
Repo | https://github.com/AleksTk/seq-morph-tagger |
Framework | tf |
Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme
Title | Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme |
Authors | Jey Han Lau, Trevor Cohn, Timothy Baldwin, Julian Brooke, Adam Hammond |
Abstract | In this paper, we propose a joint architecture that captures language, rhyme and meter for sonnet modelling. We assess the quality of generated poems using crowd and expert judgements. The stress and rhyme models perform very well, as generated poems are largely indistinguishable from human-written poems. Expert evaluation, however, reveals that a vanilla language model captures meter implicitly, and that machine-generated poems still underperform in terms of readability and emotion. Our research shows the importance expert evaluation for poetry generation, and that future research should look beyond rhyme/meter and focus on poetic language. |
Tasks | Language Modelling |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03491v1 |
http://arxiv.org/pdf/1807.03491v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-speare-a-joint-neural-model-of-poetic |
Repo | https://github.com/jhlau/deepspeare |
Framework | tf |
Hashing with Mutual Information
Title | Hashing with Mutual Information |
Authors | Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff |
Abstract | Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval. We study the problem of learning binary vector embeddings under a supervised setting, also known as hashing. We propose a novel supervised hashing method based on optimizing an information-theoretic quantity: mutual information. We show that optimizing mutual information can reduce ambiguity in the induced neighborhood structure in the learned Hamming space, which is essential in obtaining high retrieval performance. To this end, we optimize mutual information in deep neural networks with minibatch stochastic gradient descent, with a formulation that maximally and efficiently utilizes available supervision. Experiments on four image retrieval benchmarks, including ImageNet, confirm the effectiveness of our method in learning high-quality binary embeddings for nearest neighbor retrieval. |
Tasks | Image Retrieval, Video Retrieval |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.00974v2 |
http://arxiv.org/pdf/1803.00974v2.pdf | |
PWC | https://paperswithcode.com/paper/hashing-with-mutual-information |
Repo | https://github.com/fcakir/deep-mihash |
Framework | none |
GAN Q-learning
Title | GAN Q-learning |
Authors | Thang Doan, Bogdan Mazoure, Clare Lyle |
Abstract | Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However, there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Q-learning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to traditional methods. |
Tasks | Distributional Reinforcement Learning, Q-Learning |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04874v3 |
http://arxiv.org/pdf/1805.04874v3.pdf | |
PWC | https://paperswithcode.com/paper/gan-q-learning |
Repo | https://github.com/daggertye/GAN-Q-Learning |
Framework | tf |
3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks
Title | 3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks |
Authors | Soo Ye Kim, Jeongyeon Lim, Taeyoung Na, Munchurl Kim |
Abstract | In video super-resolution, the spatio-temporal coherence between, and among the frames must be exploited appropriately for accurate prediction of the high resolution frames. Although 2D convolutional neural networks (CNNs) are powerful in modelling images, 3D-CNNs are more suitable for spatio-temporal feature extraction as they can preserve temporal information. To this end, we propose an effective 3D-CNN for video super-resolution, called the 3DSRnet that does not require motion alignment as preprocessing. Our 3DSRnet maintains the temporal depth of spatio-temporal feature maps to maximally capture the temporally nonlinear characteristics between low and high resolution frames, and adopts residual learning in conjunction with the sub-pixel outputs. It outperforms the most state-of-the-art method with average 0.45 and 0.36 dB higher in PSNR for scales 3 and 4, respectively, in the Vidset4 benchmark. Our 3DSRnet first deals with the performance drop due to scene change, which is important in practice but has not been previously considered. |
Tasks | Super-Resolution, Video Super-Resolution |
Published | 2018-12-21 |
URL | https://arxiv.org/abs/1812.09079v2 |
https://arxiv.org/pdf/1812.09079v2.pdf | |
PWC | https://paperswithcode.com/paper/3dsrnet-video-super-resolution-using-3d |
Repo | https://github.com/sooyekim/3DSRnet |
Framework | none |
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Title | Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need! |
Authors | Steffen Eger, Johannes Daxenberger, Christian Stab, Iryna Gurevych |
Abstract | Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://github.com/UKPLab/coling2018-xling_argument_mining}. |
Tasks | Cross-Lingual Transfer, Machine Translation, Word Embeddings |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.08998v1 |
http://arxiv.org/pdf/1807.08998v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-argumentation-mining-machine |
Repo | https://github.com/UKPLab/coling2018-xling_argument_mining |
Framework | none |
PennyLane: Automatic differentiation of hybrid quantum-classical computations
Title | PennyLane: Automatic differentiation of hybrid quantum-classical computations |
Authors | Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, M. Sohaib Alam, Shahnawaz Ahmed, Juan Miguel Arrazola, Carsten Blank, Alain Delgado, Soran Jahangiri, Keri McKiernan, Johannes Jakob Meyer, Zeyue Niu, Antal Száva, Nathan Killoran |
Abstract | PennyLane is a Python 3 software framework for optimization and machine learning of quantum and hybrid quantum-classical computations. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane’s core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpropagation. PennyLane thus extends the automatic differentiation algorithms common in optimization and machine learning to include quantum and hybrid computations. A plugin system makes the framework compatible with any gate-based quantum simulator or hardware. We provide plugins for Strawberry Fields, Rigetti Forest, Qiskit, Cirq, and ProjectQ, allowing PennyLane optimizations to be run on publicly accessible quantum devices provided by Rigetti and IBM Q. On the classical front, PennyLane interfaces with accelerated machine learning libraries such as TensorFlow, PyTorch, and autograd. PennyLane can be used for the optimization of variational quantum eigensolvers, quantum approximate optimization, quantum machine learning models, and many other applications. |
Tasks | Quantum Machine Learning |
Published | 2018-11-12 |
URL | https://arxiv.org/abs/1811.04968v3 |
https://arxiv.org/pdf/1811.04968v3.pdf | |
PWC | https://paperswithcode.com/paper/pennylane-automatic-differentiation-of-hybrid |
Repo | https://github.com/XanaduAI/pennylane |
Framework | tf |
Continuous-variable quantum neural networks
Title | Continuous-variable quantum neural networks |
Authors | Nathan Killoran, Thomas R. Bromley, Juan Miguel Arrazola, Maria Schuld, Nicolás Quesada, Seth Lloyd |
Abstract | We introduce a general method for building neural networks on quantum computers. The quantum neural network is a variational quantum circuit built in the continuous-variable (CV) architecture, which encodes quantum information in continuous degrees of freedom such as the amplitudes of the electromagnetic field. This circuit contains a layered structure of continuously parameterized gates which is universal for CV quantum computation. Affine transformations and nonlinear activation functions, two key elements in neural networks, are enacted in the quantum network using Gaussian and non-Gaussian gates, respectively. The non-Gaussian gates provide both the nonlinearity and the universality of the model. Due to the structure of the CV model, the CV quantum neural network can encode highly nonlinear transformations while remaining completely unitary. We show how a classical network can be embedded into the quantum formalism and propose quantum versions of various specialized model such as convolutional, recurrent, and residual networks. Finally, we present numerous modeling experiments built with the Strawberry Fields software library. These experiments, including a classifier for fraud detection, a network which generates Tetris images, and a hybrid classical-quantum autoencoder, demonstrate the capability and adaptability of CV quantum neural networks. |
Tasks | Fraud Detection, Quantum Machine Learning |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06871v1 |
http://arxiv.org/pdf/1806.06871v1.pdf | |
PWC | https://paperswithcode.com/paper/continuous-variable-quantum-neural-networks |
Repo | https://github.com/XanaduAI/quantum-learning |
Framework | tf |
Albumentations: fast and flexible image augmentations
Title | Albumentations: fast and flexible image augmentations |
Authors | Alexander Buslaev, Alex Parinov, Eugene Khvedchenya, Vladimir I. Iglovikov, Alexandr A. Kalinin |
Abstract | Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations. The source code for Albumentations is made publicly available online at https://github.com/albu/albumentations |
Tasks | Data Augmentation, Image Augmentation |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.06839v1 |
http://arxiv.org/pdf/1809.06839v1.pdf | |
PWC | https://paperswithcode.com/paper/albumentations-fast-and-flexible-image |
Repo | https://github.com/albu/albumentations |
Framework | pytorch |
Variance Networks: When Expectation Does Not Meet Your Expectations
Title | Variance Networks: When Expectation Does Not Meet Your Expectations |
Authors | Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov |
Abstract | Ordinary stochastic neural networks mostly rely on the expected values of their weights to make predictions, whereas the induced noise is mostly used to capture the uncertainty, prevent overfitting and slightly boost the performance through test-time averaging. In this paper, we introduce variance layers, a different kind of stochastic layers. Each weight of a variance layer follows a zero-mean distribution and is only parameterized by its variance. We show that such layers can learn surprisingly well, can serve as an efficient exploration tool in reinforcement learning tasks and provide a decent defense against adversarial attacks. We also show that a number of conventional Bayesian neural networks naturally converge to such zero-mean posteriors. We observe that in these cases such zero-mean parameterization leads to a much better training objective than conventional parameterizations where the mean is being learned. |
Tasks | Efficient Exploration |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03764v5 |
http://arxiv.org/pdf/1803.03764v5.pdf | |
PWC | https://paperswithcode.com/paper/variance-networks-when-expectation-does-not |
Repo | https://github.com/jondaa/CS236605FinalProject |
Framework | pytorch |
A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice
Title | A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice |
Authors | Hendrik Fichtenberger, Dennis Rohde |
Abstract | In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model. |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05064v3 |
http://arxiv.org/pdf/1810.05064v3.pdf | |
PWC | https://paperswithcode.com/paper/a-theory-based-evaluation-of-nearest-neighbor |
Repo | https://github.com/derohde/knn_test |
Framework | none |
Model selection with lasso-zero: adding straw to the haystack to better find needles
Title | Model selection with lasso-zero: adding straw to the haystack to better find needles |
Authors | Pascaline Descloux, Sylvain Sardy |
Abstract | The high-dimensional linear model $y = X \beta^0 + \epsilon$ is considered and the focus is put on the problem of recovering the support $S^0$ of the sparse vector $\beta^0.$ We introduce Lasso-Zero, a new $\ell_1$-based estimator whose novelty resides in an “overfit, then threshold” paradigm and the use of noise dictionaries concatenated to $X$ for overfitting the response. To select the threshold, we employ the quantile universal threshold based on a pivotal statistic that requires neither knowledge nor preliminary estimation of the noise level. Numerical simulations show that Lasso-Zero performs well in terms of support recovery and provides an excellent trade-off between high true positive rate and low false discovery rate compared to competitors. Our methodology is supported by theoretical results showing that when no noise dictionary is used, Lasso-Zero recovers the signs of $\beta^0$ under weaker conditions on $X$ and $S^0$ than the Lasso and achieves sign consistency for correlated Gaussian designs. The use of noise dictionary improves the procedure for low signals. |
Tasks | Model Selection |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05133v2 |
http://arxiv.org/pdf/1805.05133v2.pdf | |
PWC | https://paperswithcode.com/paper/model-selection-with-lasso-zero-adding-straw |
Repo | https://github.com/pascalinedescloux/lasso-zero |
Framework | none |
Universal and Succinct Source Coding of Deep Neural Networks
Title | Universal and Succinct Source Coding of Deep Neural Networks |
Authors | Sourya Basu, Lav R. Varshney |
Abstract | Deep neural networks have shown incredible performance for inference tasks in a variety of domains. Unfortunately, most current deep networks are enormous cloud-based structures that require significant storage space, which limits scaling of deep learning as a service (DLaaS) and use for on-device intelligence. This paper is concerned with finding universal lossless compressed representations of deep feedforward networks with synaptic weights drawn from discrete sets, and directly performing inference without full decompression. The basic insight that allows less rate than naive approaches is recognizing that the bipartite graph layers of feedforward networks have a kind of permutation invariance to the labeling of nodes, in terms of inferential operation. We provide efficient algorithms to dissipate this irrelevant uncertainty and then use arithmetic coding to nearly achieve the entropy bound in a universal manner. We also provide experimental results of our approach on several standard datasets. |
Tasks | |
Published | 2018-04-09 |
URL | https://arxiv.org/abs/1804.02800v2 |
https://arxiv.org/pdf/1804.02800v2.pdf | |
PWC | https://paperswithcode.com/paper/universal-and-succinct-source-coding-of-deep |
Repo | https://github.com/basusourya/DNN |
Framework | none |
Effects of sampling skewness of the importance-weighted risk estimator on model selection
Title | Effects of sampling skewness of the importance-weighted risk estimator on model selection |
Authors | Wouter M. Kouw, Marco Loog |
Abstract | Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for datasets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to suboptimal regularization parameters when used for importance-weighted validation. |
Tasks | Model Selection |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07344v1 |
http://arxiv.org/pdf/1804.07344v1.pdf | |
PWC | https://paperswithcode.com/paper/effects-of-sampling-skewness-of-the |
Repo | https://github.com/wmkouw/covshift-skewness |
Framework | none |