October 15, 2019

2753 words 13 mins read

Paper Group NANR 94

The German Reference Corpus DeReKo: New Developments – New Opportunities. LSH-SAMPLING BREAKS THE COMPUTATIONAL CHICKEN-AND-EGG LOOP IN ADAPTIVE STOCHASTIC GRADIENT ESTIMATION. Football and Beer - a Social Media Analysis on Twitter in Context of the FIFA Football World Cup 2018. Large Scale Multi-Domain Multi-Task Learning with MultiModel. Deep Pi …

The German Reference Corpus DeReKo: New Developments – New Opportunities


Title	The German Reference Corpus DeReKo: New Developments – New Opportunities
Authors	Marc Kupietz, Harald L{"u}ngen, Pawe{\l} Kamocki, Andreas Witt
Abstract
Tasks	Word Embeddings
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1689/
PDF	https://www.aclweb.org/anthology/L18-1689
PWC	https://paperswithcode.com/paper/the-german-reference-corpus-dereko-new
Repo
Framework

LSH-SAMPLING BREAKS THE COMPUTATIONAL CHICKEN-AND-EGG LOOP IN ADAPTIVE STOCHASTIC GRADIENT ESTIMATION


Title	LSH-SAMPLING BREAKS THE COMPUTATIONAL CHICKEN-AND-EGG LOOP IN ADAPTIVE STOCHASTIC GRADIENT ESTIMATION
Authors	Beidi Chen, Yingchen Xu, Anshumali Shrivastava
Abstract	Stochastic Gradient Descent or SGD is the most popular optimization algorithm for large-scale problems. SGD estimates the gradient by uniform sampling with sample size one. There have been several other works that suggest faster epoch wise convergence by using weighted non-uniform sampling for better gradient estimates. Unfortunately, the per-iteration cost of maintaining this adaptive distribution for gradient estimation is more than calculating the full gradient. As a result, the false impression of faster convergence in iterations leads to slower convergence in time, which we call a chicken-and-egg loop. In this paper, we break this barrier by providing the first demonstration of a sampling scheme, which leads to superior gradient estimation, while keeping the sampling cost per iteration similar to that of the uniform sampling. Such an algorithm is possible due to the sampling view of Locality Sensitive Hashing (LSH), which came to light recently. As a consequence of superior and fast estimation, we reduce the running time of all existing gradient descent algorithms. We demonstrate the benefits of our proposal on both SGD and AdaGrad.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=SyVOjfbRb
PDF	https://openreview.net/pdf?id=SyVOjfbRb
PWC	https://paperswithcode.com/paper/lsh-sampling-breaks-the-computational-chicken
Repo
Framework


Title	Football and Beer - a Social Media Analysis on Twitter in Context of the FIFA Football World Cup 2018
Authors	Rol Roller, , Philippe Thomas, Sven Schmeier
Abstract	In many societies alcohol is a legal and common recreational substance and socially accepted. Alcohol consumption often comes along with social events as it helps people to increase their sociability and to overcome their inhibitions. On the other hand we know that increased alcohol consumption can lead to serious health issues, such as cancer, cardiovascular diseases and diseases of the digestive system, to mention a few. This work examines alcohol consumption during the FIFA Football World Cup 2018, particularly the usage of alcohol related information on Twitter. For this we analyse the tweeting behaviour and show that the tournament strongly increases the interest in beer. Furthermore we show that countries who had to leave the tournament at early stage might have done something good to their fans as the interest in beer decreased again.
Tasks
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5901/
PDF	https://www.aclweb.org/anthology/W18-5901
PWC	https://paperswithcode.com/paper/football-and-beer-a-social-media-analysis-on
Repo
Framework

Large Scale Multi-Domain Multi-Task Learning with MultiModel


Title	Large Scale Multi-Domain Multi-Task Learning with MultiModel
Authors	Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit
Abstract	Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.
Tasks	Image Captioning, Image Classification, Multi-Task Learning, Speech Recognition
Published	2018-01-01
URL	https://openreview.net/forum?id=HyKZyYlRZ
PDF	https://openreview.net/pdf?id=HyKZyYlRZ
PWC	https://paperswithcode.com/paper/large-scale-multi-domain-multi-task-learning
Repo
Framework

Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance


Title	Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance
Authors	Yftah Ziser, Roi Reichart
Abstract	While cross-domain and cross-language transfer have long been prominent topics in NLP research, their combination has hardly been explored. In this work we consider this problem, and propose a framework that builds on pivot-based learning, structure-aware Deep Neural Networks (particularly LSTMs and CNNs) and bilingual word embeddings, with the goal of training a model on labeled data from one (language, domain) pair so that it can be effectively applied to another (language, domain) pair. We consider two setups, differing with respect to the unlabeled data available for model training. In the full setup the model has access to unlabeled data from both pairs, while in the lazy setup, which is more realistic for truly resource-poor languages, unlabeled data is available for both domains but only for the source language. We design our model for the lazy setup so that for a given target domain, it can train once on the source language and then be applied to any target language without re-training. In experiments with nine English-German and nine English-French domain pairs our best model substantially outperforms previous models even when it is trained in the lazy setup and previous models are trained in the full setup.
Tasks	Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1022/
PDF	https://www.aclweb.org/anthology/D18-1022
PWC	https://paperswithcode.com/paper/deep-pivot-based-modeling-for-cross-language
Repo
Framework

Thumbs Up and Down: Sentiment Analysis of Medical Online Forums


Title	Thumbs Up and Down: Sentiment Analysis of Medical Online Forums
Authors	Victoria Bobicev, Marina Sokolova
Abstract	In the current study, we apply multi-class and multi-label sentence classification to sentiment analysis of online medical forums. We aim to identify major health issues discussed in online social media and the types of sentiments those issues evoke. We use ontology of personal health information for Information Extraction and apply Machine Learning methods in automated recognition of the expressed sentiments.
Tasks	Sentence Classification, Sentiment Analysis
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-5906/
PDF	https://www.aclweb.org/anthology/W18-5906
PWC	https://paperswithcode.com/paper/thumbs-up-and-down-sentiment-analysis-of
Repo
Framework

Efficient Stochastic Gradient Hard Thresholding


Title	Efficient Stochastic Gradient Hard Thresholding
Authors	Pan Zhou, Xiaotong Yuan, Jiashi Feng
Abstract	Stochastic gradient hard thresholding methods have recently been shown to work favorably in solving large-scale empirical risk minimization problems under sparsity or rank constraint. Despite the improved iteration complexity over full gradient methods, the gradient evaluation and hard thresholding complexity of the existing stochastic algorithms usually scales linearly with data size, which could still be expensive when data is huge and the hard thresholding step could be as expensive as singular value decomposition in rank-constrained problems. To address these deficiencies, we propose an efficient hybrid stochastic gradient hard thresholding (HSG-HT) method that can be provably shown to have sample-size-independent gradient evaluation and hard thresholding complexity bounds. Specifically, we prove that the stochastic gradient evaluation complexity of HSG-HT scales linearly with inverse of sub-optimality and its hard thresholding complexity scales logarithmically. By applying the heavy ball acceleration technique, we further propose an accelerated variant of HSG-HT which can be shown to have improved factor dependence on restricted condition number. Numerical results confirm our theoretical affirmation and demonstrate the computational efficiency of the proposed methods.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7469-efficient-stochastic-gradient-hard-thresholding
PDF	http://papers.nips.cc/paper/7469-efficient-stochastic-gradient-hard-thresholding.pdf
PWC	https://paperswithcode.com/paper/efficient-stochastic-gradient-hard
Repo
Framework

Re-Weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation


Title	Re-Weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation
Authors	Qingchao Chen, Yang Liu, Zhaowen Wang, Ian Wassell, Kevin Chetty
Abstract	Unsupervised Domain Adaptation (UDA) aims to transfer domain knowledge from existing well-defined tasks to new ones where labels are unavailable. In the real-world applications, as the domain (task) discrepancies are usually uncontrollable, it is significantly motivated to match the feature distributions even if the domain discrepancies are disparate. Additionally, as no label is available in the target domain, how to successfully adapt the classifier from the source to the target domain still remains an open question. In this paper, we propose the Re-weighted Adversarial Adaptation Network (RAAN) to reduce the feature distribution divergence and adapt the classifier when domain discrepancies are disparate. Specifically, to alleviate the need of common supports in matching the feature distribution, we choose to minimize optimal transport (OT) based Earth-Mover (EM) distance and reformulate it to a minimax objective function. Utilizing this, RAAN can be trained in an end-to-end and adversarial manner. To further adapt the classifier, we propose to match the label distribution and embed it into the adversarial training. Finally, after extensive evaluation of our method using UDA datasets of varying difficulty, RAAN achieved the state-of-the-art results and outperformed other methods by a large margin when the domain shifts are disparate.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Chen_Re-Weighted_Adversarial_Adaptation_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_Re-Weighted_Adversarial_Adaptation_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/re-weighted-adversarial-adaptation-network
Repo
Framework

A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity


Title	A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity
Authors	Mengnan Zhao, Aaron J. Masino, Christopher C. Yang
Abstract	We investigate the quality of task specific word embeddings created with relatively small, targeted corpora. We present a comprehensive evaluation framework including both intrinsic and extrinsic evaluation that can be expanded to named entities beyond drug name. Intrinsic evaluation results tell that drug name embeddings created with a domain specific document corpus outperformed the previously published versions that derived from a very large general text corpus. Extrinsic evaluation uses word embedding for the task of drug name recognition with Bi-LSTM model and the results demonstrate the advantage of using domain-specific word embeddings as the only input feature for drug name recognition with F1-score achieving 0.91. This work suggests that it may be advantageous to derive domain specific embeddings for certain tasks even when the domain specific corpus is of limited size.
Tasks	Named Entity Recognition, Outlier Detection, Question Answering, Relation Extraction, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-2319/
PDF	https://www.aclweb.org/anthology/W18-2319
PWC	https://paperswithcode.com/paper/a-framework-for-developing-and-evaluating
Repo
Framework

Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech


Title	Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech
Authors	Jaka Aris Eko Wibawa, Supheakmungkol Sarin, Chenfang Li, Knot Pipatsrisawat, Keshan Sodimana, Oddur Kjartansson, Alex Gutkin, er, Martin Jansche, Linne Ha
Abstract
Tasks	Speech Recognition, Speech Synthesis
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1255/
PDF	https://www.aclweb.org/anthology/L18-1255
PWC	https://paperswithcode.com/paper/building-open-javanese-and-sundanese-corpora
Repo
Framework

Spline Filters For End-to-End Deep Learning


Title	Spline Filters For End-to-End Deep Learning
Authors	Randall Balestriero, Romain Cosentino, Herve Glotin, Richard Baraniuk
Abstract	We propose to tackle the problem of end-to-end learning for raw waveform signals by introducing learnable continuous time-frequency atoms. The derivation of these filters is achieved by defining a functional space with a given smoothness order and boundary conditions. From this space, we derive the parametric analytical filters. Their differentiability property allows gradient-based optimization. As such, one can utilize any Deep Neural Network (DNN) with these filters. This enables us to tackle in a front-end fashion a large scale bird detection task based on the freefield1010 dataset known to contain key challenges, such as the dimensionality of the inputs data ($>100,000$) and the presence of additional noises: multiple sources and soundscapes.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2291
PDF	http://proceedings.mlr.press/v80/balestriero18a/balestriero18a.pdf
PWC	https://paperswithcode.com/paper/spline-filters-for-end-to-end-deep-learning
Repo
Framework

Baseline wander and power line interference removal from ECG signals using eigenvalue decomposition


Title	Baseline wander and power line interference removal from ECG signals using eigenvalue decomposition
Authors	Rishi Raj Sharma，Ram Bilas Pachori
Abstract	In this paper, a novel method is proposed for baseline wander (BW) and power line interference (PLI) removal from electrocardiogram (ECG) signals. The proposed methodology is based on the eigenvalue decomposition of the Hankel matrix. It has been observed that the end-point eigenvalues of the Hankel matrix formed using noisy ECG signals are correlated with BW and PLI components. We have proposed a methodology to remove BW and PLI noise by eliminating eigenvalues corresponding to noisy components. The proposed concept uses one-step process for removing both BW and PLI noise simultaneously. The proposed method has been compared with other existing methods using performance measure parameters namely output signal to noise ratio (SNRout), and percent root mean square difference (PRD). Simulation results validate the better performance of the proposed method than compared methods at different noise levels. The proposed method is suitable for preprocessing of ECG signals.
Tasks
Published	2018-06-01
URL	https://doi.org/10.1016/j.bspc.2018.05.002
PDF	https://doi.org/10.1016/j.bspc.2018.05.002
PWC	https://paperswithcode.com/paper/baseline-wander-and-power-line-interference
Repo
Framework

Neural Sparse Topical Coding


Title	Neural Sparse Topical Coding
Authors	Min Peng, Qianqian Xie, Yanchun Zhang, Hua Wang, Xiuzhen Zhang, Jimin Huang, Gang Tian
Abstract	Topic models with sparsity enhancement have been proven to be effective at learning discriminative and coherent latent topics of short texts, which is critical to many scientific and engineering applications. However, the extensions of these models require carefully tailored graphical models and re-deduced inference algorithms, limiting their variations and applications. We propose a novel sparsity-enhanced topic model, Neural Sparse Topical Coding (NSTC) base on a sparsity-enhanced topic model called Sparse Topical Coding (STC). It focuses on replacing the complex inference process with the back propagation, which makes the model easy to explore extensions. Moreover, the external semantic information of words in word embeddings is incorporated to improve the representation of short texts. To illustrate the flexibility offered by the neural network based framework, we present three extensions base on NSTC without re-deduced inference algorithms. Experiments on Web Snippet and 20Newsgroups datasets demonstrate that our models outperform existing methods.
Tasks	Language Modelling, Topic Models, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-1217/
PDF	https://www.aclweb.org/anthology/P18-1217
PWC	https://paperswithcode.com/paper/neural-sparse-topical-coding
Repo
Framework

Towards Safe Deep Learning: Unsupervised Defense Against Generic Adversarial Attacks


Title	Towards Safe Deep Learning: Unsupervised Defense Against Generic Adversarial Attacks
Authors	Bita Darvish Rouhani, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar
Abstract	Recent advances in adversarial Deep Learning (DL) have opened up a new and largely unexplored surface for malicious attacks jeopardizing the integrity of autonomous DL systems. We introduce a novel automated countermeasure called Parallel Checkpointing Learners (PCL) to thwart the potential adversarial attacks and significantly improve the reliability (safety) of a victim DL model. The proposed PCL methodology is unsupervised, meaning that no adversarial sample is leveraged to build/train parallel checkpointing learners. We formalize the goal of preventing adversarial attacks as an optimization problem to minimize the rarely observed regions in the latent feature space spanned by a DL network. To solve the aforementioned minimization problem, a set of complementary but disjoint checkpointing modules are trained and leveraged to validate the victim model execution in parallel. Each checkpointing learner explicitly characterizes the geometry of the input data and the corresponding high-level data abstractions within a particular DL layer. As such, the adversary is required to simultaneously deceive all the defender modules in order to succeed. We extensively evaluate the performance of the PCL methodology against the state-of-the-art attack scenarios, including Fast-Gradient-Sign (FGS), Jacobian Saliency Map Attack (JSMA), Deepfool, and Carlini&WagnerL2 algorithm. Extensive proof-of-concept evaluations for analyzing various data collections including MNIST, CIFAR10, and ImageNet corroborate the effectiveness of our proposed defense mechanism against adversarial samples.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=HyI6s40a-
PDF	https://openreview.net/pdf?id=HyI6s40a-
PWC	https://paperswithcode.com/paper/towards-safe-deep-learning-unsupervised
Repo
Framework

The Generalization Error of Dictionary Learning with Moreau Envelopes


Title	The Generalization Error of Dictionary Learning with Moreau Envelopes
Authors	Alexandros Georgogiannis
Abstract	This is a theoretical study on the sample complexity of dictionary learning with a general type of reconstruction loss. The goal is to estimate a $m \times d$ matrix $D$ of unit-norm columns when the only available information is a set of training samples. Points $x$ in $\mathbb{R}^m$ are subsequently approximated by the linear combination $Da$ after solving the problem $\min_{a \in \mathbb{R}^d} \Phi(x - Da) + g(a)$; function $g:\mathbb{R}^d \to [0,+\infty)$ is either an indicator function or a sparsity promoting regularizer. Here is considered the case where $ \Phi(x) = \inf_{z \in \mathbb{R}^m} { x-z_2^2 + h(z_2)}$ and $h$ is an even and univariate function on the real line. Connections are drawn between $\Phi$ and the Moreau envelope of $h$. A new sample complexity result concerning the $k$-sparse dictionary problem removes the spurious condition on the coherence of $D$ appearing in previous works. Finally, comments are made on the approximation error of certain families of losses. The derived generalization bounds are of order $\mathcal{O}(\sqrt{\log n /n})$ and valid without any further restrictions on the set of dictionaries with unit-norm columns.
Tasks	Dictionary Learning
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=1931
PDF	http://proceedings.mlr.press/v80/georgogiannis18a/georgogiannis18a.pdf
PWC	https://paperswithcode.com/paper/the-generalization-error-of-dictionary
Repo
Framework