Paper Group NANR 171
Secretary Ranking with Minimal Inversions. Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent. Macrocosm: Social Media Persona Linking for Open Source Intelligence Applications. Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks. FASPell: A Fast, Adaptable, Simple, Power …
Secretary Ranking with Minimal Inversions
Title | Secretary Ranking with Minimal Inversions |
Authors | Sepehr Assadi, Eric Balkanski, Renato Leme |
Abstract | We study a secretary problem which captures the task of ranking in online settings. We term this problem the secretary ranking problem: elements from an ordered set arrive in random order and instead of picking the maximum element, the algorithm is asked to assign a rank, or position, to each of the elements. The rank assigned is irrevocable and is given knowing only the pairwise comparisons with elements previously arrived. The goal is to minimize the distance of the rank produced to the true rank of the elements measured by the Kendall-Tau distance, which corresponds to the number of pairs that are inverted with respect to the true order. Our main result is a matching upper and lower bound for the secretary ranking problem. We present an algorithm that ranks n elements with only O(n^{3/2}) inversions in expectation, and show that any algorithm necessarily suffers \Omega(n^{3/2}) inversions when there are n available positions. In terms of techniques, the analysis of our algorithm draws connections to linear probing in the hashing literature, while our lower bound result relies on a general anti-concentration bound for a generic balls and bins sampling process. We also consider the case where the number of positions m can be larger than the number of secretaries n and provide an improved bound by showing a connection of this problem with random binary trees. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8390-secretary-ranking-with-minimal-inversions |
http://papers.nips.cc/paper/8390-secretary-ranking-with-minimal-inversions.pdf | |
PWC | https://paperswithcode.com/paper/secretary-ranking-with-minimal-inversions |
Repo | |
Framework | |
Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent
Title | Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent |
Authors | Minhao Cheng, Wei Wei, Cho-Jui Hsieh |
Abstract | Recent research has demonstrated that goal-oriented dialogue agents trained on large datasets can achieve striking performance when interacting with human users. In real world applications, however, it is important to ensure that the agent performs smoothly interacting with not only regular users but also those malicious ones who would attack the system through interactions in order to achieve goals for their own advantage. In this paper, we develop algorithms to evaluate the robustness of a dialogue agent by carefully designed attacks using adversarial agents. Those attacks are performed in both black-box and white-box settings. Furthermore, we demonstrate that adversarial training using our attacks can significantly improve the robustness of a goal-oriented dialogue system. On a case-study of the negotiation agent developed by (Lewis et al., 2017), our attacks reduced the average advantage of rewards between the attacker and the trained RL-based agent from 2.68 to -5.76 on a scale from -10 to 10 for randomized goals. Moreover, we show that with the adversarial training, we are able to improve the robustness of negotiation agents by 1.5 points on average against all our attacks. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1336/ |
https://www.aclweb.org/anthology/N19-1336 | |
PWC | https://paperswithcode.com/paper/evaluating-and-enhancing-the-robustness-of |
Repo | |
Framework | |
Macrocosm: Social Media Persona Linking for Open Source Intelligence Applications
Title | Macrocosm: Social Media Persona Linking for Open Source Intelligence Applications |
Authors | Graham Horwood, Ning Yu, Thomas Boggs, Changjiang Yang, Chad Holvenstot |
Abstract | Online Social Networks (OSNs) provide a wealth of intelligence to analysts in assisting tasks such as tracking cyber-attacks, human trafficking activities, and misinformation campaigns. Open Source Intelligence (OSINT) analysts monitoring social media typically track users of interest manually, spending hours or days linking personas of interest within and across OSNs. This paper presents a multi-modal analysis of cross-contextual online social media (Macrocosm), a data-driven approach to detect similarities among user personas over six modalities: usernames, patterns-of-life, stylometry, semantic content, image content, and social network associations. It fuses component modalities into an ensemble similarity judgment. To the best of our knowledge, Macrocosm is the first research effort to apply Siamese neural networks to the persona linking problem. An important lesson is that SNNs{—}deep learning models that infer a distance function from high-dimensional data{—}consistently provide improvements over traditional models in testing. |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1479/ |
https://www.aclweb.org/anthology/D19-1479 | |
PWC | https://paperswithcode.com/paper/macrocosm-social-media-persona-linking-for |
Repo | |
Framework | |
Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks
Title | Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks |
Authors | Guodong Zhang, James Martens, Roger B. Grosse |
Abstract | Natural gradient descent has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, especially for \emph{non-linear} networks. In this work, we analyze for the first time the speed of convergence to global optimum for natural gradient descent on non-linear neural networks with the squared error loss. We identify two conditions which guarantee the global convergence: (1) the Jacobian matrix (of network’s output for all training cases w.r.t the parameters) is full row rank and (2) the Jacobian matrix is stable for small perturbations around the initialization. For two-layer ReLU neural networks (i.e. with one hidden layer), we prove that these two conditions do hold throughout the training under the assumptions that the inputs do not degenerate and the network is over-parameterized. We further extend our analysis to more general loss function with similar convergence property. Lastly, we show that K-FAC, an approximate natural gradient descent method, also converges to global minima under the same assumptions. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9020-fast-convergence-of-natural-gradient-descent-for-over-parameterized-neural-networks |
http://papers.nips.cc/paper/9020-fast-convergence-of-natural-gradient-descent-for-over-parameterized-neural-networks.pdf | |
PWC | https://paperswithcode.com/paper/fast-convergence-of-natural-gradient-descent-1 |
Repo | |
Framework | |
FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm
Title | FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm |
Authors | Yuzhong Hong, Xianguo Yu, Neng He, Nan Liu, Junhui Liu |
Abstract | We propose a Chinese spell checker {–} FASPell based on a new paradigm which consists of a denoising autoencoder (DAE) and a decoder. In comparison with previous state-of-the-art models, the new paradigm allows our spell checker to be Faster in computation, readily Adaptable to both simplified and traditional Chinese texts produced by either humans or machines, and to require much Simpler structure to be as much Powerful in both error detection and correction. These four achievements are made possible because the new paradigm circumvents two bottlenecks. First, the DAE curtails the amount of Chinese spell checking data needed for supervised learning (to {\textless}10k sentences) by leveraging the power of unsupervisedly pre-trained masked language model as in BERT, XLNet, MASS etc. Second, the decoder helps to eliminate the use of confusion set that is deficient in flexibility and sufficiency of utilizing the salient feature of Chinese character similarity. |
Tasks | Denoising, Language Modelling |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5522/ |
https://www.aclweb.org/anthology/D19-5522 | |
PWC | https://paperswithcode.com/paper/faspell-a-fast-adaptable-simple-powerful |
Repo | |
Framework | |
Automated Stock Price Prediction Using Machine Learning
Title | Automated Stock Price Prediction Using Machine Learning |
Authors | Wassim El-Hajj Mariam Mokalled, Mohamad Jaber |
Abstract | |
Tasks | Stock Price Prediction |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/W19-6403/ |
https://www.aclweb.org/anthology/W19-6403 | |
PWC | https://paperswithcode.com/paper/automated-stock-price-prediction-using |
Repo | |
Framework | |
ConvSent at CLPsych 2019 Task A: Using Post-level Sentiment Features for Suicide Risk Prediction on Reddit
Title | ConvSent at CLPsych 2019 Task A: Using Post-level Sentiment Features for Suicide Risk Prediction on Reddit |
Authors | Kristen Allen, Shrey Bagroy, Alex Davis, Tamar Krishnamurti |
Abstract | This work aims to infer mental health status from public text for early detection of suicide risk. It contributes to Shared Task A in the 2019 CLPsych workshop by predicting users{'} suicide risk given posts in the Reddit subforum r/SuicideWatch. We use a convolutional neural network to incorporate LIWC information at the Reddit post level about topics discussed, first-person focus, emotional experience, grammatical choices, and thematic style. In sorting users into one of four risk categories, our best system{'}s macro-averaged F1 score was 0.50 on the withheld test set. The work demonstrates the predictive power of the Linguistic Inquiry and Word Count dictionary, in conjunction with a convolutional network and holistic consideration of each post and user. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-3024/ |
https://www.aclweb.org/anthology/W19-3024 | |
PWC | https://paperswithcode.com/paper/convsent-at-clpsych-2019-task-a-using-post |
Repo | |
Framework | |
Traditional and Heavy Tailed Self Regularization in Neural Network Models
Title | Traditional and Heavy Tailed Self Regularization in Neural Network Models |
Authors | Charles H. Martin, Michael W. Mahoney |
Abstract | Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization. For smaller and/or older DNNs, this Implicit Self-Regularization is like traditional Tikhonov regularization, in that there is a “size scale” separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of Heavy-Tailed Self-Regularization, similar to the self-organization seen in the statistical physics of disordered systems. This implicit Self-Regularization can depend strongly on the many knobs of the training process. By exploiting the generalization gap phenomena, we demonstrate that we can cause a small model to exhibit all 5+1 phases of training simply by changing the batch size. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SJeFNoRcFQ |
https://openreview.net/pdf?id=SJeFNoRcFQ | |
PWC | https://paperswithcode.com/paper/traditional-and-heavy-tailed-self-1 |
Repo | |
Framework | |
Generative Feature Matching Networks
Title | Generative Feature Matching Networks |
Authors | Cicero Nogueira dos Santos, Inkit Padhi, Pierre Dognin, Youssef Mroueh |
Abstract | We propose a non-adversarial feature matching-based approach to train generative models. Our approach, Generative Feature Matching Networks (GFMN), leverages pretrained neural networks such as autoencoders and ConvNet classifiers to perform feature extraction. We perform an extensive number of experiments with different challenging datasets, including ImageNet. Our experimental results demonstrate that, due to the expressiveness of the features from pretrained ImageNet classifiers, even by just matching first order statistics, our approach can achieve state-of-the-art results for challenging benchmarks such as CIFAR10 and STL10. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=Syfz6sC9tQ |
https://openreview.net/pdf?id=Syfz6sC9tQ | |
PWC | https://paperswithcode.com/paper/generative-feature-matching-networks |
Repo | |
Framework | |
Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting
Title | Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting |
Authors | Muming Zhao, Jian Zhang, Chongyang Zhang, Wenjun Zhang |
Abstract | Crowd counting is a challenging task in the presence of drastic scale variations, the clutter background, and severe occlusions, etc. Existing CNN-based counting methods tackle these challenges mainly by fusing either multi-scale or multi-context features to generate robust representations. In this paper, we propose to address these issues by leveraging the heterogeneous attributes compounded in the density map. We identify three geometric/semantic/numeric attributes essentially important to the density estimation, and demonstrate how to effectively utilize these heterogeneous attributes to assist the crowd counting by formulating them into multiple auxiliary tasks. With the multi-fold regularization effects induced by the auxiliary tasks, the backbone CNN model is driven to embed desired properties explicitly and thus gains robust representations towards more accurate density estimation. Extensive experiments on three challenging crowd counting datasets have demonstrated the effectiveness of the proposed approach. |
Tasks | Crowd Counting, Density Estimation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Zhao_Leveraging_Heterogeneous_Auxiliary_Tasks_to_Assist_Crowd_Counting_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_Leveraging_Heterogeneous_Auxiliary_Tasks_to_Assist_Crowd_Counting_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-heterogeneous-auxiliary-tasks-to |
Repo | |
Framework | |
The GAN Landscape: Losses, Architectures, Regularization, and Normalization
Title | The GAN Landscape: Losses, Architectures, Regularization, and Normalization |
Authors | Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly |
Abstract | Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion. While they were successfully applied to many problems, training a GAN is a notoriously challenging task and requires a significant amount of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of ``tricks”. The success in many practical applications coupled with the lack of a measure to quantify the failure modes of GANs resulted in a plethora of proposed losses, regularization and normalization schemes, and neural architectures. In this work we take a sober view of the current state of GANs from a practical perspective. We reproduce the current state of the art and go beyond fairly exploring the GAN landscape. We discuss common pitfalls and reproducibility issues, open-source our code on Github, and provide pre-trained models on TensorFlow Hub. | |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rkGG6s0qKQ |
https://openreview.net/pdf?id=rkGG6s0qKQ | |
PWC | https://paperswithcode.com/paper/the-gan-landscape-losses-architectures-1 |
Repo | |
Framework | |
Unlabeled Disentangling of GANs with Guided Siamese Networks
Title | Unlabeled Disentangling of GANs with Guided Siamese Networks |
Authors | Gökhan Yildirim, Nikolay Jetchev, Urs Bergmann |
Abstract | Disentangling underlying generative factors of a data distribution is important for interpretability and generalizable representations. In this paper, we introduce two novel disentangling methods. Our first method, Unlabeled Disentangling GAN (UD-GAN, unsupervised), decomposes the latent noise by generating similar/dissimilar image pairs and it learns a distance metric on these pairs with siamese networks and a contrastive loss. This pairwise approach provides consistent representations for similar data points. Our second method (UD-GAN-G, weakly supervised) modifies the UD-GAN with user-defined guidance functions, which restrict the information that goes into the siamese networks. This constraint helps UD-GAN-G to focus on the desired semantic variations in the data. We show that both our methods outperform existing unsupervised approaches in quantitative metrics that measure semantic accuracy of the learned representations. In addition, we illustrate that simple guidance functions we use in UD-GAN-G allow us to directly capture the desired variations in the data. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=H1e0-30qKm |
https://openreview.net/pdf?id=H1e0-30qKm | |
PWC | https://paperswithcode.com/paper/unlabeled-disentangling-of-gans-with-guided |
Repo | |
Framework | |
Adaptive Density Map Generation for Crowd Counting
Title | Adaptive Density Map Generation for Crowd Counting |
Authors | Jia Wan, Antoni Chan |
Abstract | Crowd counting is an important topic in computer vision due to its practical usage in surveillance systems. The typical design of crowd counting algorithms is divided into two steps. First, the ground-truth density maps of crowd images are generated from the ground-truth dot maps (density map generation), e.g., by convolving with a Gaussian kernel. Second, deep learning models are designed to predict a density map from an input image (density map estimation). Most research efforts have concentrated on the density map estimation problem, while the problem of density map generation has not been adequately explored. In particular, the density map could be considered as an intermediate representation used to train a crowd counting network. In the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we first show the impact of different density maps and that better ground-truth density maps can be obtained by refining the existing ones using a learned refinement network, which is jointly trained with the counter. Then, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework. The experiment results on popular counting datasets confirm the effectiveness of the proposed learnable density map representations. |
Tasks | Crowd Counting |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Wan_Adaptive_Density_Map_Generation_for_Crowd_Counting_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Wan_Adaptive_Density_Map_Generation_for_Crowd_Counting_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-density-map-generation-for-crowd |
Repo | |
Framework | |
Subword-based Compact Reconstruction of Word Embeddings
Title | Subword-based Compact Reconstruction of Word Embeddings |
Authors | Shota Sasaki, Jun Suzuki, Kentaro Inui |
Abstract | The idea of subword-based word embeddings has been proposed in the literature, mainly for solving the out-of-vocabulary (OOV) word problem observed in standard word-based word embeddings. In this paper, we propose a method of reconstructing pre-trained word embeddings using subword information that can effectively represent a large number of subword embeddings in a considerably small fixed space. The key techniques of our method are twofold: memory-shared embeddings and a variant of the key-value-query self-attention mechanism. Our experiments show that our reconstructed subword-based embeddings can successfully imitate well-trained word embeddings in a small fixed space while preventing quality degradation across several linguistic benchmark datasets, and can simultaneously predict effective embeddings of OOV words. We also demonstrate the effectiveness of our reconstruction method when we apply them to downstream tasks. |
Tasks | Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1353/ |
https://www.aclweb.org/anthology/N19-1353 | |
PWC | https://paperswithcode.com/paper/subword-based-compact-reconstruction-of-word |
Repo | |
Framework | |
Label Propagation Networks
Title | Label Propagation Networks |
Authors | Kojin Oshiba, Nir Rosenfeld, Amir Globerson |
Abstract | Graph networks have recently attracted considerable interest, and in particular in the context of semi-supervised learning. These methods typically work by generating node representations that are propagated throughout a given weighted graph. Here we argue that for semi-supervised learning, it is more natural to consider propagating labels in the graph instead. Towards this end, we propose a differentiable neural version of the classic Label Propagation (LP) algorithm. This formulation can be used for learning edge weights, unlike other methods where weights are set heuristically. Starting from a layer implementing a single iteration of LP, we proceed by adding several important non-linear steps that significantly enhance the label-propagating mechanism. Experiments in two distinct settings demonstrate the utility of our approach. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=r1g7y2RqYX |
https://openreview.net/pdf?id=r1g7y2RqYX | |
PWC | https://paperswithcode.com/paper/label-propagation-networks |
Repo | |
Framework | |