Paper Group AWR 354
Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs. Predicting Merge Conflicts in Collaborative Software Development. Interpretable multiclass classification by MDL-based rule lists. MultiFiT: Efficient Multi-lingual Language Model Fine-tuning. Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification. Deep Archet …
Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs
Title | Competing Against Equilibria in Zero-Sum Games with Evolving Payoffs |
Authors | Adrian Rivera Cardoso, Jacob Abernethy, He Wang, Huan Xu |
Abstract | We study the problem of repeated play in a zero-sum game in which the payoff matrix may change, in a possibly adversarial fashion, on each round; we call these Online Matrix Games. Finding the Nash Equilibrium (NE) of a two player zero-sum game is core to many problems in statistics, optimization, and economics, and for a fixed game matrix this can be easily reduced to solving a linear program. But when the payoff matrix evolves over time our goal is to find a sequential algorithm that can compete with, in a certain sense, the NE of the long-term-averaged payoff matrix. We design an algorithm with small NE regret–that is, we ensure that the long-term payoff of both players is close to minimax optimum in hindsight. Our algorithm achieves near-optimal dependence with respect to the number of rounds and depends poly-logarithmically on the number of available actions of the players. Additionally, we show that the naive reduction, where each player simply minimizes its own regret, fails to achieve the stated objective regardless of which algorithm is used. We also consider the so-called bandit setting, where the feedback is significantly limited, and we provide an algorithm with small NE regret using one-point estimates of each payoff matrix. |
Tasks | |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07723v1 |
https://arxiv.org/pdf/1907.07723v1.pdf | |
PWC | https://paperswithcode.com/paper/competing-against-equilibria-in-zero-sum |
Repo | https://github.com/adrianriv/gans-mode-collapse |
Framework | tf |
Predicting Merge Conflicts in Collaborative Software Development
Title | Predicting Merge Conflicts in Collaborative Software Development |
Authors | Moein Owhadi-Kareshk, Sarah Nadi, Julia Rubin |
Abstract | Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about resolving conflicts before they become large and complicated, is among the ways of dealing with this problem. Existing techniques do this by continuously pulling and merging all combinations of branches in the background to notify developers as soon as a conflict occurs, which is a computationally expensive process. One potential way for reducing this cost is to use a machine-learning based conflict predictor that filters out the merge scenarios that are not likely to have conflicts, ie safe merge scenarios. Aims. In this paper, we assess if conflict prediction is feasible. Method. We design a classifier for predicting merge conflicts, based on 9 light-weight Git feature sets. To evaluate our predictor, we perform a large-scale study on 267, 657 merge scenarios from 744 GitHub repositories in seven programming languages. Results. Our results show that we achieve high f1-scores, varying from 0.95 to 0.97 for different programming languages, when predicting safe merge scenarios. The f1-score is between 0.57 and 0.68 for the conflicting merge scenarios. Conclusions. Predicting merge conflicts is feasible in practice, especially in the context of predicting safe merge scenarios as a pre-filtering step for speculative merging. |
Tasks | |
Published | 2019-07-14 |
URL | https://arxiv.org/abs/1907.06274v1 |
https://arxiv.org/pdf/1907.06274v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-merge-conflicts-in-collaborative |
Repo | https://github.com/ualberta-smr/conflict-prediction |
Framework | none |
Interpretable multiclass classification by MDL-based rule lists
Title | Interpretable multiclass classification by MDL-based rule lists |
Authors | Hugo M. Proença, Matthijs van Leeuwen |
Abstract | Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion. We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion. |
Tasks | Model Selection |
Published | 2019-05-01 |
URL | https://arxiv.org/abs/1905.00328v2 |
https://arxiv.org/pdf/1905.00328v2.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-multiclass-classification-by |
Repo | https://github.com/HMProenca/MDLRuleLists |
Framework | none |
MultiFiT: Efficient Multi-lingual Language Model Fine-tuning
Title | MultiFiT: Efficient Multi-lingual Language Model Fine-tuning |
Authors | Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kardas, Sylvain Gugger, Jeremy Howard |
Abstract | Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data. However, training existing models requires huge amounts of compute, while pretrained cross-lingual models often underperform on low-resource languages. We propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable practitioners to train and fine-tune language models efficiently in their own language. In addition, we propose a zero-shot method using an existing pretrained cross-lingual model. We evaluate our methods on two widely used cross-lingual classification datasets where they outperform models pretrained on orders of magnitude more data and compute. We release all models and code. |
Tasks | Cross-Lingual Document Classification, Document Classification, Language Modelling |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04761v1 |
https://arxiv.org/pdf/1909.04761v1.pdf | |
PWC | https://paperswithcode.com/paper/multifit-efficient-multi-lingual-language |
Repo | https://github.com/lukexyz/Language-Models |
Framework | pytorch |
Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification
Title | Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification |
Authors | Dmitry Babichev, Dmitrii Ostrovskii, Francis Bach |
Abstract | We develop efficient algorithms to train $\ell_1$-regularized linear classifiers with large dimensionality $d$ of the feature space, number of classes $k$, and sample size $n$. Our focus is on a special class of losses that includes, in particular, the multiclass hinge and logistic losses. Our approach combines several ideas: (i) passing to the equivalent saddle-point problem with a quasi-bilinear objective; (ii) applying stochastic mirror descent with a proper choice of geometry which guarantees a favorable accuracy bound; (iii) devising non-uniform sampling schemes to approximate the matrix products. In particular, for the multiclass hinge loss we propose a \textit{sublinear} algorithm with iterations performed in $O(d+n+k)$ arithmetic operations. |
Tasks | |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.03755v1 |
http://arxiv.org/pdf/1902.03755v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-primal-dual-algorithms-for-large |
Repo | https://github.com/flykiller/sublinear-svm |
Framework | none |
Deep Archetypal Analysis
Title | Deep Archetypal Analysis |
Authors | Sebastian Mathias Keller, Maxim Samarin, Mario Wieser, Volker Roth |
Abstract | “Deep Archetypal Analysis” generates latent representations of high-dimensional datasets in terms of fractions of intuitively understandable basic entities called archetypes. The proposed method is an extension of linear “Archetypal Analysis” (AA), an unsupervised method to represent multivariate data points as sparse convex combinations of extremal elements of the dataset. Unlike the original formulation of AA, “Deep AA” can also handle side information and provides the ability for data-driven representation learning which reduces the dependence on expert knowledge. Our method is motivated by studies of evolutionary trade-offs in biology where archetypes are species highly adapted to a single task. Along these lines, we demonstrate that “Deep AA” also lends itself to the supervised exploration of chemical space, marking a distinct starting point for de novo molecular design. In the unsupervised setting we show how “Deep AA” is used on CelebA to identify archetypal faces. These can then be superimposed in order to generate new faces which inherit dominant traits of the archetypes they are based on. |
Tasks | Representation Learning |
Published | 2019-01-30 |
URL | https://arxiv.org/abs/1901.10799v2 |
https://arxiv.org/pdf/1901.10799v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-archetypal-analysis |
Repo | https://github.com/bmda-unibas/DeepArchetypeAnalysis |
Framework | tf |
Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching
Title | Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching |
Authors | Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, Ping Tan |
Abstract | The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity. These methods are limited when high-resolution outputs are needed since the memory and time costs grow cubically as the volume resolution increases. In this paper, we propose a both memory and time efficient cost volume formulation that is complementary to existing multi-view stereo and stereo matching approaches based on 3D cost volumes. First, the proposed cost volume is built upon a standard feature pyramid encoding geometry and context at gradually finer scales. Then, we can narrow the depth (or disparity) range of each stage by the depth (or disparity) map from the previous stage. With gradually higher cost volume resolution and adaptive adjustment of depth (or disparity) intervals, the output is recovered in a coarser to fine manner. We apply the cascade cost volume to the representative MVS-Net, and obtain a 23.1% improvement on DTU benchmark (1st place), with 50.6% and 74.2% reduction in GPU memory and run-time. It is also the state-of-the-art learning-based method on Tanks and Temples benchmark. The statistics of accuracy, run-time and GPU memory on other representative stereo CNNs also validate the effectiveness of our proposed method. |
Tasks | Stereo Matching |
Published | 2019-12-13 |
URL | https://arxiv.org/abs/1912.06378v2 |
https://arxiv.org/pdf/1912.06378v2.pdf | |
PWC | https://paperswithcode.com/paper/cascade-cost-volume-for-high-resolution-multi |
Repo | https://github.com/kwea123/CasMVSNet_pl |
Framework | pytorch |
Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels
Title | Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels |
Authors | Shuang Song, David Berthelot, Afshin Rostamizadeh |
Abstract | We propose using active learning based techniques to further improve the state-of-the-art semi-supervised learning MixMatch algorithm. We provide a thorough empirical evaluation of several active-learning and baseline methods, which successfully demonstrate a significant improvement on the benchmark CIFAR-10, CIFAR-100, and SVHN datasets (as much as 1.5% in absolute accuracy). We also provide an empirical analysis of the cost trade-off between incrementally gathering more labeled versus unlabeled data. This analysis can be used to measure the relative value of labeled/unlabeled data at different points of the learning curve, where we find that although the incremental value of labeled data can be as much as 20x that of unlabeled, it quickly diminishes to less than 3x once more than 2,000 labeled example are observed. Code can be found at https://github.com/google-research/mma. |
Tasks | Active Learning |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00594v2 |
https://arxiv.org/pdf/1912.00594v2.pdf | |
PWC | https://paperswithcode.com/paper/combining-mixmatch-and-active-learning-for-1 |
Repo | https://github.com/google-research/mma |
Framework | tf |
AutoGMM: Automatic Gaussian Mixture Modeling in Python
Title | AutoGMM: Automatic Gaussian Mixture Modeling in Python |
Authors | Thomas L. Athey, Joshua T. Vogelstein |
Abstract | Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation. However, estimating the optimal model for any given number of components is an NP-hard problem, and estimating the number of components is in some respects an even harder problem. In R, a popular package called mclust addresses both of these problems. However, Python has lacked such a package. We therefore introduce AutoGMM, a Python algorithm for automatic Gaussian mixture modeling. AutoGMM builds upon scikit-learn’s AgglomerativeClustering and GaussianMixture classes, with certain modifications to make the results more stable. Empirically, on several different applications, AutoGMM performs approximately as well as mclust. This algorithm is freely available and therefore further shrinks the gap between functionality of R and Python for data science. |
Tasks | Density Estimation |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02688v2 |
https://arxiv.org/pdf/1909.02688v2.pdf | |
PWC | https://paperswithcode.com/paper/autogmm-automatic-gaussian-mixture-modeling |
Repo | https://github.com/tathey1/autogmm |
Framework | none |
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer
Title | BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer |
Authors | Guan-Lin Chao, Ian Lane |
Abstract | An important yet rarely tackled problem in dialogue state tracking (DST) is scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot values. We focus on a specific condition, where the ontology is unknown to the state tracker, but the target slot value (except for none and dontcare), possibly unseen during training, can be found as word segment in the dialogue context. Prior approaches often rely on candidate generation from n-gram enumeration or slot tagger outputs, which can be inefficient or suffer from error propagation. We propose BERT-DST, an end-to-end dialogue state tracker which directly extracts slot values from the dialogue context. We use BERT as dialogue context encoder whose contextualized language representations are suitable for scalable DST to identify slot values from their semantic context. Furthermore, we employ encoder parameter sharing across all slots with two advantages: (1) Number of parameters does not grow linearly with the ontology. (2) Language representation knowledge can be transferred among slots. Empirical evaluation shows BERT-DST with cross-slot parameter sharing outperforms prior work on the benchmark scalable DST datasets Sim-M and Sim-R, and achieves competitive performance on the standard DSTC2 and WOZ 2.0 datasets. |
Tasks | Dialogue State Tracking |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.03040v1 |
https://arxiv.org/pdf/1907.03040v1.pdf | |
PWC | https://paperswithcode.com/paper/bert-dst-scalable-end-to-end-dialogue-state |
Repo | https://github.com/guanlinchao/bert-dst |
Framework | tf |
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation
Title | Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation |
Authors | Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu |
Abstract | Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than $1$ BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than $10$ times) the inference process over AT baselines. |
Tasks | Machine Translation |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08717v2 |
https://arxiv.org/pdf/1911.08717v2.pdf | |
PWC | https://paperswithcode.com/paper/fine-tuning-by-curriculum-learning-for-non |
Repo | https://github.com/lemmonation/fcl-nat |
Framework | none |
Uncertainty Quantification with Generative Models
Title | Uncertainty Quantification with Generative Models |
Authors | Vanessa Böhm, François Lanusse, Uroš Seljak |
Abstract | We develop a generative model-based approach to Bayesian inverse problems, such as image reconstruction from noisy and incomplete images. Our framework addresses two common challenges of Bayesian reconstructions: 1) It makes use of complex, data-driven priors that comprise all available information about the uncorrupted data distribution. 2) It enables computationally tractable uncertainty quantification in the form of posterior analysis in latent and data space. The method is very efficient in that the generative model only has to be trained once on an uncorrupted data set, after that, the procedure can be used for arbitrary corruption types. |
Tasks | Image Reconstruction |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10046v1 |
https://arxiv.org/pdf/1910.10046v1.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-quantification-with-generative |
Repo | https://github.com/bccp/DeepUQ |
Framework | tf |
On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models
Title | On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models |
Authors | Paul Michel, Xian Li, Graham Neubig, Juan Miguel Pino |
Abstract | Adversarial examples — perturbations to the input of a model that elicit large changes in the output — have been shown to be an effective way of assessing the robustness of sequence-to-sequence (seq2seq) models. However, these perturbations only indicate weaknesses in the model if they do not change the input so significantly that it legitimately results in changes in the expected output. This fact has largely been ignored in the evaluations of the growing body of related literature. Using the example of untargeted attacks on machine translation (MT), we propose a new evaluation framework for adversarial attacks on seq2seq models that takes the semantic equivalence of the pre- and post-perturbation input into account. Using this framework, we demonstrate that existing methods may not preserve meaning in general, breaking the aforementioned assumption that source side perturbations should not result in changes in the expected output. We further use this framework to demonstrate that adding additional constraints on attacks allows for adversarial perturbations that are more meaning-preserving, but nonetheless largely change the output sequence. Finally, we show that performing untargeted adversarial training with meaning-preserving attacks is beneficial to the model in terms of adversarial robustness, without hurting test performance. A toolkit implementing our evaluation framework is released at https://github.com/pmichel31415/teapot-nlp. |
Tasks | Machine Translation |
Published | 2019-03-15 |
URL | http://arxiv.org/abs/1903.06620v2 |
http://arxiv.org/pdf/1903.06620v2.pdf | |
PWC | https://paperswithcode.com/paper/on-evaluation-of-adversarial-perturbations |
Repo | https://github.com/pmichel31415/teapot-nlp |
Framework | none |
Multi-Task Regression-based Learning for Autonomous Unmanned Aerial Vehicle Flight Control within Unstructured Outdoor Environments
Title | Multi-Task Regression-based Learning for Autonomous Unmanned Aerial Vehicle Flight Control within Unstructured Outdoor Environments |
Authors | Bruna G. Maciel-Pearson, Samet Akcay, Amir Atapour-Abarghouei, Christopher Holder, Toby P. Breckon |
Abstract | Increased growth in the global Unmanned Aerial Vehicles (UAV) (drone) industry has expanded possibilities for fully autonomous UAV applications. A particular application which has in part motivated this research is the use of UAV in wide area search and surveillance operations in unstructured outdoor environments. The critical issue with such environments is the lack of structured features that could aid in autonomous flight, such as road lines or paths. In this paper, we propose an End-to-End Multi-Task Regression-based Learning approach capable of defining flight commands for navigation and exploration under the forest canopy, regardless of the presence of trails or additional sensors (i.e. GPS). Training and testing are performed using a software in the loop pipeline which allows for a detailed evaluation against state-of-the-art pose estimation techniques. Our extensive experiments demonstrate that our approach excels in performing dense exploration within the required search perimeter, is capable of covering wider search regions, generalises to previously unseen and unexplored environments and outperforms contemporary state-of-the-art techniques. |
Tasks | Autonomous Flight (Dense Forest), Autonomous Navigation, Pose Estimation |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.08320v1 |
https://arxiv.org/pdf/1907.08320v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-regression-based-learning-for |
Repo | https://github.com/brunapearson/mtrl-auto-uav |
Framework | tf |
Audio-Visual Scene-Aware Dialog
Title | Audio-Visual Scene-Aware Dialog |
Authors | Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh |
Abstract | We introduce the task of scene-aware dialog. Our goal is to generate a complete and natural response to a question about a scene, given video and audio of the scene and the history of previous turns in the dialog. To answer successfully, agents must ground concepts from the question in the video while leveraging contextual cues from the dialog history. To benchmark this task, we introduce the Audio Visual Scene-Aware Dialog (AVSD) Dataset. For each of more than 11,000 videos of human actions from the Charades dataset, our dataset contains a dialog about the video, plus a final summary of the video by one of the dialog participants. We train several baseline systems for this task and evaluate the performance of the trained models using both qualitative and quantitative metrics. Our results indicate that models must utilize all the available inputs (video, audio, question, and dialog history) to perform best on this dataset. |
Tasks | Scene-Aware Dialogue |
Published | 2019-01-25 |
URL | https://arxiv.org/abs/1901.09107v2 |
https://arxiv.org/pdf/1901.09107v2.pdf | |
PWC | https://paperswithcode.com/paper/audio-visual-scene-aware-dialog |
Repo | https://github.com/batra-mlp-lab/avsd |
Framework | pytorch |