Paper Group AWR 276
75 Languages, 1 Model: Parsing Universal Dependencies Universally. FedMD: Heterogenous Federated Learning via Model Distillation. Learning distant cause and effect using only local and immediate credit assignment. Competing Neural Networks for Robust Control of Nonlinear Systems. Bayesian Estimation of Mixed Multinomial Logit Models: Advances and S …
75 Languages, 1 Model: Parsing Universal Dependencies Universally
Title | 75 Languages, 1 Model: Parsing Universal Dependencies Universally |
Authors | Dan Kondratyuk, Milan Straka |
Abstract | We present UDify, a multilingual multi-task model capable of accurately predicting universal part-of-speech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages. By leveraging a multilingual BERT self-attention model pretrained on 104 languages, we found that fine-tuning it on all datasets concatenated together with simple softmax classifiers for each UD task can result in state-of-the-art UPOS, UFeats, Lemmas, UAS, and LAS scores, without requiring any recurrent or language-specific components. We evaluate UDify for multilingual learning, showing that low-resource languages benefit the most from cross-linguistic annotations. We also evaluate for zero-shot learning, with results suggesting that multilingual training provides strong UD predictions even for languages that neither UDify nor BERT have ever been trained on. Code for UDify is available at https://github.com/hyperparticle/udify. |
Tasks | Zero-Shot Learning |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.02099v3 |
https://arxiv.org/pdf/1904.02099v3.pdf | |
PWC | https://paperswithcode.com/paper/75-languages-1-model-parsing-universal |
Repo | https://github.com/hyperparticle/udify |
Framework | pytorch |
FedMD: Heterogenous Federated Learning via Model Distillation
Title | FedMD: Heterogenous Federated Learning via Model Distillation |
Authors | Daliang Li, Junpu Wang |
Abstract | Federated learning enables the creation of a powerful centralized model without compromising data privacy of multiple participants. While successful, it does not incorporate the case where each participant independently designs its own model. Due to intellectual property concerns and heterogeneous nature of tasks and data, this is a widespread requirement in applications of federated learning to areas such as health care and AI as a service. In this work, we use transfer learning and knowledge distillation to develop a universal framework that enables federated learning when each agent owns not only their private data, but also uniquely designed models. We test our framework on the MNIST/FEMNIST dataset and the CIFAR10/CIFAR100 dataset and observe fast improvement across all participating models. With 10 distinct participants, the final test accuracy of each model on average receives a 20% gain on top of what’s possible without collaboration and is only a few percent lower than the performance each model would have obtained if all private datasets were pooled and made directly available for all participants. |
Tasks | Transfer Learning |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03581v1 |
https://arxiv.org/pdf/1910.03581v1.pdf | |
PWC | https://paperswithcode.com/paper/fedmd-heterogenous-federated-learning-via |
Repo | https://github.com/diogenes0319/FedMD_clean |
Framework | tf |
Learning distant cause and effect using only local and immediate credit assignment
Title | Learning distant cause and effect using only local and immediate credit assignment |
Authors | David Rawlinson, Abdelrahman Ahmed, Gideon Kowadlo |
Abstract | We present a recurrent neural network memory that uses sparse coding to create a combinatoric encoding of sequential inputs. Using several examples, we show that the network can associate distant causes and effects in a discrete stochastic process, predict partially-observable higher-order sequences, and enable a DQN agent to navigate a maze by giving it memory. The network uses only biologically-plausible, local and immediate credit assignment. Memory requirements are typically one order of magnitude less than existing LSTM, GRU and autoregressive feed-forward sequence learning models. The most significant limitation of the memory is generalization to unseen input sequences. We explore this limitation by measuring next-word prediction perplexity on the Penn Treebank dataset. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11589v2 |
https://arxiv.org/pdf/1905.11589v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-distant-cause-and-effect-using-only |
Repo | https://github.com/maximecb/gym-minigrid |
Framework | pytorch |
Competing Neural Networks for Robust Control of Nonlinear Systems
Title | Competing Neural Networks for Robust Control of Nonlinear Systems |
Authors | Babak Rahmani, Damien Loterie, Eirini Kakkava, Navid Borhani, Uğur Teğin, Demetri Psaltis, Christophe Moser |
Abstract | The output of physical systems is often accessible by measurements such as the 3D position of a robotic arm actuated by many actuators or the speckle patterns formed by shining the spot of a laser pointer on a wall. The selection of the input of such a system (actuators and the shape of the laser spot respectively) to obtain a desired output is difficult because it is an ill-posed problem i.e. there are multiple inputs yielding the same output. In this paper, we propose an approach that provides a robust solution to this dilemma for any physical system. We show that it is possible to find the appropriate input of a system that results in a desired output, despite the input-output relation being nonlinear and\or with incomplete measurements of the systems variables. We showcase our approach using an extremely ill-posed problem in imaging. We demonstrate the projection of arbitrary shapes through a multimode fiber (MMF) when a sample of intensity-only measurements are taken at the output. We show image projection fidelity as high as ~90 %, which is on par with the gold standard methods which characterize the system fully by phase and amplitude measurements. The generality as well as simplicity of the proposed approach provides a new way of target-oriented control in real-world applications. |
Tasks | |
Published | 2019-06-29 |
URL | https://arxiv.org/abs/1907.00126v2 |
https://arxiv.org/pdf/1907.00126v2.pdf | |
PWC | https://paperswithcode.com/paper/multimode-fiber-projector |
Repo | https://github.com/Babak70/Projector_network |
Framework | tf |
Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations
Title | Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations |
Authors | Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi |
Abstract | Variational Bayes (VB) methods have emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of mixed multinomial logit (MMNL) models. It has been established that VB is substantially faster than MCMC at practically no compromises in predictive accuracy. In this paper, we address two critical gaps concerning the usage and understanding of VB for MMNL. First, extant VB methods are limited to utility specifications involving only individual-specific taste parameters. Second, the finite-sample properties of VB estimators and the relative performance of VB, MCMC and maximum simulated likelihood estimation (MSLE) are not known. To address the former, this study extends several VB methods for MMNL to admit utility specifications including both fixed and random utility parameters. To address the latter, we conduct an extensive simulation-based evaluation to benchmark the extended VB methods against MCMC and MSLE in terms of estimation times, parameter recovery and predictive accuracy. The results suggest that all VB variants with the exception of the ones relying on an alternative variational lower bound constructed with the help of the modified Jensen’s inequality perform as well as MCMC and MSLE at prediction and parameter recovery. In particular, VB with nonconjugate variational message passing and the delta-method (VB-NCVMP-Delta) is up to 16 times faster than MCMC and MSLE. Thus, VB-NCVMP-Delta can be an attractive alternative to MCMC and MSLE for fast, scalable and accurate estimation of MMNL models. |
Tasks | |
Published | 2019-04-07 |
URL | https://arxiv.org/abs/1904.03647v4 |
https://arxiv.org/pdf/1904.03647v4.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-estimation-of-mixed-multinomial |
Repo | https://github.com/RicoKrueger/bayes_mxl |
Framework | none |
Active Domain Randomization
Title | Active Domain Randomization |
Authors | Bhairav Mehta, Manfred Diaz, Florian Golemo, Christopher J. Pal, Liam Paull |
Abstract | Domain randomization is a popular technique for improving domain transfer, often used in a zero-shot setting when the target domain is unknown or cannot easily be used for training. In this work, we empirically examine the effects of domain randomization on agent generalization. Our experiments show that domain randomization may lead to suboptimal, high-variance policies, which we attribute to the uniform sampling of environment parameters. We propose Active Domain Randomization, a novel algorithm that learns a parameter sampling strategy. Our method looks for the most informative environment variations within the given randomization ranges by leveraging the discrepancies of policy rollouts in randomized and reference environment instances. We find that training more frequently on these instances leads to better overall agent generalization. Our experiments across various physics-based simulated and real-robot tasks show that this enhancement leads to more robust, consistent policies. |
Tasks | |
Published | 2019-04-09 |
URL | https://arxiv.org/abs/1904.04762v2 |
https://arxiv.org/pdf/1904.04762v2.pdf | |
PWC | https://paperswithcode.com/paper/active-domain-randomization |
Repo | https://github.com/montrealrobotics/active-domainrand |
Framework | none |
VisualBERT: A Simple and Performant Baseline for Vision and Language
Title | VisualBERT: A Simple and Performant Baseline for Vision and Language |
Authors | Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang |
Abstract | We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments. |
Tasks | Language Modelling, Visual Question Answering, Visual Reasoning |
Published | 2019-08-09 |
URL | https://arxiv.org/abs/1908.03557v1 |
https://arxiv.org/pdf/1908.03557v1.pdf | |
PWC | https://paperswithcode.com/paper/visualbert-a-simple-and-performant-baseline |
Repo | https://github.com/uclanlp/visualbert |
Framework | pytorch |
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Title | Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks |
Authors | Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu |
Abstract | Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under l2 loss by gradient descent with infinitesimally small learning rate (b) kernel regression with respect to so-called Neural Tangent Kernels (NTKs) (Jacot et al., 2018). An efficient algorithm to compute the NTK, as well as its convolutional counterparts, appears in Arora et al. (2019a), which allowed studying performance of infinitely wide nets on datasets like CIFAR-10. However, super-quadratic running time of kernel methods makes them best suited for small-data tasks. We report results suggesting neural tangent kernels perform strongly on low-data tasks. 1. On a standard testbed of classification/regression tasks from the UCI database, NTK SVM beats the previous gold standard, Random Forests (RF), and also the corresponding finite nets. 2. On CIFAR-10 with 10 - 640 training samples, Convolutional NTK consistently beats ResNet-34 by 1% - 3%. 3. On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance. 4. Comparing the performance of NTK with the finite-width net it was derived from, NTK behavior starts at lower net widths than suggested by theoretical analysis(Arora et al., 2019a). NTK’s efficacy may trace to lower variance of output. |
Tasks | Few-Shot Image Classification, Image Classification, Transfer Learning |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.01663v3 |
https://arxiv.org/pdf/1910.01663v3.pdf | |
PWC | https://paperswithcode.com/paper/harnessing-the-power-of-infinitely-wide-deep-1 |
Repo | https://github.com/LeoYu/neural-tangent-kernel-UCI |
Framework | none |
MAP-Net: Multi Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery
Title | MAP-Net: Multi Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery |
Authors | Qing Zhu, Cheng Liao, Han Hu, Xiaoming Mei, Haifeng Li |
Abstract | Accurately and efficiently extracting building footprints from a wide range of remote sensed imagery remains a challenge due to their complex structure, variety of scales and diverse appearances. Existing convolutional neural network (CNN)-based building extraction methods are complained that they cannot detect the tiny buildings because the spatial information of CNN feature maps are lost during repeated pooling operations of the CNN, and the large buildings still have inaccurate segmentation edges. Moreover, features extracted by a CNN are always partial which restricted by the size of the respective field, and large-scale buildings with low texture are always discontinuous and holey when extracted. This paper proposes a novel multi attending path neural network (MAP-Net) for accurately extracting multiscale building footprints and precise boundaries. MAP-Net learns spatial localization-preserved multiscale features through a multi-parallel path in which each stage is gradually generated to extract high-level semantic features with fixed resolution. Then, an attention module adaptively squeezes channel-wise features from each path for optimization, and a pyramid spatial pooling module captures global dependency for refining discontinuous building footprints. Experimental results show that MAP-Net outperforms state-of-the-art (SOTA) algorithms in boundary localization accuracy as well as continuity of large buildings. Specifically, our method achieved 0.68%, 1.74%, 1.46% precision, and 1.50%, 1.53%, 0.82% IoU score improvement without increasing computational complexity compared with the latest HRNetv2 on the Urban 3D, Deep Globe and WHU datasets, respectively. The TensorFlow implementation is available at https://github.com/lehaifeng/MAPNet. |
Tasks | |
Published | 2019-10-26 |
URL | https://arxiv.org/abs/1910.12060v1 |
https://arxiv.org/pdf/1910.12060v1.pdf | |
PWC | https://paperswithcode.com/paper/map-net-multi-attending-path-neural-network |
Repo | https://github.com/lehaifeng/MAPNet |
Framework | tf |
EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
Title | EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference |
Authors | Abhilasha Ravichander, Aakanksha Naik, Carolyn Rose, Eduard Hovy |
Abstract | Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2%), but has limited verbal reasoning capabilities (-8.1%). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding. |
Tasks | Natural Language Inference |
Published | 2019-01-11 |
URL | https://arxiv.org/abs/1901.03735v2 |
https://arxiv.org/pdf/1901.03735v2.pdf | |
PWC | https://paperswithcode.com/paper/equate-a-benchmark-evaluation-framework-for |
Repo | https://github.com/AbhilashaRavichander/EQUATE |
Framework | none |
Affordance Learning for End-to-End Visuomotor Robot Control
Title | Affordance Learning for End-to-End Visuomotor Robot Control |
Authors | Aleksi Hämäläinen, Karol Arndt, Ali Ghadirzadeh, Ville Kyrki |
Abstract | Training end-to-end deep robot policies requires a lot of domain-, task-, and hardware-specific data, which is often costly to provide. In this work, we propose to tackle this issue by employing a deep neural network with a modular architecture, consisting of separate perception, policy, and trajectory parts. Each part of the system is trained fully on synthetic data or in simulation. The data is exchanged between parts of the system as low-dimensional latent representations of affordances and trajectories. The performance is then evaluated in a zero-shot transfer scenario using Franka Panda robot arm. Results demonstrate that a low-dimensional representation of scene affordances extracted from an RGB image is sufficient to successfully train manipulator policies. We also introduce a method for affordance dataset generation, which is easily generalizable to new tasks, objects and environments, and requires no manual pixel labeling. |
Tasks | |
Published | 2019-03-10 |
URL | http://arxiv.org/abs/1903.04053v1 |
http://arxiv.org/pdf/1903.04053v1.pdf | |
PWC | https://paperswithcode.com/paper/affordance-learning-for-end-to-end-visuomotor |
Repo | https://github.com/gamleksi/BlenderDomainRandomizer |
Framework | none |
Latent Normalizing Flows for Discrete Sequences
Title | Latent Normalizing Flows for Discrete Sequences |
Authors | Zachary M. Ziegler, Alexander M. Rush |
Abstract | Normalizing flows are a powerful class of generative models for continuous random variables, showing both strong model flexibility and the potential for non-autoregressive generation. These benefits are also desired when modeling discrete random variables such as text, but directly applying normalizing flows to discrete sequences poses significant additional challenges. We propose a VAE-based generative model which jointly learns a normalizing flow-based distribution in the latent space and a stochastic mapping to an observed discrete space. In this setting, we find that it is crucial for the flow-based distribution to be highly multimodal. To capture this property, we propose several normalizing flow architectures to maximize model flexibility. Experiments consider common discrete sequence tasks of character-level language modeling and polyphonic music generation. Our results indicate that an autoregressive flow-based model can match the performance of a comparable autoregressive baseline, and a non-autoregressive flow-based model can improve generation speed with a penalty to performance. |
Tasks | Language Modelling, Music Generation |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10548v4 |
https://arxiv.org/pdf/1901.10548v4.pdf | |
PWC | https://paperswithcode.com/paper/latent-normalizing-flows-for-discrete |
Repo | https://github.com/harvardnlp/TextFlow |
Framework | pytorch |
On Solving the 2-Dimensional Greedy Shooter Problem for UAVs
Title | On Solving the 2-Dimensional Greedy Shooter Problem for UAVs |
Authors | Loren Anderson, Sahitya Senapathy |
Abstract | Unmanned Aerial Vehicles (UAVs), autonomously-guided aircraft, are widely used for tasks involving surveillance and reconnaissance. A version of the pursuit-evasion problems centered around UAVs and its variants has been extensively studied in recent years due to numerous breakthroughs in AI. We present an approach to UAV pursuit-evasion in a 2D aerial-engagement environment using reinforcement learning (RL), a machine learning paradigm concerned with goal-oriented algorithms. In this work, a UAV wielding the greedy shooter strategy engages with a UAV trained using deep Q-learning techniques. Simulated results show that the latter UAV wins every engagement in which the UAVs are suffciently separated during initialization. This approach highlights an exhaustive and robust application of reinforcement learning to pursuit-evasion that provides insight into effective strategies for UAV flight and interaction. |
Tasks | Q-Learning |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.01419v1 |
https://arxiv.org/pdf/1911.01419v1.pdf | |
PWC | https://paperswithcode.com/paper/on-solving-the-2-dimensional-greedy-shooter |
Repo | https://github.com/LorenJAnderson/uav-2d-greedyshooter-rl |
Framework | pytorch |
Auptimizer – an Extensible, Open-Source Framework for Hyperparameter Tuning
Title | Auptimizer – an Extensible, Open-Source Framework for Hyperparameter Tuning |
Authors | Jiayi Liu, Samarth Tripathi, Unmesh Kurup, Mohak Shah |
Abstract | Tuning machine learning models at scale, especially finding the right hyperparameter values, can be difficult and time-consuming. In addition to the computational effort required, this process also requires some ancillary efforts including engineering tasks (e.g., job scheduling) as well as more mundane tasks (e.g., keeping track of the various parameters and associated results). We present Auptimizer, a general Hyperparameter Optimization (HPO) framework to help data scientists speed up model tuning and bookkeeping. With Auptimizer, users can use all available computing resources in distributed settings for model training. The user-friendly system design simplifies creating, controlling, and tracking of a typical machine learning project. The design also allows researchers to integrate new HPO algorithms. To demonstrate its flexibility, we show how Auptimizer integrates a few major HPO techniques (from random search to neural architecture search). The code is available at https://github.com/LGE-ARC-AdvancedAI/auptimizer. |
Tasks | Hyperparameter Optimization, Neural Architecture Search |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02522v1 |
https://arxiv.org/pdf/1911.02522v1.pdf | |
PWC | https://paperswithcode.com/paper/auptimizer-an-extensible-open-source |
Repo | https://github.com/LGE-ARC-AdvancedAI/auptimizer |
Framework | none |
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications
Title | Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications |
Authors | Peifeng Yu, Mosharaf Chowdhury |
Abstract | GPU computing is becoming increasingly more popular with the proliferation of deep learning (DL) applications. However, unlike traditional resources such as CPU or the network, modern GPUs do not natively support fine-grained sharing primitives. Consequently, implementing common policies such as time sharing and preemption are expensive. Worse, when a DL application cannot completely use a GPU’s resources, the GPU cannot be efficiently shared between multiple applications, leading to GPU underutilization. We present Salus to enable two GPU sharing primitives: fast job switching and memory sharing, in order to achieve fine-grained GPU sharing among multiple DL applications. Salus implements an efficient, consolidated execution service that exposes the GPU to different DL applications, and enforces fine-grained sharing by performing iteration scheduling and addressing associated memory management issues. We show that these primitives can then be used to implement flexible sharing policies such as fairness, prioritization, and packing for various use cases. Our integration of Salus with TensorFlow and evaluation on popular DL jobs show that Salus can improve the average completion time of DL training jobs by $3.19\times$, GPU utilization for hyper-parameter tuning by $2.38\times$, and GPU utilization of DL inference applications by $42\times$ over not sharing the GPU and $7\times$ over NVIDIA MPS with small overhead. |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04610v1 |
http://arxiv.org/pdf/1902.04610v1.pdf | |
PWC | https://paperswithcode.com/paper/salus-fine-grained-gpu-sharing-primitives-for |
Repo | https://github.com/SymbioticLab/Salus |
Framework | tf |