Paper Group NAWR 13
hULMonA: The Universal Language Model in Arabic. Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices. Planning in entropy-regularized Markov decision processes and games. Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration. Efficient Pure Exploration in Adaptive Round model. Visual …
hULMonA: The Universal Language Model in Arabic
Title | hULMonA: The Universal Language Model in Arabic |
Authors | Obeida ElJundi, Wissam Antoun, Nour El Droubi, Hazem Hajj, Wassim El-Hajj, Khaled Shaban |
Abstract | Arabic is a complex language with limited resources which makes it challenging to produce accurate text classification tasks such as sentiment analysis. The utilization of transfer learning (TL) has recently shown promising results for advancing accuracy of text classification in English. TL models are pre-trained on large corpora, and then fine-tuned on task-specific datasets. In particular, universal language models (ULMs), such as recently developed BERT, have achieved state-of-the-art results in various NLP tasks in English. In this paper, we hypothesize that similar success can be achieved for Arabic. The work aims at supporting the hypothesis by developing the first Universal Language Model in Arabic (hULMonA - حلمنا meaning our dream), demonstrating its use for Arabic classifications tasks, and demonstrating how a pre-trained multi-lingual BERT can also be used for Arabic. We then conduct a benchmark study to evaluate both ULM successes with Arabic sentiment analysis. Experiment results show that the developed hULMonA and multi-lingual ULM are able to generalize well to multiple Arabic data sets and achieve new state of the art results in Arabic Sentiment Analysis for some of the tested sets. |
Tasks | Arabic Sentiment Analysis, Language Modelling, Sentiment Analysis, Text Classification, Transfer Learning |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4608/ |
https://www.aclweb.org/anthology/W19-4608 | |
PWC | https://paperswithcode.com/paper/hulmona-the-universal-language-model-in |
Repo | https://github.com/aub-mind/hULMonA |
Framework | none |
Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices
Title | Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices |
Authors | Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Venkatesh Saligrama, Harsha Vardhan Simhadri, Prateek Jain |
Abstract | Recurrent Neural Networks (RNNs) capture long dependencies and context, and 2 hence are the key component of typical sequential data based tasks. However, the sequential nature of RNNs dictates a large inference cost for long sequences even if the hardware supports parallelization. To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs. In this architecture, the first layer splits the input sequence and runs several independent RNNs. The second layer consumes the output of the first layer using a second RNN thus capturing long dependencies. We provide theoretical justification for our architecture under weak assumptions that we verify on real-world benchmarks. Furthermore, we show that for time-series classification, our technique leads to substantially improved inference time over standard RNNs without compromising accuracy. For example, we can deploy audio-keyword classification on tiny Cortex M4 devices (100MHz processor, 256KB RAM, no DSP available) which was not possible using standard RNN models. Similarly, using SRNN in the popular Listen-Attend-Spell (LAS) architecture for phoneme classification [4], we can reduce the lag inphoneme classification by 10-12x while maintaining state-of-the-art accuracy. |
Tasks | Time Series, Time Series Classification |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9451-shallow-rnn-accurate-time-series-classification-on-resource-constrained-devices |
http://papers.nips.cc/paper/9451-shallow-rnn-accurate-time-series-classification-on-resource-constrained-devices.pdf | |
PWC | https://paperswithcode.com/paper/shallow-rnn-accurate-time-series |
Repo | https://github.com/Microsoft/EdgeML |
Framework | tf |
Planning in entropy-regularized Markov decision processes and games
Title | Planning in entropy-regularized Markov decision processes and games |
Authors | Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko |
Abstract | We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9405-planning-in-entropy-regularized-markov-decision-processes-and-games |
http://papers.nips.cc/paper/9405-planning-in-entropy-regularized-markov-decision-processes-and-games.pdf | |
PWC | https://paperswithcode.com/paper/planning-in-entropy-regularized-markov |
Repo | https://github.com/omardrwch/smoothcruiser-check |
Framework | none |
Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration
Title | Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration |
Authors | Jianchun Chen, Lingjing Wang, Xiang Li, Yi Fang |
Abstract | This paper concerns the undetermined problem of estimating geometric transformation between image pairs. Recent methods introduce deep neural networks to predict the controlling parameters of hand-crafted geometric transformation models (e.g. thin-plate spline) for image registration and matching. However, the low-dimension parametric models are incapable of estimating a highly complex geometric transform with limited flexibility to model the actual geometric deformation from image pairs. To address this issue, we present an end-to-end trainable deep neural networks, named Arbitrary Continuous Geometric Transformation Networks (Arbicon-Net), to directly predict the dense displacement field for pairwise image alignment. Arbicon-Net is generalized from training data to predict the desired arbitrary continuous geometric transformation in a data-driven manner for unseen new pair of images. Particularly, without imposing penalization terms, the predicted displacement vector function is proven to be spatially continuous and smooth. To verify the performance of Arbicon-Net, we conducted semantic alignment tests over both synthetic and real image dataset with various experimental settings. The results demonstrate that Arbicon-Net outperforms the previous image alignment techniques in identifying the image correspondences. |
Tasks | Image Registration |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8602-arbicon-net-arbitrary-continuous-geometric-transformation-networks-for-image-registration |
http://papers.nips.cc/paper/8602-arbicon-net-arbitrary-continuous-geometric-transformation-networks-for-image-registration.pdf | |
PWC | https://paperswithcode.com/paper/arbicon-net-arbitrary-continuous-geometric |
Repo | https://github.com/nyummvc/Arbicon-Net |
Framework | pytorch |
Efficient Pure Exploration in Adaptive Round model
Title | Efficient Pure Exploration in Adaptive Round model |
Authors | Tianyuan Jin, Jieming Shi, Xiaokui Xiao, Enhong Chen |
Abstract | In the adaptive setting, many multi-armed bandit applications allow the learner to adaptively draw samples and adjust sampling strategy in rounds. In many real applications, not only the query complexity but also the round complexity need to be optimized. In this paper, we study both PAC and exact top-$k$ arm identification problems and design efficient algorithms considering both round complexity and query complexity. For PAC problem, we achieve optimal query complexity and use only $O(\log_{\frac{k}{\delta}}^*(n))$ rounds, which matches the lower bound of round complexity, while most of existing works need $\Theta(\log \frac{n}{k})$ rounds. For exact top-$k$ arm identification, we improve the round complexity factor from $\log n$ to $\log_{\frac{1}{\delta}}^*(n)$, and achieve near optimal query complexity. In experiments, our algorithms conduct far fewer rounds, and outperform state of the art by orders of magnitude with respect to query cost. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8887-efficient-pure-exploration-in-adaptive-round-model |
http://papers.nips.cc/paper/8887-efficient-pure-exploration-in-adaptive-round-model.pdf | |
PWC | https://paperswithcode.com/paper/efficient-pure-exploration-in-adaptive-round |
Repo | https://github.com/jmshi123/mab-nips-2019 |
Framework | none |
Visual Tracking via Adaptive Spatially-Regularized Correlation Filters
Title | Visual Tracking via Adaptive Spatially-Regularized Correlation Filters |
Authors | Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, Jianhua Li |
Abstract | In this work, we propose a novel adaptive spatially-regularized correlation filters (ASRCF) model to simultaneously optimize the filter coefficients and the spatial regularization weight. First, this adaptive spatial regularization scheme could learn an effective spatial weight for a specific object and its appearance variations, and therefore result in more reliable filter coefficients during the tracking process. Second, our ASRCF model can be effectively optimized based on the alternating direction method of multipliers, where each subproblem has the closed-from solution. Third, our tracker applies two kinds of CF models to estimate the location and scale respectively. The location CF model exploits ensembles of shallow and deep features to determine the optimal position accurately. The scale CF model works on multi-scale shallow features to estimate the optimal scale efficiently. Extensive experiments on five recent benchmarks show that our tracker performs favorably against many state-of-the-art algorithms, with real-time performance of 28fps. |
Tasks | Visual Tracking |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Dai_Visual_Tracking_via_Adaptive_Spatially-Regularized_Correlation_Filters_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Dai_Visual_Tracking_via_Adaptive_Spatially-Regularized_Correlation_Filters_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/visual-tracking-via-adaptive-spatially |
Repo | https://github.com/Daikenan/ASRCF |
Framework | none |
Katib: A Distributed General AutoML Platform on Kubernetes
Title | Katib: A Distributed General AutoML Platform on Kubernetes |
Authors | Jinan Zhou, Andrey Velichkevich, Kirill Prosvirov, Anubhav Garg, Yuji Oshima, Debo Dutta |
Abstract | Automatic Machine Learning (AutoML) is a powerful mechanism to design and tune models. We present Katib, a scalable Kubernetes-native general AutoML platform that can support a range of AutoML algorithms including both hyper-parameter tuning and neural architecture search. The system is divided into separate components, encapsulated as micro-services. Each micro-service operates within a Kubernetes pod and communicates with others via well-defined APIs, thus allowing flexible management and scalable deployment at a minimal cost. Together with a powerful user interface, Katib provides a universal platform for researchers as well as enterprises to try, compare and deploy their AutoML algorithms, on any Kubernetes platform. |
Tasks | AutoML, Hyperparameter Optimization, Neural Architecture Search |
Published | 2019-01-01 |
URL | https://www.usenix.org/conference/opml19/presentation/zhou |
https://www.usenix.org/system/files/opml19papers-zhou.pdf | |
PWC | https://paperswithcode.com/paper/katib-a-distributed-general-automl-platform |
Repo | https://github.com/kubeflow/katib |
Framework | pytorch |
AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism
Title | AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism |
Authors | Jingjia Huang, Zhangheng Li, Nannan Li, Shan Liu, Ge Li |
Abstract | Graph convolutional networks (GCNs) are potentially short of the ability to learn hierarchical representation for graph embedding, which holds them back in the graph classification task. Here, we propose AttPool, which is a novel graph pooling module based on attention mechanism, to remedy the problem. It is able to select nodes that are significant for graph representation adaptively, and generate hierarchical features via aggregating the attention-weighted information in nodes. Additionally, we devise a hierarchical prediction architecture to sufficiently leverage the hierarchical representation and facilitate the model learning. The AttPool module together with the entire training structure can be integrated into existing GCNs, and is trained in an end-to-end fashion conveniently. The experimental results on several graph-classification benchmark datasets with various scales demonstrate the effectiveness of our method. |
Tasks | Graph Classification, Graph Embedding |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Huang_AttPool_Towards_Hierarchical_Feature_Representation_in_Graph_Convolutional_Networks_via_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Huang_AttPool_Towards_Hierarchical_Feature_Representation_in_Graph_Convolutional_Networks_via_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/attpool-towards-hierarchical-feature |
Repo | https://github.com/hjjpku/Attention_in_Graph |
Framework | pytorch |
Neural Lyapunov Control
Title | Neural Lyapunov Control |
Authors | Ya-Chien Chang, Nima Roohi, Sicun Gao |
Abstract | We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging robot control problems such as path tracking for wheeled vehicles and humanoid robot balancing. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8587-neural-lyapunov-control |
http://papers.nips.cc/paper/8587-neural-lyapunov-control.pdf | |
PWC | https://paperswithcode.com/paper/neural-lyapunov-control |
Repo | https://github.com/YaChienChang/Neural-Lyapunov-Control |
Framework | pytorch |
When Color Constancy Goes Wrong: Correcting Improperly White-Balanced Images
Title | When Color Constancy Goes Wrong: Correcting Improperly White-Balanced Images |
Authors | Mahmoud Afifi, Brian Price, Scott Cohen, Michael S. Brown |
Abstract | This paper focuses on correcting a camera image that has been improperly white-balanced. This situation occurs when a camera’s auto white balance fails or when the wrong manual white-balance setting is used. Even after decades of computational color constancy research, there are no effective solutions to this problem. The challenge lies not in identifying what the correct white balance should have been, but in the fact that the in-camera white-balance procedure is followed by several camera-specific nonlinear color manipulations that make it challenging to correct the image’s colors in post-processing. This paper introduces the first method to explicitly address this problem. Our method is enabled by a dataset of over 65,000 pairs of incorrectly white-balanced images and their corresponding correctly white-balanced images. Using this dataset, we introduce a k-nearest neighbor strategy that is able to compute a nonlinear color mapping function to correct the image’s colors. We show our method is highly effective and generalizes well to camera models not in the training set. |
Tasks | Color Constancy |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Afifi_When_Color_Constancy_Goes_Wrong_Correcting_Improperly_White-Balanced_Images_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Afifi_When_Color_Constancy_Goes_Wrong_Correcting_Improperly_White-Balanced_Images_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/when-color-constancy-goes-wrong-correcting |
Repo | https://github.com/mahmoudnafifi/WB_sRGB |
Framework | none |
RelGAN: Relational Generative Adversarial Networks for Text Generation
Title | RelGAN: Relational Generative Adversarial Networks for Text Generation |
Authors | Weili Nie, Nina Narodytska, Ankit Patel |
Abstract | Generative adversarial networks (GANs) have achieved great success at generating realistic images. However, the text generation still remains a challenging task for modern GAN architectures. In this work, we propose RelGAN, a new GAN architecture for text generation, consisting of three main components: a relational memory based generator for the long-distance dependency modeling, the Gumbel-Softmax relaxation for training GANs on discrete data, and multiple embedded representations in the discriminator to provide a more informative signal for the generator updates. Our experiments show that RelGAN outperforms current state-of-the-art models in terms of sample quality and diversity, and we also reveal via ablation studies that each component of RelGAN contributes critically to its performance improvements. Moreover, a key advantage of our method, that distinguishes it from other GANs, is the ability to control the trade-off between sample quality and diversity via the use of a single adjustable parameter. Finally, RelGAN is the first architecture that makes GANs with Gumbel-Softmax relaxation succeed in generating realistic text. |
Tasks | Text Generation |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rJedV3R5tm |
https://openreview.net/pdf?id=rJedV3R5tm | |
PWC | https://paperswithcode.com/paper/relgan-relational-generative-adversarial |
Repo | https://github.com/weilinie/RelGAN |
Framework | tf |
Multi-task Temporal Convolutional Network for Predicting Water Quality Sensor
Title | Multi-task Temporal Convolutional Network for Predicting Water Quality Sensor |
Authors | Zhang, Yifan; Thorburn, Peter; Fitch, Peter |
Abstract | Predicting the trend of water quality is essential in environmental management decision support systems. Despite various data-driven models in water quality prediction, most studies focus on predicting a single water quality variable. When multiple water quality variables need to be estimated, preparing several data-driven models may require unaffordable computing resources. Also, the changing patterns of several water quality variables can only be revealed by processing long term historical observations, which is not well supported by conventional data-driven models. In this paper, we propose a multi-task temporal convolution network (MTCN) for predicting multiple water quality variables. The temporal convolution offers one the capability to explore the temporal dependencies among a remarkably long historical period. Furthermore, instead of providing predictions for only one water quality variable, the MTCN is designed to predict multiple water quality variables simultaneously. Data collected from the Burnett River, Queensland is used to evaluate the MTCN. Compared to training a set of single-task TCNs for each variable separately, the proposed MTCN achieves the best RMSE scores in predicting both temperature and DO in the following 48 time steps but only requires 53% of the total training time of the TCN. Therefore, the MTCN is an encouraging approach for water quality management by processing a large amount of sensor data. |
Tasks | Time Series Prediction |
Published | 2019-12-05 |
URL | https://link.springer.com/chapter/10.1007/978-3-030-36808-1_14#citeas |
https://www.ivivan.com/papers/ICONIP2019.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-temporal-convolutional-network-for |
Repo | https://github.com/ivivan/MTCN |
Framework | tf |
Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function
Title | Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function |
Authors | Devendra Singh Sachan, Manzil Zaheer, Ruslan Salakhutdinov |
Abstract | In this paper, we do a careful study of a bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. In prior work, it has been reported that in order to get good classification accuracy using LSTMmodels for text classification task, pretraining the LSTM model parameters using unsupervised learning methods such as language modeling or sequence auto-encoder is necessary [2, 20]. However, we find that our simple model, when trained with cross-entropy loss is able to achieve competitive results compared with the more complex models. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report new state-of-the-art results for text classification task on four benchmark datasets. In particular, on ACL-IMDB sentiment analysis and AG-News topic classification datasets, our method outperforms current approaches by a substantial margin. |
Tasks | Language Modelling, Sentiment Analysis, Text Classification |
Published | 2019-02-01 |
URL | https://www.semanticscholar.org/paper/Revisiting-LSTM-Networks-for-Semi-Supervised-Text-Sachan-Petuum/c3f89364aecd661eb032840d2fe3efd0f6d1698c |
https://www.aaai.org/Papers/AAAI/2019/AAAI-SachanD.7236.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-lstm-networks-for-semi-supervised |
Repo | https://github.com/DevSinghSachan/ssl_text_classification |
Framework | none |
Examining Hyperparameters of Neural Networks Trained Using Local Search
Title | Examining Hyperparameters of Neural Networks Trained Using Local Search |
Authors | Ahmed Aly, Gianluca Guadagni, Joanne Bechta Dugan |
Abstract | Deep neural networks (DNNs) have been found useful for many applications. However, training and designing those networks can be challenging and is considered more of an art or an engineering process than rigorous science. In this regard, the important process of choosing hyperparameters is relevant. In addition, training neural networks with derivative-free methods is somewhat understudied. Particularly, with regards to hyperparameter selection. The paper presents a small-scale study of 3 hyperparam-eters choice for convolutional neural networks (CNNs). The networks were trained with two single-candidate optimization algorithms: Stochastic Gradient Descent (derivative-based) and Local Search (derivative-free). The CNN is trained on a subset of the FashionMNIST dataset. Experimental results show that hyperparameter selection can be detrimental for Local Search, especially regarding network parametrization. Moreover, the best hyperparameter choices didn’t match for both algorithms. Future investigation into the training dynamics of Local Search is likely needed. |
Tasks | |
Published | 2019-12-10 |
URL | https://www.researchgate.net/publication/338501734_Examining_Hyperparameters_of_Neural_Networks_Trained_Using_Local_Search |
https://www.researchgate.net/publication/338501734_Examining_Hyperparameters_of_Neural_Networks_Trained_Using_Local_Search | |
PWC | https://paperswithcode.com/paper/examining-hyperparameters-of-neural-networks |
Repo | https://github.com/AroMorin/DNNOP |
Framework | pytorch |
Expressive power of tensor-network factorizations for probabilistic modeling
Title | Expressive power of tensor-network factorizations for probabilistic modeling |
Authors | Ivan Glasser, Ryan Sweke, Nicola Pancotti, Jens Eisert, Ignacio Cirac |
Abstract | Tensor-network techniques have recently proven useful in machine learning, both as a tool for the formulation of new learning algorithms and for enhancing the mathematical understanding of existing methods. Inspired by these developments, and the natural correspondence between tensor networks and probabilistic graphical models, we provide a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions. These factorizations include non-negative tensor-trains/MPS, which are in correspondence with hidden Markov models, and Born machines, which are naturally related to the probabilistic interpretation of quantum circuits. When used to model probability distributions, they exhibit tractable likelihoods and admit efficient learning algorithms. Interestingly, we prove that there exist probability distributions for which there are unbounded separations between the resource requirements of some of these tensor-network factorizations. Of particular interest, using complex instead of real tensors can lead to an arbitrarily large reduction in the number of parameters of the network. Additionally, we introduce locally purified states (LPS), a new factorization inspired by techniques for the simulation of quantum systems, with provably better expressive power than all other representations considered. The ramifications of this result are explored through numerical experiments. |
Tasks | Tensor Networks |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling |
http://papers.nips.cc/paper/8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling.pdf | |
PWC | https://paperswithcode.com/paper/expressive-power-of-tensor-network-1 |
Repo | https://github.com/glivan/tensor_networks_for_probabilistic_modeling |
Framework | none |