January 25, 2020

2984 words 15 mins read

Paper Group NAWR 13

hULMonA: The Universal Language Model in Arabic. Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices. Planning in entropy-regularized Markov decision processes and games. Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration. Efficient Pure Exploration in Adaptive Round model. Visual …

hULMonA: The Universal Language Model in Arabic


Title	hULMonA: The Universal Language Model in Arabic
Authors	Obeida ElJundi, Wissam Antoun, Nour El Droubi, Hazem Hajj, Wassim El-Hajj, Khaled Shaban
Abstract	Arabic is a complex language with limited resources which makes it challenging to produce accurate text classification tasks such as sentiment analysis. The utilization of transfer learning (TL) has recently shown promising results for advancing accuracy of text classification in English. TL models are pre-trained on large corpora, and then fine-tuned on task-specific datasets. In particular, universal language models (ULMs), such as recently developed BERT, have achieved state-of-the-art results in various NLP tasks in English. In this paper, we hypothesize that similar success can be achieved for Arabic. The work aims at supporting the hypothesis by developing the first Universal Language Model in Arabic (hULMonA - حلمنا meaning our dream), demonstrating its use for Arabic classifications tasks, and demonstrating how a pre-trained multi-lingual BERT can also be used for Arabic. We then conduct a benchmark study to evaluate both ULM successes with Arabic sentiment analysis. Experiment results show that the developed hULMonA and multi-lingual ULM are able to generalize well to multiple Arabic data sets and achieve new state of the art results in Arabic Sentiment Analysis for some of the tested sets.
Tasks	Arabic Sentiment Analysis, Language Modelling, Sentiment Analysis, Text Classification, Transfer Learning
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-4608/
PDF	https://www.aclweb.org/anthology/W19-4608
PWC	https://paperswithcode.com/paper/hulmona-the-universal-language-model-in
Repo	https://github.com/aub-mind/hULMonA
Framework	none

Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices


Title	Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices
Authors	Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Venkatesh Saligrama, Harsha Vardhan Simhadri, Prateek Jain
Abstract	Recurrent Neural Networks (RNNs) capture long dependencies and context, and 2 hence are the key component of typical sequential data based tasks. However, the sequential nature of RNNs dictates a large inference cost for long sequences even if the hardware supports parallelization. To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs. In this architecture, the first layer splits the input sequence and runs several independent RNNs. The second layer consumes the output of the first layer using a second RNN thus capturing long dependencies. We provide theoretical justification for our architecture under weak assumptions that we verify on real-world benchmarks. Furthermore, we show that for time-series classification, our technique leads to substantially improved inference time over standard RNNs without compromising accuracy. For example, we can deploy audio-keyword classification on tiny Cortex M4 devices (100MHz processor, 256KB RAM, no DSP available) which was not possible using standard RNN models. Similarly, using SRNN in the popular Listen-Attend-Spell (LAS) architecture for phoneme classification [4], we can reduce the lag inphoneme classification by 10-12x while maintaining state-of-the-art accuracy.
Tasks	Time Series, Time Series Classification
Published	2019-12-01
URL	http://papers.nips.cc/paper/9451-shallow-rnn-accurate-time-series-classification-on-resource-constrained-devices
PDF	http://papers.nips.cc/paper/9451-shallow-rnn-accurate-time-series-classification-on-resource-constrained-devices.pdf
PWC	https://paperswithcode.com/paper/shallow-rnn-accurate-time-series
Repo	https://github.com/Microsoft/EdgeML
Framework	tf

Planning in entropy-regularized Markov decision processes and games


Title	Planning in entropy-regularized Markov decision processes and games
Authors	Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko
Abstract	We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9405-planning-in-entropy-regularized-markov-decision-processes-and-games
PDF	http://papers.nips.cc/paper/9405-planning-in-entropy-regularized-markov-decision-processes-and-games.pdf
PWC	https://paperswithcode.com/paper/planning-in-entropy-regularized-markov
Repo	https://github.com/omardrwch/smoothcruiser-check
Framework	none

Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration


Title	Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration
Authors	Jianchun Chen, Lingjing Wang, Xiang Li, Yi Fang
Abstract	This paper concerns the undetermined problem of estimating geometric transformation between image pairs. Recent methods introduce deep neural networks to predict the controlling parameters of hand-crafted geometric transformation models (e.g. thin-plate spline) for image registration and matching. However, the low-dimension parametric models are incapable of estimating a highly complex geometric transform with limited flexibility to model the actual geometric deformation from image pairs. To address this issue, we present an end-to-end trainable deep neural networks, named Arbitrary Continuous Geometric Transformation Networks (Arbicon-Net), to directly predict the dense displacement field for pairwise image alignment. Arbicon-Net is generalized from training data to predict the desired arbitrary continuous geometric transformation in a data-driven manner for unseen new pair of images. Particularly, without imposing penalization terms, the predicted displacement vector function is proven to be spatially continuous and smooth. To verify the performance of Arbicon-Net, we conducted semantic alignment tests over both synthetic and real image dataset with various experimental settings. The results demonstrate that Arbicon-Net outperforms the previous image alignment techniques in identifying the image correspondences.
Tasks	Image Registration
Published	2019-12-01
URL	http://papers.nips.cc/paper/8602-arbicon-net-arbitrary-continuous-geometric-transformation-networks-for-image-registration
PDF	http://papers.nips.cc/paper/8602-arbicon-net-arbitrary-continuous-geometric-transformation-networks-for-image-registration.pdf
PWC	https://paperswithcode.com/paper/arbicon-net-arbitrary-continuous-geometric
Repo	https://github.com/nyummvc/Arbicon-Net
Framework	pytorch

Efficient Pure Exploration in Adaptive Round model


Title	Efficient Pure Exploration in Adaptive Round model
Authors	Tianyuan Jin, Jieming Shi, Xiaokui Xiao, Enhong Chen
Abstract	In the adaptive setting, many multi-armed bandit applications allow the learner to adaptively draw samples and adjust sampling strategy in rounds. In many real applications, not only the query complexity but also the round complexity need to be optimized. In this paper, we study both PAC and exact top-$k$ arm identification problems and design efficient algorithms considering both round complexity and query complexity. For PAC problem, we achieve optimal query complexity and use only $O(\log_{\frac{k}{\delta}}^(n))$ rounds, which matches the lower bound of round complexity, while most of existing works need $\Theta(\log \frac{n}{k})$ rounds. For exact top-$k$ arm identification, we improve the round complexity factor from $\log n$ to $\log_{\frac{1}{\delta}}^(n)$, and achieve near optimal query complexity. In experiments, our algorithms conduct far fewer rounds, and outperform state of the art by orders of magnitude with respect to query cost.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8887-efficient-pure-exploration-in-adaptive-round-model
PDF	http://papers.nips.cc/paper/8887-efficient-pure-exploration-in-adaptive-round-model.pdf
PWC	https://paperswithcode.com/paper/efficient-pure-exploration-in-adaptive-round
Repo	https://github.com/jmshi123/mab-nips-2019
Framework	none

Visual Tracking via Adaptive Spatially-Regularized Correlation Filters


Title	Visual Tracking via Adaptive Spatially-Regularized Correlation Filters
Authors	Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, Jianhua Li
Abstract	In this work, we propose a novel adaptive spatially-regularized correlation filters (ASRCF) model to simultaneously optimize the filter coefficients and the spatial regularization weight. First, this adaptive spatial regularization scheme could learn an effective spatial weight for a specific object and its appearance variations, and therefore result in more reliable filter coefficients during the tracking process. Second, our ASRCF model can be effectively optimized based on the alternating direction method of multipliers, where each subproblem has the closed-from solution. Third, our tracker applies two kinds of CF models to estimate the location and scale respectively. The location CF model exploits ensembles of shallow and deep features to determine the optimal position accurately. The scale CF model works on multi-scale shallow features to estimate the optimal scale efficiently. Extensive experiments on five recent benchmarks show that our tracker performs favorably against many state-of-the-art algorithms, with real-time performance of 28fps.
Tasks	Visual Tracking
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Dai_Visual_Tracking_via_Adaptive_Spatially-Regularized_Correlation_Filters_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Dai_Visual_Tracking_via_Adaptive_Spatially-Regularized_Correlation_Filters_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/visual-tracking-via-adaptive-spatially
Repo	https://github.com/Daikenan/ASRCF
Framework	none

Katib: A Distributed General AutoML Platform on Kubernetes


Title	Katib: A Distributed General AutoML Platform on Kubernetes
Authors	Jinan Zhou, Andrey Velichkevich, Kirill Prosvirov, Anubhav Garg, Yuji Oshima, Debo Dutta
Abstract	Automatic Machine Learning (AutoML) is a powerful mechanism to design and tune models. We present Katib, a scalable Kubernetes-native general AutoML platform that can support a range of AutoML algorithms including both hyper-parameter tuning and neural architecture search. The system is divided into separate components, encapsulated as micro-services. Each micro-service operates within a Kubernetes pod and communicates with others via well-defined APIs, thus allowing flexible management and scalable deployment at a minimal cost. Together with a powerful user interface, Katib provides a universal platform for researchers as well as enterprises to try, compare and deploy their AutoML algorithms, on any Kubernetes platform.
Tasks	AutoML, Hyperparameter Optimization, Neural Architecture Search
Published	2019-01-01
URL	https://www.usenix.org/conference/opml19/presentation/zhou
PDF	https://www.usenix.org/system/files/opml19papers-zhou.pdf
PWC	https://paperswithcode.com/paper/katib-a-distributed-general-automl-platform
Repo	https://github.com/kubeflow/katib
Framework	pytorch

AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism


Title	AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism
Authors	Jingjia Huang, Zhangheng Li, Nannan Li, Shan Liu, Ge Li
Abstract	Graph convolutional networks (GCNs) are potentially short of the ability to learn hierarchical representation for graph embedding, which holds them back in the graph classification task. Here, we propose AttPool, which is a novel graph pooling module based on attention mechanism, to remedy the problem. It is able to select nodes that are significant for graph representation adaptively, and generate hierarchical features via aggregating the attention-weighted information in nodes. Additionally, we devise a hierarchical prediction architecture to sufficiently leverage the hierarchical representation and facilitate the model learning. The AttPool module together with the entire training structure can be integrated into existing GCNs, and is trained in an end-to-end fashion conveniently. The experimental results on several graph-classification benchmark datasets with various scales demonstrate the effectiveness of our method.
Tasks	Graph Classification, Graph Embedding
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Huang_AttPool_Towards_Hierarchical_Feature_Representation_in_Graph_Convolutional_Networks_via_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Huang_AttPool_Towards_Hierarchical_Feature_Representation_in_Graph_Convolutional_Networks_via_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/attpool-towards-hierarchical-feature
Repo	https://github.com/hjjpku/Attention_in_Graph
Framework	pytorch

Neural Lyapunov Control


Title	Neural Lyapunov Control
Authors	Ya-Chien Chang, Nima Roohi, Sicun Gao
Abstract	We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging robot control problems such as path tracking for wheeled vehicles and humanoid robot balancing.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8587-neural-lyapunov-control
PDF	http://papers.nips.cc/paper/8587-neural-lyapunov-control.pdf
PWC	https://paperswithcode.com/paper/neural-lyapunov-control
Repo	https://github.com/YaChienChang/Neural-Lyapunov-Control
Framework	pytorch

When Color Constancy Goes Wrong: Correcting Improperly White-Balanced Images


Title	When Color Constancy Goes Wrong: Correcting Improperly White-Balanced Images
Authors	Mahmoud Afifi, Brian Price, Scott Cohen, Michael S. Brown
Abstract	This paper focuses on correcting a camera image that has been improperly white-balanced. This situation occurs when a camera’s auto white balance fails or when the wrong manual white-balance setting is used. Even after decades of computational color constancy research, there are no effective solutions to this problem. The challenge lies not in identifying what the correct white balance should have been, but in the fact that the in-camera white-balance procedure is followed by several camera-specific nonlinear color manipulations that make it challenging to correct the image’s colors in post-processing. This paper introduces the first method to explicitly address this problem. Our method is enabled by a dataset of over 65,000 pairs of incorrectly white-balanced images and their corresponding correctly white-balanced images. Using this dataset, we introduce a k-nearest neighbor strategy that is able to compute a nonlinear color mapping function to correct the image’s colors. We show our method is highly effective and generalizes well to camera models not in the training set.
Tasks	Color Constancy
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Afifi_When_Color_Constancy_Goes_Wrong_Correcting_Improperly_White-Balanced_Images_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Afifi_When_Color_Constancy_Goes_Wrong_Correcting_Improperly_White-Balanced_Images_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/when-color-constancy-goes-wrong-correcting
Repo	https://github.com/mahmoudnafifi/WB_sRGB
Framework	none

RelGAN: Relational Generative Adversarial Networks for Text Generation


Title	RelGAN: Relational Generative Adversarial Networks for Text Generation
Authors	Weili Nie, Nina Narodytska, Ankit Patel
Abstract	Generative adversarial networks (GANs) have achieved great success at generating realistic images. However, the text generation still remains a challenging task for modern GAN architectures. In this work, we propose RelGAN, a new GAN architecture for text generation, consisting of three main components: a relational memory based generator for the long-distance dependency modeling, the Gumbel-Softmax relaxation for training GANs on discrete data, and multiple embedded representations in the discriminator to provide a more informative signal for the generator updates. Our experiments show that RelGAN outperforms current state-of-the-art models in terms of sample quality and diversity, and we also reveal via ablation studies that each component of RelGAN contributes critically to its performance improvements. Moreover, a key advantage of our method, that distinguishes it from other GANs, is the ability to control the trade-off between sample quality and diversity via the use of a single adjustable parameter. Finally, RelGAN is the first architecture that makes GANs with Gumbel-Softmax relaxation succeed in generating realistic text.
Tasks	Text Generation
Published	2019-05-01
URL	https://openreview.net/forum?id=rJedV3R5tm
PDF	https://openreview.net/pdf?id=rJedV3R5tm
PWC	https://paperswithcode.com/paper/relgan-relational-generative-adversarial
Repo	https://github.com/weilinie/RelGAN
Framework	tf

Multi-task Temporal Convolutional Network for Predicting Water Quality Sensor


Title	Multi-task Temporal Convolutional Network for Predicting Water Quality Sensor
Authors	Zhang, Yifan; Thorburn, Peter; Fitch, Peter
Abstract	Predicting the trend of water quality is essential in environmental management decision support systems. Despite various data-driven models in water quality prediction, most studies focus on predicting a single water quality variable. When multiple water quality variables need to be estimated, preparing several data-driven models may require unaffordable computing resources. Also, the changing patterns of several water quality variables can only be revealed by processing long term historical observations, which is not well supported by conventional data-driven models. In this paper, we propose a multi-task temporal convolution network (MTCN) for predicting multiple water quality variables. The temporal convolution offers one the capability to explore the temporal dependencies among a remarkably long historical period. Furthermore, instead of providing predictions for only one water quality variable, the MTCN is designed to predict multiple water quality variables simultaneously. Data collected from the Burnett River, Queensland is used to evaluate the MTCN. Compared to training a set of single-task TCNs for each variable separately, the proposed MTCN achieves the best RMSE scores in predicting both temperature and DO in the following 48 time steps but only requires 53% of the total training time of the TCN. Therefore, the MTCN is an encouraging approach for water quality management by processing a large amount of sensor data.
Tasks	Time Series Prediction
Published	2019-12-05
URL	https://link.springer.com/chapter/10.1007/978-3-030-36808-1_14#citeas
PDF	https://www.ivivan.com/papers/ICONIP2019.pdf
PWC	https://paperswithcode.com/paper/multi-task-temporal-convolutional-network-for
Repo	https://github.com/ivivan/MTCN
Framework	tf

Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function


Title	Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function
Authors	Devendra Singh Sachan, Manzil Zaheer, Ruslan Salakhutdinov
Abstract	In this paper, we do a careful study of a bidirectional LSTM network for the task of text classification using both supervised and semi-supervised approaches. In prior work, it has been reported that in order to get good classification accuracy using LSTMmodels for text classification task, pretraining the LSTM model parameters using unsupervised learning methods such as language modeling or sequence auto-encoder is necessary [2, 20]. However, we find that our simple model, when trained with cross-entropy loss is able to achieve competitive results compared with the more complex models. Furthermore, in addition to cross-entropy loss, by using a combination of entropy minimization, adversarial, and virtual adversarial losses for both labeled and unlabeled data, we report new state-of-the-art results for text classification task on four benchmark datasets. In particular, on ACL-IMDB sentiment analysis and AG-News topic classification datasets, our method outperforms current approaches by a substantial margin.
Tasks	Language Modelling, Sentiment Analysis, Text Classification
Published	2019-02-01
URL	https://www.semanticscholar.org/paper/Revisiting-LSTM-Networks-for-Semi-Supervised-Text-Sachan-Petuum/c3f89364aecd661eb032840d2fe3efd0f6d1698c
PDF	https://www.aaai.org/Papers/AAAI/2019/AAAI-SachanD.7236.pdf
PWC	https://paperswithcode.com/paper/revisiting-lstm-networks-for-semi-supervised
Repo	https://github.com/DevSinghSachan/ssl_text_classification
Framework	none

Examining Hyperparameters of Neural Networks Trained Using Local Search


Title	Examining Hyperparameters of Neural Networks Trained Using Local Search
Authors	Ahmed Aly, Gianluca Guadagni, Joanne Bechta Dugan
Abstract	Deep neural networks (DNNs) have been found useful for many applications. However, training and designing those networks can be challenging and is considered more of an art or an engineering process than rigorous science. In this regard, the important process of choosing hyperparameters is relevant. In addition, training neural networks with derivative-free methods is somewhat understudied. Particularly, with regards to hyperparameter selection. The paper presents a small-scale study of 3 hyperparam-eters choice for convolutional neural networks (CNNs). The networks were trained with two single-candidate optimization algorithms: Stochastic Gradient Descent (derivative-based) and Local Search (derivative-free). The CNN is trained on a subset of the FashionMNIST dataset. Experimental results show that hyperparameter selection can be detrimental for Local Search, especially regarding network parametrization. Moreover, the best hyperparameter choices didn’t match for both algorithms. Future investigation into the training dynamics of Local Search is likely needed.
Tasks
Published	2019-12-10
URL	https://www.researchgate.net/publication/338501734_Examining_Hyperparameters_of_Neural_Networks_Trained_Using_Local_Search
PDF	https://www.researchgate.net/publication/338501734_Examining_Hyperparameters_of_Neural_Networks_Trained_Using_Local_Search
PWC	https://paperswithcode.com/paper/examining-hyperparameters-of-neural-networks
Repo	https://github.com/AroMorin/DNNOP
Framework	pytorch

Expressive power of tensor-network factorizations for probabilistic modeling


Title	Expressive power of tensor-network factorizations for probabilistic modeling
Authors	Ivan Glasser, Ryan Sweke, Nicola Pancotti, Jens Eisert, Ignacio Cirac
Abstract	Tensor-network techniques have recently proven useful in machine learning, both as a tool for the formulation of new learning algorithms and for enhancing the mathematical understanding of existing methods. Inspired by these developments, and the natural correspondence between tensor networks and probabilistic graphical models, we provide a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions. These factorizations include non-negative tensor-trains/MPS, which are in correspondence with hidden Markov models, and Born machines, which are naturally related to the probabilistic interpretation of quantum circuits. When used to model probability distributions, they exhibit tractable likelihoods and admit efficient learning algorithms. Interestingly, we prove that there exist probability distributions for which there are unbounded separations between the resource requirements of some of these tensor-network factorizations. Of particular interest, using complex instead of real tensors can lead to an arbitrarily large reduction in the number of parameters of the network. Additionally, we introduce locally purified states (LPS), a new factorization inspired by techniques for the simulation of quantum systems, with provably better expressive power than all other representations considered. The ramifications of this result are explored through numerical experiments.
Tasks	Tensor Networks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling
PDF	http://papers.nips.cc/paper/8429-expressive-power-of-tensor-network-factorizations-for-probabilistic-modeling.pdf
PWC	https://paperswithcode.com/paper/expressive-power-of-tensor-network-1
Repo	https://github.com/glivan/tensor_networks_for_probabilistic_modeling
Framework	none