October 19, 2019

2600 words 13 mins read

Paper Group ANR 337

Ramp-based Twin Support Vector Clustering. Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches. Why every GBDT speed benchmark is wrong. Implicit Policy for Reinforcement Learning. An attention-based Bi-GRU-CapsNet model for hypernymy detection between compound entities. A Characterwise Windowed Approach to Hebrew M …

Ramp-based Twin Support Vector Clustering


Title	Ramp-based Twin Support Vector Clustering
Authors	Zhen Wang, Xu Chen, Chun-Na Li, Yuan-Hai Shao
Abstract	Traditional plane-based clustering methods measure the cost of within-cluster and between-cluster by quadratic, linear or some other unbounded functions, which may amplify the impact of cost. This letter introduces a ramp cost function into the plane-based clustering to propose a new clustering method, called ramp-based twin support vector clustering (RampTWSVC). RampTWSVC is more robust because of its boundness, and thus it is more easier to find the intrinsic clusters than other plane-based clustering methods. The non-convex programming problem in RampTWSVC is solved efficiently through an alternating iteration algorithm, and its local solution can be obtained in a finite number of iterations theoretically. In addition, the nonlinear manifold-based formation of RampTWSVC is also proposed by kernel trick. Experimental results on several benchmark datasets show the better performance of our RampTWSVC compared with other plane-based clustering methods.
Tasks
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03710v1
PDF	http://arxiv.org/pdf/1812.03710v1.pdf
PWC	https://paperswithcode.com/paper/ramp-based-twin-support-vector-clustering
Repo
Framework

Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches


Title	Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches
Authors	Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin
Abstract	As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with scenario of micro-batch (e.g., less than 10 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. In this paper, we present a novel normalization method, called Batch Kalman Normalization (BKN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, BKN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. BKN has two appealing properties. First, it enables more stable training and faster convergence compared to previous works. Second, training DNNs using BKN performs substantially better than those using BN and its variants, especially when very small mini-batches are presented. On the image classification benchmark of ImageNet, using BKN powered networks we improve upon the best-published model-zoo results: reaching 74.0% top-1 val accuracy for InceptionV2. More importantly, using BKN achieves the comparable accuracy with extremely smaller batch size, such as 64 times smaller on CIFAR-10/100 and 8 times smaller on ImageNet.
Tasks	Image Classification
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03133v2
PDF	http://arxiv.org/pdf/1802.03133v2.pdf
PWC	https://paperswithcode.com/paper/batch-kalman-normalization-towards-training
Repo
Framework

Why every GBDT speed benchmark is wrong


Title	Why every GBDT speed benchmark is wrong
Authors	Anna Veronika Dorogush, Vasily Ershov, Dmitriy Kruchinin
Abstract	This article provides a comprehensive study of different ways to make speed benchmarks of gradient boosted decision trees algorithm. We show main problems of several straight forward ways to make benchmarks, explain, why a speed benchmarking is a challenging task and provide a set of reasonable requirements for a benchmark to be fair and useful.
Tasks
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10380v1
PDF	http://arxiv.org/pdf/1810.10380v1.pdf
PWC	https://paperswithcode.com/paper/why-every-gbdt-speed-benchmark-is-wrong
Repo
Framework

Implicit Policy for Reinforcement Learning


Title	Implicit Policy for Reinforcement Learning
Authors	Yunhao Tang, Shipra Agrawal
Abstract	We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients. We empirically show that, despite its simplicity in implementation, entropy regularization combined with a rich policy class can attain desirable properties displayed under maximum entropy reinforcement learning framework, such as robustness and multi-modality.
Tasks
Published	2018-06-10
URL	http://arxiv.org/abs/1806.06798v2
PDF	http://arxiv.org/pdf/1806.06798v2.pdf
PWC	https://paperswithcode.com/paper/implicit-policy-for-reinforcement-learning
Repo
Framework

An attention-based Bi-GRU-CapsNet model for hypernymy detection between compound entities


Title	An attention-based Bi-GRU-CapsNet model for hypernymy detection between compound entities
Authors	Qi Wang, Chenming Xu, Yangming Zhou, Tong Ruan, Daqi Gao, Ping He
Abstract	Named entities are usually composable and extensible. Typical examples are names of symptoms and diseases in medical areas. To distinguish these entities from general entities, we name them \textit{compound entities}. In this paper, we present an attention-based Bi-GRU-CapsNet model to detect hypernymy relationship between compound entities. Our model consists of several important components. To avoid the out-of-vocabulary problem, English words or Chinese characters in compound entities are fed into the bidirectional gated recurrent units. An attention mechanism is designed to focus on the differences between the two compound entities. Since there are some different cases in hypernymy relationship between compound entities, capsule network is finally employed to decide whether the hypernymy relationship exists or not. Experimental results demonstrate
Tasks
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04827v3
PDF	http://arxiv.org/pdf/1805.04827v3.pdf
PWC	https://paperswithcode.com/paper/an-attention-based-bi-gru-capsnet-model-for
Repo
Framework

A Characterwise Windowed Approach to Hebrew Morphological Segmentation


Title	A Characterwise Windowed Approach to Hebrew Morphological Segmentation
Authors	Amir Zeldes
Abstract	This paper presents a novel approach to the segmentation of orthographic word forms in contemporary Hebrew, focusing purely on splitting without carrying out morphological analysis or disambiguation. Casting the analysis task as character-wise binary classification and using adjacent character and word-based lexicon-lookup features, this approach achieves over 98% accuracy on the benchmark SPMRL shared task data for Hebrew, and 97% accuracy on a new out of domain Wikipedia dataset, an improvement of ~4% and 5% over previous state of the art performance.
Tasks	Morphological Analysis
Published	2018-08-22
URL	http://arxiv.org/abs/1808.07214v2
PDF	http://arxiv.org/pdf/1808.07214v2.pdf
PWC	https://paperswithcode.com/paper/a-characterwise-windowed-approach-to-hebrew
Repo
Framework

An Online Attention-based Model for Speech Recognition


Title	An Online Attention-based Model for Speech Recognition
Authors	Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu
Abstract	Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the whole pipeline of traditional automatic speech recognition (ASR) systems and become popular in the field of speech recognition. In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism. However, bidirectional encoder and GSA are two obstacles for real-time speech recognition. In this work, we aim to stream LAS baseline by removing the above two obstacles. On the encoder side, we use a latency-controlled (LC) bidirectional structure to reduce the delay of forward computation. Meanwhile, an adaptive monotonic chunk-wise attention (AMoChA) mechanism is proposed to replace GSA for the calculation of attention weight distribution. Furthermore, we propose two methods to alleviate the huge performance degradation when combining LC and AMoChA. Finally, we successfully acquire an online LAS model, LC-AMoChA, which has only 3.5% relative performance reduction to LAS baseline on our internal Mandarin corpus.
Tasks	Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05247v2
PDF	http://arxiv.org/pdf/1811.05247v2.pdf
PWC	https://paperswithcode.com/paper/an-online-attention-based-model-for-speech
Repo
Framework

Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!


Title	Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!
Authors	Katharina Kann, Sascha Rothe, Katja Filippova
Abstract	Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own.
Tasks	Language Modelling, Text Generation
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08731v1
PDF	http://arxiv.org/pdf/1809.08731v1.pdf
PWC	https://paperswithcode.com/paper/sentence-level-fluency-evaluation-references
Repo
Framework

Mean Square Prediction Error of Misspecified Gaussian Process Models


Title	Mean Square Prediction Error of Misspecified Gaussian Process Models
Authors	Thomas Beckers, Jonas Umlauft, Sandra Hirche
Abstract	Nonparametric modeling approaches show very promising results in the area of system identification and control. A naturally provided model confidence is highly relevant for system-theoretical considerations to provide guarantees for application scenarios. Gaussian process regression represents one approach which provides such an indicator for the model confidence. However, this measure is only valid if the covariance function and its hyperparameters fit the underlying data generating process. In this paper, we derive an upper bound for the mean square prediction error of misspecified Gaussian process models based on a pseudo-concave optimization problem. We present application scenarios and a simulation to compare the derived upper bound with the true mean square error.
Tasks
Published	2018-11-16
URL	http://arxiv.org/abs/1811.06642v1
PDF	http://arxiv.org/pdf/1811.06642v1.pdf
PWC	https://paperswithcode.com/paper/mean-square-prediction-error-of-misspecified
Repo
Framework

Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution


Title	Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution
Authors	Thomas Elsken, Jan Hendrik Metzen, Frank Hutter
Abstract	Neural Architecture Search aims at automatically finding neural architectures that are competitive with architectures designed by human experts. While recent approaches have achieved state-of-the-art predictive performance for image recognition, they are problematic under resource constraints for two reasons: (1)the neural architectures found are solely optimized for high predictive performance, without penalizing excessive resource consumption, (2) most architecture search methods require vast computational resources. We address the first shortcoming by proposing LEMONADE, an evolutionary algorithm for multi-objective architecture search that allows approximating the entire Pareto-front of architectures under multiple objectives, such as predictive performance and number of parameters, in a single run of the method. We address the second shortcoming by proposing a Lamarckian inheritance mechanism for LEMONADE which generates children networks that are warmstarted with the predictive performance of their trained parents. This is accomplished by using (approximate) network morphism operators for generating children. The combination of these two contributions allows finding models that are on par or even outperform both hand-crafted as well as automatically-designed networks.
Tasks	Neural Architecture Search
Published	2018-04-24
URL	http://arxiv.org/abs/1804.09081v4
PDF	http://arxiv.org/pdf/1804.09081v4.pdf
PWC	https://paperswithcode.com/paper/efficient-multi-objective-neural-architecture
Repo
Framework

Attention Models with Random Features for Multi-layered Graph Embeddings


Title	Attention Models with Random Features for Multi-layered Graph Embeddings
Authors	Uday Shankar Shanthamallu, Jayaraman J. Thiagarajan, Huan Song, Andreas Spanias
Abstract	Modern data analysis pipelines are becoming increasingly complex due to the presence of multi-view information sources. While graphs are effective in modeling complex relationships, in many scenarios a single graph is rarely sufficient to succinctly represent all interactions, and hence multi-layered graphs have become popular. Though this leads to richer representations, extending solutions from the single-graph case is not straightforward. Consequently, there is a strong need for novel solutions to solve classical problems, such as node classification, in the multi-layered case. In this paper, we consider the problem of semi-supervised learning with multi-layered graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for community discovery, we argue that feature learning with random node attributes, using graph neural networks, can be more effective. To this end, we propose to use attention models for effective feature learning, and develop two novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer dependencies for building multi-layered graph embeddings. Using empirical studies on several benchmark datasets, we evaluate the proposed approaches and demonstrate significant performance improvements in comparison to state-of-the-art network embedding strategies. The results also show that using simple random features is an effective choice, even in cases where explicit node attributes are not available.
Tasks	Network Embedding, Node Classification
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01405v1
PDF	http://arxiv.org/pdf/1810.01405v1.pdf
PWC	https://paperswithcode.com/paper/attention-models-with-random-features-for
Repo
Framework

Self Attention Grid for Person Re-Identification


Title	Self Attention Grid for Person Re-Identification
Authors	Jean-Paul Ainam, Ke Qin, Guisong Liu
Abstract	In this paper, we present an attention mechanism scheme to improve person re-identification task. Inspired by biology, we propose Self Attention Grid (SAG) to discover the most informative parts from a high-resolution image using its internal representation. In particular, given an input image, the proposed model is fed with two copies of the same image and consists of two branches. The upper branch processes the high-resolution image and learns high dimensional feature representation while the lower branch processes the low-resolution image and learn a filtering attention grid. We apply a max filter operation to non-overlapping sub-regions on the high feature representation before element-wise multiplied with the output of the second branch. The feature maps of the second branch are subsequently weighted to reflect the importance of each patch of the grid using a softmax operation. Our attention module helps the network learn the most discriminative visual features of multiple image regions and is specifically optimized to attend feature representation at different levels. Extensive experiments on three large-scale datasets show that our self-attention mechanism significantly improves the baseline model and outperforms various state-of-art models by a large margin.
Tasks	Person Re-Identification
Published	2018-09-23
URL	http://arxiv.org/abs/1809.08556v1
PDF	http://arxiv.org/pdf/1809.08556v1.pdf
PWC	https://paperswithcode.com/paper/self-attention-grid-for-person-re
Repo
Framework

Disparity Image Segmentation For ADAS


Title	Disparity Image Segmentation For ADAS
Authors	Viktor Mukha, Inon Sharony
Abstract	We present a simple solution for segmenting grayscale images using existing Connected Component Labeling (CCL) algorithms (which are generally applied to binary images), which was efficient enough to be implemented in a constrained (embedded automotive) architecture. Our solution customizes the region growing and merging approach, and is primarily targeted for stereoscopic disparity images where nearer objects carry more relevance. We provide results from a standard OpenCV implementation for some basic cases and an image from the Tsukuba stereo-pair dataset.
Tasks	Semantic Segmentation
Published	2018-06-27
URL	http://arxiv.org/abs/1806.10350v1
PDF	http://arxiv.org/pdf/1806.10350v1.pdf
PWC	https://paperswithcode.com/paper/disparity-image-segmentation-for-adas
Repo
Framework

Active Inverse Reward Design


Title	Active Inverse Reward Design
Authors	Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell
Abstract	Designers of AI agents often iterate on the reward function in a trial-and-error process until they get the desired behavior, but this only guarantees good behavior in the training environment. We propose structuring this process as a series of queries asking the user to compare between different reward functions. Thus we can actively select queries for maximum informativeness about the true reward. In contrast to approaches asking the designer for optimal behavior, this allows us to gather additional information by eliciting preferences between suboptimal behaviors. After each query, we need to update the posterior over the true reward function from observing the proxy reward function chosen by the designer. The recently proposed Inverse Reward Design (IRD) enables this. Our approach substantially outperforms IRD in test environments. In particular, it can query the designer about interpretable, linear reward functions and still infer non-linear ones.
Tasks
Published	2018-09-09
URL	https://arxiv.org/abs/1809.03060v3
PDF	https://arxiv.org/pdf/1809.03060v3.pdf
PWC	https://paperswithcode.com/paper/active-inverse-reward-design
Repo
Framework

Index Set Fourier Series Features for Approximating Multi-dimensional Periodic Kernels


Title	Index Set Fourier Series Features for Approximating Multi-dimensional Periodic Kernels
Authors	Anthony Tompkins, Fabio Ramos
Abstract	Periodicity is often studied in timeseries modelling with autoregressive methods but is less popular in the kernel literature, particularly for higher dimensional problems such as in textures, crystallography, and quantum mechanics. Large datasets often make modelling periodicity untenable for otherwise powerful non-parametric methods like Gaussian Processes (GPs) which typically incur an $\mathcal{O}(N^3)$ computational burden and, consequently, are unable to scale to larger datasets. To this end we introduce a method termed \emph{Index Set Fourier Series Features} to tractably exploit multivariate Fourier series and efficiently decompose periodic kernels on higher-dimensional data into a series of basis functions. We show that our approximation produces significantly less predictive error than alternative approaches such as those based on random Fourier features and achieves better generalisation on regression problems with periodic data.
Tasks	Gaussian Processes
Published	2018-05-14
URL	http://arxiv.org/abs/1805.04982v1
PDF	http://arxiv.org/pdf/1805.04982v1.pdf
PWC	https://paperswithcode.com/paper/index-set-fourier-series-features-for
Repo
Framework