Paper Group ANR 337
Ramp-based Twin Support Vector Clustering. Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches. Why every GBDT speed benchmark is wrong. Implicit Policy for Reinforcement Learning. An attention-based Bi-GRU-CapsNet model for hypernymy detection between compound entities. A Characterwise Windowed Approach to Hebrew M …
Ramp-based Twin Support Vector Clustering
Title | Ramp-based Twin Support Vector Clustering |
Authors | Zhen Wang, Xu Chen, Chun-Na Li, Yuan-Hai Shao |
Abstract | Traditional plane-based clustering methods measure the cost of within-cluster and between-cluster by quadratic, linear or some other unbounded functions, which may amplify the impact of cost. This letter introduces a ramp cost function into the plane-based clustering to propose a new clustering method, called ramp-based twin support vector clustering (RampTWSVC). RampTWSVC is more robust because of its boundness, and thus it is more easier to find the intrinsic clusters than other plane-based clustering methods. The non-convex programming problem in RampTWSVC is solved efficiently through an alternating iteration algorithm, and its local solution can be obtained in a finite number of iterations theoretically. In addition, the nonlinear manifold-based formation of RampTWSVC is also proposed by kernel trick. Experimental results on several benchmark datasets show the better performance of our RampTWSVC compared with other plane-based clustering methods. |
Tasks | |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03710v1 |
http://arxiv.org/pdf/1812.03710v1.pdf | |
PWC | https://paperswithcode.com/paper/ramp-based-twin-support-vector-clustering |
Repo | |
Framework | |
Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches
Title | Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-Batches |
Authors | Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang, Liang Lin |
Abstract | As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with scenario of micro-batch (e.g., less than 10 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. In this paper, we present a novel normalization method, called Batch Kalman Normalization (BKN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, BKN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. BKN has two appealing properties. First, it enables more stable training and faster convergence compared to previous works. Second, training DNNs using BKN performs substantially better than those using BN and its variants, especially when very small mini-batches are presented. On the image classification benchmark of ImageNet, using BKN powered networks we improve upon the best-published model-zoo results: reaching 74.0% top-1 val accuracy for InceptionV2. More importantly, using BKN achieves the comparable accuracy with extremely smaller batch size, such as 64 times smaller on CIFAR-10/100 and 8 times smaller on ImageNet. |
Tasks | Image Classification |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03133v2 |
http://arxiv.org/pdf/1802.03133v2.pdf | |
PWC | https://paperswithcode.com/paper/batch-kalman-normalization-towards-training |
Repo | |
Framework | |
Why every GBDT speed benchmark is wrong
Title | Why every GBDT speed benchmark is wrong |
Authors | Anna Veronika Dorogush, Vasily Ershov, Dmitriy Kruchinin |
Abstract | This article provides a comprehensive study of different ways to make speed benchmarks of gradient boosted decision trees algorithm. We show main problems of several straight forward ways to make benchmarks, explain, why a speed benchmarking is a challenging task and provide a set of reasonable requirements for a benchmark to be fair and useful. |
Tasks | |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10380v1 |
http://arxiv.org/pdf/1810.10380v1.pdf | |
PWC | https://paperswithcode.com/paper/why-every-gbdt-speed-benchmark-is-wrong |
Repo | |
Framework | |
Implicit Policy for Reinforcement Learning
Title | Implicit Policy for Reinforcement Learning |
Authors | Yunhao Tang, Shipra Agrawal |
Abstract | We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients. We empirically show that, despite its simplicity in implementation, entropy regularization combined with a rich policy class can attain desirable properties displayed under maximum entropy reinforcement learning framework, such as robustness and multi-modality. |
Tasks | |
Published | 2018-06-10 |
URL | http://arxiv.org/abs/1806.06798v2 |
http://arxiv.org/pdf/1806.06798v2.pdf | |
PWC | https://paperswithcode.com/paper/implicit-policy-for-reinforcement-learning |
Repo | |
Framework | |
An attention-based Bi-GRU-CapsNet model for hypernymy detection between compound entities
Title | An attention-based Bi-GRU-CapsNet model for hypernymy detection between compound entities |
Authors | Qi Wang, Chenming Xu, Yangming Zhou, Tong Ruan, Daqi Gao, Ping He |
Abstract | Named entities are usually composable and extensible. Typical examples are names of symptoms and diseases in medical areas. To distinguish these entities from general entities, we name them \textit{compound entities}. In this paper, we present an attention-based Bi-GRU-CapsNet model to detect hypernymy relationship between compound entities. Our model consists of several important components. To avoid the out-of-vocabulary problem, English words or Chinese characters in compound entities are fed into the bidirectional gated recurrent units. An attention mechanism is designed to focus on the differences between the two compound entities. Since there are some different cases in hypernymy relationship between compound entities, capsule network is finally employed to decide whether the hypernymy relationship exists or not. Experimental results demonstrate |
Tasks | |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04827v3 |
http://arxiv.org/pdf/1805.04827v3.pdf | |
PWC | https://paperswithcode.com/paper/an-attention-based-bi-gru-capsnet-model-for |
Repo | |
Framework | |
A Characterwise Windowed Approach to Hebrew Morphological Segmentation
Title | A Characterwise Windowed Approach to Hebrew Morphological Segmentation |
Authors | Amir Zeldes |
Abstract | This paper presents a novel approach to the segmentation of orthographic word forms in contemporary Hebrew, focusing purely on splitting without carrying out morphological analysis or disambiguation. Casting the analysis task as character-wise binary classification and using adjacent character and word-based lexicon-lookup features, this approach achieves over 98% accuracy on the benchmark SPMRL shared task data for Hebrew, and 97% accuracy on a new out of domain Wikipedia dataset, an improvement of ~4% and 5% over previous state of the art performance. |
Tasks | Morphological Analysis |
Published | 2018-08-22 |
URL | http://arxiv.org/abs/1808.07214v2 |
http://arxiv.org/pdf/1808.07214v2.pdf | |
PWC | https://paperswithcode.com/paper/a-characterwise-windowed-approach-to-hebrew |
Repo | |
Framework | |
An Online Attention-based Model for Speech Recognition
Title | An Online Attention-based Model for Speech Recognition |
Authors | Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu |
Abstract | Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the whole pipeline of traditional automatic speech recognition (ASR) systems and become popular in the field of speech recognition. In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism. However, bidirectional encoder and GSA are two obstacles for real-time speech recognition. In this work, we aim to stream LAS baseline by removing the above two obstacles. On the encoder side, we use a latency-controlled (LC) bidirectional structure to reduce the delay of forward computation. Meanwhile, an adaptive monotonic chunk-wise attention (AMoChA) mechanism is proposed to replace GSA for the calculation of attention weight distribution. Furthermore, we propose two methods to alleviate the huge performance degradation when combining LC and AMoChA. Finally, we successfully acquire an online LAS model, LC-AMoChA, which has only 3.5% relative performance reduction to LAS baseline on our internal Mandarin corpus. |
Tasks | Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05247v2 |
http://arxiv.org/pdf/1811.05247v2.pdf | |
PWC | https://paperswithcode.com/paper/an-online-attention-based-model-for-speech |
Repo | |
Framework | |
Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!
Title | Sentence-Level Fluency Evaluation: References Help, But Can Be Spared! |
Authors | Katharina Kann, Sascha Rothe, Katja Filippova |
Abstract | Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own. |
Tasks | Language Modelling, Text Generation |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.08731v1 |
http://arxiv.org/pdf/1809.08731v1.pdf | |
PWC | https://paperswithcode.com/paper/sentence-level-fluency-evaluation-references |
Repo | |
Framework | |
Mean Square Prediction Error of Misspecified Gaussian Process Models
Title | Mean Square Prediction Error of Misspecified Gaussian Process Models |
Authors | Thomas Beckers, Jonas Umlauft, Sandra Hirche |
Abstract | Nonparametric modeling approaches show very promising results in the area of system identification and control. A naturally provided model confidence is highly relevant for system-theoretical considerations to provide guarantees for application scenarios. Gaussian process regression represents one approach which provides such an indicator for the model confidence. However, this measure is only valid if the covariance function and its hyperparameters fit the underlying data generating process. In this paper, we derive an upper bound for the mean square prediction error of misspecified Gaussian process models based on a pseudo-concave optimization problem. We present application scenarios and a simulation to compare the derived upper bound with the true mean square error. |
Tasks | |
Published | 2018-11-16 |
URL | http://arxiv.org/abs/1811.06642v1 |
http://arxiv.org/pdf/1811.06642v1.pdf | |
PWC | https://paperswithcode.com/paper/mean-square-prediction-error-of-misspecified |
Repo | |
Framework | |
Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution
Title | Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution |
Authors | Thomas Elsken, Jan Hendrik Metzen, Frank Hutter |
Abstract | Neural Architecture Search aims at automatically finding neural architectures that are competitive with architectures designed by human experts. While recent approaches have achieved state-of-the-art predictive performance for image recognition, they are problematic under resource constraints for two reasons: (1)the neural architectures found are solely optimized for high predictive performance, without penalizing excessive resource consumption, (2) most architecture search methods require vast computational resources. We address the first shortcoming by proposing LEMONADE, an evolutionary algorithm for multi-objective architecture search that allows approximating the entire Pareto-front of architectures under multiple objectives, such as predictive performance and number of parameters, in a single run of the method. We address the second shortcoming by proposing a Lamarckian inheritance mechanism for LEMONADE which generates children networks that are warmstarted with the predictive performance of their trained parents. This is accomplished by using (approximate) network morphism operators for generating children. The combination of these two contributions allows finding models that are on par or even outperform both hand-crafted as well as automatically-designed networks. |
Tasks | Neural Architecture Search |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.09081v4 |
http://arxiv.org/pdf/1804.09081v4.pdf | |
PWC | https://paperswithcode.com/paper/efficient-multi-objective-neural-architecture |
Repo | |
Framework | |
Attention Models with Random Features for Multi-layered Graph Embeddings
Title | Attention Models with Random Features for Multi-layered Graph Embeddings |
Authors | Uday Shankar Shanthamallu, Jayaraman J. Thiagarajan, Huan Song, Andreas Spanias |
Abstract | Modern data analysis pipelines are becoming increasingly complex due to the presence of multi-view information sources. While graphs are effective in modeling complex relationships, in many scenarios a single graph is rarely sufficient to succinctly represent all interactions, and hence multi-layered graphs have become popular. Though this leads to richer representations, extending solutions from the single-graph case is not straightforward. Consequently, there is a strong need for novel solutions to solve classical problems, such as node classification, in the multi-layered case. In this paper, we consider the problem of semi-supervised learning with multi-layered graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for community discovery, we argue that feature learning with random node attributes, using graph neural networks, can be more effective. To this end, we propose to use attention models for effective feature learning, and develop two novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer dependencies for building multi-layered graph embeddings. Using empirical studies on several benchmark datasets, we evaluate the proposed approaches and demonstrate significant performance improvements in comparison to state-of-the-art network embedding strategies. The results also show that using simple random features is an effective choice, even in cases where explicit node attributes are not available. |
Tasks | Network Embedding, Node Classification |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01405v1 |
http://arxiv.org/pdf/1810.01405v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-models-with-random-features-for |
Repo | |
Framework | |
Self Attention Grid for Person Re-Identification
Title | Self Attention Grid for Person Re-Identification |
Authors | Jean-Paul Ainam, Ke Qin, Guisong Liu |
Abstract | In this paper, we present an attention mechanism scheme to improve person re-identification task. Inspired by biology, we propose Self Attention Grid (SAG) to discover the most informative parts from a high-resolution image using its internal representation. In particular, given an input image, the proposed model is fed with two copies of the same image and consists of two branches. The upper branch processes the high-resolution image and learns high dimensional feature representation while the lower branch processes the low-resolution image and learn a filtering attention grid. We apply a max filter operation to non-overlapping sub-regions on the high feature representation before element-wise multiplied with the output of the second branch. The feature maps of the second branch are subsequently weighted to reflect the importance of each patch of the grid using a softmax operation. Our attention module helps the network learn the most discriminative visual features of multiple image regions and is specifically optimized to attend feature representation at different levels. Extensive experiments on three large-scale datasets show that our self-attention mechanism significantly improves the baseline model and outperforms various state-of-art models by a large margin. |
Tasks | Person Re-Identification |
Published | 2018-09-23 |
URL | http://arxiv.org/abs/1809.08556v1 |
http://arxiv.org/pdf/1809.08556v1.pdf | |
PWC | https://paperswithcode.com/paper/self-attention-grid-for-person-re |
Repo | |
Framework | |
Disparity Image Segmentation For ADAS
Title | Disparity Image Segmentation For ADAS |
Authors | Viktor Mukha, Inon Sharony |
Abstract | We present a simple solution for segmenting grayscale images using existing Connected Component Labeling (CCL) algorithms (which are generally applied to binary images), which was efficient enough to be implemented in a constrained (embedded automotive) architecture. Our solution customizes the region growing and merging approach, and is primarily targeted for stereoscopic disparity images where nearer objects carry more relevance. We provide results from a standard OpenCV implementation for some basic cases and an image from the Tsukuba stereo-pair dataset. |
Tasks | Semantic Segmentation |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10350v1 |
http://arxiv.org/pdf/1806.10350v1.pdf | |
PWC | https://paperswithcode.com/paper/disparity-image-segmentation-for-adas |
Repo | |
Framework | |
Active Inverse Reward Design
Title | Active Inverse Reward Design |
Authors | Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell |
Abstract | Designers of AI agents often iterate on the reward function in a trial-and-error process until they get the desired behavior, but this only guarantees good behavior in the training environment. We propose structuring this process as a series of queries asking the user to compare between different reward functions. Thus we can actively select queries for maximum informativeness about the true reward. In contrast to approaches asking the designer for optimal behavior, this allows us to gather additional information by eliciting preferences between suboptimal behaviors. After each query, we need to update the posterior over the true reward function from observing the proxy reward function chosen by the designer. The recently proposed Inverse Reward Design (IRD) enables this. Our approach substantially outperforms IRD in test environments. In particular, it can query the designer about interpretable, linear reward functions and still infer non-linear ones. |
Tasks | |
Published | 2018-09-09 |
URL | https://arxiv.org/abs/1809.03060v3 |
https://arxiv.org/pdf/1809.03060v3.pdf | |
PWC | https://paperswithcode.com/paper/active-inverse-reward-design |
Repo | |
Framework | |
Index Set Fourier Series Features for Approximating Multi-dimensional Periodic Kernels
Title | Index Set Fourier Series Features for Approximating Multi-dimensional Periodic Kernels |
Authors | Anthony Tompkins, Fabio Ramos |
Abstract | Periodicity is often studied in timeseries modelling with autoregressive methods but is less popular in the kernel literature, particularly for higher dimensional problems such as in textures, crystallography, and quantum mechanics. Large datasets often make modelling periodicity untenable for otherwise powerful non-parametric methods like Gaussian Processes (GPs) which typically incur an $\mathcal{O}(N^3)$ computational burden and, consequently, are unable to scale to larger datasets. To this end we introduce a method termed \emph{Index Set Fourier Series Features} to tractably exploit multivariate Fourier series and efficiently decompose periodic kernels on higher-dimensional data into a series of basis functions. We show that our approximation produces significantly less predictive error than alternative approaches such as those based on random Fourier features and achieves better generalisation on regression problems with periodic data. |
Tasks | Gaussian Processes |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.04982v1 |
http://arxiv.org/pdf/1805.04982v1.pdf | |
PWC | https://paperswithcode.com/paper/index-set-fourier-series-features-for |
Repo | |
Framework | |