Paper Group ANR 1161
Dialog State Tracking with Reinforced Data Augmentation. Deep Weisfeiler-Lehman Assignment Kernels via Multiple Kernel Learning. SPair-71k: A Large-scale Benchmark for Semantic Correspondence. Optimized Preprocessing and Machine Learning for Quantitative Raman Spectroscopy in Biology. Self-Training for End-to-End Speech Recognition. Expressing Visu …
Dialog State Tracking with Reinforced Data Augmentation
Title | Dialog State Tracking with Reinforced Data Augmentation |
Authors | Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu |
Abstract | Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data. In this paper, we address this difficulty by proposing a reinforcement learning (RL) based framework for data augmentation that can generate high-quality data to improve the neural state tracker. Specifically, we introduce a novel contextual bandit generator to learn fine-grained augmentation policies that can generate new effective instances by choosing suitable replacements for the specific context. Moreover, by alternately learning between the generator and the state tracker, we can keep refining the generative policies to generate more high-quality training data for neural state tracker. Experimental results on the WoZ and MultiWoZ (restaurant) datasets demonstrate that the proposed framework significantly improves the performance over the state-of-the-art models, especially with limited training data. |
Tasks | Data Augmentation |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07795v2 |
https://arxiv.org/pdf/1908.07795v2.pdf | |
PWC | https://paperswithcode.com/paper/190807795 |
Repo | |
Framework | |
Deep Weisfeiler-Lehman Assignment Kernels via Multiple Kernel Learning
Title | Deep Weisfeiler-Lehman Assignment Kernels via Multiple Kernel Learning |
Authors | Nils M. Kriege |
Abstract | Kernels for structured data are commonly obtained by decomposing objects into their parts and adding up the similarities between all pairs of parts measured by a base kernel. Assignment kernels are based on an optimal bijection between the parts and have proven to be an effective alternative to the established convolution kernels. We explore how the base kernel can be learned as part of the classification problem. We build on the theory of valid assignment kernels derived from hierarchies defined on the parts. We show that the weights of this hierarchy can be optimized via multiple kernel learning. We apply this result to learn vertex similarities for the Weisfeiler-Lehman optimal assignment kernel for graph classification. We present first experimental results which demonstrate the feasibility and effectiveness of the approach. |
Tasks | Graph Classification |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06661v1 |
https://arxiv.org/pdf/1908.06661v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-weisfeiler-lehman-assignment-kernels-via |
Repo | |
Framework | |
SPair-71k: A Large-scale Benchmark for Semantic Correspondence
Title | SPair-71k: A Large-scale Benchmark for Semantic Correspondence |
Authors | Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho |
Abstract | Establishing visual correspondences under large intra-class variations, which is often referred to as semantic correspondence or semantic matching, remains a challenging problem in computer vision. Despite its significance, however, most of the datasets for semantic correspondence are limited to a small amount of image pairs with similar viewpoints and scales. In this paper, we present a new large-scale benchmark dataset of semantically paired images, SPair-71k, which contains 70,958 image pairs with diverse variations in viewpoint and scale. Compared to previous datasets, it is significantly larger in number and contains more accurate and richer annotations. We believe this dataset will provide a reliable testbed to study the problem of semantic correspondence and will help to advance research in this area. We provide the results of recent methods on our new dataset as baselines for further research. Our benchmark is available online at http://cvlab.postech.ac.kr/research/SPair-71k/. |
Tasks | |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10543v1 |
https://arxiv.org/pdf/1908.10543v1.pdf | |
PWC | https://paperswithcode.com/paper/spair-71k-a-large-scale-benchmark-for |
Repo | |
Framework | |
Optimized Preprocessing and Machine Learning for Quantitative Raman Spectroscopy in Biology
Title | Optimized Preprocessing and Machine Learning for Quantitative Raman Spectroscopy in Biology |
Authors | Emily E Storey, Amr S. Helmy |
Abstract | Raman spectroscopy’s capability to provide meaningful composition predictions is heavily reliant on a pre-processing step to remove insignificant spectral variation. This is crucial in biofluid analysis. Widespread adoption of diagnostics using Raman requires a robust model which can withstand routine spectra discrepancies due to unavoidable variations such as age, diet, and medical background. A wealth of pre-processing methods are available, and it is often up to trial-and-error or user experience to select the method which gives the best results. This process can be incredibly time consuming and inconsistent for multiple operators. In this study we detail a method to analyze the statistical variability within a set of training spectra and determine suitability to form a robust model. This allows us to selectively qualify or exclude a pre-processing method, predetermine robustness, and simultaneously identify the number of components which will form the best predictive model. We demonstrate the ability of this technique to improve predictive models of two artificial biological fluids. Raman spectroscopy is ideal for noninvasive, nondestructive analysis. Routine health monitoring which maximizes comfort is increasingly crucial, particularly in epidemic-level diabetes diagnoses. High variability in spectra of biological samples can hinder Raman’s adoption for these methods. Our technique allows the decision of optimal pre-treatment method to be determined for the operator; model performance is no longer a function of user experience. We foresee this statistical technique being an instrumental element to widening the adoption of Raman as a monitoring tool in a field of biofluid analysis. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02243v1 |
http://arxiv.org/pdf/1904.02243v1.pdf | |
PWC | https://paperswithcode.com/paper/optimized-preprocessing-and-machine-learning |
Repo | |
Framework | |
Self-Training for End-to-End Speech Recognition
Title | Self-Training for End-to-End Speech Recognition |
Authors | Jacob Kahn, Ann Lee, Awni Hannun |
Abstract | We revisit self-training in the context of end-to-end speech recognition. We demonstrate that training with pseudo-labels can substantially improve the accuracy of a baseline model. Key to our approach are a strong baseline acoustic and language model used to generate the pseudo-labels, filtering mechanisms tailored to common errors from sequence-to-sequence models, and a novel ensemble approach to increase pseudo-label diversity. Experiments on the LibriSpeech corpus show that with an ensemble of four models and label filtering, self-training yields a 33.9% relative improvement in WER compared with a baseline trained on 100 hours of labelled data in the noisy speech setting. In the clean speech setting, self-training recovers 59.3% of the gap between the baseline and an oracle model, which is at least 93.8% relatively higher than what previous approaches can achieve. |
Tasks | End-To-End Speech Recognition, Language Modelling, Speech Recognition |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09116v2 |
https://arxiv.org/pdf/1909.09116v2.pdf | |
PWC | https://paperswithcode.com/paper/self-training-for-end-to-end-speech |
Repo | |
Framework | |
Expressing Visual Relationships via Language
Title | Expressing Visual Relationships via Language |
Authors | Hao Tan, Franck Dernoncourt, Zhe Lin, Trung Bui, Mohit Bansal |
Abstract | Describing images with text is a fundamental problem in vision-language research. Current studies in this domain mostly focus on single image captioning. However, in various real applications (e.g., image editing, difference interpretation, and retrieval), generating relational captions for two images, can also be very useful. This important problem has not been explored mostly due to lack of datasets and effective models. To push forward the research in this direction, we first introduce a new language-guided image editing dataset that contains a large number of real image pairs with corresponding editing instructions. We then propose a new relational speaker model based on an encoder-decoder architecture with static relational attention and sequential multi-head attention. We also extend the model with dynamic relational attention, which calculates visual alignment while decoding. Our models are evaluated on our newly collected and two public datasets consisting of image pairs annotated with relationship sentences. Experimental results, based on both automatic and human evaluation, demonstrate that our model outperforms all baselines and existing methods on all the datasets. |
Tasks | Image Captioning |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.07689v2 |
https://arxiv.org/pdf/1906.07689v2.pdf | |
PWC | https://paperswithcode.com/paper/expressing-visual-relationships-via-language |
Repo | |
Framework | |
Gaussian Mixture Clustering Using Relative Tests of Fit
Title | Gaussian Mixture Clustering Using Relative Tests of Fit |
Authors | Purvasha Chakravarti, Sivaraman Balakrishnan, Larry Wasserman |
Abstract | We consider clustering based on significance tests for Gaussian Mixture Models (GMMs). Our starting point is the SigClust method developed by Liu et al. (2008), which introduces a test based on the k-means objective (with k = 2) to decide whether the data should be split into two clusters. When applied recursively, this test yields a method for hierarchical clustering that is equipped with a significance guarantee. We study the limiting distribution and power of this approach in some examples and show that there are large regions of the parameter space where the power is low. We then introduce a new test based on the idea of relative fit. Unlike prior work, we test for whether a mixture of Gaussians provides a better fit relative to a single Gaussian, without assuming that either model is correct. The proposed test has a simple critical value and provides provable error control. One version of our test provides exact, finite sample control of the type I error. We show how our tests can be used for hierarchical clustering as well as in a sequential manner for model selection. We conclude with an extensive simulation study and a cluster analysis of a gene expression dataset. |
Tasks | Model Selection |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02566v1 |
https://arxiv.org/pdf/1910.02566v1.pdf | |
PWC | https://paperswithcode.com/paper/gaussian-mixture-clustering-using-relative |
Repo | |
Framework | |
Ladder Loss for Coherent Visual-Semantic Embedding
Title | Ladder Loss for Coherent Visual-Semantic Embedding |
Authors | Mo Zhou, Zhenxing Niu, Le Wang, Zhanning Gao, Qilin Zhang, Gang Hua |
Abstract | For visual-semantic embedding, the existing methods normally treat the relevance between queries and candidates in a bipolar way – relevant or irrelevant, and all “irrelevant” candidates are uniformly pushed away from the query by an equal margin in the embedding space, regardless of their various proximity to the query. This practice disregards relatively discriminative information and could lead to suboptimal ranking in the retrieval results and poorer user experience, especially in the long-tail query scenario where a matching candidate may not necessarily exist. In this paper, we introduce a continuous variable to model the relevance degree between queries and multiple candidates, and propose to learn a coherent embedding space, where candidates with higher relevance degrees are mapped closer to the query than those with lower relevance degrees. In particular, the new ladder loss is proposed by extending the triplet loss inequality to a more general inequality chain, which implements variable push-away margins according to respective relevance degrees. In addition, a proper Coherent Score metric is proposed to better measure the ranking results including those “irrelevant” candidates. Extensive experiments on multiple datasets validate the efficacy of our proposed method, which achieves significant improvement over existing state-of-the-art methods. |
Tasks | |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07528v1 |
https://arxiv.org/pdf/1911.07528v1.pdf | |
PWC | https://paperswithcode.com/paper/ladder-loss-for-coherent-visual-semantic |
Repo | |
Framework | |
Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings
Title | Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings |
Authors | Denis Newman-Griffis, Eric Fosler-Lussier |
Abstract | Natural language processing techniques are being applied to increasingly diverse types of electronic health records, and can benefit from in-depth understanding of the distinguishing characteristics of medical document types. We present a method for characterizing the usage patterns of clinical concepts among different document types, in order to capture semantic differences beyond the lexical level. By training concept embeddings on clinical documents of different types and measuring the differences in their nearest neighborhood structures, we are able to measure divergences in concept usage while correcting for noise in embedding learning. Experiments on the MIMIC-III corpus demonstrate that our approach captures clinically-relevant differences in concept usage and provides an intuitive way to explore semantic characteristics of clinical document collections. |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00192v1 |
https://arxiv.org/pdf/1910.00192v1.pdf | |
PWC | https://paperswithcode.com/paper/writing-habits-and-telltale-neighbors |
Repo | |
Framework | |
R-SQAIR: Relational Sequential Attend, Infer, Repeat
Title | R-SQAIR: Relational Sequential Attend, Infer, Repeat |
Authors | Aleksandar Stanić, Jürgen Schmidhuber |
Abstract | Traditional sequential multi-object attention models rely on a recurrent mechanism to infer object relations. We propose a relational extension (R-SQAIR) of one such attention model (SQAIR) by endowing it with a module with strong relational inductive bias that computes in parallel pairwise interactions between inferred objects. Two recently proposed relational modules are studied on tasks of unsupervised learning from videos. We demonstrate gains over sequential relational mechanisms, also in terms of combinatorial generalization. |
Tasks | |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.05231v1 |
https://arxiv.org/pdf/1910.05231v1.pdf | |
PWC | https://paperswithcode.com/paper/r-sqair-relational-sequential-attend-infer |
Repo | |
Framework | |
Model Order Selection Based on Information Theoretic Criteria: Design of the Penalty
Title | Model Order Selection Based on Information Theoretic Criteria: Design of the Penalty |
Authors | Andrea Mariani, Andrea Giorgetti, Marco Chiani |
Abstract | Information theoretic criteria (ITC) have been widely adopted in engineering and statistics for selecting, among an ordered set of candidate models, the one that better fits the observed sample data. The selected model minimizes a penalized likelihood metric, where the penalty is determined by the criterion adopted. While rules for choosing a penalty that guarantees a consistent estimate of the model order are known, theoretical tools for its design with finite samples have never been provided in a general setting. In this paper, we study model order selection for finite samples under a design perspective, focusing on the generalized information criterion (GIC), which embraces the most common ITC. The theory is general, and as case studies we consider: a) the problem of estimating the number of signals embedded in additive white Gaussian noise (AWGN) by using multiple sensors; b) model selection for the general linear model (GLM), which includes e.g. the problem of estimating the number of sinusoids in AWGN. The analysis reveals a trade-off between the probabilities of overestimating and underestimating the order of the model. We then propose to design the GIC penalty to minimize underestimation while keeping the overestimation probability below a specified level. For the considered problems, this method leads to analytical derivation of the optimal penalty for a given sample size. A performance comparison between the penalty optimized GIC and common AIC and BIC is provided, demonstrating the effectiveness of the proposed design strategy. |
Tasks | Model Selection |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.03980v1 |
https://arxiv.org/pdf/1910.03980v1.pdf | |
PWC | https://paperswithcode.com/paper/model-order-selection-based-on-information |
Repo | |
Framework | |
Adaptive Sampling for Stochastic Risk-Averse Learning
Title | Adaptive Sampling for Stochastic Risk-Averse Learning |
Authors | Sebastian Curi, Kfir. Y. Levy, Stefanie Jegelka, Andreas Krause |
Abstract | In high-stakes machine learning applications, it is crucial to not only perform well on average, but also when restricted to difficult examples. To address this, we consider the problem of training models in a risk-averse manner. We propose an adaptive sampling algorithm for stochastically optimizing the Conditional Value-at-Risk (CVaR) of a loss distribution. We use a distributionally robust formulation of the CVaR to phrase the problem as a zero-sum game between two players, and solve it efficiently using regret minimization. Our approach relies on sampling from structured Determinantal Point Processes (DPPs), which allows scaling it to large data sets. Finally, we empirically demonstrate its effectiveness on large-scale convex and non-convex learning tasks. |
Tasks | Point Processes |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12511v2 |
https://arxiv.org/pdf/1910.12511v2.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-sampling-for-stochastic-risk-averse |
Repo | |
Framework | |
Soft computing methods for multiobjective location of garbage accumulation points in smart cities
Title | Soft computing methods for multiobjective location of garbage accumulation points in smart cities |
Authors | Jamal Toutouh, Diego Rossit, Sergio Nesmachnow |
Abstract | This article describes the application of soft computing methods for solving the problem of locating garbage accumulation points in urban scenarios. This is a relevant problem in modern smart cities, in order to reduce negative environmental and social impacts in the waste management process, and also to optimize the available budget from the city administration to install waste bins. A specific problem model is presented, which accounts for reducing the investment costs, enhance the number of citizens served by the installed bins, and the accessibility to the system. A family of single- and multi-objective heuristics based on the PageRank method and two mutiobjective evolutionary algorithms are proposed. Experimental evaluation performed on real scenarios on the cities of Montevideo (Uruguay) and Bahia Blanca (Argentina) demonstrates the effectiveness of the proposed approaches. The methods allow computing plannings with different trade-off between the problem objectives. The computed results improve over the current planning in Montevideo and provide a reasonable budget cost and quality of service for Bahia Blanca. |
Tasks | |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10689v1 |
https://arxiv.org/pdf/1906.10689v1.pdf | |
PWC | https://paperswithcode.com/paper/soft-computing-methods-for-multiobjective |
Repo | |
Framework | |
Neural Machine Translation with Byte-Level Subwords
Title | Neural Machine Translation with Byte-Level Subwords |
Authors | Changhan Wang, Kyunghyun Cho, Jiatao Gu |
Abstract | Almost all existing machine translation models are built on top of character-based vocabularies: characters, subwords or words. Rare characters from noisy text or character-rich languages such as Japanese and Chinese however can unnecessarily take up vocabulary slots and limit its compactness. Representing text at the level of bytes and using the 256 byte set as vocabulary is a potential solution to this issue. High computational cost has however prevented it from being widely deployed or used in practice. In this paper, we investigate byte-level subwords, specifically byte-level BPE (BBPE), which is compacter than character vocabulary and has no out-of-vocabulary tokens, but is more efficient than using pure bytes only is. We claim that contextualizing BBPE embeddings is necessary, which can be implemented by a convolutional or recurrent layer. Our experiments show that BBPE has comparable performance to BPE while its size is only 1/8 of that for BPE. In the multilingual setting, BBPE maximizes vocabulary sharing across many languages and achieves better translation quality. Moreover, we show that BBPE enables transferring models between languages with non-overlapping character sets. |
Tasks | Machine Translation |
Published | 2019-09-07 |
URL | https://arxiv.org/abs/1909.03341v2 |
https://arxiv.org/pdf/1909.03341v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-with-byte-level |
Repo | |
Framework | |
Certainty Equivalence is Efficient for Linear Quadratic Control
Title | Certainty Equivalence is Efficient for Linear Quadratic Control |
Authors | Horia Mania, Stephen Tu, Benjamin Recht |
Abstract | We study the performance of the certainty equivalent controller on Linear Quadratic (LQ) control problems with unknown transition dynamics. We show that for both the fully and partially observed settings, the sub-optimality gap between the cost incurred by playing the certainty equivalent controller on the true system and the cost incurred by using the optimal LQ controller enjoys a fast statistical rate, scaling as the square of the parameter error. To the best of our knowledge, our result is the first sub-optimality guarantee in the partially observed Linear Quadratic Gaussian (LQG) setting. Furthermore, in the fully observed Linear Quadratic Regulator (LQR), our result improves upon recent work by Dean et al. (2017), who present an algorithm achieving a sub-optimality gap linear in the parameter error. A key part of our analysis relies on perturbation bounds for discrete Riccati equations. We provide two new perturbation bounds, one that expands on an existing result from Konstantinov et al. (1993), and another based on a new elementary proof strategy. |
Tasks | |
Published | 2019-02-21 |
URL | https://arxiv.org/abs/1902.07826v2 |
https://arxiv.org/pdf/1902.07826v2.pdf | |
PWC | https://paperswithcode.com/paper/certainty-equivalent-control-of-lqr-is |
Repo | |
Framework | |