Paper Group ANR 344
Attention as a Perspective for Learning Tempo-invariant Audio Queries. Subsampling Sequential Monte Carlo for Static Bayesian Models. Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers. MBS: Macroblock Scaling for CNN Model Reduction. Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts. Mo …
Attention as a Perspective for Learning Tempo-invariant Audio Queries
Title | Attention as a Perspective for Learning Tempo-invariant Audio Queries |
Authors | Matthias Dorfer, Jan Hajič Jr., Gerhard Widmer |
Abstract | Current models for audio–sheet music retrieval via multimodal embedding space learning use convolutional neural networks with a fixed-size window for the input audio. Depending on the tempo of a query performance, this window captures more or less musical content, while notehead density in the score is largely tempo-independent. In this work we address this disparity with a soft attention mechanism, which allows the model to encode only those parts of an audio excerpt that are most relevant with respect to efficient query codes. Empirical results on classical piano music indicate that attention is beneficial for retrieval performance, and exhibits intuitively appealing behavior. |
Tasks | |
Published | 2018-09-15 |
URL | http://arxiv.org/abs/1809.05689v1 |
http://arxiv.org/pdf/1809.05689v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-as-a-perspective-for-learning-tempo |
Repo | |
Framework | |
Subsampling Sequential Monte Carlo for Static Bayesian Models
Title | Subsampling Sequential Monte Carlo for Static Bayesian Models |
Authors | David Gunawan, Khue-Dung Dang, Matias Quiroz, Robert Kohn, Minh-Ngoc Tran |
Abstract | We show how to speed up Sequential Monte Carlo (SMC) for Bayesian inference in large data problems by data subsampling. SMC sequentially updates a cloud of particles through a sequence of distributions, beginning with a distribution that is easy to sample from such as the prior and ending with the posterior distribution. Each update of the particle cloud consists of three steps: reweighting, resampling, and moving. In the move step, each particle is moved using a Markov kernel; this is typically the most computationally expensive part, particularly when the dataset is large. It is crucial to have an efficient move step to ensure particle diversity. Our article makes two important contributions. First, in order to speed up the SMC computation, we use an approximately unbiased and efficient annealed likelihood estimator based on data subsampling. The subsampling approach is more memory efficient than the corresponding full data SMC, which is an advantage for parallel computation. Second, we use a Metropolis within Gibbs kernel with two conditional updates. A Hamiltonian Monte Carlo update makes distant moves for the model parameters, and a block pseudo-marginal proposal is used for the particles corresponding to the auxiliary variables for the data subsampling. We demonstrate both the usefulness and limitations of the methodology for estimating four generalized linear models and a generalized additive model with large datasets. |
Tasks | Bayesian Inference |
Published | 2018-05-08 |
URL | https://arxiv.org/abs/1805.03317v3 |
https://arxiv.org/pdf/1805.03317v3.pdf | |
PWC | https://paperswithcode.com/paper/subsampling-sequential-monte-carlo-for-static |
Repo | |
Framework | |
Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers
Title | Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers |
Authors | Yonatan Geifman, Guy Uziel, Ran El-Yaniv |
Abstract | We consider the problem of uncertainty estimation in the context of (non-Bayesian) deep neural classification. In this context, all known methods are based on extracting uncertainty signals from a trained network optimized to solve the classification problem at hand. We demonstrate that such techniques tend to introduce biased estimates for instances whose predictions are supposed to be highly confident. We argue that this deficiency is an artifact of the dynamics of training with SGD-like optimizers, and it has some properties similar to overfitting. Based on this observation, we develop an uncertainty estimation algorithm that selectively estimates the uncertainty of highly confident points, using earlier snapshots of the trained model, before their estimates are jittered (and way before they are ready for actual classification). We present extensive experiments indicating that the proposed algorithm provides uncertainty estimates that are consistently better than all known methods. |
Tasks | |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08206v4 |
http://arxiv.org/pdf/1805.08206v4.pdf | |
PWC | https://paperswithcode.com/paper/bias-reduced-uncertainty-estimation-for-deep |
Repo | |
Framework | |
MBS: Macroblock Scaling for CNN Model Reduction
Title | MBS: Macroblock Scaling for CNN Model Reduction |
Authors | Yu-Hsun Lin, Chun-Nan Chou, Edward Y. Chang |
Abstract | In this paper we propose the macroblock scaling (MBS) algorithm, which can be applied to various CNN architectures to reduce their model size. MBS adaptively reduces each CNN macroblock depending on its information redundancy measured by our proposed effective flops. Empirical studies conducted with ImageNet and CIFAR-10 attest that MBS can reduce the model size of some already compact CNN models, e.g., MobileNetV2 (25.03% further reduction) and ShuffleNet (20.74%), and even ultra-deep ones such as ResNet-101 (51.67%) and ResNet-1202 (72.71%) with negligible accuracy degradation. MBS also performs better reduction at a much lower cost than the state-of-the-art optimization-based methods do. MBS’s simplicity and efficiency, its flexibility to work with any CNN model, and its scalability to work with models of any depth make it an attractive choice for CNN model size reduction. |
Tasks | |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.06569v2 |
http://arxiv.org/pdf/1809.06569v2.pdf | |
PWC | https://paperswithcode.com/paper/mbs-macroblock-scaling-for-cnn-model |
Repo | |
Framework | |
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
Title | Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts |
Authors | Siyou Liu, Longyue Wang, Chao-Hong Liu |
Abstract | Although there are increasing and significant ties between China and Portuguese-speaking countries, there is not much parallel corpora in the Chinese-Portuguese language pair. Both languages are very populous, with 1.2 billion native Chinese speakers and 279 million native Portuguese speakers, the language pair, however, could be considered as low-resource in terms of available parallel corpora. In this paper, we describe our methods to curate Chinese-Portuguese parallel corpora and evaluate their quality. We extracted bilingual data from Macao government websites and proposed a hierarchical strategy to build a large parallel corpus. Experiments are conducted on existing and our corpora using both Phrased-Based Machine Translation (PBMT) and the state-of-the-art Neural Machine Translation (NMT) models. The results of this work can be used as a benchmark for future Chinese-Portuguese MT systems. The approach we used in this paper also shows a good example on how to boost performance of MT systems for low-resource language pairs. |
Tasks | Machine Translation |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.01768v1 |
http://arxiv.org/pdf/1804.01768v1.pdf | |
PWC | https://paperswithcode.com/paper/chinese-portuguese-machine-translation-a |
Repo | |
Framework | |
Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG
Title | Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG |
Authors | Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Zhibo Gong |
Abstract | Modelling and exploiting teammates’ policies in cooperative multi-agent systems have long been an interest and also a big challenge for the reinforcement learning (RL) community. The interest lies in the fact that if the agent knows the teammates’ policies, it can adjust its own policy accordingly to arrive at proper cooperations; while the challenge is that the agents’ policies are changing continuously due to they are learning concurrently, which imposes difficulty to model the dynamic policies of teammates accurately. In this paper, we present \emph{ATTention Multi-Agent Deep Deterministic Policy Gradient} (ATT-MADDPG) to address this challenge. ATT-MADDPG extends DDPG, a single-agent actor-critic RL method, with two special designs. First, in order to model the teammates’ policies, the agent should get access to the observations and actions of teammates. ATT-MADDPG adopts a centralized critic to collect such information. Second, to model the teammates’ policies using the collected information in an effective way, ATT-MADDPG enhances the centralized critic with an attention mechanism. This attention mechanism introduces a special structure to explicitly model the dynamic joint policy of teammates, making sure that the collected information can be processed efficiently. We evaluate ATT-MADDPG on both benchmark tasks and the real-world packet routing tasks. Experimental results show that it not only outperforms the state-of-the-art RL-based methods and rule-based methods by a large margin, but also achieves better performance in terms of scalability and robustness. |
Tasks | |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.07029v1 |
http://arxiv.org/pdf/1811.07029v1.pdf | |
PWC | https://paperswithcode.com/paper/modelling-the-dynamic-joint-policy-of |
Repo | |
Framework | |
Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives
Title | Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives |
Authors | Gil Jader, Luciano Oliveira, Matheus Pithon |
Abstract | This review presents an in-depth study of the literature on segmentation methods applied in dental imaging. Ten segmentation methods were studied and categorized according to the type of the segmentation method (region-based, threshold-based, cluster-based, boundary-based or watershed-based), type of X-ray images used (intra-oral or extra-oral) and characteristics of the dataset used to evaluate the methods in the state-of-the-art works. We found that the literature has primarily focused on threshold-based segmentation methods (54%). 80% of the reviewed papers have used intra-oral X-ray images in their experiments, demonstrating preference to perform segmentation on images of already isolated parts of the teeth, rather than using extra-oral X-rays, which show tooth structure of the mouth and bones of the face. To fill a scientific gap in the field, a novel data set based on extra-oral X-ray images are proposed here. A statistical comparison of the results found with the 10 image segmentation methods over our proposed data set comprised of 1,500 images is also carried out, providing a more comprehensive source of performance assessment. Discussion on limitations of the methods conceived over the past year as well as future perspectives on exploiting learning-based segmentation methods to improve performance are also provided. |
Tasks | Semantic Segmentation |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03086v1 |
http://arxiv.org/pdf/1802.03086v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-segmenting-teeth-in-x-ray-images |
Repo | |
Framework | |
Distributionally Robust Submodular Maximization
Title | Distributionally Robust Submodular Maximization |
Authors | Matthew Staib, Bryan Wilder, Stefanie Jegelka |
Abstract | Submodular functions have applications throughout machine learning, but in many settings, we do not have direct access to the underlying function $f$. We focus on stochastic functions that are given as an expectation of functions over a distribution $P$. In practice, we often have only a limited set of samples $f_i$ from $P$. The standard approach indirectly optimizes $f$ by maximizing the sum of $f_i$. However, this ignores generalization to the true (unknown) distribution. In this paper, we achieve better performance on the actual underlying function $f$ by directly optimizing a combination of bias and variance. Algorithmically, we accomplish this by showing how to carry out distributionally robust optimization (DRO) for submodular functions, providing efficient algorithms backed by theoretical guarantees which leverage several novel contributions to the general theory of DRO. We also show compelling empirical evidence that DRO improves generalization to the unknown stochastic submodular function. |
Tasks | |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05249v2 |
http://arxiv.org/pdf/1802.05249v2.pdf | |
PWC | https://paperswithcode.com/paper/distributionally-robust-submodular |
Repo | |
Framework | |
Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes
Title | Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes |
Authors | Simon Barthelmé, Pierre-Olivier Amblard, Nicolas Tremblay |
Abstract | Determinantal Point Processes (DPPs) are popular models for point processes with repulsion. They appear in numerous contexts, from physics to graph theory, and display appealing theoretical properties. On the more practical side of things, since DPPs tend to select sets of points that are some distance apart (repulsion), they have been advocated as a way of producing random subsets with high diversity. DPPs come in two variants: fixed-size and varying-size. A sample from a varying-size DPP is a subset of random cardinality, while in fixed-size “$k$-DPPs” the cardinality is fixed. The latter makes more sense in many applications, but unfortunately their computational properties are less attractive, since, among other things, inclusion probabilities are harder to compute. In this work we show that as the size of the ground set grows, $k$-DPPs and DPPs become equivalent, meaning that their inclusion probabilities converge. As a by-product, we obtain saddlepoint formulas for inclusion probabilities in $k$-DPPs. These turn out to be extremely accurate, and suffer less from numerical difficulties than exact methods do. Our results also suggest that $k$-DPPs and DPPs also have equivalent maximum likelihood estimators. Finally, we obtain results on asymptotic approximations of elementary symmetric polynomials which may be of independent interest. |
Tasks | Point Processes |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01576v2 |
http://arxiv.org/pdf/1803.01576v2.pdf | |
PWC | https://paperswithcode.com/paper/asymptotic-equivalence-of-fixed-size-and |
Repo | |
Framework | |
A Comparative Analysis of Content-based Geolocation in Blogs and Tweets
Title | A Comparative Analysis of Content-based Geolocation in Blogs and Tweets |
Authors | Konstantinos Pappas, Mahmoud Azab, Rada Mihalcea |
Abstract | The geolocation of online information is an essential component in any geospatial application. While most of the previous work on geolocation has focused on Twitter, in this paper we quantify and compare the performance of text-based geolocation methods on social media data drawn from both Blogger and Twitter. We introduce a novel set of location specific features that are both highly informative and easily interpretable, and show that we can achieve error rate reductions of up to 12.5% with respect to the best previously proposed geolocation features. We also show that despite posting longer text, Blogger users are significantly harder to geolocate than Twitter users. Additionally, we investigate the effect of training and testing on different media (cross-media predictions), or combining multiple social media sources (multi-media predictions). Finally, we explore the geolocability of social media in relation to three user dimensions: state, gender, and industry. |
Tasks | |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07497v1 |
http://arxiv.org/pdf/1811.07497v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-analysis-of-content-based |
Repo | |
Framework | |
A Variable Neighborhood Search for Flying Sidekick Traveling Salesman Problem
Title | A Variable Neighborhood Search for Flying Sidekick Traveling Salesman Problem |
Authors | Julia C. Freitas, Puca Huachi V. Penna |
Abstract | The efficiency and dynamism of Unmanned Aerial Vehicles (UAVs), or drones, present substantial application opportunities in several industries in the last years. Notably, the logistic companies gave close attention to these vehicles envisioning reduce delivery time and operational cost. A variant of the Traveling Salesman Problem (TSP) called Flying Sidekick Traveling Salesman Problem (FSTSP) was introduced involving drone-assisted parcel delivery. The drone is launched from the truck, proceeds to deliver parcels to a customer and then is recovered by the truck in a third location. While the drone travels through a trip, the truck delivers parcels to other customers as long as the drone has enough battery to hover waiting for the truck. This work proposes a hybrid heuristic that the initial solution is created from the optimal TSP solution reached by a TSP solver. Next, an implementation of the General Variable Neighborhood Search is used to obtain the delivery routes of truck and drone. Computational experiments show the potential of the algorithm to improve the delivery time significantly. Furthermore, we provide a new set of instances based on well-known TSPLIB instances. |
Tasks | |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.03954v2 |
http://arxiv.org/pdf/1804.03954v2.pdf | |
PWC | https://paperswithcode.com/paper/a-variable-neighborhood-search-for-flying |
Repo | |
Framework | |
Optimized Algorithms to Sample Determinantal Point Processes
Title | Optimized Algorithms to Sample Determinantal Point Processes |
Authors | Nicolas Tremblay, Simon Barthelme, Pierre-Olivier Amblard |
Abstract | In this technical report, we discuss several sampling algorithms for Determinantal Point Processes (DPP). DPPs have recently gained a broad interest in the machine learning and statistics literature as random point processes with negative correlation, i.e., ones that can generate a “diverse” sample from a set of items. They are parametrized by a matrix $\mathbf{L}$, called $L$-ensemble, that encodes the correlations between items. The standard sampling algorithm is separated in three phases: 1/~eigendecomposition of $\mathbf{L}$, 2/~an eigenvector sampling phase where $\mathbf{L}$'s eigenvectors are sampled independently via a Bernoulli variable parametrized by their associated eigenvalue, 3/~a Gram-Schmidt-type orthogonalisation procedure of the sampled eigenvectors. In a naive implementation, the computational cost of the third step is on average $\mathcal{O}(N\mu^3)$ where $\mu$ is the average number of samples of the DPP. We give an algorithm which runs in $\mathcal{O}(N\mu^2)$ and is extremely simple to implement. If memory is a constraint, we also describe a dual variant with reduced memory costs. In addition, we discuss implementation details often missing in the literature. |
Tasks | Point Processes |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08471v1 |
http://arxiv.org/pdf/1802.08471v1.pdf | |
PWC | https://paperswithcode.com/paper/optimized-algorithms-to-sample-determinantal |
Repo | |
Framework | |
A Neuronal Planar Modeling for Handwriting Signature based on Automatic Segmentation
Title | A Neuronal Planar Modeling for Handwriting Signature based on Automatic Segmentation |
Authors | Imen Abroug Ben Abdelghani, Najwa Essoukri Ben Amara |
Abstract | This paper deals with offline handwriting signature verification.We propose a planar neuronal model of signature image. Planarmodelsare generally based on delimiting homogenous zones ofimages; we propose in this paper an automatic segmentationapproach into bands of signature images. Signature image ismodeled by a planar neuronal model with horizontal secondarymodels and a verticalprincipal model. The proposed methodhas been tested on two databases. The first is the one we havecollected; it includes 6000 signaturescorresponding to 60writers. The second is the public GPDS-300 database including16200 signature corresponding to 300 persons. The achievedresults are promising. |
Tasks | |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1804.00527v1 |
http://arxiv.org/pdf/1804.00527v1.pdf | |
PWC | https://paperswithcode.com/paper/a-neuronal-planar-modeling-for-handwriting |
Repo | |
Framework | |
Weather Classification: A new multi-class dataset, data augmentation approach and comprehensive evaluations of Convolutional Neural Networks
Title | Weather Classification: A new multi-class dataset, data augmentation approach and comprehensive evaluations of Convolutional Neural Networks |
Authors | Jose Carlos Villarreal Guerra, Zeba Khanam, Shoaib Ehsan, Rustam Stolkin, Klaus McDonald-Maier |
Abstract | Weather conditions often disrupt the proper functioning of transportation systems. Present systems either deploy an array of sensors or use an in-vehicle camera to predict weather conditions. These solutions have resulted in incremental cost and limited scope. To ensure smooth operation of all transportation services in all-weather conditions, a reliable detection system is necessary to classify weather in wild. The challenges involved in solving this problem is that weather conditions are diverse in nature and there is an absence of discriminate features among various weather conditions. The existing works to solve this problem have been scene specific and have targeted classification of two categories of weather. In this paper, we have created a new open source dataset consisting of images depicting three classes of weather i.e rain, snow and fog called RFS Dataset. A novel algorithm has also been proposed which has used super pixel delimiting masks as a form of data augmentation, leading to reasonable results with respect to ten Convolutional Neural Network architectures. |
Tasks | Data Augmentation |
Published | 2018-08-01 |
URL | http://arxiv.org/abs/1808.00588v1 |
http://arxiv.org/pdf/1808.00588v1.pdf | |
PWC | https://paperswithcode.com/paper/weather-classification-a-new-multi-class |
Repo | |
Framework | |
Non-Parametric Transformation Networks
Title | Non-Parametric Transformation Networks |
Authors | Dipan K. Pal, Marios Savvides |
Abstract | ConvNets, through their architecture, only enforce invariance to translation. In this paper, we introduce a new class of deep convolutional architectures called Non-Parametric Transformation Networks (NPTNs) which can learn \textit{general} invariances and symmetries directly from data. NPTNs are a natural generalization of ConvNets and can be optimized directly using gradient descent. Unlike almost all previous works in deep architectures, they make no assumption regarding the structure of the invariances present in the data and in that aspect are flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks (TN), which yields a better understanding of the connection between the two. We demonstrate the efficacy of NPTNs on data such as MNIST with extreme transformations and CIFAR10 where they outperform baselines, and further outperform several recent algorithms on ETH-80. They do so while having the same number of parameters. We also show that they are more effective than ConvNets in modelling symmetries and invariances from data, without the explicit knowledge of the added arbitrary nuisance transformations. Finally, we replace ConvNets with NPTNs within Capsule Networks and show that this enables Capsule Nets to perform even better. |
Tasks | |
Published | 2018-01-14 |
URL | http://arxiv.org/abs/1801.04520v6 |
http://arxiv.org/pdf/1801.04520v6.pdf | |
PWC | https://paperswithcode.com/paper/non-parametric-transformation-networks |
Repo | |
Framework | |