Paper Group ANR 1088
Convergence Rates of Gradient Descent and MM Algorithms for Generalized Bradley-Terry Models. Geodesic Distance Estimation with Spherelets. Analysis of high-dimensional Continuous Time Markov Chains using the Local Bouncy Particle Sampler. Differentiable Algorithm Networks for Composable Robot Learning. Universal Transforming Geometric Network. Rel …
Convergence Rates of Gradient Descent and MM Algorithms for Generalized Bradley-Terry Models
Title | Convergence Rates of Gradient Descent and MM Algorithms for Generalized Bradley-Terry Models |
Authors | Milan Vojnovic, Seyoung Yun, Kaifang Zhou |
Abstract | We show tight convergence rate bounds for gradient descent and MM algorithms for maximum likelihood estimation and maximum aposteriori probability estimation of a popular Bayesian inference method for generalized Bradley-Terry models. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons with ties, the Luce choice model, and the Plackett-Luce ranking model. Our results show that MM algorithms have same convergence rates as gradient descent algorithms up to constant factors. For the maximum likelihood estimation, the convergence is linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the Bayesian inference, the convergence rate is also linear, with the rate determined by a parameter of the prior distribution in a way that can make convergence arbitrarily slow for small values of this parameter. We propose a simple, first-order acceleration method that resolves the slow convergence issue. |
Tasks | Bayesian Inference |
Published | 2019-01-01 |
URL | http://arxiv.org/abs/1901.00150v1 |
http://arxiv.org/pdf/1901.00150v1.pdf | |
PWC | https://paperswithcode.com/paper/convergence-rates-of-gradient-descent-and-mm |
Repo | |
Framework | |
Geodesic Distance Estimation with Spherelets
Title | Geodesic Distance Estimation with Spherelets |
Authors | Didong Li, David B Dunson |
Abstract | Many statistical and machine learning approaches rely on pairwise distances between data points. The choice of distance metric has a fundamental impact on performance of these procedures, raising questions about how to appropriately calculate distances. When data points are real-valued vectors, by far the most common choice is the Euclidean distance. This article is focused on the problem of how to better calculate distances taking into account the intrinsic geometry of the data, assuming data are concentrated near an unknown subspace or manifold. The appropriate geometric distance corresponds to the length of the shortest path along the manifold, which is the geodesic distance. When the manifold is unknown, it is challenging to accurately approximate the geodesic distance. Current algorithms are either highly complex, and hence often impractical to implement, or based on simple local linear approximations and shortest path algorithms that may have inadequate accuracy. We propose a simple and general alternative, which uses pieces of spheres, or spherelets, to locally approximate the unknown subspace and thereby estimate the geodesic distance through paths over spheres. Theory is developed showing lower error for many manifolds. This conclusion is supported through multiple simulation examples and applications to real data sets. |
Tasks | |
Published | 2019-06-29 |
URL | https://arxiv.org/abs/1907.00296v1 |
https://arxiv.org/pdf/1907.00296v1.pdf | |
PWC | https://paperswithcode.com/paper/geodesic-distance-estimation-with-spherelets |
Repo | |
Framework | |
Analysis of high-dimensional Continuous Time Markov Chains using the Local Bouncy Particle Sampler
Title | Analysis of high-dimensional Continuous Time Markov Chains using the Local Bouncy Particle Sampler |
Authors | Tingting Zhao, Alexandre Bouchard-Côté |
Abstract | Sampling the parameters of high-dimensional Continuous Time Markov Chains (CTMC) is a challenging problem with important applications in many fields of applied statistics. In this work a recently proposed type of non-reversible rejection-free Markov Chain Monte Carlo (MCMC) sampler, the Bouncy Particle Sampler (BPS), is brought to bear to this problem. BPS has demonstrated its favorable computational efficiency compared with state-of-the-art MCMC algorithms, however to date applications to real-data scenario were scarce. An important aspect of the practical implementation of BPS is the simulation of event times. Default implementations use conservative thinning bounds. Such bounds can slow down the algorithm and limit the computational performance. Our paper develops an algorithm with an exact analytical solution to the random event times in the context of CTMCs. Our local version of BPS algorithm takes advantage of the sparse structure in the target factor graph and we also provide a framework for assessing the computational complexity of local BPS algorithms. |
Tasks | |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13120v3 |
https://arxiv.org/pdf/1905.13120v3.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-high-dimensional-continuous-time |
Repo | |
Framework | |
Differentiable Algorithm Networks for Composable Robot Learning
Title | Differentiable Algorithm Networks for Composable Robot Learning |
Authors | Peter Karkus, Xiao Ma, David Hsu, Leslie Pack Kaelbling, Wee Sun Lee, Tomas Lozano-Perez |
Abstract | This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is trained end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and models act as structural assumptions to reduce the data requirements for learning; end-to-end learning allows the modules to adapt to one another and compensate for imperfect models and algorithms, in order to achieve the best overall system performance. We illustrate the DAN methodology through a case study on a simulated robot system, which learns to navigate in complex 3-D environments with only local visual observations and an image of a partially correct 2-D floor map. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11602v1 |
https://arxiv.org/pdf/1905.11602v1.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-algorithm-networks-for |
Repo | |
Framework | |
Universal Transforming Geometric Network
Title | Universal Transforming Geometric Network |
Authors | Jin Li |
Abstract | The recurrent geometric network (RGN), the first end-to-end differentiable neural architecture for protein structure prediction, is a competitive alternative to existing models. However, the RGN’s use of recurrent neural networks (RNNs) as internal representations results in long training time and unstable gradients. And because of its sequential nature, it is less effective at learning global dependencies among amino acids than existing transformer architectures. We propose the Universal Transforming Geometric Network (UTGN), an end-to-end differentiable model that uses the encoder portion of the Universal Transformer architecture as an alternative for internal representations. Our experiments show that compared to RGN, UTGN achieve a $1.7$ \si{\angstrom} improvement on the free modeling portion and a $0.7$ \si{\angstrom} improvement on the template based modeling of the CASP12 competition. |
Tasks | |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00723v1 |
https://arxiv.org/pdf/1908.00723v1.pdf | |
PWC | https://paperswithcode.com/paper/universal-transforming-geometric-network |
Repo | |
Framework | |
Relation Module for Non-answerable Prediction on Question Answering
Title | Relation Module for Non-answerable Prediction on Question Answering |
Authors | Kevin Huang, Yun Tang, Jing Huang, Xiaodong He, Bowen Zhou |
Abstract | Machine reading comprehension(MRC) has attracted significant amounts of research attention recently, due to an increase of challenging reading comprehension datasets. In this paper, we aim to improve a MRC model’s ability to determine whether a question has an answer in a given context (e.g. the recently proposed SQuAD 2.0 task). Our solution is a relation module that is adaptable to any MRC model. The relation module consists of both semantic extraction and relational information. We first extract high level semantics as objects from both question and context with multi-head self-attentive pooling. These semantic objects are then passed to a relation network, which generates relationship scores for each object pair in a sentence. These scores are used to determine whether a question is non-answerable. We test the relation module on the SQuAD 2.0 dataset using both BiDAF and BERT models as baseline readers. We obtain 1.8% gain of F1 on top of the BiDAF reader, and 1.0% on top of the BERT base model. These results show the effectiveness of our relation module on MRC |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10843v1 |
https://arxiv.org/pdf/1910.10843v1.pdf | |
PWC | https://paperswithcode.com/paper/relation-module-for-non-answerable-prediction |
Repo | |
Framework | |
CNM: An Interpretable Complex-valued Network for Matching
Title | CNM: An Interpretable Complex-valued Network for Matching |
Authors | Qiuchi Li, Benyou Wang, Massimo Melucci |
Abstract | This paper seeks to model human language by the mathematical framework of quantum physics. With the well-designed mathematical formulations in quantum physics, this framework unifies different linguistic units in a single complex-valued vector space, e.g. words as particles in quantum states and sentences as mixed systems. A complex-valued network is built to implement this framework for semantic matching. With well-constrained complex-valued components, the network admits interpretations to explicit physical meanings. The proposed complex-valued network for matching (CNM) achieves comparable performances to strong CNN and RNN baselines on two benchmarking question answering (QA) datasets. |
Tasks | Question Answering |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05298v1 |
http://arxiv.org/pdf/1904.05298v1.pdf | |
PWC | https://paperswithcode.com/paper/cnm-an-interpretable-complex-valued-network |
Repo | |
Framework | |
Set Flow: A Permutation Invariant Normalizing Flow
Title | Set Flow: A Permutation Invariant Normalizing Flow |
Authors | Kashif Rasul, Ingmar Schuster, Roland Vollgraf, Urs Bergmann |
Abstract | We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies between entities of the set and is able to train and sample with variable set sizes in a computationally efficient manner. Experiments on 3D point clouds show state-of-the art likelihoods. |
Tasks | |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02775v1 |
https://arxiv.org/pdf/1909.02775v1.pdf | |
PWC | https://paperswithcode.com/paper/set-flow-a-permutation-invariant-normalizing |
Repo | |
Framework | |
Why can’t memory networks read effectively?
Title | Why can’t memory networks read effectively? |
Authors | Simon Šuster, Madhumita Sushil, Walter Daelemans |
Abstract | Memory networks have been a popular choice among neural architectures for machine reading comprehension and question answering. While recent work revealed that memory networks can’t truly perform multi-hop reasoning, we show in the present paper that vanilla memory networks are ineffective even in single-hop reading comprehension. We analyze the reasons for this on two cloze-style datasets, one from the medical domain and another including children’s fiction. We find that the output classification layer with entity-specific weights, and the aggregation of passage information with relatively flat attention distributions are the most important contributors to poor results. We propose network adaptations that can serve as simple remedies. We also find that the presence of unseen answers at test time can dramatically affect the reported results, so we suggest controlling for this factor during evaluation. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07350v1 |
https://arxiv.org/pdf/1910.07350v1.pdf | |
PWC | https://paperswithcode.com/paper/why-cant-memory-networks-read-effectively |
Repo | |
Framework | |
Semantic Characteristics of Schizophrenic Speech
Title | Semantic Characteristics of Schizophrenic Speech |
Authors | Kfir Bar, Vered Zilberstein, Ido Ziv, Heli Baram, Nachum Dershowitz, Samuel Itzikowitz, Eiran Vadim Harel |
Abstract | Natural language processing tools are used to automatically detect disturbances in transcribed speech of schizophrenia inpatients who speak Hebrew. We measure topic mutation over time and show that controls maintain more cohesive speech than inpatients. We also examine differences in how inpatients and controls use adjectives and adverbs to describe content words and show that the ones used by controls are more common than the those of inpatients. We provide experimental results and show their potential for automatically detecting schizophrenia in patients by means only of their speech patterns. |
Tasks | |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07953v1 |
http://arxiv.org/pdf/1904.07953v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-characteristics-of-schizophrenic |
Repo | |
Framework | |
EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation
Title | EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation |
Authors | Roshan Dathathri, Blagovesta Kostova, Olli Saarikivi, Wei Dai, Kim Laine, Madanlal Musuvathi |
Abstract | Fully-Homomorphic Encryption (FHE) offers powerful capabilities by enabling secure offloading of both storage and computation, and recent innovations in schemes and implementation have made it all the more attractive. At the same time, FHE is notoriously hard to use with a very constrained programming model, a very unusual performance profile, and many cryptographic constraints. Existing compilers for FHE either target simpler but less efficient FHE schemes or only support specific domains where they can rely on expert provided high-level runtimes to hide complications. This paper presents a new FHE language called Encrypted Vector Arithmetic (EVA), which includes an optimizing compiler that generates correct and secure FHE programs, while hiding all the complexities of the target FHE scheme. Bolstered by our optimizing compiler, programmers can develop efficient general purpose FHE applications directly in EVA. For example, we have developed image processing applications using EVA, with very few lines of code. EVA is designed to also work as an intermediate representation that can be a target for compiling higher-level domain-specific languages. To demonstrate this we have re-targeted CHET, an existing domain-specific compiler for neural network inference, onto EVA. Due to the novel optimizations in EVA, its programs are on average 5.3x faster than those generated by CHET. We believe EVA would enable a wider adoption of FHE by making it easier to develop FHE applications and domain-specific FHE compilers. |
Tasks | |
Published | 2019-12-27 |
URL | https://arxiv.org/abs/1912.11951v1 |
https://arxiv.org/pdf/1912.11951v1.pdf | |
PWC | https://paperswithcode.com/paper/eva-an-encrypted-vector-arithmetic-language |
Repo | |
Framework | |
PseudoEdgeNet: Nuclei Segmentation only with Point Annotations
Title | PseudoEdgeNet: Nuclei Segmentation only with Point Annotations |
Authors | Inwan Yoo, Donggeun Yoo, Kyunghyun Paeng |
Abstract | Nuclei segmentation is one of the important tasks for whole slide image analysis in digital pathology. With the drastic advance of deep learning, recent deep networks have demonstrated successful performance of the nuclei segmentation task. However, a major bottleneck to achieving good performance is the cost for annotation. A large network requires a large number of segmentation masks, and this annotation task is given to pathologists, not the public. In this paper, we propose a weakly supervised nuclei segmentation method, which requires only point annotations for training. This method can scale to large training set as marking a point of a nucleus is much cheaper than the fine segmentation mask. To this end, we introduce a novel auxiliary network, called PseudoEdgeNet, which guides the segmentation network to recognize nuclei edges even without edge annotations. We evaluate our method with two public datasets, and the results demonstrate that the method consistently outperforms other weakly supervised methods. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02924v2 |
https://arxiv.org/pdf/1906.02924v2.pdf | |
PWC | https://paperswithcode.com/paper/pseudoedgenet-nuclei-segmentation-only-with |
Repo | |
Framework | |
Dedge-AGMNet:an effective stereo matching network optimized by depth edge auxiliary task
Title | Dedge-AGMNet:an effective stereo matching network optimized by depth edge auxiliary task |
Authors | Weida Yang, Xindong Ai, Zuliu Yang, Yong Xu, Yong Zhao |
Abstract | To improve the performance in ill-posed regions, this paper proposes an atrous granular multi-scale network based on depth edge subnetwork(Dedge-AGMNet). According to a general fact, the depth edge is the binary semantic edge of instance-sensitive. This paper innovatively generates the depth edge ground-truth by mining the semantic and instance dataset simultaneously. To incorporate the depth edge cues efficiently, our network employs the hard parameter sharing mechanism for the stereo matching branch and depth edge branch. The network modifies SPP to Dedge-SPP, which fuses the depth edge features to the disparity estimation network. The granular convolution is extracted and extends to 3D architecture. Then we design the AGM module to build a more suitable structure. This module could capture the multi-scale receptive field with fewer parameters. Integrating the ranks of different stereo datasets, our network outperforms other stereo matching networks and advances state-of-the-art performances on the Sceneflow, KITTI 2012 and KITTI 2015 benchmark datasets. |
Tasks | Disparity Estimation, Edge Detection, Multi-Task Learning, Stereo Matching |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09346v4 |
https://arxiv.org/pdf/1908.09346v4.pdf | |
PWC | https://paperswithcode.com/paper/depth-agmnet-an-atrous-granular-multiscale |
Repo | |
Framework | |
Direct Object Recognition Without Line-of-Sight Using Optical Coherence
Title | Direct Object Recognition Without Line-of-Sight Using Optical Coherence |
Authors | Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu |
Abstract | Visual object recognition under situations in which the direct line-of-sight is blocked, such as when it is occluded around the corner, is of practical importance in a wide range of applications. With coherent illumination, the light scattered from diffusive walls forms speckle patterns that contain information of the hidden object. It is possible to realize non-line-of-sight (NLOS) recognition with these speckle patterns. We introduce a novel approach based on speckle pattern recognition with deep neural network, which is simpler and more robust than other NLOS recognition methods. Simulations and experiments are performed to verify the feasibility and performance of this approach. |
Tasks | Object Recognition |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07705v1 |
http://arxiv.org/pdf/1903.07705v1.pdf | |
PWC | https://paperswithcode.com/paper/direct-object-recognition-without-line-of |
Repo | |
Framework | |
Improving Pre-Trained Multilingual Models with Vocabulary Expansion
Title | Improving Pre-Trained Multilingual Models with Vocabulary Expansion |
Authors | Hai Wang, Dian Yu, Kai Sun, Janshu Chen, Dong Yu |
Abstract | Recently, pre-trained language models have achieved remarkable success in a broad range of natural language processing tasks. However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language. Instead of exhaustively pre-training monolingual language models independently, an alternative solution is to pre-train a powerful multilingual deep language model over large-scale corpora in hundreds of languages. However, the vocabulary size for each language in such a model is relatively small, especially for low-resource languages. This limitation inevitably hinders the performance of these multilingual models on tasks such as sequence labeling, wherein in-depth token-level or sentence-level understanding is essential. In this paper, inspired by previous methods designed for monolingual settings, we investigate two approaches (i.e., joint mapping and mixture mapping) based on a pre-trained multilingual model BERT for addressing the out-of-vocabulary (OOV) problem on a variety of tasks, including part-of-speech tagging, named entity recognition, machine translation quality estimation, and machine reading comprehension. Experimental results show that using mixture mapping is more promising. To the best of our knowledge, this is the first work that attempts to address and discuss the OOV issue in multilingual settings. |
Tasks | Language Modelling, Machine Reading Comprehension, Machine Translation, Named Entity Recognition, Part-Of-Speech Tagging, Reading Comprehension |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12440v1 |
https://arxiv.org/pdf/1909.12440v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-pre-trained-multilingual-models |
Repo | |
Framework | |