Paper Group ANR 1179
On the Consistency of Top-k Surrogate Losses. Interactive Text Ranking with Bayesian Optimisation: A Case Study on Community QA and Summarisation. The role of a layer in deep neural networks: a Gaussian Process perspective. Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach. Octree guid …
On the Consistency of Top-k Surrogate Losses
Title | On the Consistency of Top-k Surrogate Losses |
Authors | Forest Yang, Sanmi Koyejo |
Abstract | The top-$k$ error is often employed to evaluate performance for challenging classification tasks in computer vision as it is designed to compensate for ambiguity in ground truth labels. This practical success motivates our theoretical analysis of consistent top-$k$ classification. To this end, we define top-$k$ calibration as a necessary and sufficient condition for consistency, for bounded below loss functions. Unlike prior work, our analysis of top-$k$ calibration handles non-uniqueness of the predictor scores, and extends calibration to consistency – providing a theoretically sound basis for analysis of this topic. Based on the top-$k$ calibration analysis, we propose a rich class of top-$k$ calibrated Bregman divergence surrogates. Our analysis continues by showing previously proposed hinge-like top-$k$ surrogate losses are not top-$k$ calibrated and thus inconsistent. On the other hand, we propose two new hinge-like losses, one which is similarly inconsistent, and one which is consistent. Our empirical results highlight theoretical claims, confirming our analysis of the consistency of these losses. |
Tasks | Calibration |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.11141v1 |
http://arxiv.org/pdf/1901.11141v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-consistency-of-top-k-surrogate-losses |
Repo | |
Framework | |
Interactive Text Ranking with Bayesian Optimisation: A Case Study on Community QA and Summarisation
Title | Interactive Text Ranking with Bayesian Optimisation: A Case Study on Community QA and Summarisation |
Authors | Edwin Simpson, Yang Gao, Iryna Gurevych |
Abstract | For many NLP applications, such as question answering and summarisation, the goal is to select the best solution from a large space of candidates to meet a particular user’s needs. To address the lack of user-specific training data, we propose an interactive text ranking approach that actively selects pairs of candidates, from which the user selects the best. Unlike previous strategies, which attempt to learn a ranking across the whole candidate space, our method employs Bayesian optimisation to focus the user’s labelling effort on high quality candidates and integrates prior knowledge in a Bayesian manner to cope better with small data scenarios. We apply our method to community question answering (cQA) and extractive summarisation, finding that it significantly outperforms existing interactive approaches. We also show that the ranking function learned by our method is an effective reward function for reinforcement learning, which improves the state of the art for interactive summarisation. |
Tasks | Bayesian Optimisation, Community Question Answering, Question Answering |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10183v2 |
https://arxiv.org/pdf/1911.10183v2.pdf | |
PWC | https://paperswithcode.com/paper/interactive-text-ranking-with-bayesian |
Repo | |
Framework | |
The role of a layer in deep neural networks: a Gaussian Process perspective
Title | The role of a layer in deep neural networks: a Gaussian Process perspective |
Authors | Oded Ben-David, Zohar Ringel |
Abstract | A fundamental question in deep learning concerns the role played by individual layers in a deep neural network (DNN) and the transferable properties of the data representations which they learn. To the extent that layers have clear roles, one should be able to optimize them separately using layer-wise loss functions. Such loss functions would describe what is the set of good data representations at each depth of the network and provide a target for layer-wise greedy optimization (LEGO). Here we derive a novel correspondence between Gaussian Processes and SGD trained deep neural networks. Leveraging this correspondence, we derive the Deep Gaussian Layer-wise loss functions (DGLs) which, we believe, are the first supervised layer-wise loss functions which are both explicit and competitive in terms of accuracy. Being highly structured and symmetric, the DGLs provide a promising analytic route to understanding the internal representations generated by DNNs. |
Tasks | Gaussian Processes |
Published | 2019-02-06 |
URL | https://arxiv.org/abs/1902.02354v3 |
https://arxiv.org/pdf/1902.02354v3.pdf | |
PWC | https://paperswithcode.com/paper/the-role-of-a-layer-in-deep-neural-networks-a |
Repo | |
Framework | |
Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach
Title | Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach |
Authors | Raed Kontar, Garvesh Raskutti, Shiyu Zhou |
Abstract | Recently there has been an increasing interest in the multivariate Gaussian process (MGP) which extends the Gaussian process (GP) to deal with multiple outputs. One approach to construct the MGP and account for non-trivial commonalities amongst outputs employs a convolution process (CP). The CP is based on the idea of sharing latent functions across several convolutions. Despite the elegance of the CP construction, it provides new challenges that need yet to be tackled. First, even with a moderate number of outputs, model building is extremely prohibitive due to the huge increase in computational demands and number of parameters to be estimated. Second, the negative transfer of knowledge may occur when some outputs do not share commonalities. In this paper we address these issues. We propose a regularized pairwise modeling approach for the MGP established using CP. The key feature of our approach is to distribute the estimation of the full multivariate model into a group of bivariate GPs which are individually built. Interestingly pairwise modeling turns out to possess unique characteristics, which allows us to tackle the challenge of negative transfer through penalizing the latent function that facilitates information sharing in each bivariate model. Predictions are then made through combining predictions from the bivariate models within a Bayesian framework. The proposed method has excellent scalability when the number of outputs is large and minimizes the negative transfer of knowledge between uncorrelated outputs. Statistical guarantees for the proposed method are studied and its advantageous features are demonstrated through numerical studies. |
Tasks | Gaussian Processes |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11512v2 |
http://arxiv.org/pdf/1901.11512v2.pdf | |
PWC | https://paperswithcode.com/paper/minimizing-negative-transfer-of-knowledge-in |
Repo | |
Framework | |
Octree guided CNN with Spherical Kernels for 3D Point Clouds
Title | Octree guided CNN with Spherical Kernels for 3D Point Clouds |
Authors | Huan Lei, Naveed Akhtar, Ajmal Mian |
Abstract | We propose an octree guided neural network architecture and spherical convolutional kernel for machine learning from arbitrary 3D point clouds. The network architecture capitalizes on the sparse nature of irregular point clouds, and hierarchically coarsens the data representation with space partitioning. At the same time, the proposed spherical kernels systematically quantize point neighborhoods to identify local geometric structures in the data, while maintaining the properties of translation-invariance and asymmetry. We specify spherical kernels with the help of network neurons that in turn are associated with spatial locations. We exploit this association to avert dynamic kernel generation during network training that enables efficient learning with high resolution point clouds. The effectiveness of the proposed technique is established on the benchmark tasks of 3D object classification and segmentation, achieving new state-of-the-art on ShapeNet and RueMonge2014 datasets. |
Tasks | 3D Object Classification, Object Classification |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1903.00343v1 |
http://arxiv.org/pdf/1903.00343v1.pdf | |
PWC | https://paperswithcode.com/paper/octree-guided-cnn-with-spherical-kernels-for |
Repo | |
Framework | |
Managing Popularity Bias in Recommender Systems with Personalized Re-ranking
Title | Managing Popularity Bias in Recommender Systems with Personalized Re-ranking |
Authors | Himan Abdollahpouri, Robin Burke, Bamshad Mobasher |
Abstract | Many recommender systems suffer from popularity bias: popular items are recommended frequently while less popular, niche products, are recommended rarely or not at all. However, recommending the ignored products in the `long tail’ is critical for businesses as they are less likely to be discovered. In this paper, we introduce a personalized diversification re-ranking approach to increase the representation of less popular items in recommendations while maintaining acceptable recommendation accuracy. Our approach is a post-processing step that can be applied to the output of any recommender system. We show that our approach is capable of managing popularity bias more effectively, compared with an existing method based on regularization. We also examine both new and existing metrics to measure the coverage of long-tail items in the recommendation. | |
Tasks | Recommendation Systems |
Published | 2019-01-22 |
URL | https://arxiv.org/abs/1901.07555v4 |
https://arxiv.org/pdf/1901.07555v4.pdf | |
PWC | https://paperswithcode.com/paper/managing-popularity-bias-in-recommender |
Repo | |
Framework | |
An “augmentation-free” rotation invariant classification scheme on point-cloud and its application to neuroimaging
Title | An “augmentation-free” rotation invariant classification scheme on point-cloud and its application to neuroimaging |
Authors | Liu Yang, Rudrasis Chakraborty |
Abstract | Recent years have witnessed the emergence and increasing popularity of 3D medical imaging techniques with the development of 3D sensors and technology. However, achieving geometric invariance in the processing of 3D medical images is computationally expensive but nonetheless essential due to the presence of possible errors caused by rigid registration techniques. An alternative way to analyze medical imaging is by understanding the 3D shapes represented in terms of point-cloud. Though in the medical imaging community, 3D point-cloud processing is not a “go-to” choice, it is a canonical way to preserve rotation invariance. Unfortunately, due to the presence of discrete topology, one can not use the standard convolution operator on point-cloud. To the best of our knowledge, the existing ways to do “convolution” can not preserve the rotation invariance without explicit data augmentation. Therefore, we propose a rotation invariant convolution operator by inducing topology from hypersphere. Experimental validation has been performed on publicly available OASIS dataset in terms of classification accuracy between subjects with (without) dementia, demonstrating the usefulness of our proposed method in terms of model complexity, classification accuracy, and last but most important invariance to rotations. |
Tasks | Data Augmentation |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.03443v1 |
https://arxiv.org/pdf/1911.03443v1.pdf | |
PWC | https://paperswithcode.com/paper/an-augmentation-free-rotation-invariant |
Repo | |
Framework | |
libconform v0.1.0: a Python library for conformal prediction
Title | libconform v0.1.0: a Python library for conformal prediction |
Authors | Jonas Fassbender |
Abstract | This paper introduces libconform v0.1.0, a Python library for the conformal prediction framework, licensed under the MIT-license. libconform is not yet stable. This paper describes the main algorithms implemented and documents the API of libconform. Also some details about the implementation and changes in future versions are described. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.02015v1 |
https://arxiv.org/pdf/1907.02015v1.pdf | |
PWC | https://paperswithcode.com/paper/libconform-v010-a-python-library-for |
Repo | |
Framework | |
Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes
Title | Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes |
Authors | Jun Yang, Shengyang Sun, Daniel M. Roy |
Abstract | The developments of Rademacher complexity and PAC-Bayesian theory have been largely independent. One exception is the PAC-Bayes theorem of Kakade, Sridharan, and Tewari (2008), which is established via Rademacher complexity theory by viewing Gibbs classifiers as linear operators. The goal of this paper is to extend this bridge between Rademacher complexity and state-of-the-art PAC-Bayesian theory. We first demonstrate that one can match the fast rate of Catoni’s PAC-Bayes bounds (Catoni, 2007) using shifted Rademacher processes (Wegkamp, 2003; Lecu'{e} and Mitchell, 2012; Zhivotovskiy and Hanneke, 2018). We then derive a new fast-rate PAC-Bayes bound in terms of the “flatness” of the empirical risk surface on which the posterior concentrates. Our analysis establishes a new framework for deriving fast-rate PAC-Bayes bounds and yields new insights on PAC-Bayesian theory. |
Tasks | |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07585v2 |
https://arxiv.org/pdf/1908.07585v2.pdf | |
PWC | https://paperswithcode.com/paper/190807585 |
Repo | |
Framework | |
Learnable Manifold Alignment (LeMA) : A Semi-supervised Cross-modality Learning Framework for Land Cover and Land Use Classification
Title | Learnable Manifold Alignment (LeMA) : A Semi-supervised Cross-modality Learning Framework for Land Cover and Land Use Classification |
Authors | Danfeng Hong, Naoto Yokoya, Nan Ge, Jocelyn Chanussot, Xiao Xiang Zhu |
Abstract | In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community — can a limited amount of highly-discrimin-ative (e.g., hyperspectral) training data improve the performance of a classification task using a large amount of poorly-discriminative (e.g., multispectral) data? Traditional semi-supervised manifold alignment methods do not perform sufficiently well for such problems, since the hyperspectral data is very expensive to be largely collected in a trade-off between time and efficiency, compared to the multispectral data. To this end, we propose a novel semi-supervised cross-modality learning framework, called learnable manifold alignment (LeMA). LeMA learns a joint graph structure directly from the data instead of using a given fixed graph defined by a Gaussian kernel function. With the learned graph, we can further capture the data distribution by graph-based label propagation, which enables finding a more accurate decision boundary. Additionally, an optimization strategy based on the alternating direction method of multipliers (ADMM) is designed to solve the proposed model. Extensive experiments on two hyperspectral-multispectral datasets demonstrate the superiority and effectiveness of the proposed method in comparison with several state-of-the-art methods. |
Tasks | |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.02838v1 |
http://arxiv.org/pdf/1901.02838v1.pdf | |
PWC | https://paperswithcode.com/paper/learnable-manifold-alignment-lema-a-semi |
Repo | |
Framework | |
High-speed Railway Fastener Detection and Localization Method based on convolutional neural network
Title | High-speed Railway Fastener Detection and Localization Method based on convolutional neural network |
Authors | Qing Song, Yao Guo, Jianan Jiang, Chun Liu, Mengjie Hu |
Abstract | Railway transportation is the artery of China’s national economy and plays an important role in the development of today’s society. Due to the late start of China’s railway security inspection technology, the current railway security inspection tasks mainly rely on manual inspection, but the manual inspection efficiency is low, and a lot of manpower and material resources are needed. In this paper, we establish a steel rail fastener detection image dataset, which contains 4,000 rail fastener pictures about 4 types. We use the regional suggestion network to generate the region of interest, extracts the features using the convolutional neural network, and fuses the classifier into the detection network. With online hard sample mining to improve the accuracy of the model, we optimize the Faster RCNN detection framework by reducing the number of regions of interest. Finally, the model accuracy reaches 99% and the speed reaches 35FPS in the deployment environment of TITAN X GPU. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01141v2 |
https://arxiv.org/pdf/1907.01141v2.pdf | |
PWC | https://paperswithcode.com/paper/high-speed-railway-fastener-detection-and |
Repo | |
Framework | |
Learning to Collaborate from Simulation for Robot-Assisted Dressing
Title | Learning to Collaborate from Simulation for Robot-Assisted Dressing |
Authors | Alexander Clegg, Zackory Erickson, Patrick Grady, Greg Turk, Charles C. Kemp, C. Karen Liu |
Abstract | We investigated the application of haptic feedback control and deep reinforcement learning (DRL) to robot-assisted dressing. Our method uses DRL to simultaneously train human and robot control policies as separate neural networks using physics simulations. In addition, we modeled variations in human impairments relevant to dressing, including unilateral muscle weakness, involuntary arm motion, and limited range of motion. Our approach resulted in control policies that successfully collaborate in a variety of simulated dressing tasks involving a hospital gown and a T-shirt. In addition, our approach resulted in policies trained in simulation that enabled a real PR2 robot to dress the arm of a humanoid robot with a hospital gown. We found that training policies for specific impairments dramatically improved performance; that controller execution speed could be scaled after training to reduce the robot’s speed without steep reductions in performance; that curriculum learning could be used to lower applied forces; and that multi-modal sensing, including a simulated capacitive sensor, improved performance. |
Tasks | |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06682v2 |
https://arxiv.org/pdf/1909.06682v2.pdf | |
PWC | https://paperswithcode.com/paper/modeling-collaboration-for-robot-assisted |
Repo | |
Framework | |
Hear “No Evil”, See “Kenansville”: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems
Title | Hear “No Evil”, See “Kenansville”: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems |
Authors | Hadi Abdullah, Muhammad Sajidur Rahman, Washington Garcia, Logan Blue, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, Patrick Traynor |
Abstract | Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension. Processing pipelines for modern systems are comprised of signal preprocessing and feature extraction steps, whose output is fed to a machine-learned model. Prior work has focused on the models, using white-box knowledge to tailor model-specific attacks. We focus on the pipeline stages before the models, which (unlike the models) are quite similar across systems. As such, our attacks are black-box and transferable, and demonstrably achieve mistranscription and misidentification rates as high as 100% by modifying only a few frames of audio. We perform a study via Amazon Mechanical Turk demonstrating that there is no statistically significant difference between human perception of regular and perturbed audio. Our findings suggest that models may learn aspects of speech that are generally not perceived by human subjects, but that are crucial for model accuracy. We also find that certain English language phonemes (in particular, vowels) are significantly more susceptible to our attack. We show that the attacks are effective when mounted over cellular networks, where signals are subject to degradation due to transcoding, jitter, and packet loss. |
Tasks | Speech Recognition |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.05262v1 |
https://arxiv.org/pdf/1910.05262v1.pdf | |
PWC | https://paperswithcode.com/paper/hear-no-evil-see-kenansville-efficient-and |
Repo | |
Framework | |
Learning Theory and Support Vector Machines - a primer
Title | Learning Theory and Support Vector Machines - a primer |
Authors | Michael Banf |
Abstract | The main goal of statistical learning theory is to provide a fundamental framework for the problem of decision making and model construction based on sets of data. Here, we present a brief introduction to the fundamentals of statistical learning theory, in particular the difference between empirical and structural risk minimization, including one of its most prominent implementations, i.e. the Support Vector Machine. |
Tasks | Decision Making |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04622v1 |
http://arxiv.org/pdf/1902.04622v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-theory-and-support-vector-machines-a |
Repo | |
Framework | |
Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation
Title | Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation |
Authors | Nikolay Bogoychev, Rico Sennrich |
Abstract | The quality of neural machine translation can be improved by leveraging additional monolingual resources to create synthetic training data. Source-side monolingual data can be (forward-)translated into the target language for self-training; target-side monolingual data can be back-translated. It has been widely reported that back-translation delivers superior results, but could this be due to artefacts in the test sets? We perform a case study using French-English news translation task and separate test sets based on their original languages. We show that forward translation delivers superior gains in terms of BLEU on sentences that were originally in the source language, complementing previous studies which show large improvements with back-translation on sentences that were originally in the target language. To better understand when and why forward and back-translation are effective, we study the role of domains, translationese, and noise. While translationese effects are well known to influence MT evaluation, we also find evidence that news data from different languages shows subtle domain differences, which is another explanation for varying performance on different portions of the test set. We perform additional low-resource experiments which demonstrate that forward translation is more sensitive to the quality of the initial translation system than back-translation, and tends to perform worse in low-resource settings. |
Tasks | Machine Translation |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.03362v1 |
https://arxiv.org/pdf/1911.03362v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-translationese-and-noise-in-synthetic |
Repo | |
Framework | |