Paper Group ANR 1056
Causal Reasoning for Algorithmic Fairness. Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources. An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking. A discrete version of CMA-ES. Active Ranking with Subset-wise Preferences. Robust Optimization over Multiple Domains. Learning from Chunk-based …
Causal Reasoning for Algorithmic Fairness
Title | Causal Reasoning for Algorithmic Fairness |
Authors | Joshua R. Loftus, Chris Russell, Matt J. Kusner, Ricardo Silva |
Abstract | In this work, we argue for the importance of causal reasoning in creating fair algorithms for decision making. We give a review of existing approaches to fairness, describe work in causality necessary for the understanding of causal approaches, argue why causality is necessary for any approach that wishes to be fair, and give a detailed analysis of the many recent approaches to causality-based fairness. |
Tasks | Decision Making |
Published | 2018-05-15 |
URL | http://arxiv.org/abs/1805.05859v1 |
http://arxiv.org/pdf/1805.05859v1.pdf | |
PWC | https://paperswithcode.com/paper/causal-reasoning-for-algorithmic-fairness |
Repo | |
Framework | |
Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
Title | Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources |
Authors | Ivan Vulić, Goran Glavaš, Nikola Mrkšić, Anna Korhonen |
Abstract | Word vector specialisation (also known as retrofitting) is a portable, light-weight approach to fine-tuning arbitrary distributional word vector spaces by injecting external knowledge from rich lexical resources such as WordNet. By design, these post-processing methods only update the vectors of words occurring in external lexicons, leaving the representations of all unseen words intact. In this paper, we show that constraint-driven vector space specialisation can be extended to unseen words. We propose a novel post-specialisation method that: a) preserves the useful linguistic knowledge for seen words; while b) propagating this external signal to unseen words in order to improve their vector representations as well. Our post-specialisation approach explicits a non-linear specialisation function in the form of a deep neural network by learning to predict specialised vectors from their original distributional counterparts. The learned function is then used to specialise vectors of unseen words. This approach, applicable to any post-processing model, yields considerable gains over the initial specialisation models both in intrinsic word similarity tasks, and in two downstream tasks: dialogue state tracking and lexical text simplification. The positive effects persist across three languages, demonstrating the importance of specialising the full vocabulary of distributional word vector spaces. |
Tasks | Dialogue State Tracking, Text Simplification |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.03228v1 |
http://arxiv.org/pdf/1805.03228v1.pdf | |
PWC | https://paperswithcode.com/paper/post-specialisation-retrofitting-vectors-of |
Repo | |
Framework | |
An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking
Title | An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking |
Authors | Puyang Xu, Qi Hu |
Abstract | We highlight a practical yet rarely discussed problem in dialogue state tracking (DST), namely handling unknown slot values. Previous approaches generally assume predefined candidate lists and thus are not designed to output unknown values, especially when the spoken language understanding (SLU) module is absent as in many end-to-end (E2E) systems. We describe in this paper an E2E architecture based on the pointer network (PtrNet) that can effectively extract unknown slot values while still obtains state-of-the-art accuracy on the standard DSTC2 benchmark. We also provide extensive empirical evidence to show that tracking unknown values can be challenging and our approach can bring significant improvement with the help of an effective feature dropout technique. |
Tasks | Dialogue State Tracking, Spoken Language Understanding |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01555v1 |
http://arxiv.org/pdf/1805.01555v1.pdf | |
PWC | https://paperswithcode.com/paper/an-end-to-end-approach-for-handling-unknown |
Repo | |
Framework | |
A discrete version of CMA-ES
Title | A discrete version of CMA-ES |
Authors | Eric Benhamou, Jamal Atif, Rida Laraki |
Abstract | Modern machine learning uses more and more advanced optimization techniques to find optimal hyper parameters. Whenever the objective function is non-convex, non continuous and with potentially multiple local minima, standard gradient descent optimization methods fail. A last resource and very different method is to assume that the optimum(s), not necessarily unique, is/are distributed according to a distribution and iteratively to adapt the distribution according to tested points. These strategies originated in the early 1960s, named Evolution Strategy (ES) have culminated with the CMA-ES (Covariance Matrix Adaptation) ES. It relies on a multi variate normal distribution and is supposed to be state of the art for general optimization program. However, it is far from being optimal for discrete variables. In this paper, we extend the method to multivariate binomial correlated distributions. For such a distribution, we show that it shares similar features to the multi variate normal: independence and correlation is equivalent and correlation is efficiently modeled by interaction between different variables. We discuss this distribution in the framework of the exponential family. We prove that the model can estimate not only pairwise interactions among the two variables but also is capable of modeling higher order interactions. This allows creating a version of CMA ES that can accommodate efficiently discrete variables. We provide the corresponding algorithm and conclude. |
Tasks | |
Published | 2018-12-27 |
URL | http://arxiv.org/abs/1812.11859v2 |
http://arxiv.org/pdf/1812.11859v2.pdf | |
PWC | https://paperswithcode.com/paper/a-discrete-version-of-cma-es |
Repo | |
Framework | |
Active Ranking with Subset-wise Preferences
Title | Active Ranking with Subset-wise Preferences |
Authors | Aadirupa Saha, Aditya Gopalan |
Abstract | We consider the problem of probably approximately correct (PAC) ranking $n$ items by adaptively eliciting subset-wise preference feedback. At each round, the learner chooses a subset of $k$ items and observes stochastic feedback indicating preference information of the winner (most preferred) item of the chosen subset drawn according to a Plackett-Luce (PL) subset choice model unknown a priori. The objective is to identify an $\epsilon$-optimal ranking of the $n$ items with probability at least $1 - \delta$. When the feedback in each subset round is a single Plackett-Luce-sampled item, we show $(\epsilon, \delta)$-PAC algorithms with a sample complexity of $O\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$ rounds, which we establish as being order-optimal by exhibiting a matching sample complexity lower bound of $\Omega\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$—this shows that there is essentially no improvement possible from the pairwise comparisons setting ($k = 2$). When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case. This again turns out to be order-wise unimprovable across the class of symmetric ranking algorithms. Our algorithms rely on a novel {pivot trick} to maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score estimates that has been used in prior work. We report results of numerical experiments that corroborate our findings. |
Tasks | |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.10321v1 |
http://arxiv.org/pdf/1810.10321v1.pdf | |
PWC | https://paperswithcode.com/paper/active-ranking-with-subset-wise-preferences |
Repo | |
Framework | |
Robust Optimization over Multiple Domains
Title | Robust Optimization over Multiple Domains |
Authors | Qi Qian, Shenghuo Zhu, Jiasheng Tang, Rong Jin, Baigui Sun, Hao Li |
Abstract | In this work, we study the problem of learning a single model for multiple domains. Unlike the conventional machine learning scenario where each domain can have the corresponding model, multiple domains (i.e., applications/users) may share the same machine learning model due to maintenance loads in cloud computing services. For example, a digit-recognition model should be applicable to hand-written digits, house numbers, car plates, etc. Therefore, an ideal model for cloud computing has to perform well at each applicable domain. To address this new challenge from cloud computing, we develop a framework of robust optimization over multiple domains. In lieu of minimizing the empirical risk, we aim to learn a model optimized to the adversarial distribution over multiple domains. Hence, we propose to learn the model and the adversarial distribution simultaneously with the stochastic algorithm for efficiency. Theoretically, we analyze the convergence rate for convex and non-convex models. To our best knowledge, we first study the convergence rate of learning a robust non-convex model with a practical algorithm. Furthermore, we demonstrate that the robustness of the framework and the convergence rate can be further enhanced by appropriate regularizers over the adversarial distribution. The empirical study on real-world fine-grained visual categorization and digits recognition tasks verifies the effectiveness and efficiency of the proposed framework. |
Tasks | Fine-Grained Visual Categorization |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07588v2 |
http://arxiv.org/pdf/1805.07588v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-optimization-over-multiple-domains |
Repo | |
Framework | |
Learning from Chunk-based Feedback in Neural Machine Translation
Title | Learning from Chunk-based Feedback in Neural Machine Translation |
Authors | Pavel Petrushkov, Shahram Khadivi, Evgeny Matusov |
Abstract | We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute. |
Tasks | Machine Translation |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07169v1 |
http://arxiv.org/pdf/1806.07169v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-chunk-based-feedback-in-neural |
Repo | |
Framework | |
Atherosclerotic carotid plaques on panoramic imaging: an automatic detection using deep learning with small dataset
Title | Atherosclerotic carotid plaques on panoramic imaging: an automatic detection using deep learning with small dataset |
Authors | Lazar Kats, Marilena Vered, Ayelet Zlotogorski-Hurvitz, Itai Harpaz |
Abstract | Stroke is the second most frequent cause of death worldwide with a considerable economic burden on the health systems. In about 15% of strokes, atherosclerotic carotid plaques (ACPs) constitute the main etiological factor. Early detection of ACPs may have a key-role for preventing strokes by managing the patient a-priory to the occurrence of the damage. ACPs can be detected on panoramic images. As these are one of the most common images performed for routine dental practice, they can be used as a source of available data for computerized methods of automatic detection in order to significantly increase timely diagnosis of ACPs. Recently, there has been a definite breakthrough in the field of analysis of medical images due to the use of deep learning based on neural networks. These methods, however have been barely used in dentistry. In this study we used the Faster Region-based Convolutional Network (Faster R-CNN) for deep learning. We aimed to assess the operation of the algorithm on a small database of 65 panoramic images. Due to a small amount of available training data, we had to use data augmentation by changing the brightness and randomly flipping and rotating cropped regions of interest in multiple angles. Receiver Operating Characteristic (ROC) analysis was performed to calculate the accuracy of detection. ACP was detected with a sensitivity of 75%, specificity of 80% and an accuracy of 83%. The ROC analysis showed a significant Area Under Curve (AUC) difference from 0.5. Our novelty lies in that we have showed the efficiency of the Faster R-CNN algorithm in detecting ACPs on routine panoramic images based on a small database. There is a need to further improve the application of the algorithm to the level of introducing this methodology in routine dental practice in order to enable us to prevent stroke events. |
Tasks | Data Augmentation |
Published | 2018-08-24 |
URL | http://arxiv.org/abs/1808.08093v1 |
http://arxiv.org/pdf/1808.08093v1.pdf | |
PWC | https://paperswithcode.com/paper/atherosclerotic-carotid-plaques-on-panoramic |
Repo | |
Framework | |
Secure Deep Learning Engineering: A Software Quality Assurance Perspective
Title | Secure Deep Learning Engineering: A Software Quality Assurance Perspective |
Authors | Lei Ma, Felix Juefei-Xu, Minhui Xue, Qiang Hu, Sen Chen, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, Simon See |
Abstract | Over the past decades, deep learning (DL) systems have achieved tremendous success and gained great popularity in various applications, such as intelligent machines, image processing, speech processing, and medical diagnostics. Deep neural networks are the key driving force behind its recent success, but still seem to be a magic black box lacking interpretability and understanding. This brings up many open safety and security issues with enormous and urgent demands on rigorous methodologies and engineering practice for quality enhancement. A plethora of studies have shown that the state-of-the-art DL systems suffer from defects and vulnerabilities that can lead to severe loss and tragedies, especially when applied to real-world safety-critical applications. In this paper, we perform a large-scale study and construct a paper repository of 223 relevant works to the quality assurance, security, and interpretation of deep learning. We, from a software quality assurance perspective, pinpoint challenges and future opportunities towards universal secure deep learning engineering. We hope this work and the accompanied paper repository can pave the path for the software engineering community towards addressing the pressing industrial demand of secure intelligent applications. |
Tasks | |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.04538v1 |
http://arxiv.org/pdf/1810.04538v1.pdf | |
PWC | https://paperswithcode.com/paper/secure-deep-learning-engineering-a-software |
Repo | |
Framework | |
Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation
Title | Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation |
Authors | Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, Maarten de Rijke |
Abstract | Most previous work on outfit recommendation focuses on designing visual features to enhance recommendations. Existing work neglects user comments of fashion items, which have been proved to be effective in generating explanations along with better recommendation results. We propose a novel neural network framework, neural outfit recommendation (NOR), that simultaneously provides outfit recommendations and generates abstractive comments. NOR consists of two parts: outfit matching and comment generation. For outfit matching, we propose a convolutional neural network with a mutual attention mechanism to extract visual features. The visual features are then decoded into a rating score for the matching prediction. For abstractive comment generation, we propose a gated recurrent neural network with a cross-modality attention mechanism to transform visual features into a concise sentence. The two parts are jointly trained based on a multi-task learning framework in an end-to-end back-propagation paradigm. Extensive experiments conducted on an existing dataset and a collected real-world dataset show NOR achieves significant improvements over state-of-the-art baselines for outfit recommendation. Meanwhile, our generated comments achieve impressive ROUGE and BLEU scores in comparison to human-written comments. The generated comments can be regarded as explanations for the recommendation results. We release the dataset and code to facilitate future research. |
Tasks | Multi-Task Learning |
Published | 2018-06-23 |
URL | http://arxiv.org/abs/1806.08977v3 |
http://arxiv.org/pdf/1806.08977v3.pdf | |
PWC | https://paperswithcode.com/paper/explainable-outfit-recommendation-with-joint |
Repo | |
Framework | |
Interdependent Gibbs Samplers
Title | Interdependent Gibbs Samplers |
Authors | Mark Kozdoba, Shie Mannor |
Abstract | Gibbs sampling, as a model learning method, is known to produce the most accurate results available in a variety of domains, and is a de facto standard in these domains. Yet, it is also well known that Gibbs random walks usually have bottlenecks, sometimes termed “local maxima”, and thus samplers often return suboptimal solutions. In this paper we introduce a variation of the Gibbs sampler which yields high likelihood solutions significantly more often than the regular Gibbs sampler. Specifically, we show that combining multiple samplers, with certain dependence (coupling) between them, results in higher likelihood solutions. This side-steps the well known issue of identifiability, which has been the obstacle to combining samplers in previous work. We evaluate the approach on a Latent Dirichlet Allocation model, and also on HMM’s, where precise computation of likelihoods and comparisons to the standard EM algorithm are possible. |
Tasks | |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.03958v2 |
http://arxiv.org/pdf/1804.03958v2.pdf | |
PWC | https://paperswithcode.com/paper/interdependent-gibbs-samplers |
Repo | |
Framework | |
Simultaneous Modeling of Multiple Complications for Risk Profiling in Diabetes Care
Title | Simultaneous Modeling of Multiple Complications for Risk Profiling in Diabetes Care |
Authors | Bin Liu, Ying Li, Soumya Ghosh, Zhaonan Sun, Kenney Ng, Jianying Hu |
Abstract | Type 2 diabetes mellitus (T2DM) is a chronic disease that often results in multiple complications. Risk prediction and profiling of T2DM complications is critical for healthcare professionals to design personalized treatment plans for patients in diabetes care for improved outcomes. In this paper, we study the risk of developing complications after the initial T2DM diagnosis from longitudinal patient records. We propose a novel multi-task learning approach to simultaneously model multiple complications where each task corresponds to the risk modeling of one complication. Specifically, the proposed method strategically captures the relationships (1) between the risks of multiple T2DM complications, (2) between the different risk factors, and (3) between the risk factor selection patterns. The method uses coefficient shrinkage to identify an informative subset of risk factors from high-dimensional data, and uses a hierarchical Bayesian framework to allow domain knowledge to be incorporated as priors. The proposed method is favorable for healthcare applications because in additional to improved prediction performance, relationships among the different risks and risk factors are also identified. Extensive experimental results on a large electronic medical claims database show that the proposed method outperforms state-of-the-art models by a significant margin. Furthermore, we show that the risk associations learned and the risk factors identified lead to meaningful clinical insights. |
Tasks | Multi-Task Learning |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.06476v1 |
http://arxiv.org/pdf/1802.06476v1.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-modeling-of-multiple |
Repo | |
Framework | |
Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations
Title | Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations |
Authors | Qianxiao Li, Cheng Tai, Weinan E |
Abstract | We develop the mathematical foundations of the stochastic modified equations (SME) framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is approximated by a class of stochastic differential equations with small noise parameters. We prove that this approximation can be understood mathematically as an weak approximation, which leads to a number of precise and useful results on the approximations of stochastic gradient descent (SGD), momentum SGD and stochastic Nesterov’s accelerated gradient method in the general setting of stochastic objectives. We also demonstrate through explicit calculations that this continuous-time approach can uncover important analytical insights into the stochastic gradient algorithms under consideration that may not be easy to obtain in a purely discrete-time setting. |
Tasks | |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01558v1 |
http://arxiv.org/pdf/1811.01558v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-modified-equations-and-dynamics-of |
Repo | |
Framework | |
AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
Title | AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale |
Authors | Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu |
Abstract | AISHELL-1 is by far the largest open-source speech corpus available for Mandarin speech recognition research. It was released with a baseline system containing solid training and testing pipelines for Mandarin ASR. In AISHELL-2, 1000 hours of clean read-speech data from iOS is published, which is free for academic usage. On top of AISHELL-2 corpus, an improved recipe is developed and released, containing key components for industrial applications, such as Chinese word segmentation, flexible vocabulary expension and phone set transformation etc. Pipelines support various state-of-the-art techniques, such as time-delayed neural networks and Lattic-Free MMI objective funciton. In addition, we also release dev and test data from other channels(Android and Mic). For research community, we hope that AISHELL-2 corpus can be a solid resource for topics like transfer learning and robust ASR. For industry, we hope AISHELL-2 recipe can be a helpful reference for building meaningful industrial systems and products. |
Tasks | Chinese Word Segmentation, Speech Recognition, Transfer Learning |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1808.10583v2 |
http://arxiv.org/pdf/1808.10583v2.pdf | |
PWC | https://paperswithcode.com/paper/aishell-2-transforming-mandarin-asr-research |
Repo | |
Framework | |
Asymmetric kernel in Gaussian Processes for learning target variance
Title | Asymmetric kernel in Gaussian Processes for learning target variance |
Authors | Silvia L. Pintea, Jan C. van Gemert, Arnold W. M. Smeulders |
Abstract | This work incorporates the multi-modality of the data distribution into a Gaussian Process regression model. We approach the problem from a discriminative perspective by learning, jointly over the training data, the target space variance in the neighborhood of a certain sample through metric learning. We start by using data centers rather than all training samples. Subsequently, each center selects an individualized kernel metric. This enables each center to adjust the kernel space in its vicinity in correspondence with the topology of the targets — a multi-modal approach. We additionally add descriptiveness by allowing each center to learn a precision matrix. We demonstrate empirically the reliability of the model. |
Tasks | Gaussian Processes, Metric Learning |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06952v1 |
http://arxiv.org/pdf/1803.06952v1.pdf | |
PWC | https://paperswithcode.com/paper/asymmetric-kernel-in-gaussian-processes-for |
Repo | |
Framework | |