October 16, 2019

3226 words 16 mins read

Paper Group ANR 1056

Causal Reasoning for Algorithmic Fairness. Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources. An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking. A discrete version of CMA-ES. Active Ranking with Subset-wise Preferences. Robust Optimization over Multiple Domains. Learning from Chunk-based …

Causal Reasoning for Algorithmic Fairness


Title	Causal Reasoning for Algorithmic Fairness
Authors	Joshua R. Loftus, Chris Russell, Matt J. Kusner, Ricardo Silva
Abstract	In this work, we argue for the importance of causal reasoning in creating fair algorithms for decision making. We give a review of existing approaches to fairness, describe work in causality necessary for the understanding of causal approaches, argue why causality is necessary for any approach that wishes to be fair, and give a detailed analysis of the many recent approaches to causality-based fairness.
Tasks	Decision Making
Published	2018-05-15
URL	http://arxiv.org/abs/1805.05859v1
PDF	http://arxiv.org/pdf/1805.05859v1.pdf
PWC	https://paperswithcode.com/paper/causal-reasoning-for-algorithmic-fairness
Repo
Framework

Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources


Title	Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical Resources
Authors	Ivan Vulić, Goran Glavaš, Nikola Mrkšić, Anna Korhonen
Abstract	Word vector specialisation (also known as retrofitting) is a portable, light-weight approach to fine-tuning arbitrary distributional word vector spaces by injecting external knowledge from rich lexical resources such as WordNet. By design, these post-processing methods only update the vectors of words occurring in external lexicons, leaving the representations of all unseen words intact. In this paper, we show that constraint-driven vector space specialisation can be extended to unseen words. We propose a novel post-specialisation method that: a) preserves the useful linguistic knowledge for seen words; while b) propagating this external signal to unseen words in order to improve their vector representations as well. Our post-specialisation approach explicits a non-linear specialisation function in the form of a deep neural network by learning to predict specialised vectors from their original distributional counterparts. The learned function is then used to specialise vectors of unseen words. This approach, applicable to any post-processing model, yields considerable gains over the initial specialisation models both in intrinsic word similarity tasks, and in two downstream tasks: dialogue state tracking and lexical text simplification. The positive effects persist across three languages, demonstrating the importance of specialising the full vocabulary of distributional word vector spaces.
Tasks	Dialogue State Tracking, Text Simplification
Published	2018-05-08
URL	http://arxiv.org/abs/1805.03228v1
PDF	http://arxiv.org/pdf/1805.03228v1.pdf
PWC	https://paperswithcode.com/paper/post-specialisation-retrofitting-vectors-of
Repo
Framework

An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking


Title	An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking
Authors	Puyang Xu, Qi Hu
Abstract	We highlight a practical yet rarely discussed problem in dialogue state tracking (DST), namely handling unknown slot values. Previous approaches generally assume predefined candidate lists and thus are not designed to output unknown values, especially when the spoken language understanding (SLU) module is absent as in many end-to-end (E2E) systems. We describe in this paper an E2E architecture based on the pointer network (PtrNet) that can effectively extract unknown slot values while still obtains state-of-the-art accuracy on the standard DSTC2 benchmark. We also provide extensive empirical evidence to show that tracking unknown values can be challenging and our approach can bring significant improvement with the help of an effective feature dropout technique.
Tasks	Dialogue State Tracking, Spoken Language Understanding
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01555v1
PDF	http://arxiv.org/pdf/1805.01555v1.pdf
PWC	https://paperswithcode.com/paper/an-end-to-end-approach-for-handling-unknown
Repo
Framework

A discrete version of CMA-ES


Title	A discrete version of CMA-ES
Authors	Eric Benhamou, Jamal Atif, Rida Laraki
Abstract	Modern machine learning uses more and more advanced optimization techniques to find optimal hyper parameters. Whenever the objective function is non-convex, non continuous and with potentially multiple local minima, standard gradient descent optimization methods fail. A last resource and very different method is to assume that the optimum(s), not necessarily unique, is/are distributed according to a distribution and iteratively to adapt the distribution according to tested points. These strategies originated in the early 1960s, named Evolution Strategy (ES) have culminated with the CMA-ES (Covariance Matrix Adaptation) ES. It relies on a multi variate normal distribution and is supposed to be state of the art for general optimization program. However, it is far from being optimal for discrete variables. In this paper, we extend the method to multivariate binomial correlated distributions. For such a distribution, we show that it shares similar features to the multi variate normal: independence and correlation is equivalent and correlation is efficiently modeled by interaction between different variables. We discuss this distribution in the framework of the exponential family. We prove that the model can estimate not only pairwise interactions among the two variables but also is capable of modeling higher order interactions. This allows creating a version of CMA ES that can accommodate efficiently discrete variables. We provide the corresponding algorithm and conclude.
Tasks
Published	2018-12-27
URL	http://arxiv.org/abs/1812.11859v2
PDF	http://arxiv.org/pdf/1812.11859v2.pdf
PWC	https://paperswithcode.com/paper/a-discrete-version-of-cma-es
Repo
Framework

Active Ranking with Subset-wise Preferences


Title	Active Ranking with Subset-wise Preferences
Authors	Aadirupa Saha, Aditya Gopalan
Abstract	We consider the problem of probably approximately correct (PAC) ranking $n$ items by adaptively eliciting subset-wise preference feedback. At each round, the learner chooses a subset of $k$ items and observes stochastic feedback indicating preference information of the winner (most preferred) item of the chosen subset drawn according to a Plackett-Luce (PL) subset choice model unknown a priori. The objective is to identify an $\epsilon$-optimal ranking of the $n$ items with probability at least $1 - \delta$. When the feedback in each subset round is a single Plackett-Luce-sampled item, we show $(\epsilon, \delta)$-PAC algorithms with a sample complexity of $O\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$ rounds, which we establish as being order-optimal by exhibiting a matching sample complexity lower bound of $\Omega\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$—this shows that there is essentially no improvement possible from the pairwise comparisons setting ($k = 2$). When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case. This again turns out to be order-wise unimprovable across the class of symmetric ranking algorithms. Our algorithms rely on a novel {pivot trick} to maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score estimates that has been used in prior work. We report results of numerical experiments that corroborate our findings.
Tasks
Published	2018-10-23
URL	http://arxiv.org/abs/1810.10321v1
PDF	http://arxiv.org/pdf/1810.10321v1.pdf
PWC	https://paperswithcode.com/paper/active-ranking-with-subset-wise-preferences
Repo
Framework

Robust Optimization over Multiple Domains


Title	Robust Optimization over Multiple Domains
Authors	Qi Qian, Shenghuo Zhu, Jiasheng Tang, Rong Jin, Baigui Sun, Hao Li
Abstract	In this work, we study the problem of learning a single model for multiple domains. Unlike the conventional machine learning scenario where each domain can have the corresponding model, multiple domains (i.e., applications/users) may share the same machine learning model due to maintenance loads in cloud computing services. For example, a digit-recognition model should be applicable to hand-written digits, house numbers, car plates, etc. Therefore, an ideal model for cloud computing has to perform well at each applicable domain. To address this new challenge from cloud computing, we develop a framework of robust optimization over multiple domains. In lieu of minimizing the empirical risk, we aim to learn a model optimized to the adversarial distribution over multiple domains. Hence, we propose to learn the model and the adversarial distribution simultaneously with the stochastic algorithm for efficiency. Theoretically, we analyze the convergence rate for convex and non-convex models. To our best knowledge, we first study the convergence rate of learning a robust non-convex model with a practical algorithm. Furthermore, we demonstrate that the robustness of the framework and the convergence rate can be further enhanced by appropriate regularizers over the adversarial distribution. The empirical study on real-world fine-grained visual categorization and digits recognition tasks verifies the effectiveness and efficiency of the proposed framework.
Tasks	Fine-Grained Visual Categorization
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07588v2
PDF	http://arxiv.org/pdf/1805.07588v2.pdf
PWC	https://paperswithcode.com/paper/robust-optimization-over-multiple-domains
Repo
Framework

Learning from Chunk-based Feedback in Neural Machine Translation


Title	Learning from Chunk-based Feedback in Neural Machine Translation
Authors	Pavel Petrushkov, Shahram Khadivi, Evgeny Matusov
Abstract	We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute.
Tasks	Machine Translation
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07169v1
PDF	http://arxiv.org/pdf/1806.07169v1.pdf
PWC	https://paperswithcode.com/paper/learning-from-chunk-based-feedback-in-neural
Repo
Framework

Atherosclerotic carotid plaques on panoramic imaging: an automatic detection using deep learning with small dataset


Title	Atherosclerotic carotid plaques on panoramic imaging: an automatic detection using deep learning with small dataset
Authors	Lazar Kats, Marilena Vered, Ayelet Zlotogorski-Hurvitz, Itai Harpaz
Abstract	Stroke is the second most frequent cause of death worldwide with a considerable economic burden on the health systems. In about 15% of strokes, atherosclerotic carotid plaques (ACPs) constitute the main etiological factor. Early detection of ACPs may have a key-role for preventing strokes by managing the patient a-priory to the occurrence of the damage. ACPs can be detected on panoramic images. As these are one of the most common images performed for routine dental practice, they can be used as a source of available data for computerized methods of automatic detection in order to significantly increase timely diagnosis of ACPs. Recently, there has been a definite breakthrough in the field of analysis of medical images due to the use of deep learning based on neural networks. These methods, however have been barely used in dentistry. In this study we used the Faster Region-based Convolutional Network (Faster R-CNN) for deep learning. We aimed to assess the operation of the algorithm on a small database of 65 panoramic images. Due to a small amount of available training data, we had to use data augmentation by changing the brightness and randomly flipping and rotating cropped regions of interest in multiple angles. Receiver Operating Characteristic (ROC) analysis was performed to calculate the accuracy of detection. ACP was detected with a sensitivity of 75%, specificity of 80% and an accuracy of 83%. The ROC analysis showed a significant Area Under Curve (AUC) difference from 0.5. Our novelty lies in that we have showed the efficiency of the Faster R-CNN algorithm in detecting ACPs on routine panoramic images based on a small database. There is a need to further improve the application of the algorithm to the level of introducing this methodology in routine dental practice in order to enable us to prevent stroke events.
Tasks	Data Augmentation
Published	2018-08-24
URL	http://arxiv.org/abs/1808.08093v1
PDF	http://arxiv.org/pdf/1808.08093v1.pdf
PWC	https://paperswithcode.com/paper/atherosclerotic-carotid-plaques-on-panoramic
Repo
Framework

Secure Deep Learning Engineering: A Software Quality Assurance Perspective


Title	Secure Deep Learning Engineering: A Software Quality Assurance Perspective
Authors	Lei Ma, Felix Juefei-Xu, Minhui Xue, Qiang Hu, Sen Chen, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, Simon See
Abstract	Over the past decades, deep learning (DL) systems have achieved tremendous success and gained great popularity in various applications, such as intelligent machines, image processing, speech processing, and medical diagnostics. Deep neural networks are the key driving force behind its recent success, but still seem to be a magic black box lacking interpretability and understanding. This brings up many open safety and security issues with enormous and urgent demands on rigorous methodologies and engineering practice for quality enhancement. A plethora of studies have shown that the state-of-the-art DL systems suffer from defects and vulnerabilities that can lead to severe loss and tragedies, especially when applied to real-world safety-critical applications. In this paper, we perform a large-scale study and construct a paper repository of 223 relevant works to the quality assurance, security, and interpretation of deep learning. We, from a software quality assurance perspective, pinpoint challenges and future opportunities towards universal secure deep learning engineering. We hope this work and the accompanied paper repository can pave the path for the software engineering community towards addressing the pressing industrial demand of secure intelligent applications.
Tasks
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04538v1
PDF	http://arxiv.org/pdf/1810.04538v1.pdf
PWC	https://paperswithcode.com/paper/secure-deep-learning-engineering-a-software
Repo
Framework

Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation


Title	Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation
Authors	Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, Maarten de Rijke
Abstract	Most previous work on outfit recommendation focuses on designing visual features to enhance recommendations. Existing work neglects user comments of fashion items, which have been proved to be effective in generating explanations along with better recommendation results. We propose a novel neural network framework, neural outfit recommendation (NOR), that simultaneously provides outfit recommendations and generates abstractive comments. NOR consists of two parts: outfit matching and comment generation. For outfit matching, we propose a convolutional neural network with a mutual attention mechanism to extract visual features. The visual features are then decoded into a rating score for the matching prediction. For abstractive comment generation, we propose a gated recurrent neural network with a cross-modality attention mechanism to transform visual features into a concise sentence. The two parts are jointly trained based on a multi-task learning framework in an end-to-end back-propagation paradigm. Extensive experiments conducted on an existing dataset and a collected real-world dataset show NOR achieves significant improvements over state-of-the-art baselines for outfit recommendation. Meanwhile, our generated comments achieve impressive ROUGE and BLEU scores in comparison to human-written comments. The generated comments can be regarded as explanations for the recommendation results. We release the dataset and code to facilitate future research.
Tasks	Multi-Task Learning
Published	2018-06-23
URL	http://arxiv.org/abs/1806.08977v3
PDF	http://arxiv.org/pdf/1806.08977v3.pdf
PWC	https://paperswithcode.com/paper/explainable-outfit-recommendation-with-joint
Repo
Framework

Interdependent Gibbs Samplers


Title	Interdependent Gibbs Samplers
Authors	Mark Kozdoba, Shie Mannor
Abstract	Gibbs sampling, as a model learning method, is known to produce the most accurate results available in a variety of domains, and is a de facto standard in these domains. Yet, it is also well known that Gibbs random walks usually have bottlenecks, sometimes termed “local maxima”, and thus samplers often return suboptimal solutions. In this paper we introduce a variation of the Gibbs sampler which yields high likelihood solutions significantly more often than the regular Gibbs sampler. Specifically, we show that combining multiple samplers, with certain dependence (coupling) between them, results in higher likelihood solutions. This side-steps the well known issue of identifiability, which has been the obstacle to combining samplers in previous work. We evaluate the approach on a Latent Dirichlet Allocation model, and also on HMM’s, where precise computation of likelihoods and comparisons to the standard EM algorithm are possible.
Tasks
Published	2018-04-11
URL	http://arxiv.org/abs/1804.03958v2
PDF	http://arxiv.org/pdf/1804.03958v2.pdf
PWC	https://paperswithcode.com/paper/interdependent-gibbs-samplers
Repo
Framework

Simultaneous Modeling of Multiple Complications for Risk Profiling in Diabetes Care


Title	Simultaneous Modeling of Multiple Complications for Risk Profiling in Diabetes Care
Authors	Bin Liu, Ying Li, Soumya Ghosh, Zhaonan Sun, Kenney Ng, Jianying Hu
Abstract	Type 2 diabetes mellitus (T2DM) is a chronic disease that often results in multiple complications. Risk prediction and profiling of T2DM complications is critical for healthcare professionals to design personalized treatment plans for patients in diabetes care for improved outcomes. In this paper, we study the risk of developing complications after the initial T2DM diagnosis from longitudinal patient records. We propose a novel multi-task learning approach to simultaneously model multiple complications where each task corresponds to the risk modeling of one complication. Specifically, the proposed method strategically captures the relationships (1) between the risks of multiple T2DM complications, (2) between the different risk factors, and (3) between the risk factor selection patterns. The method uses coefficient shrinkage to identify an informative subset of risk factors from high-dimensional data, and uses a hierarchical Bayesian framework to allow domain knowledge to be incorporated as priors. The proposed method is favorable for healthcare applications because in additional to improved prediction performance, relationships among the different risks and risk factors are also identified. Extensive experimental results on a large electronic medical claims database show that the proposed method outperforms state-of-the-art models by a significant margin. Furthermore, we show that the risk associations learned and the risk factors identified lead to meaningful clinical insights.
Tasks	Multi-Task Learning
Published	2018-02-19
URL	http://arxiv.org/abs/1802.06476v1
PDF	http://arxiv.org/pdf/1802.06476v1.pdf
PWC	https://paperswithcode.com/paper/simultaneous-modeling-of-multiple
Repo
Framework

Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations


Title	Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations
Authors	Qianxiao Li, Cheng Tai, Weinan E
Abstract	We develop the mathematical foundations of the stochastic modified equations (SME) framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is approximated by a class of stochastic differential equations with small noise parameters. We prove that this approximation can be understood mathematically as an weak approximation, which leads to a number of precise and useful results on the approximations of stochastic gradient descent (SGD), momentum SGD and stochastic Nesterov’s accelerated gradient method in the general setting of stochastic objectives. We also demonstrate through explicit calculations that this continuous-time approach can uncover important analytical insights into the stochastic gradient algorithms under consideration that may not be easy to obtain in a purely discrete-time setting.
Tasks
Published	2018-11-05
URL	http://arxiv.org/abs/1811.01558v1
PDF	http://arxiv.org/pdf/1811.01558v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-modified-equations-and-dynamics-of
Repo
Framework

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale


Title	AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale
Authors	Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu
Abstract	AISHELL-1 is by far the largest open-source speech corpus available for Mandarin speech recognition research. It was released with a baseline system containing solid training and testing pipelines for Mandarin ASR. In AISHELL-2, 1000 hours of clean read-speech data from iOS is published, which is free for academic usage. On top of AISHELL-2 corpus, an improved recipe is developed and released, containing key components for industrial applications, such as Chinese word segmentation, flexible vocabulary expension and phone set transformation etc. Pipelines support various state-of-the-art techniques, such as time-delayed neural networks and Lattic-Free MMI objective funciton. In addition, we also release dev and test data from other channels(Android and Mic). For research community, we hope that AISHELL-2 corpus can be a solid resource for topics like transfer learning and robust ASR. For industry, we hope AISHELL-2 recipe can be a helpful reference for building meaningful industrial systems and products.
Tasks	Chinese Word Segmentation, Speech Recognition, Transfer Learning
Published	2018-08-31
URL	http://arxiv.org/abs/1808.10583v2
PDF	http://arxiv.org/pdf/1808.10583v2.pdf
PWC	https://paperswithcode.com/paper/aishell-2-transforming-mandarin-asr-research
Repo
Framework

Asymmetric kernel in Gaussian Processes for learning target variance


Title	Asymmetric kernel in Gaussian Processes for learning target variance
Authors	Silvia L. Pintea, Jan C. van Gemert, Arnold W. M. Smeulders
Abstract	This work incorporates the multi-modality of the data distribution into a Gaussian Process regression model. We approach the problem from a discriminative perspective by learning, jointly over the training data, the target space variance in the neighborhood of a certain sample through metric learning. We start by using data centers rather than all training samples. Subsequently, each center selects an individualized kernel metric. This enables each center to adjust the kernel space in its vicinity in correspondence with the topology of the targets — a multi-modal approach. We additionally add descriptiveness by allowing each center to learn a precision matrix. We demonstrate empirically the reliability of the model.
Tasks	Gaussian Processes, Metric Learning
Published	2018-03-19
URL	http://arxiv.org/abs/1803.06952v1
PDF	http://arxiv.org/pdf/1803.06952v1.pdf
PWC	https://paperswithcode.com/paper/asymmetric-kernel-in-gaussian-processes-for
Repo
Framework