Paper Group ANR 312
Harnessing Code Switching to Transcend the Linguistic Barrier. Learning in Networked Control Systems. Adversarial Filters of Dataset Biases. Temporal Information Processing on Noisy Quantum Computers. Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300. Investigating Simple Object Representations …
Harnessing Code Switching to Transcend the Linguistic Barrier
Title | Harnessing Code Switching to Transcend the Linguistic Barrier |
Authors | Ashiqur R. KhudaBukhsh, Shriphani Palakodety, Jaime G. Carbonell |
Abstract | Code mixing (or code switching) is a common phenomenon observed in social-media content generated by a linguistically diverse user-base. Studies show that in the Indian sub-continent, a substantial fraction of social media posts exhibit code switching. While the difficulties posed by code mixed documents to further downstream analyses are well-understood, lending visibility to code mixed documents under certain scenarios may have utility that has been previously overlooked. For instance, a document written in a mixture of multiple languages can be partially accessible to a wider audience; this could be particularly useful if a considerable fraction of the audience lacks fluency in one of the component languages. In this paper, we provide a systematic approach to sample code mixed documents leveraging a polyglot embedding based method that requires minimal supervision. In the context of the 2019 India-Pakistan conflict triggered by the Pulwama terror attack, we demonstrate an untapped potential of harnessing code mixing for human well-being: starting from an existing hostility diffusing \emph{hope speech} classifier solely trained on English documents, code mixed documents are utilized as a bridge to retrieve \emph{hope speech} content written in a low-resource but widely used language - Romanized Hindi. Our proposed pipeline requires minimal supervision and holds promise in substantially reducing web moderation efforts. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11258v1 |
https://arxiv.org/pdf/2001.11258v1.pdf | |
PWC | https://paperswithcode.com/paper/harnessing-code-switching-to-transcend-the |
Repo | |
Framework | |
Learning in Networked Control Systems
Title | Learning in Networked Control Systems |
Authors | Rahul Singh, P. R. Kumar |
Abstract | We design adaptive controller (learning rule) for a networked control system (NCS) in which data packets containing control information are transmitted across a lossy wireless channel. We propose Upper Confidence Bounds for Networked Control Systems (UCB-NCS), a learning rule that maintains confidence intervals for the estimates of plant parameters $(A_{(\star)},B_{(\star)})$, and channel reliability $p_{(\star)}$, and utilizes the principle of optimism in the face of uncertainty while making control decisions. We provide non-asymptotic performance guarantees for UCB-NCS by analyzing its “regret”, i.e., performance gap from the scenario when $(A_{(\star)},B_{(\star)},p_{(\star)})$ are known to the controller. We show that with a high probability the regret can be upper-bounded as $\tilde{O}\left(C\sqrt{T}\right)$\footnote{Here $\tilde{O}$ hides logarithmic factors.}, where $T$ is the operating time horizon of the system, and $C$ is a problem dependent constant. |
Tasks | |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09596v1 |
https://arxiv.org/pdf/2003.09596v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-in-networked-control-systems |
Repo | |
Framework | |
Adversarial Filters of Dataset Biases
Title | Adversarial Filters of Dataset Biases |
Authors | Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E. Peters, Ashish Sabharwal, Yejin Choi |
Abstract | Large neural models have demonstrated human-level performance on language and vision benchmarks such as ImageNet and Stanford Natural Language Inference (SNLI). Yet, their performance degrades considerably when tested on adversarial or out-of-distribution samples. This raises the question of whether these models have learned to solve a dataset rather than the underlying task by overfitting on spurious dataset biases. We investigate one recently proposed approach, AFLite, which adversarially filters such dataset biases, as a means to mitigate the prevalent overestimation of machine performance. We provide a theoretical understanding for AFLite, by situating it in the generalized framework for optimum bias reduction. Our experiments show that as a result of the substantial reduction of these biases, models trained on the filtered datasets yield better generalization to out-of-distribution tasks, especially when the benchmarks used for training are over-populated with biased samples. We show that AFLite is broadly applicable to a variety of both real and synthetic datasets for reduction of measurable dataset biases and provide extensive supporting analyses. Finally, filtering results in a large drop in model performance (e.g., from 92% to 63% for SNLI), while human performance still remains high. Our work thus shows that such filtered datasets can pose new research challenges for robust generalization by serving as upgraded benchmarks. |
Tasks | Natural Language Inference |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.04108v2 |
https://arxiv.org/pdf/2002.04108v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-filters-of-dataset-biases-1 |
Repo | |
Framework | |
Temporal Information Processing on Noisy Quantum Computers
Title | Temporal Information Processing on Noisy Quantum Computers |
Authors | Jiayin Chen, Hendra I. Nurdin, Naoki Yamamoto |
Abstract | The combination of machine learning and quantum computing has emerged as a promising approach for addressing previously untenable problems. Reservoir computing is a state-of-the-art machine learning paradigm that utilizes nonlinear dynamical systems for temporal information processing, whose state-space dimension plays a key role in the performance. Here we propose a quantum reservoir system that harnesses complex dissipative quantum dynamics and the exponentially large quantum state-space. Our proposal is readily implementable on available noisy gate-model quantum processors and possesses universal computational power for approximating nonlinear short-term memory maps, important in applications such as neural modeling, speech recognition and natural language processing. We experimentally demonstrate on superconducting quantum computers that small and noisy quantum reservoirs can tackle high-order nonlinear temporal tasks. Our theoretical and experimental results pave the way for attractive temporal processing applications of near-term gate-model quantum computers of increasing fidelity but without quantum error correction, signifying the potential of these devices for wider applications beyond static classification and regression tasks in interdisciplinary areas. |
Tasks | Speech Recognition |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09498v1 |
https://arxiv.org/pdf/2001.09498v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-information-processing-on-noisy |
Repo | |
Framework | |
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300
Title | Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300 |
Authors | Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury |
Abstract | It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training. In this paper, we show that state-of-the-art recognition performance can be achieved on the Switchboard-300 database using a single headed attention, LSTM based model. Using a cross-utterance language model, our single-pass speaker independent system reaches 6.4% and 12.5% word error rate (WER) on the Switchboard and CallHome subsets of Hub5’00, without a pronunciation lexicon. While careful regularization and data augmentation are crucial in achieving this level of performance, experiments on Switchboard-2000 show that nothing is more useful than more data. |
Tasks | Data Augmentation, Language Modelling, Speech Recognition |
Published | 2020-01-20 |
URL | https://arxiv.org/abs/2001.07263v1 |
https://arxiv.org/pdf/2001.07263v1.pdf | |
PWC | https://paperswithcode.com/paper/single-headed-attention-based-sequence-to |
Repo | |
Framework | |
Investigating Simple Object Representations in Model-Free Deep Reinforcement Learning
Title | Investigating Simple Object Representations in Model-Free Deep Reinforcement Learning |
Authors | Guy Davidson, Brenden M. Lake |
Abstract | We explore the benefits of augmenting state-of-the-art model-free deep reinforcement algorithms with simple object representations. Following the Frostbite challenge posited by Lake et al. (2017), we identify object representations as a critical cognitive capacity lacking from current reinforcement learning agents. We discover that providing the Rainbow model (Hessel et al.,2018) with simple, feature-engineered object representations substantially boosts its performance on the Frostbite game from Atari 2600. We then analyze the relative contributions of the representations of different types of objects, identify environment states where these representations are most impactful, and examine how these representations aid in generalizing to novel situations. |
Tasks | |
Published | 2020-02-16 |
URL | https://arxiv.org/abs/2002.06703v1 |
https://arxiv.org/pdf/2002.06703v1.pdf | |
PWC | https://paperswithcode.com/paper/investigating-simple-object-representations |
Repo | |
Framework | |
Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization
Title | Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization |
Authors | Tianxiang Gao, Songtao Lu, Jia Liu, Chris Chu |
Abstract | We propose a new \textit{randomized Bregman (block) coordinate descent} (RBCD) method for minimizing a composite problem, where the objective function could be either convex or nonconvex, and the smooth part are freed from the global Lipschitz-continuous (partial) gradient assumption. Under the notion of relative smoothness based on the Bregman distance, we prove that every limit point of the generated sequence is a stationary point. Further, we show that the iteration complexity of the proposed method is $O(n\varepsilon^{-2})$ to achieve $\epsilon$-stationary point, where $n$ is the number of blocks of coordinates. If the objective is assumed to be convex, the iteration complexity is improved to $O(n\epsilon^{-1} )$. If, in addition, the objective is strongly convex (relative to the reference function), the global linear convergence rate is recovered. We also present the accelerated version of the RBCD method, which attains an $O(n\varepsilon^{-1/\gamma} )$ iteration complexity for the convex case, where the scalar $\gamma\in [1,2]$ is determined by the \textit{generalized translation variant} of the Bregman distance. Convergence analysis without assuming the global Lipschitz-continuous (partial) gradient sets our results apart from the existing works in the composite problems. |
Tasks | |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05202v1 |
https://arxiv.org/pdf/2001.05202v1.pdf | |
PWC | https://paperswithcode.com/paper/randomized-bregman-coordinate-descent-methods |
Repo | |
Framework | |
Industrial Scale Privacy Preserving Deep Neural Network
Title | Industrial Scale Privacy Preserving Deep Neural Network |
Authors | Longfei Zheng, Chaochao Chen, Yingting Liu, Bingzhe Wu, Xibin Wu, Li Wang, Lei Wang, Jun Zhou, Shuang Yang |
Abstract | Deep Neural Network (DNN) has been showing great potential in kinds of real-world applications such as fraud detection and distress prediction. Meanwhile, data isolation has become a serious problem currently, i.e., different parties cannot share data with each other. To solve this issue, most research leverages cryptographic techniques to train secure DNN models for multi-parties without compromising their private data. Although such methods have strong security guarantee, they are difficult to scale to deep networks and large datasets due to its high communication and computation complexities. To solve the scalability of the existing secure Deep Neural Network (DNN) in data isolation scenarios, in this paper, we propose an industrial scale privacy preserving neural network learning paradigm, which is secure against semi-honest adversaries. Our main idea is to split the computation graph of DNN into two parts, i.e., the computations related to private data are performed by each party using cryptographic techniques, and the rest computations are done by a neutral server with high computation ability. We also present a defender mechanism for further privacy protection. We conduct experiments on real-world fraud detection dataset and financial distress prediction dataset, the encouraging results demonstrate the practicalness of our proposal. |
Tasks | Fraud Detection |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05198v2 |
https://arxiv.org/pdf/2003.05198v2.pdf | |
PWC | https://paperswithcode.com/paper/industrial-scale-privacy-preserving-deep |
Repo | |
Framework | |
A Deep Neural Framework for Contextual Affect Detection
Title | A Deep Neural Framework for Contextual Affect Detection |
Authors | Kumar Shikhar Deep, Asif Ekbal, Pushpak Bhattacharyya |
Abstract | A short and simple text carrying no emotion can represent some strong emotions when reading along with its context, i.e., the same sentence can express extreme anger as well as happiness depending on its context. In this paper, we propose a Contextual Affect Detection (CAD) framework which learns the inter-dependence of words in a sentence, and at the same time the inter-dependence of sentences in a dialogue. Our proposed CAD framework is based on a Gated Recurrent Unit (GRU), which is further assisted by contextual word embeddings and other diverse hand-crafted feature sets. Evaluation and analysis suggest that our model outperforms the state-of-the-art methods by 5.49% and 9.14% on Friends and EmotionPush dataset, respectively. |
Tasks | Word Embeddings |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10169v1 |
https://arxiv.org/pdf/2001.10169v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-neural-framework-for-contextual-affect |
Repo | |
Framework | |
An Information-Theoretic Approach to Personalized Explainable Machine Learning
Title | An Information-Theoretic Approach to Personalized Explainable Machine Learning |
Authors | Alexander Jung, Pedro H. J. Nardelli |
Abstract | Automated decision making is used routinely throughout our everyday life. Recommender systems decide which jobs, movies, or other user profiles might be interesting to us. Spell checkers help us to make good use of language. Fraud detection systems decide if a credit card transactions should be verified more closely. Many of these decision making systems use machine learning methods that fit complex models to massive datasets. The successful deployment of machine learning (ML) methods to many (critical) application domains crucially depends on its explainability. Indeed, humans have a strong desire to get explanations that resolve the uncertainty about experienced phenomena like the predictions and decisions obtained from ML methods. Explainable ML is challenging since explanations must be tailored (personalized) to individual users with varying backgrounds. Some users might have received university-level education in ML, while other users might have no formal training in linear algebra. Linear regression with few features might be perfectly interpretable for the first group but might be considered a black-box by the latter. We propose a simple probabilistic model for the predictions and user knowledge. This model allows to study explainable ML using information theory. Explaining is here considered as the task of reducing the “surprise” incurred by a prediction. We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction, given the user background. |
Tasks | Decision Making, Fraud Detection, Recommendation Systems |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00484v2 |
https://arxiv.org/pdf/2003.00484v2.pdf | |
PWC | https://paperswithcode.com/paper/an-information-theoretic-approach-to-3 |
Repo | |
Framework | |
Link Prediction using Graph Neural Networks for Master Data Management
Title | Link Prediction using Graph Neural Networks for Master Data Management |
Authors | Balaji Ganesan, Gayatri Mishra, Srinivas Parkala, Neeraj R Singh, Hima Patel, Somashekar Naganna |
Abstract | Learning graph representations of n-ary relational data has a number of real world applications like anti-money laundering, fraud detection, risk assessment etc. Graph Neural Networks have been shown to be effective in predicting links with few or no node features. While a number of datasets exist for link prediction, their features are considerably different from real world applications. Temporal information on entities and relations are often unavailable. We introduce a new dataset with 10 subgraphs, 20912 nodes, 67564 links, 70 attributes and 9 relation types. We also present novel improvements to graph models to adapt them for industry scale applications. |
Tasks | Fraud Detection, Link Prediction |
Published | 2020-03-07 |
URL | https://arxiv.org/abs/2003.04732v1 |
https://arxiv.org/pdf/2003.04732v1.pdf | |
PWC | https://paperswithcode.com/paper/link-prediction-using-graph-neural-networks |
Repo | |
Framework | |
Adapted tree boosting for Transfer Learning
Title | Adapted tree boosting for Transfer Learning |
Authors | Wenjing Fang, Chaochao Chen, Bowen Song, Li Wang, Jun Zhou, Kenny Q. Zhu |
Abstract | Secure online transaction is an essential task for e-commerce platforms. Alipay, one of the world’s leading cashless payment platform, provides the payment service to both merchants and individual customers. The fraud detection models are built to protect the customers, but stronger demands are raised by the new scenes, which are lacking in training data and labels. The proposed model makes a difference by utilizing the data under similar old scenes and the data under a new scene is treated as the target domain to be promoted. Inspired by this real case in Alipay, we view the problem as a transfer learning problem and design a set of revise strategies to transfer the source domain models to the target domain under the framework of gradient boosting tree models. This work provides an option for the cold-starting and data-sharing problems. |
Tasks | Fraud Detection, Transfer Learning |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.11982v1 |
https://arxiv.org/pdf/2002.11982v1.pdf | |
PWC | https://paperswithcode.com/paper/adapted-tree-boosting-for-transfer-learning |
Repo | |
Framework | |
Uncovering Insurance Fraud Conspiracy with Network Learning
Title | Uncovering Insurance Fraud Conspiracy with Network Learning |
Authors | Chen Liang, Ziqi Liu, Bin Liu, Jun Zhou, Xiaolong Li, Shuang Yang, Yuan Qi |
Abstract | Fraudulent claim detection is one of the greatest challenges the insurance industry faces. Alibaba’s return-freight insurance, providing return-shipping postage compensations over product return on the e-commerce platform, receives thousands of potentially fraudulent claims every day. Such deliberate abuse of the insurance policy could lead to heavy financial losses. In order to detect and prevent fraudulent insurance claims, we developed a novel data-driven procedure to identify groups of organized fraudsters, one of the major contributions to financial losses, by learning network information. In this paper, we introduce a device-sharing network among claimants, followed by developing an automated solution for fraud detection based on graph learning algorithms, to separate fraudsters from regular customers and uncover groups of organized fraudsters. This solution applied at Alibaba achieves more than 80% precision while covering 44% more suspicious accounts compared with a previously deployed rule-based classifier after human expert investigations. Our approach can easily and effectively generalizes to other types of insurance. |
Tasks | Fraud Detection |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12789v1 |
https://arxiv.org/pdf/2002.12789v1.pdf | |
PWC | https://paperswithcode.com/paper/uncovering-insurance-fraud-conspiracy-with |
Repo | |
Framework | |
Generative ODE Modeling with Known Unknowns
Title | Generative ODE Modeling with Known Unknowns |
Authors | Ori Linial, Danny Eytan, Uri Shalit |
Abstract | In several crucial applications, domain knowledge is encoded by a system of ordinary differential equations (ODE). A motivating example is intensive care unit patients: The dynamics of some vital physiological variables such as heart rate, blood pressure and arterial compliance can be approximately described by a known system of ODEs. Typically, some of the ODE variables are directly observed while some are unobserved, and in addition many other variables are observed but not modeled by the ODE, for example body temperature. Importantly, the unobserved ODE variables are ``known-unknowns’': We know they exist and their functional dynamics, but cannot measure them directly, nor do we know the function tying them to all observed measurements. Estimating these known-unknowns is often highly valuable to physicians. Under this scenario we wish to: (i) learn the static parameters of the ODE generating each observed time-series (ii) infer the dynamic sequence of all ODE variables including the known-unknowns, and (iii) extrapolate the future of the ODE variables and the observations of the time-series. We address this task with a variational autoencoder incorporating the known ODE function, called GOKU-net for Generative ODE modeling with Known Unknowns. We test our method on videos of pendulums with unknown length, and a model of the cardiovascular system. | |
Tasks | Time Series |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.10775v1 |
https://arxiv.org/pdf/2003.10775v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-ode-modeling-with-known-unknowns |
Repo | |
Framework | |
Interleaved Sequence RNNs for Fraud Detection
Title | Interleaved Sequence RNNs for Fraud Detection |
Authors | Bernardo Branco, Pedro Abreu, Ana Sofia Gomes, Mariana S. C. Almeida, João Tiago Ascensão, Pedro Bizarro |
Abstract | Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies. In this work, we do not require those expensive features by using recurrent neural networks and treating payments as an interleaved sequence, where the history of each card is an unbounded, irregular sub-sequence. We present a complete RNN framework to detect fraud in real-time, proposing an efficient ML pipeline from preprocessing to deployment. We show that these feature-free, multi-sequence RNNs outperform state-of-the-art models saving millions of dollars in fraud detection and using fewer computational resources. |
Tasks | Feature Engineering, Fraud Detection |
Published | 2020-02-14 |
URL | https://arxiv.org/abs/2002.05988v1 |
https://arxiv.org/pdf/2002.05988v1.pdf | |
PWC | https://paperswithcode.com/paper/interleaved-sequence-rnns-for-fraud-detection |
Repo | |
Framework | |