Paper Group ANR 288
Machine Learning at the Network Edge: A Survey. Learning and Planning in Feature Deception Games. 3D Hand Shape and Pose Estimation from a Single RGB Image. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control. Rk-means: Fast Clustering for Relational Data. Lessons from Building Acoustic Models with a Millio …
Machine Learning at the Network Edge: A Survey
Title | Machine Learning at the Network Edge: A Survey |
Authors | M. G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain |
Abstract | Devices comprising the Internet of Things, such as sensors and small cameras, usually have small memories and limited computational power. The proliferation of such resource-constrained devices in recent years has led to the generation of large quantities of data. These data-producing devices are appealing targets for machine learning applications but struggle to run machine learning algorithms due to their limited computing capability. They typically offload data to external computing systems (such as cloud servers) for further processing. The results of the machine learning computations are communicated back to the resource-scarce devices, but this worsens latency, leads to increased communication costs, and adds to privacy concerns. Therefore, efforts have been made to place additional computing devices at the edge of the network, i.e close to the IoT devices where the data is generated. Deploying machine learning systems on such edge devices alleviates the above issues by allowing computations to be performed close to the data sources. This survey describes major research efforts where machine learning has been deployed at the edge of computer networks. |
Tasks | |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1908.00080v2 |
https://arxiv.org/pdf/1908.00080v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-at-the-network-edge-a-survey |
Repo | |
Framework | |
Learning and Planning in Feature Deception Games
Title | Learning and Planning in Feature Deception Games |
Authors | Zheyuan Ryan Shi, Ariel D. Procaccia, Kevin S. Chan, Sridhar Venkatesan, Noam Ben-Asher, Nandi O. Leslie, Charles Kamhoua, Fei Fang |
Abstract | Today’s high-stakes adversarial interactions feature attackers who constantly breach the ever-improving security measures. Deception mitigates the defender’s loss by misleading the attacker to make suboptimal decisions. In order to formally reason about deception, we introduce the feature deception game (FDG), a domain-independent game-theoretic model and present a learning and planning framework. We make the following contributions. (1) We show that we can uniformly learn the adversary’s preferences using data from a modest number of deception strategies. (2) We propose an approximation algorithm for finding the optimal deception strategy and show that the problem is NP-hard. (3) We perform extensive experiments to empirically validate our methods and results. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04833v1 |
https://arxiv.org/pdf/1905.04833v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-and-planning-in-feature-deception |
Repo | |
Framework | |
3D Hand Shape and Pose Estimation from a Single RGB Image
Title | 3D Hand Shape and Pose Estimation from a Single RGB Image |
Authors | Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan |
Abstract | This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods. |
Tasks | Hand Pose Estimation, Pose Estimation |
Published | 2019-03-03 |
URL | http://arxiv.org/abs/1903.00812v2 |
http://arxiv.org/pdf/1903.00812v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-hand-shape-and-pose-estimation-from-a |
Repo | |
Framework | |
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Title | V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control |
Authors | H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick |
Abstract | Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algorithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state-value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported. |
Tasks | Continuous Control, Policy Gradient Methods |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12238v1 |
https://arxiv.org/pdf/1909.12238v1.pdf | |
PWC | https://paperswithcode.com/paper/v-mpo-on-policy-maximum-a-posteriori-policy |
Repo | |
Framework | |
Rk-means: Fast Clustering for Relational Data
Title | Rk-means: Fast Clustering for Relational Data |
Authors | Ryan Curtin, Ben Moseley, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, Maximilian Schleich |
Abstract | Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be prohibitive, as the data matrix may be (much) larger than the total input relation size. This paper introduces Rk-means, or relational k -means algorithm, for clustering relational data tuples without having to access the full data matrix. As such, we avoid having to run the expensive feature extraction query and storing its output. Our algorithm leverages the underlying structures in relational data. It involves construction of a small {\it grid coreset} of the data matrix for subsequent cluster construction. This gives a constant approximation for the k -means objective, while having asymptotic runtime improvements over standard approaches of first running the database query and then clustering. Empirical results show orders-of-magnitude speedup, and Rk-means can run faster on the database than even just computing the data matrix. |
Tasks | |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.04939v1 |
https://arxiv.org/pdf/1910.04939v1.pdf | |
PWC | https://paperswithcode.com/paper/rk-means-fast-clustering-for-relational-data |
Repo | |
Framework | |
Lessons from Building Acoustic Models with a Million Hours of Speech
Title | Lessons from Building Acoustic Models with a Million Hours of Speech |
Authors | Sree Hari Krishnan Parthasarathi, Nikko Strom |
Abstract | This is a report of our lessons learned building acoustic models from 1 Million hours of unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ student/teacher training on unlabeled data, helping scale out target generation in comparison to confidence model based methods, which require a decoder and a confidence model. To optimize storage and to parallelize target generation, we store high valued logits from the teacher model. Introducing the notion of scheduled learning, we interleave learning on unlabeled and labeled data. To scale distributed training across a large number of GPUs, we use BMUF with 64 GPUs, while performing sequence training only on labeled data with gradient threshold compression SGD using 16 GPUs. Our experiments show that extremely large amounts of data are indeed useful; with little hyper-parameter tuning, we obtain relative WER improvements in the 10 to 20% range, with higher gains in noisier conditions. |
Tasks | |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01624v1 |
http://arxiv.org/pdf/1904.01624v1.pdf | |
PWC | https://paperswithcode.com/paper/lessons-from-building-acoustic-models-with-a |
Repo | |
Framework | |
Neural Input Search for Large Scale Recommendation Models
Title | Neural Input Search for Large Scale Recommendation Models |
Authors | Manas R. Joglekar, Cong Li, Jay K. Adams, Pranav Khaitan, Quoc V. Le |
Abstract | Recommendation problems with large numbers of discrete items, such as products, webpages, or videos, are ubiquitous in the technology industry. Deep neural networks are being increasingly used for these recommendation problems. These models use embeddings to represent discrete items as continuous vectors, and the vocabulary sizes and embedding dimensions, although heavily influence the model’s accuracy, are often manually selected in a heuristical manner. We present Neural Input Search (NIS), a technique for learning the optimal vocabulary sizes and embedding dimensions for categorical features. The goal is to maximize prediction accuracy subject to a constraint on the total memory used by all embeddings. Moreover, we argue that the traditional Single-size Embedding (SE), which uses the same embedding dimension for all values of a feature, suffers from inefficient usage of model capacity and training data. We propose a novel type of embedding, namely Multi-size Embedding (ME), which allows the embedding dimension to vary for different values of the feature. During training we use reinforcement learning to find the optimal vocabulary size for each feature and embedding dimension for each value of the feature. In experiments on two common types of large scale recommendation problems, i.e. retrieval and ranking problems, NIS automatically found better vocabulary and embedding sizes that result in $6.8%$ and $1.8%$ relative improvements on Recall@1 and ROC-AUC over manually optimized ones. |
Tasks | |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04471v1 |
https://arxiv.org/pdf/1907.04471v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-input-search-for-large-scale |
Repo | |
Framework | |
iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters
Title | iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters |
Authors | Ruhul Amin, Chowdhury Rafeed Rahman, Md. Habibur Rahman Sifat, Md Nazmul Khan Liton, Md. Moshiur Rahman, Swakkhar Shatabda, Sajid Ahmed |
Abstract | Promoter is a short region of DNA which is responsible for initiating transcription of specific genes. Development of computational tools for automatic identification of promoters is in high demand. According to the difference of functions, promoters can be of different types. Promoters may have both intra and inter class variation and similarity in terms of consensus sequences. Accurate classification of various types of sigma promoters still remains a challenge. We present iPromoter-BnCNN for identification and accurate classification of six types of promoters - sigma24, sigma28, sigma32, sigma38, sigma54, sigma70. It is a Convolutional Neural Network (CNN) based classifier which combines local features related to monomer nucleotide sequence, trimer nucleotide sequence, dimer structural properties and trimer structural properties through the use of parallel branching. We conducted experiments on a benchmark dataset and compared with two state-of-the-art tools to show our supremacy on 5-fold cross-validation. Moreover, we tested our classifier on an independent test dataset. Our proposed tool iPromoter-BnCNN along with the source code is freely available at https://cutt.ly/te6XISV. |
Tasks | |
Published | 2019-12-21 |
URL | https://arxiv.org/abs/1912.10251v3 |
https://arxiv.org/pdf/1912.10251v3.pdf | |
PWC | https://paperswithcode.com/paper/ipromoter-bncnn-a-novel-branched-cnn-based |
Repo | |
Framework | |
Image Analytics for Legal Document Review: A Transfer Learning Approach
Title | Image Analytics for Legal Document Review: A Transfer Learning Approach |
Authors | Nathaniel Huber-Fliflet, Fusheng Wei, Haozhen Zhao, Han Qin, Shi Ye, Amy Tsang |
Abstract | Though technology assisted review in electronic discovery has been focusing on text data, the need of advanced analytics to facilitate reviewing multimedia content is on the rise. In this paper, we present several applications of deep learning in computer vision to Technology Assisted Review of image data in legal industry. These applications include image classification, image clustering, and object detection. We use transfer learning techniques to leverage established pretrained models for feature extraction and fine tuning. These applications are first of their kind in the legal industry for image document review. We demonstrate effectiveness of these applications with solving real world business challenges. |
Tasks | Image Classification, Image Clustering, Object Detection, Transfer Learning |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.12169v1 |
https://arxiv.org/pdf/1912.12169v1.pdf | |
PWC | https://paperswithcode.com/paper/image-analytics-for-legal-document-review-a |
Repo | |
Framework | |
Keyphrase Generation: A Text Summarization Struggle
Title | Keyphrase Generation: A Text Summarization Struggle |
Authors | Erion Çano, Ondřej Bojar |
Abstract | Authors’ keyphrases assigned to scientific articles are essential for recognizing content and topic aspects. Most of the proposed supervised and unsupervised methods for keyphrase generation are unable to produce terms that are valuable but do not appear in the text. In this paper, we explore the possibility of considering the keyphrase string as an abstractive summary of the title and the abstract. First, we collect, process and release a large dataset of scientific paper metadata that contains 2.2 million records. Then we experiment with popular text summarization neural architectures. Despite using advanced deep learning models, large quantities of data and many days of computation, our systematic evaluation on four test datasets reveals that the explored text summarization methods could not produce better keyphrases than the simpler unsupervised methods, or the existing supervised ones. |
Tasks | Text Summarization |
Published | 2019-03-29 |
URL | http://arxiv.org/abs/1904.00110v2 |
http://arxiv.org/pdf/1904.00110v2.pdf | |
PWC | https://paperswithcode.com/paper/keyphrase-generation-a-text-summarization |
Repo | |
Framework | |
Structured Bayesian Compression for Deep models in mobile enabled devices for connected healthcare
Title | Structured Bayesian Compression for Deep models in mobile enabled devices for connected healthcare |
Authors | Sijia Chen, Bin Song, Xiaojiang Du, Nadra Guizani |
Abstract | Deep Models, typically Deep neural networks, have millions of parameters, analyze medical data accurately, yet in a time-consuming method. However, energy cost effectiveness and computational efficiency are important for prerequisites developing and deploying mobile-enabled devices, the mainstream trend in connected healthcare. |
Tasks | |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.05429v1 |
http://arxiv.org/pdf/1902.05429v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-bayesian-compression-for-deep |
Repo | |
Framework | |
A Kernel Stein Test for Comparing Latent Variable Models
Title | A Kernel Stein Test for Comparing Latent Variable Models |
Authors | Heishiro Kanagawa, Wittawat Jitkrittum, Lester Mackey, Kenji Fukumizu, Arthur Gretton |
Abstract | We propose a nonparametric, kernel-based test to assess the relative goodness of fit of latent variable models with intractable unnormalized densities. Our test generalises the kernel Stein discrepancy (KSD) tests of (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018, Jitkrittum et al., 2018) which required exact access to unnormalized densities. Our new test relies on the simple idea of using an approximate observed-variable marginal in place of the exact, intractable one. As our main theoretical contribution, we prove that the new test, with a properly corrected threshold, has a well-controlled type-I error. In the case of models with low-dimensional latent structure and high-dimensional observations, our test significantly outperforms the relative maximum mean discrepancy test (Bounliphone et al., 2015) , which cannot exploit the latent structure. |
Tasks | Latent Variable Models |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00586v1 |
https://arxiv.org/pdf/1907.00586v1.pdf | |
PWC | https://paperswithcode.com/paper/a-kernel-stein-test-for-comparing-latent |
Repo | |
Framework | |
FraudJudger: Real-World Data Oriented Fraud Detection on Digital Payment Platforms
Title | FraudJudger: Real-World Data Oriented Fraud Detection on Digital Payment Platforms |
Authors | Ruoyu Deng, Na Ruan |
Abstract | Automated fraud behaviors detection on electronic payment platforms is a tough problem. Fraud users often exploit the vulnerability of payment platforms and the carelessness of users to defraud money, steal passwords, do money laundering, etc, which causes enormous losses to digital payment platforms and users. There are many challenges for fraud detection in practice. Traditional fraud detection methods require a large-scale manually labeled dataset, which is hard to obtain in reality. Manually labeled data cost tremendous human efforts. Besides, the continuous and rapid evolution of fraud users makes it hard to find new fraud patterns based on existing detection rules. In our work, we propose a real-world data oriented detection paradigm which can detect fraud users and upgrade its detection ability automatically. Based on the new paradigm, we design a novel fraud detection model, FraudJudger, to analyze users behaviors on digital payment platforms and detect fraud users with fewer labeled data in training. FraudJudger can learn the latent representations of users from unlabeled data with the help of Adversarial Autoencoder (AAE). Furthermore, FraudJudger can find new fraud patterns from unknown users by cluster analysis. Our experiment is based on a real-world electronic payment dataset. Comparing with other well-known fraud detection methods, FraudJudger can achieve better detection performance with only 10% labeled data. |
Tasks | Fraud Detection |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02398v1 |
https://arxiv.org/pdf/1909.02398v1.pdf | |
PWC | https://paperswithcode.com/paper/fraudjudger-real-world-data-oriented-fraud |
Repo | |
Framework | |
Breaking Inter-Layer Co-Adaptation by Classifier Anonymization
Title | Breaking Inter-Layer Co-Adaptation by Classifier Anonymization |
Authors | Ikuro Sato, Kohta Ishikawa, Guoqing Liu, Masayuki Tanaka |
Abstract | This study addresses an issue of co-adaptation between a feature extractor and a classifier in a neural network. A naive joint optimization of a feature extractor and a classifier often brings situations in which an excessively complex feature distribution adapted to a very specific classifier degrades the test performance. We introduce a method called Feature-extractor Optimization through Classifier Anonymization (FOCA), which is designed to avoid an explicit co-adaptation between a feature extractor and a particular classifier by using many randomly-generated, weak classifiers during optimization. We put forth a mathematical proposition that states the FOCA features form a point-like distribution within the same class in a class-separable fashion under special conditions. Real-data experiments under more general conditions provide supportive evidences. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01150v1 |
https://arxiv.org/pdf/1906.01150v1.pdf | |
PWC | https://paperswithcode.com/paper/breaking-inter-layer-co-adaptation-by |
Repo | |
Framework | |
Does SLOPE outperform bridge regression?
Title | Does SLOPE outperform bridge regression? |
Authors | Shuaiwen Wang, Haolei Weng, Arian Maleki |
Abstract | A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize the estimation error of SLOPE under the complementary regime where both $k$ and $n$ scale linearly with $p$, and provide new insights into the performance of SLOPE estimators. We first derive a concentration inequality for the finite sample mean square error (MSE) of SLOPE. The quantity that MSE concentrates around takes a complicated and implicit form. With delicate analysis of the quantity, we prove that among all SLOPE estimators, LASSO is optimal for estimating $k$-sparse parameter vectors that do not have tied non-zero components in the low noise scenario. On the other hand, in the large noise scenario, the family of SLOPE estimators are sub-optimal compared with bridge regression such as the Ridge estimator. |
Tasks | |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09345v2 |
https://arxiv.org/pdf/1909.09345v2.pdf | |
PWC | https://paperswithcode.com/paper/does-slope-outperform-bridge-regression |
Repo | |
Framework | |